Ticket #276 (new defect)

Opened 1 year ago

Last modified 9 months ago

Precise semantics for 'this' propagation

Reported by: lth Assigned to: lth
Type: defect Priority: major
Milestone: Component: Spec
Version: 4 Keywords:
Cc: brendan, dherman, cormac, jeffdyer, graydon, pratapl@microsoft.com, erights@gmail.com

Description (last modified by lth) (diff)

We've agreed to change the way this is propagated in calls to lexically nested functions (http://wiki.ecmascript.org/doku.php?id=proposals:bug_fixes, http://wiki.ecmascript.org/doku.php?id=es3.1:inner_functions_and_this). What remains is to agree about some of the details.

There are two cases: function definitions and function expressions.

(1) Definitions: My understanding is that if

  • there is a FunctionDefinition of a function f,
  • that definition is lexically nested inside any other function definition or expression (including methods), and
  • the calling code calls f directly by name as in f(...)

then the value of this is propagated. Examples:

   function g() {
      function f() { return this }
      f()
   }
   function h() {
      function f() { return this }
      function m() { return f() }
      m()
   }

Note f need not be defined inside its caller, it can be a peer of the caller (as in the second example), a peer of the caller's enclosing function, or indeed one of the functions enclosing the caller. The only requirement is that f is not a top-level function (ie, defined in Program, Package, or Class scope).

Expressions: There are two cases, named and unnamed.

(2a) Named expressions. Consider this code:

   function g() {
      var v = function f() { ... f() ... }
      v()
      var o = { f: v }
      o.f()
   }

Here it seems to me that the call through v should pass the global object as this but that the recursive call to f should probably propagate the value of this, as motivated by the call to o.f. This case is not covered by the three conditions above, so we get a fourth:

  • If f is a named FunctionExpression containing a recursive call to itself by its given name, then that recursive call propagates the value of this.

(2b) Unnamed expressions. Consider the following program:

    function g() {
        print( function () { return this } () )
        let x = function () { return this }
        print( x() )
    }

By the previous argument, the call through x should pass the global object as this. My argument is that the second part of g is a simple refactoring of the first part, therefore the first part should have the same result, therefore direct calls to function expressions (named or unnamed) should not propagate this.

Note that:

  • direct calls to function expressions will be much less common in idiomatic ES4 than in ES3
  • it's not obvious that propagating this to direct calls to function expressions is all that useful in practice

Note also that:

  • in all cases, whether the variable f is mutable or not (or may have been mutated or not) is not taken into account.
  • with does not figure into the equation, because if a variable is looked up and found to be bound by a with object then that object is the this value.

Attachments

Change History

Changed 1 year ago by brendan

I buy it!

/be

Changed 1 year ago by waldemar

I'd like to see this fleshed out a little more before deciding on what the right course of action is. The meaning of "calling directly by name" is too fuzzy for me to understand the implications of this.

Also, a quick question on 2a: I was under the, perhaps mistaken, impression that "f" was in scope only inside the lambda-expression. How does {f:f} get to the function? Which proposal discusses this?

Changed 1 year ago by waldemar

Some examples of things I don't understand:

  • Is (f)(a, b, c) different from f(a, b, c)?
  • What happens if f is parametrized by a type?
  • In the function h() example, what would happen if you passed m to an outside function that takes a callback and that function in turn called m()?

Changed 1 year ago by waldemar

In the third bullet point above, my question is what value of "this" function f gets.

Changed 1 year ago by lth

  • description changed from We've agreed to change the way `this` is propagated in calls to lexically nested functions (http://wiki.ecmascript.org/doku.php?id=proposals:bug_fixes, http://wiki.ecmascript.org/doku.php?id=es3.1:inner_functions_and_this). What remains is to agree about some of the details. There are two cases: function definitions and function expressions. (1) Definitions: My understanding is that if * there is a !FunctionDefinition of a function `f`, * that definition is lexically nested inside any other function definition or expression (including methods), and * the calling code calls `f` directly by name as in `f(...)` then the value of `this` is propagated. Examples: {{{ function g() { function f() { return this } f() } function h() { function f() { return this } function m() { return f() } m() } }}} Note `f` need not be defined inside its caller, it can be a peer of the caller (as in the second example), a peer of the caller's enclosing function, or indeed one of the functions enclosing the caller. The only requirement is that `f` is not a top-level function (ie, defined in Program, Package, or Class scope). Expressions: There are two cases, named and unnamed. (2a) Named expressions. Consider this code: {{{ function g() { var v = function f() { ... f() ... } v() var o = { f: f } o.f() } }}} Here it seems to me that the call through `v` should pass the global object as `this` but that the recursive call to `f` should probably propagate the value of `this`, as motivated by the call to `o.f`. This case is not covered by the three conditions above, so we get a fourth: * If `f` is a named !FunctionExpression containing a recursive call to itself by its given name, then that recursive call propagates the value of `this`. (2b) Unnamed expressions. Consider the following program: {{{ function g() { print( function () { return this } () ) let x = function () { return this } print( x() ) } }}} By the previous argument, the call through `x` should pass the global object as `this`. My argument is that the second part of `g` is a simple refactoring of the first part, therefore the first part should have the same result, therefore direct calls to function expressions (named or unnamed) should not propagate `this`. Note that: * direct calls to function expressions will be much less common in idiomatic ES4 than in ES3 * it's not obvious that propagating `this` to direct calls to function expressions is all that useful in practice Note also that: * in all cases, whether the variable `f` is mutable or not (or may have been mutated or not) is not taken into account. * `with` does not figure into the equation, because if a variable is looked up and found to be bound by a `with` object then that object is the `this` value. to We've agreed to change the way `this` is propagated in calls to lexically nested functions (http://wiki.ecmascript.org/doku.php?id=proposals:bug_fixes, http://wiki.ecmascript.org/doku.php?id=es3.1:inner_functions_and_this). What remains is to agree about some of the details. There are two cases: function definitions and function expressions. (1) Definitions: My understanding is that if * there is a !FunctionDefinition of a function `f`, * that definition is lexically nested inside any other function definition or expression (including methods), and * the calling code calls `f` directly by name as in `f(...)` then the value of `this` is propagated. Examples: {{{ function g() { function f() { return this } f() } function h() { function f() { return this } function m() { return f() } m() } }}} Note `f` need not be defined inside its caller, it can be a peer of the caller (as in the second example), a peer of the caller's enclosing function, or indeed one of the functions enclosing the caller. The only requirement is that `f` is not a top-level function (ie, defined in Program, Package, or Class scope). Expressions: There are two cases, named and unnamed. (2a) Named expressions. Consider this code: {{{ function g() { var v = function f() { ... f() ... } v() var o = { f: v } o.f() } }}} Here it seems to me that the call through `v` should pass the global object as `this` but that the recursive call to `f` should probably propagate the value of `this`, as motivated by the call to `o.f`. This case is not covered by the three conditions above, so we get a fourth: * If `f` is a named !FunctionExpression containing a recursive call to itself by its given name, then that recursive call propagates the value of `this`. (2b) Unnamed expressions. Consider the following program: {{{ function g() { print( function () { return this } () ) let x = function () { return this } print( x() ) } }}} By the previous argument, the call through `x` should pass the global object as `this`. My argument is that the second part of `g` is a simple refactoring of the first part, therefore the first part should have the same result, therefore direct calls to function expressions (named or unnamed) should not propagate `this`. Note that: * direct calls to function expressions will be much less common in idiomatic ES4 than in ES3 * it's not obvious that propagating `this` to direct calls to function expressions is all that useful in practice Note also that: * in all cases, whether the variable `f` is mutable or not (or may have been mutated or not) is not taken into account. * `with` does not figure into the equation, because if a variable is looked up and found to be bound by a `with` object then that object is the `this` value.

Re 2a, my fault, a typo. Should be { f: v }. Will correct the description.

"Calling directly by name" is meant to match exactly Ident(Expr,...), and yes, the intent was that (f)(a,b,c) is different from f(a,b,c), although I could see how the parser might like to generate the same tree for the two. I could be persuaded to change my opinion here.

If f is parameterized by a type -- good point. This means the pattern needs to include also Ident.<TExpr>(Expr,...).

In the case of h(), since m is not defined by a FunctionDefinition in the scope of the function calling it, the global object would be passed as 'this' to m, and by the rules of this propagation that object would be passed on to f, I think. I think you're probably asking whether the this object originally passed to h is somehow propagated (captured by m, in a way). My thinking is that it is not.

Changed 1 year ago by brendan

Is (f)(a, b, c) different from f(a, b, c)?

No, not in ES3 for this binding, and not in ES4.

/be

Changed 1 year ago by lth

Two more notes.

  • I did not intend for this to apply to functions nested inside methods, thinking that they had this functionality already. But in ActionScript? they do not.
  • A couple of people have made comments on the mailing list to the effect that the proposed change does not serve several interesting use cases, which effectively call for the 'this' value to be captured by a closure.

Changed 10 months ago by crock

I propose these rules:

1. The global variable this is bound to the global object.

2. The variable this in global functions is bound to undefined. [This is new behavior, motivated by security concerns.]

3. The variable this in inner functions is bound to this in the outer function. [This is new behavior, motivated by correctness. this is bound and closed like ordinary variables.]

4. When a function is invoked in the method form, the binding of this is overridden, binding to the invoked object.

5. When a function is invoked with apply or call, the binding of this is overridden, binding this to the parameter unless the parameter is undefined. [This is new behavior.]

Changed 10 months ago by brendan

Comments, hope you can make use of them:

A. Item 2 seems like too big a change to sell in a low-migration-cost successor language.

B. Items 2 and 3 are at odds in this sense: this is bound in the outer (global) code. The security concerns motivating the symmetry break should be spelled out with concrete examples, if possible.

C. Items 3 and 4 together are nice, and where not compatible probably fix latent bugs.

D. Item 5 is good for making apply and call usable to invoke a function with its lexically bound this, but in the case of call, there's no point in this since you can just invoke the function directly:

f.call(undefined, a, b,c) // why not just call f(a,b,c)?

For apply, the same cannot be done of course, unless you use a "splat" operator not yet proposed, but discussed on es4-discuss@mozilla.org:

f.apply(undefined, [a,b,c]) // could be f(...[a,b,c])

The advantages of "splat" include (a) works on other callable objects than function objects; (b) composes with new, so you can say

new f(...args) // args is an array of actual params

This makes me want to leave apply alone, to avoid a migration tax per my first comment above (secure dialect is a separate issue), and work on "splat".

/be

Changed 10 months ago by allen

Another Alternative Proposal

(modified from earlier email)

This is yet another attempt to informally define a semantics for ES this that 1) respects well known concepts of lexical closures; 2) respects well known concepts of this/self binding for object-oriented languages; 3) introduces minimal breakage/incompatibilities with exist ES code. Except for its treatment of this in global functions I believe that it largely equivalent to Crock’s above proposal but I explain it in slightly different terms.

Informal semantics summary: The symbol “this” is lexically scoped and captured by function closures. When a function closure is invoked via a function invocation expression (i.e. not as a method call) the captured “this” binding is available for use during the invocation. When a function closure is invoked as a method on a object the symbol “this” is bound (or if necessary, rebound) to that object for the duration of that invocation.

Programmer conceptual model: The symbol “this” is lexically scoped and captured by function closures just like any other name. However, when a closure is invoked as a method the binding of “this” is always reset to the object upon which the method was invoked.

Rules and special cases:

1. The global variable “this” is a constant binding to the global object
2. The symbol this is always treated as a constant binding, it may not be the target of an assignment operator.
3. The symbol “this” may not be explicitly bond as the name of a function, variable, or formal parameter.
4. An object initialiser does not introduces a new lexical binding of “this”. The binding of "this" within the initialization expression for an object initialiser property is its current lexical binding.
5. The invocation of a constructor function by the new operator is considered to be a method invocation on the nearly created object. Within such an invocation “this” is bound to the new object.
6. The apply and call functions rebind “this” for the invoked function using either ES3 rules (more compatible) or Crock’s suggested handling of undefined as first argument (less compatible)
7. Rules for ES4 declarative constructs need to be further defined but should generally apply these same principles taking into account ES4's addition lexical scoping constructs.

Note on object initialisers.

Rule 4 is compatible with ES3. It does not solve the problem that there is no declarative way for an inner initialiser to capture a reference to the object created by its enclosing initialize or for a property initialization expression to reference the object being created by its containing object initialiser.

Clarifying examples, important use cases, etc:

Global this binding, lexical this capture, method invocation this rebinding

var foo = "global foo";  //define global variable foo
function getFoo () {return this.foo} // lexically captures global this binding
var obj = {foo: "obj member foo"}  // define “instance variable” foo
obj.getFoo= getFoo;  //binding global definition of getFoo as a method
getFoo();             //should return “global foo”
obj.getFoo();     //should return “obj member foo”
(obj.getFoo)();  //should return “global foo” --  this is not a method invocation;  FF2 AND IE7 CURRENTLY RETURN “obj member foo”
var reboundGetFoo = obj.getFoo;
reboundGetFoo();  // should return “global foo” – using the original captured binding of “this”;  FF2 AND IE7 CURRENTLY DO THIS (by dynamically binding this?)
(function (f) {return f()}) (obj.getFoo) // should return “global foo” – IE7 DOES THIS; FF2 RETURN undefined

Rebinding this on method delegation

var proto=  {baz: "proto member baz", getBaz: function (){return this.baz}}
proto.getBaz() ;   //should return “proto member baz"
function Bazer(name) {this.baz=name; return this};
Bazer.prototype=proto;
var baz1= new Bazer("baz1 member baz ");
baz1.getBaz();  // should return “baz1 member baz”; this reference in getBaz rebound by method invocation
baz1.getBaz();  //should return “proto member baz”  using lexically captured “this” FF2 AND IE7 RETURN “baz1 member baz”
var fgb = baz1.getBaz;
fgb();  //should return “proto member baz” using lexically captured “this” FF2 AND IE7 RETURN undefined because “this” not captured by closure
var baz = "global variable baz"
fgb(); //should return “proto member baz” using lexically captured “this” FF2 AND IE7 RETURN "global variable baz"  because they dynamically scope “this” on invocation

Changed 9 months ago by MILLEM-GOO

  • cc changed from brendan,dherman,cormac,jeffdyer,graydon,pratapl@microsoft.com, to brendan, dherman, cormac, jeffdyer, graydon, pratapl@microsoft.com, erights@gmail.com

Alternate suggestion I just discussed with Crock and Pratap:

ES3.1, like ES4, should behave differently depending on whether "strict mode" is enabled. Strict mode is opt-in. When someone enables strict mode, we should assume they'd prefer that broken programs fail fast and visibly, but well behaved ES3 programs should continue working. If a developer desires that all legacy programs on a page continue to work no matter how broken they are, then they shouldn't opt-in to strict mode on that page.

Referring to Crock's numbering first

Regarding "this" binding, in strict mode:

  1. As with Crock's (1) which agrees with ES3.
  2. When a function is invoked as a function "f(...)", its "this" is always bound to undefined. This is like Crock's (2) but applies to all functions.
  3. When the programmer intends that a function lexically capture the outer "this", ES3.1 (and ES3 in practice) already provides two means to do so:

var that = this; function(){...that...}

and

function(){...this...}.bind(this)

The second form is better behaved, clearer, and smaller; and so should be encouraged.

  1. As with Crock's (4), which agrees with ES3.
  2. When invoked by "call" or "apply" (or indirectly by "bind") "this" is bound to the first argument of the call, apply, or bind, no matter what that value is.

Without strict mode on, (1,3,4,5) still operate as I propose above, but (2) reverts to ES3 behavior. (Note that my (5) is a real difference with ES3 that I propose be unconditional.)

Referring to Brendan's lettering

  1. My (2) is even more severe than the form Crock proposes. However, this behavior would only be enabled by strict mode, which would be opt-in. Anyone interested in software engineering quality, reliability, or security would consider any program that (2) breaks a program that should be fixed anyway.
  2. I agree with Brendan about the symmetry break. My proposal has no such problem.
  3. N/A
  4. Given that calling a function as a function causes its "this" to be bound to undefined, and that a reflective call binds "this" to the first argument to call, bind, or apply, then

f.call(undefined, a, b) ==== f(a, b) ==== f.apply(undefined, [a, b])

so we don't need splat or any new reflective primitives for these cases.

For the remaining case Brendan mentions, I have repeatedly encountered the need for a reflective "new" operation. Rather than introduce the new splat syntax, which would violate ES3.1 design rules, I propose

f.newInstance([a,b]) ==== new f(a, b)

which I think may have been proposed in this forum before.

From a security perspective, this proposal so far still has a bad problem: A function cannot easily tell whether it has been invoked as a method vs as a constructor. One possibility is that "new f(a, b)" be considered syntactic sugar for "f.newInstance([a,b])", so if a function overrides newInstance it can react distinctly to "new" calls on itself.

Referring to Allen's numbering

Allen's (1-5) seem to agree with ES3 and all the proposal above. For Allen's (6), my (5) proposes that "this" is always bound to the first argument of call/apply/bind. This is even less compatible, but if I recall, we agreed to (5) at the last face-to-face EcmaScript meeting.

I didn't understand Allen's (7). Allen?

I have only skimmed Allen's use cases. One jumped out at me:

obj.getFoo(); //should return “obj member foo”

(obj.getFoo)(); //should return “global foo” --

// this is not a method invocation;

// FF2 AND IE7 CURRENTLY RETURN “obj member foo”

What does ES3 and ES4 specify here? I would prefer the behavior Allen states. But if (as I suspect) fixing this would require changing most existing ES3 parsers, then we should consider this change to violate the "no new syntax in ES3.1" design rule.

Note that "(true && obj.getFoo)()" does return "global foo" on FF2.

Note: See TracTickets for help on using tickets.