Ticket #47 (new enhancement)

Opened 2 years ago

Last modified 7 months ago

Close method for iterators

Reported by: ibukanov Assigned to: brendan
Type: enhancement Priority: major
Milestone: Component: RefImpl
Version: 4 Keywords:
Cc: lth, dherman

Description (last modified by brendan) (diff)

Consider the following example:

let iter = f();
for (i in iter)
  break;
print(iter.next())

where f is defined as:

function f() { return ["a","b"].iterator::get(true); }

According to the current spec, it should print 1, the index of the second array element as the specs expands the for-in loop to:

let ($it = iter.iterator::get(true)) {
  while (true) {
    try {
      i = $it.next();
    } catch (e : iterator::StopIterationClass) {
      break;
    }
    break;
  }
}

or, assuming that iter defines iterator::get to return itself:

while (true) {
  try {
    i = iter.next();
  } catch (e : iterator::StopIterationClass) {
    break;
  }
  break;
}

Now consider the same example but define f() as a generator:

function f() { yield 0; yield 1; }

According to the proposal, this time the example throws iterator::StopIterationClass? at "print(iter.next())" as the new for-in loops expands as:

try {
  while (true) {
    try {
      i = iter.next();
    } catch (e : iterator::StopIterationClass) {
      break;
    }
    break;
  }
} finally {
  iter.close();
}

This unconditionally closes the iterator.

Thus to remove this discrepancy I propose to add the close method to the IteratorType?:

type IteratorType.<T> = {
  next: function () : T;
  close: function () : void
};

and always define for (i in o) body as:

let ($it = iter.iterator::get(true)) {
  try {
    while (true) {
      try {
        i = $it.next();
      } catch (e : iterator::StopIterationClass) {
        break;
      }
      body
    }
  } finally {
    $it.close();
  }
}

The close method will also allow for a script to explicitly close the iterator releasing its resources if the script needs to code the iterations explicitly instead of using for-in loop.

Attachments

Change History

Changed 1 year ago by brendan

  • description changed from Consider the following example: {{{ let iter = f(); for (i in iter) break; print(iter.next()) }}} where f is defined as: {{{ function f() { return ["a","b"].iterator::get(true); } }}} According to the current spec, it should print 1, the index of the second array element as the specs expands the for-in loop to: {{{ let ($it = iter.iterator::get(true)) { while (true) { try { i = $it.next(); } catch (e : iterator::StopIterationClass) { break; } break; } } }}} or, assuming that iter defines iterator::get to return itself: {{{ while (true) { try { i = iter.next(); } catch (e : iterator::StopIterationClass) { break; } break; } }}} Now consider the same example but define f() as a generator: {{{ function f() { yield 0; yield 1; } }}} According to the proposal, this time the example throws iterator::StopIterationClass at "print(iter.next())" as the new for-in loops expands as: {{{ try { while (true) { try { i = iter.next(); } catch (e : iterator::StopIterationClass) { break; } break; } } finally { iter.close(); } }}} This unconditionally closes the iterator. Thus to remove this discrepancy I propose to add the close method to the IteratorType: {{{ type IteratorType.<T> = { next: function () : T; public function close() : void }; }}} and always define for (i in o) body as: {{{ let ($it = iter.iterator::get(true)) { try { while (true) { try { i = $it.next(); } catch (e : iterator::StopIterationClass) { break; } body } } finally { $it.close(); } } }}} The close method will also allow for a script to explicitly close the iterator releasing its resources if the script needs to code the iterations explicitly instead of using for-in loop. to Consider the following example: {{{ let iter = f(); for (i in iter) break; print(iter.next()) }}} where f is defined as: {{{ function f() { return ["a","b"].iterator::get(true); } }}} According to the current spec, it should print 1, the index of the second array element as the specs expands the for-in loop to: {{{ let ($it = iter.iterator::get(true)) { while (true) { try { i = $it.next(); } catch (e : iterator::StopIterationClass) { break; } break; } } }}} or, assuming that iter defines iterator::get to return itself: {{{ while (true) { try { i = iter.next(); } catch (e : iterator::StopIterationClass) { break; } break; } }}} Now consider the same example but define f() as a generator: {{{ function f() { yield 0; yield 1; } }}} According to the proposal, this time the example throws iterator::StopIterationClass at "print(iter.next())" as the new for-in loops expands as: {{{ try { while (true) { try { i = iter.next(); } catch (e : iterator::StopIterationClass) { break; } break; } } finally { iter.close(); } }}} This unconditionally closes the iterator. Thus to remove this discrepancy I propose to add the close method to the IteratorType: {{{ type IteratorType.<T> = { next: function () : T; close: function () : void }; }}} and always define for (i in o) body as: {{{ let ($it = iter.iterator::get(true)) { try { while (true) { try { i = $it.next(); } catch (e : iterator::StopIterationClass) { break; } body } } finally { $it.close(); } } }}} The close method will also allow for a script to explicitly close the iterator releasing its resources if the script needs to code the iterations explicitly instead of using for-in loop.

Changed 1 year ago by brendan

The downside of adding close to IteratorType? is that it requires all compatible objects to implement two methods. There are many iterators that do not want or need close, e.g. range/xrange implemented in JS, or other cursor-based iterators that do not hold object references.

Here is a revised IteratorType? that also takes advantage of the return-this capability (see self type), all with |use default namespace iterator| in effect:

type IteratorType.<T> = {
  iterator::get: function (boolean=) : this,
  next: function () : T
};
// To separate closeability from the iteration protocol, we add:
type CloseableType = {
  close: function () : void
};

The Generator nominal type is a subtype of both IteratorType? and CloseableType?, but other iterator-protocol implementors could match CloseableType? too, and have their close methods called automatically on exit from for-in constructs that start the iteration.

This raises a design issue: should the close be automated for any for-in that starts an iteration, i.e., calls next for the first time on a newborn iterator object? Or, should close be automated only if the for-in construct first called iterator::get to create or find the iterator, and that iterator was started by the loop?

The latter alternative rule for when close is automated has the advantage that only with contortions can the iterator escape the loop. IOW, the first example in this ticket's description would not see close called for the generator (or any closeable iterator) case.

Comments?

/be

Changed 1 year ago by ibukanov

There are many iterators that do not want or need close, e.g. range/xrange implemented in JS, or other cursor-based iterators that do not hold object references.

Even in those cases explicit close with setting fields to null helps GC to find the garbage faster.

But I would like to emphasis that the main reason for the close method is to remove that discrepancy regarding using the generator after the loop and simplify the specs and implementations.

For example, since it is not possible to predict if a particular for-in loop iterates over a generator or iterator, an implementation will have to add a hidden finally to any for-in loop to call close for a generator. With the current specs the implementation will have to issue an extra check to see if the iterator is closable in that finally.

Or consider how one would write a zip-iterator that yilds a sum of 2 sequences formed by 2 iterators. Here one must manipulates the iterators explicitly and proper implementation that ensures that both generators are always closed will look uglier with extra checks in the finally blocks for Generator or Closable instances.

The bottom line is that close-for-iterators complicates life for those who code against explicit iterator protocol instead of using generators since they have to add an empty close method. But it simplifies the life for the iterator users, implementation programmers and spec writers as they no longer needs to differenciate between generators and iterators.

This raises a design issue: should the close be automated for any for-in that starts an iteration, i.e., calls next for the first time on a newborn iterator object? Or, should close be automated only if the for-in construct first called iterator::get to create or find the iterator, and that iterator was started by the loop?

But how one distinguish between those 2 close/not close cases given that for-in always calls iterator::get? Should an implementation now have to check if iterator::get returns the same object and call the close method later only if objects are different? This is very fragile.

Or consider {{ for (let i in iter) use_i; }} versus {{ for (let i in debug_printer(iter)) use_i; }} where iter is already existing generator instance. To preserve that close/not close semantics debug_printer will have to implement the iterator protocol explicitly instead oif using straightforward:

{{function some_filter(iter) {

for (let i in iter) {

print(i); yield i;

}

} }}

as that will cause iter to be closed at the end of iterations.

Changed 1 year ago by brendan

To weigh the trade-off between close in IteratorType? vs. CloseableType?, it would help to survey iterator use-cases in Python. Indeed just in Lib/*.py of Python 2.5 source, most iter definitions return self, less commonly iter(something-else), and as or more often iter is defined as (or delegated to) a generator function.

This change is worth a thread at es4-discuss@mozilla.org. I will post in a bit.

On the when-to-automate-close question: sorry, we've been over this, I should have remembered. The only sane way is to auto-close a newborn iterator that for-in starts (however it came to be used by the for-in), as you note here.

/be

Changed 1 year ago by brendan

  • owner set to brendan

Changed 1 year ago by brendan

  • cc set to lth, dherman
  • component changed from Proposals to RefImpl

TG1 agrees close universal for all iterator types is better. I'll update the iterators and generators wiki page before re-export.

/be

Changed 7 months ago by brendan

There's no protocol by which for-in can tell if an arbitrary iterator is unstarted, so I do not see how to automate close calling for those iterators started by a for-in loop.

Generators are a special case. The runtime knows the nominal type of the one and only generator implementation, so it can detect an unstarted generator-iterator and remember to close it on all exit paths from the loop.

It's too late to invent more protocol, and we lose if the iterator structural type has more methods than the Pythonic next method. I suggest this bug should be wontfix'ed. Igor, what do you think?

/be

Note: See TracTickets for help on using tickets.