Ticket #286 (new defect)

Opened 1 year ago

Last modified 1 year ago

RegExp: can String.prototype.replace, called with a substitution function, substitute for named submatches?

Reported by: lth Assigned to: anonymous
Type: defect Priority: major
Milestone: Component: Proposals
Version: 4 Keywords:
Cc: StevenLevithan,brendan

Description

Steven Levithan points out that since .replace() calls its substitution function with the submatches broken out as individual arguments and not contained in a match result object then the substitution function cannot look up submatches by name. This seems a shame.

Recall: the substitution function is called with m+3 arguments, where m is the number of capturing submatches. The first argument is the string that matched, the next m arguments are the submatches (strings or undefined); argument m+2 is the offset in the string of the match and argument m+3 is the search string itself.

We can't add arguments at the beginning or at the end, since programs that are not tailored to one particular regular expression will invariably use the number of arguments passed as an indicator of the number of submatches.

Steven suggests that perhaps one of the first m+1 arguments can be a String object with additional properties (for example, the first argument can carry properties for all the named submatches), but he's worried that that may break programs.

E262-3 does not, as far as I can tell, require the first m+1 arguments to be primitive string values; the spec only says "string". But all browsers currently generate primitive strings here.

I think Steven's approach is a little too brittle -- there are real risks of name clashes with properties on the prototype -- but a slight adjustment makes it better: We could specify that the first argument is a String object with a single argument "matchResult", either if there are named submatches or always. (I'd favor the latter.) The value of the property would be a standard match result, probably -- I haven't checked all the details, but that's the working assumption.

Here's a competing proposal: If a submatch has a name then the value passed to the substitution function in that argument position will be a String, and it will have a dynamic property called "name" that holds the name of the property.

In either case, we could choose to put the properties into namespace "regexp" to avoid all risk of name clashes with prototype properties (regexp::matchResult, regexp::name), current or future, at the cost of introducing yet another namespace.

Attachments

Change History

Changed 1 year ago by StevenLevithan

The competing proposal, as I understand it, offers little benefit over ECMA-262v3 since the user will not typically care about using the name for anything other than accessing the value of the submatch, and doesn't accomplish the goal of using names rather than knowledge of backrefernce ordering to look up submatches. I.e., you can already do something like this in ES3:

"12".replace(/(.)(.)/, function ($0, a, b) { return "value: " + a; });

// returns "value: 1"

My understanding of the competing proposal:

"12".replace(/(?<a>.)(?<b>.)/, function ($0, a, b) { return "value: " + a + ", name: " + a.name; });

// would return "value: 1, name: a"

Attaching named submatches as properties to arguments[0] is indeed a bit brittle, but IMO no more so than attaching named submatches to a match result object (e.g. via var match = /(?<a>.)(?<b>.)/("12");). I think it would be best to be consistent and either use an object named matchResult or similar (.NET and Python use groups / group) with both match result objects and arguments[0], or don't use a matchResult object with either.

As for using namespace "regexp", I don't currently have enough knowledge of this feature in ES4 to contribute meaningfully.

Note: See TracTickets for help on using tickets.