Ticket #261 (new defect)

Opened 1 year ago

RegExp: Allow leading, unescaped "]" in character classes

Reported by: StevenLevithan Assigned to: anonymous
Type: defect Priority: major
Milestone: Component: Proposals
Version: 4 Keywords:
Cc: lth, brendan

Description

ECMAScript 4 proposals indicate that "/" will be allowed to appear unescaped within character classes within regex literals, for compatibility with IE. That is definitely a positive change, IMO.

However, there is an additional character class syntax change I would strongly recommend for the same reason: a leading, unescaped "]" within a character class should be considered a literal character and therefore should not end the class (this would also apply to embedded character classes used for class subtraction and intersection). This handling would be consistent with Internet Explorer's native handling, and is also true when using Perl, PCRE, .NET, Python, Ruby, JGsoft, and just about any other regex engine including the traditional Unix regular expression engines I have used. However, in Firefox 2.0.0.8 the following quirks can be experienced (for the record, I have not closely examined ECMA-262v3 to determine how this should be handled according to that standard):

  • "[^]" is equivalent to "[\S\s]", although it should throw an error when there is no following "]"
  • "[^]]" is equivalent to "[\S\s]]", although it should be equivalent to "[^\]]" or "(?!])[\S\s]"
  • "[]" is equivalent to "(?!)" (which will never match), although it should throw an error when there is no following "]"
  • "[]]" is equivalent to "(?!)]" (which will never match), although it should be equivalent to "[\]]" or "]"

Currently, it is impossible to parse a regular expression's source in JavaScript? in a cross-browser fashion without browser sniffing, behavior testing, or first converting leading "]" characters within character classes to "\]".

This brings up a related question I am interested in, which I have not seen addressed at http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps or the accompanying discussion page. What will "[&&[a-z]]" and "[^&&[a-z]]" mean in ECMAScript 4? java.util.regex considers both patterns equivalent to "[a-z]", which is an interpretation I think is intuitive and ideal. However, it is not consistent with Firefox's current handling of empty character classes.

This issue also affects something like /[ ]/x, if ECMAScript 4 does not abandon the idea of applying the /x flag within character classes (see ticket #254).

Attachments

Note: See TracTickets for help on using tickets.