Ticket #213 (new defect)

Opened 3 years ago

Last modified 2 years ago

Unicode: we need to distinguish external and internal formats

Reported by: lth Assigned to: lth
Type: defect Priority: major
Component: Proposals Version: 4
Keywords: unicode Cc:

Description

The Update Unicode proposal contains language that is (on the surface at least) incompatible with JSON and encodeURI/decodeURI. The proposal says that if a character is encoded in a string as a surrogate pair, then the implementation must keep the pair as two separate code points -- this is to be compatible with 16-bit implementations (all existing implementations). Yet JSON and URI decoders require such pairs to be merged into a single Unicode character.

This issue is probably best resolved by distinguishing between ECMAScript string data on the one hand and external representations of ECMAScript data as produced by encodeURI and the JSON encoder on the other hand, and noting that these are not the same thing even if the data produced by those encoders are represented as ECMAScript strings.

Attachments

Change History

Changed 3 years ago by lth

Waldemar says that our current Update Unicode proposal violates the Unicode spec, because that spec requires surrogate pairs to be merged in implementations that support 32-bit Unicode, and that that is the source of our woes.

Brendan is on record as not willing to violate the Unicode spec. The choices are going back to ES3 (UTF-16), breaking working code (by merging characters in some implementations), or going forward to full Unicode in all cases.

Changed 3 years ago by lth

  • owner set to lth

Changed 3 years ago by lth

Also see #37.

Changed 2 years ago by David-Sarah Hopwood

  • keywords set to unicode

Will Harmony still require UTF-16?

Note: See TracTickets for help on using tickets.