-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
The performance of the String class is rather poor. This is because the methods call utf8_decode() all the time. This is a consequence of the design decision to have the String be an UTF-8 string internally and have it present itself as a string of characters rather than bytes.
It's probably better to have both a UTF-8 byte-string String class and a UTF-32 String32 or uString class and let the programmer decide what she wants to use.
For example, like in Python:
>>> s = '普通话/普通話'
>>> s
'\xe6\x99\xae\xe9\x80\x9a\xe8\xaf\x9d/\xe6\x99\xae\xe9\x80\x9a\xe8\xa9\xb1'
>>> len(s)
19
>>> s[0]
'\xe6'
>>> s[1]
'\x99'
>>> s[2]
'\xae'
>>> us = u'普通话/普通話'
>>> len(us)
7
>>> us[0]
u'\u666e'
(This example demonstrates behavior of len and operator[]).
Note that changing the design of String is a major change that would break backwards compatibility.
Metadata
Metadata
Assignees
Labels
No labels