-
Notifications
You must be signed in to change notification settings - Fork 1
Description
There are different kinds of substrings that can be supported by this library. Currently, the implementation supports substrings with respect to chars, but some users will likely want substrings with respect to graphemes instead. Word and sentence substrings could also be supported using the relevant unicode standards.
Altogether, I see the following substring variants being possible:
- CharSubstring: respect to chars, which are unicode scalar values.
- GraphemeSubstring: respect to graphemes (using unicode-segmentation).
- WordSubstring: respect to words (using unicode-segmentation).
- SentenceSubstring: respect to sentences (using unicode-segmentation).
- ByteSubstring: respecting individual bytes (equivalent to slicing the string).
Since we are already looking at a breaking change with #9, the Substring trait can be renamed to CharSubstring (so there is no ambiguity between substring variants). The unicode-segmentation variants (grapheme, word, and sentence) can be guarded behind a unicode feature (or perhaps separate features for each?). The byte variant can be held off on for now, since it really isn't needed and presents issues with properly-formed strings.
This solution will give maximum clarity as to what this crate offers, and will give flexibility for users to choose from the various types of substrings offered.