Support Different Kinds of Substrings

There are different kinds of substrings that can be supported by this library. Currently, the implementation supports substrings with respect to `char`s, but some users will likely want substrings with respect to graphemes instead. Word and sentence substrings could also be supported using the relevant unicode standards. 

Altogether, I see the following substring variants being possible:

- CharSubstring: respect to chars, which are unicode scalar values. 
- GraphemeSubstring: respect to graphemes (using unicode-segmentation).
- WordSubstring: respect to words (using unicode-segmentation).
- SentenceSubstring: respect to sentences (using unicode-segmentation).
- ByteSubstring: respecting individual bytes (equivalent to slicing the string).

Since we are already looking at a breaking change with #9, the `Substring` trait can be renamed to `CharSubstring` (so there is no ambiguity between substring variants). The unicode-segmentation variants (grapheme, word, and sentence) can be guarded behind a `unicode` feature (or perhaps separate features for each?). The byte variant can be held off on for now, since it really isn't needed and presents issues with properly-formed strings. 

This solution will give maximum clarity as to what this crate offers, and will give flexibility for users to choose from the various types of substrings offered. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Different Kinds of Substrings #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support Different Kinds of Substrings #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions