Skip to content

Adding support for languages with discernible delimiters #40

@arnavkapoor

Description

@arnavkapoor

Languages without delimiters - Japanese and Chinese (Simplified, Traditional) and possibly other east Asian languages don't have any delimiter. eg) 九千九百九十九 (9999 in Japanese). These actually have a very similar structure compared to English but the lack of a delimiter makes it tougher.
Also, there isn't a delimiter as such (upto a certain number) for German and Dutch .

One approach in mind for the delimiter thing is reading words character by character and as soon as we have a match in any of the words we insert a space and after this pre-processing step, we can follow the same logic. This does increase the complexity O(string_length ^ 2) which shouldn't be a major issue I believe. (We can use this function only for certain languages without delimiters).

Concrete example

five thousand nine hundred and thirteen - English (5913) 
fünftausendneunhundertdreizehn - German (5913)

nine hundred and thirteen - English (913)
negenhonderddertien - Dutch (913)

To handle this we first check f , fü, fün and finally hit fünf = 5 and similary get negen = 9 and insert a space and then start again from the next character.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions