Skip to content

casholab/language-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Casho Lab Languages

The term "language" can mean many different things depending on the context. There are macro-languages which describe multiple sub languages and dialects. There are some languages which are spoken the same but written differently. There are some languages which are written the same but spoken differnetly. Some languags have multiple scripts, directions, dialetcs, regionalities etc.

This library helps give an easy way to select a singluar specific language given all of these circumstances.

This library is focused primaily on written languages

There are 1000s of languages (around 7000 in iso 639-3)

Around 400 languages are more commonly used and have iso 639-2 codes

Around 200 languages have 2 letter codes

Language Standards Explanations

ISO 639-1 describes the most common languages it is further expanded by ISO 639-2 and ISO 639-3 to give codes to over 7000 "languages"

Regionalities, scripts, and other variants are described by ISO 3166 and use the format of BCP 47

language selector

To turn this into a language selector we can combine these features and based off what languages we select from we can implement differnt selection screens.

We can include flags that represent countries in which the languages are commonly spoken. We can arrange them by the amount of speakers or by the percentage of speakers in those countries.

We can display the language name in both the users language (or a standard language (english)) and then also display the name of the language in its own language/script.

We can make it searchable by either of those terms

We can make it so once a language is selected then the region is a secondary selection window. (or we can display regionality in the original list)

We can also do the same with language scripts. either showing them on the original list or in a subselection screen.

disclaimers and notes

ai/llms

Certain elements were generated using LLMs. These can have hallucinations and errors.

If you feel that any of the data is incorrect you can raise an issue. Please back it with 1 external source. Language are ever evolving so some data may be "wrong" for certain regions or peoples.

This library chooses to be opinionated about singular default values. We choose the data value based on global and computer standards (ISO, BPC). For elements that do not have standards to reference we first take an opinonated approach and the authors decide on a value. If the authors do not have an opinion or information about a value we consult multiple AI's and then have a grader AI select the final value.

The llm data generation scripts are all documented. Depending on the "cruciality" of the data and cost concerns only one llm may have been used or multiple llms.

Due to cost constraints and copyright infrigment concerns we did not use Search or external sources and all results are from the LLMs themselves, personal opinions, and open source standards.

flags

Flags are in svg format. Some of the svg are over 300kb most are much smaller. They were gathered from an open source library at

Some people feel flags do not represent languages well. But I feel they do. They handle some strange edge cases. (a deaf user who is illiterate) A user who does not speak either of the languages shown for instance a south korean speaker trying to find chinese but they do not speak english or chinese.

In these cases cases flags do provide the closest approximation of information. And that's why I feel they are useful.

Flag order is something which is heavily debatable. We built our flag orders by weighing, number of speakers, percentage of L1(primary language) speakers in that country. We did not use historical or name context.

license

This is free to use for anyone. You may not copyright or claim this information but you may use this for any commercial usecase.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published