-
Notifications
You must be signed in to change notification settings - Fork 0
Project Goals
Elmurod Talipov edited this page Jul 20, 2018
·
2 revisions
- Implement database abstraction library for managing word database.
- Create word database with the following Part-Of-Speech tags:
| # | Uzbek | English | Tag |
|---|---|---|---|
| 1 | От | Noun | NOUN |
| 2 | Феьл | Verb | VERB |
| 3 | Сифат | Adjective | ADJ |
| 4 | Равиш | Adverb | ADV |
| 5 | Олмош | Pronoun | PRON |
| 6 | Сон | Numeral | NUM |
| 8 | Боғловчи | Conjunction | CONJ |
| 9 | Юклама | Particle | PART |
| 10 | Ундов сўз | Interjection | INTJ |
| 11 | Тақлид сўз | Onomatopoeic words | X |
| 12 | Модал сўз | Modal words | AUX |
| 13 | Кўмакчи | Postposition | ADP |
This may not be fully compatible with the Universal Dependencies POS tags.
- Perform statistical analysis on the usage of words, generate a table using PDF books and news feed.
- Implement stemming rules and a table within the database (for the fast reference), table for entity names.
- Implement basic natural language analysis tool that would provide functionalities such tokenizing, parsing, stemming, POS tagging, named entity recognition, etc.
Expected deliverable: The tool tahlih gets an input text in Uzbek language and generates parsed (tokenized, stemmed, POS tagged) output.
- Manually (Semi-automatically) generate Uzbek Treebank in CoNLL-U format that can be contributed to Universal Dependencies (UD) Framework.
- Feed Uzbek Treebank to SyntaxNet, and perform analysis, training, improving
- Implement initial applications with Uzbek NLU project: Telegram Q&A Bot, Twitter Bot, news summarizer.
- TBD