-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
From @gilgameshjw 's run using GNDB data.
- ara is the source
- ara_diacri is the diacriticized Arabic produced with rababa
- DEST_FULL_NAME_RO is the manual transliteration provided in GNDB
- ara_latinised is the output of Interscript
| ara | ara_diacri | ara_latinised | DEST_FULL_NAME_RO | index | dist_edit | dist_jaro_winkler |
|---|---|---|---|---|---|---|
| 0 | گرجان | گرِجانَ | grijna | 0 | girjān | 0.666667 |
| 1 | چم كورك | چمَ كُوَرِكَ | chma kūarika | 1 | cham kūrik | 0.400000 |
| 2 | وادي نوباندي | وَادِي نُوبَانْدِي | wādī nūbāndī | 2 | wādī nūbāndī | 0.000000 |
| 3 | وادي خازيانلي | وَادِي خَازِيَانْلِيٍّ | wādī khāziyānlīyin | 3 | wādī khāzyānlī | 0.285714 |
| 4 | وادي ام بطمة | وَادِي امْ بُطْمَةَ | wādī am buṭmata | 4 | wādī umm buţmah | 0.333333 |
| ... | ... | ... | ... | ... | ... | ... |
| 89 | القباقب | القَبَاقِبُ | al-qabāqibu | 89 | al qabāqib | 0.200000 |
| 90 | العِقلة | العَقْلَةِ | al-‘aqlahi | 90 | al ‘iqlah | 0.333333 |
| 91 | الظهرور | الظُّهْرُورُ | al-ẓẓuhrūru | 91 | az̧ z̧ahrūr | 0.636364 |
| 92 | أم الدنانير | أَمْ الدَّنَانِيرَ | am al-ddanānīra | 92 | umm ad danānīr | 0.428571 |
| 93 | أرض الرجوم | أَرْضِ الرُّجُومِ | arḍi al-rrujūmi | 93 | arḑ ar rujūm | 0.500000 |
Clearly there is some difference in certain entries, if you look at 91 and 93, the transliteration system is different.
@gilgameshjw can you help confirm:
- which GNDB dataset are you using?
- which transliteration system are you using?
Method to easily reproduce this output? 😉 Thanks!
Metadata
Metadata
Assignees
Labels
No labels