Skip to content

Problem with Arabic transliteration #7

@ronaldtse

Description

@ronaldtse

From @gilgameshjw 's run using GNDB data.

  • ara is the source
  • ara_diacri is the diacriticized Arabic produced with rababa
  • DEST_FULL_NAME_RO is the manual transliteration provided in GNDB
  • ara_latinised is the output of Interscript
ara ara_diacri ara_latinised DEST_FULL_NAME_RO index dist_edit dist_jaro_winkler
0 گرجان گرِجانَ grijna 0 girjān 0.666667
1 چم كورك چمَ كُوَرِكَ chma kūarika 1 cham kūrik 0.400000
2 وادي نوباندي وَادِي نُوبَانْدِي wādī nūbāndī 2 wādī nūbāndī 0.000000
3 وادي خازيانلي وَادِي خَازِيَانْلِيٍّ wādī khāziyānlīyin 3 wādī khāzyānlī 0.285714
4 وادي ام بطمة وَادِي امْ بُطْمَةَ wādī am buṭmata 4 wādī umm buţmah 0.333333
... ... ... ... ... ... ...
89 القباقب القَبَاقِبُ al-qabāqibu 89 al qabāqib 0.200000
90 العِقلة العَقْلَةِ al-‘aqlahi 90 al ‘iqlah 0.333333
91 الظهرور الظُّهْرُورُ al-ẓẓuhrūru 91 az̧ z̧ahrūr 0.636364
92 أم الدنانير أَمْ الدَّنَانِيرَ am al-ddanānīra 92 umm ad danānīr 0.428571
93 أرض الرجوم أَرْضِ الرُّجُومِ arḍi al-rrujūmi 93 arḑ ar rujūm 0.500000

Clearly there is some difference in certain entries, if you look at 91 and 93, the transliteration system is different.

@gilgameshjw can you help confirm:

  • which GNDB dataset are you using?
  • which transliteration system are you using?

Method to easily reproduce this output? 😉 Thanks!

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions