Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
EUROPE | |
MIDDLE EAST | |
ASIA | |
AFRICA | |
............................. | |
United States of America | |
United Kingdom | |
Canada | |
Australia | |
Italy | |
France | |
Brazil | |
Germany | |
Malaysia | |
Turkey | |
China | |
Taiwan | |
Japan | |
Saudi Arabia | |
Jordan | |
Egypt | |
United Arab Emirates | |
India | |
Nigeria |
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Fadi Sindran, Firas Mualla, Tino Haderlein, Khaled Daqrouq, Elmar Nöth
Pages - 38 - 53 | Revised - 30-11-2016 | Published - 31-12-2016
MORE INFORMATION
KEYWORDS
Statistical Studies, Standard Arabic, Phonetic Transcription, Phonetization, Ranked Frequency Distribution, Phonemes, Allophones, Syllables, Allosyllables, Fit of Equation.
ABSTRACT
Statistical studies based on automatic phonetic transcription of Standard Arabic texts are rare, and even though studies have been performed, they have been done only on one level - phoneme or syllable - and the results cannot be generalized on the language as a whole. In this paper we automatically derived accurate statistical information about phonemes, allophones, syllables, and allosyllables in Standard Arabic. A corpus of more than 5 million words, including words and sentences from both Classical Arabic and Modern Standard Arabic, has been prepared and preprocessed. We developed a software package to accomplish a rule-based automatic transcription from written Standard Arabic text to the corresponding linguistic units at four levels: phoneme, allophone, syllable, and allosyllable. After testing the software on four corpora including more than 57000 vocabulary words, and achieving a very high accuracy (> 99 %) on the four levels, we used this software as a reliable tool for the automatic transcription of the corpus used in this paper and evaluated the following: 1) the vocabulary phonemes, allophones, syllables, and allosyllables with their specific percentages in Standard Arabic. 2) the best curve equation from the distribution of phonemes, allophones, syllables, and allosyllables normalized frequencies. 3) important statistical information, such as percentage of consonants and vowels, percentage of the consonants classified by the place and way of articulation, the transition probability matrix between phonemes, and percentages of syllables according to the type of syllable, etc.
A. A.-R. A. Ibrahim, [The syllable system in Surat al-Baqara] (in Arabic), Master's thesis, Arabic Department, Faculty of Arts, Islamic University Gaza, Palestine (2006). | |
A. al-Shaizari. [Nihayet al Rutba fi Talab al-Hisba] (in Arabic: “نÙهَايَة٠الرّÙتبَة٠ÙÙÙŠ طَلَب٠الØÙسبَةâ€). [On-line]. Available: http://shamela.ws/browse.php/book-21584 [October 13, 2016]. | |
A. H. Moussa, [Computerization of the Arab heritage] (in Arabic: Øَوسَبَة٠التّÙرَاث٠العَرَبÙÙŠ). Internet: http://majma.org.jo/res/seasons/19/19-1.pdf, [October 15, 2016]. | |
A. Lüdeling, M. Kytö, Eds.: "Corpus linguistics: an international handbook". Berlin, Mouton de Gruyter, 2008. Vol. 2, pp. 803-821. | |
A. Masmoudi, M. Ellouze Khemakhem, Y. Estéve, L. Hadrich Belguith, N. Habash, "A corpus and phonetic dictionary for Tunisian Arabic speech recognition," in: LREC, 2014, pp. 306-310. | |
Arpabet, Internet: https://en.wikipedia.org/wiki/Arpabet [October 23, 2016]. | |
D.M.W. Powers, "Applications and explanations of Zipf's law". Association for Computational Linguistics, 1998, pp. 151-160. | |
Evaluating Goodness of Fit. Internet: https://de.mathworks.com/help/curvefit/evaluating-goodness-of-fit.html?requestedDomain=www.mathworks.com [October 26, 2016]. | |
F. Sindran, F. Mualla, T. Haderlein, K. Daqrouq, E. Nöth.G. "Rule-Based Standard Arabic Phonetization at Phoneme, Allophone, and Syllable Level." International Journal of Computational Linguistics (IJCL), vol. 7, pp. 23-37, Dec. 2016. | |
I. AbuSalim, [The syllabic structure in Arabic language] (in Arabic: البÙنيَة٠المَقطَعÙيَّة٠ÙÙÙŠ اللّÙغَة٠العَرَبÙيَّة), Magazine of the Jordan Academy of Arabic 33 (1987), pp. 45–63. | |
I. al-Haytami. [Tuhfatu’l Muhtaj fi Sharh Al-Minhaj] (in Arabic: “تÙØÙَة٠المÙØتَاج٠ÙÙÙŠ شَرØ٠المÙنهَاجÙâ€). [On-line]. Available: http://shamela.ws/browse.php/book-9059 [October 13, 2016]. | |
K. Bobzin. [Arabic Basic Course] (in German: "Arabisch Grundkurs"). Wiesbaden, Germany: Harrassowitz Verlag, 2009. | |
M. al-Bukhari. [Sahih al-Bukhari] (in Arabic: “صَØÙÙŠØ٠البÙخَارÙÙŠâ€). [On-line]. Available: http://shamela.ws/browse.php/book-1681 [October 13, 2016]. | |
M. Alghamdi, A. H. Alhamid, M. M. Aldasuqi, "Database of Arabic Sounds: Sentences," Technical Report, King Abdulaziz City of Science and Technology, Saudi Arabia, 2003. (In Arabic). | |
M. Alghamdi, Y. O. M. El Hadj, M. Alkanhal, "A manual system to segment and transcribe Arabic speech," in: IEEE International Conference on Signal Processing and Communications (ICSPC), 2007, pp. 233-236. | |
M. Elshafei, H. Al-Muhtaseb, M. Alghamdi, "Statistical methods for automatic diacritization of Arabic text," in: The Saudi 18th National Computer Conference. Riyadh, 2006. | |
M. Zeki, O.O. Khalifa, A.W. Naji, "Development of an arabic text-to-speech system," in: International Conference on Computer and Communication Engineering (ICCCE), 2010. | |
S. Harrat, M. Abbas, K. Meftouh, K. Smaili, "Diacritics restoration for Arabic dialects," in: 14th Annual Conference of the International Speech Communication Association (Interspeech), 2013, pp. 1429-1433. | |
S. Razi. [Nahj al-Balagha] (in Arabic: “نَهج٠البَلَاغَةâ€). [On-line]. Available: http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf [October 13, 2016]. | |
Y. Tambovtsev, C. Martindale, "Phoneme frequencies follow a yule distribution," SKASE Journal of Theoretical Linguistics 4 (2007), pp. 1-11. | |
[Holy Bible] (in Arabic: “الكÙتَاب٠المÙقَدَّسâ€). [On-line]. Available: http://ar.arabicbible.com/arabic-bible/word.html [October 13, 2016]. | |
[Holy Qur’an] (in Arabic: “القÙرآَن٠الكَرÙيمâ€). [On-line]. Available: http://www.holyquran.net/quran/index.html [October 13, 2016]. | |
[The Mecca list of common vocabulary] (in Arabic: “قَائÙمَة٠مَكَّةَ Ù„ÙلمÙÙرَدَات٠الشَّائÙعَةâ€). [On-line]. Available: http://daleel-ar.com/2016/09/08/قائمة-مكة-للمÙردات-الشائعة/ [October 13, 2016]. | |
Mr. Fadi Sindran
Faculty of Engineering/ Department of Computer Science
Pattern Recognition Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg
Erlangen, 91058, Germany - Germany
fadi.sindran@faui51.informatik.uni-erlangen.de
Mr. Firas Mualla
Faculty of Engineering/ Department of Computer Science
Pattern Recognition Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg
Erlangen, 91058, Germany - Germany
Dr. Tino Haderlein
Faculty of Engineering/ Department of Computer Science
Pattern Recognition Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg
Erlangen, 91058, Germany - Germany
Professor Khaled Daqrouq
Department of Electrical and Computer Engineering
King Abdulaziz University, Jeddah, 22254, Saudi Arabia - Saudi Arabia
Professor Elmar Nöth
Faculty of Engineering/ Department of Computer Science
Pattern Recognition Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg
Erlangen, 91058, Germany - Germany
|
|
|
|
View all special issues >> | |
|
|