Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
EUROPE | |
MIDDLE EAST | |
ASIA | |
AFRICA | |
............................. | |
United States of America | |
United Kingdom | |
Canada | |
Australia | |
Italy | |
France | |
Brazil | |
Germany | |
Malaysia | |
Turkey | |
China | |
Taiwan | |
Japan | |
Saudi Arabia | |
Jordan | |
Egypt | |
United Arab Emirates | |
India | |
Nigeria |
Language Identifier for Languages of Pakistan Including Arabic and Persian
Qaiser Abbas, M. S. Ahmad, Sadia Niazi
Pages - 27 - 35 | Revised - 30-11-2010 | Published - 20-12-2010
MORE INFORMATION
KEYWORDS
, Identifier, Probabilistic, HAIL, Digrams, LIJ
ABSTRACT
Language recognizer/identifier/guesser is the basic application used by humans to identify the language of a text document. It takes simply a file as input and after processing its text, decides the language of text document with precision using LIJ-I, LIJ-II and LIJ-III. LIJ-I results in poor accuracy and strengthen with the use of LIJ-II which is further boosted towards a higher level of accuracy with the use of LIJ-III. It also helps in calculating the probability of digrams and the average percentages of accuracy. LIJ-I considers the complete character sets of each language while the LIJ-II considers only the difference. A JAVA based language recognizer is developed and presented in this paper in detail.
1 | Abbas, Q. (2014, August). Semi-semantic part of speech annotation and evaluation. In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association of Computational Linguistics, P (pp. 75-81). |
2 | Abbas, Q. (2014). Building Computational Resources: The URDU. KON-TB Treebank and the Urdu Parser (Doctoral dissertation). |
3 | Abbas, Q. (2014). A Stochastic Prediction Interface for Urdu. International Journal of Intelligent Systems and Applications (IJISA), 7(1), 94. |
4 | Khanam, M. H. experiments in probabilistic context free grammar for urdu language. |
5 | Abbas, Q. (2012). Building a hierarchical annotated corpus of urdu: the URDU. KON-TB treebank. In Computational Linguistics and Intelligent Text Processing (pp. 66-79). Springer Berlin Heidelberg. |
Gary Adams and Philip Resnik. “A language identification application built on the Java clientserver platform”. In Jill Burstein and Claudia Leacock, editors, From Research to Commercial Applications: Making NLP Work in Practice, pages 43--47. Association for Computational Linguistics, 1997 | |
Bashir Ahmed, Sung-Hyuk Cha, and Charles Tappert. “Language identification from text using n-gram based cumulative frequency addition”. In Proc. of CSIS Research Day, pages 12.1–12.8, Pace University, NY, 2004. | |
C. Kruengkrai, P. Srichaivattana, V. Sornlertlamvanich, and H. Isahara. “Language identification based on string kernels”. In Proceedings of the 5th International Symposium on Communications and Information Technologies, 2005. | |
Cavnar, William B., Trenkle, M. “N-gram based text categorization”, InProceedings of the third Annual Symposium on Document Analysis and Information Retrieval, pp161-169, 1994. | |
Charles M. Kastner, G. Adam Covington, Andrew A. Levine, John W. Lockwood, “HAIL: A HARDWARE-ACCELERATED ALGORITHM FOR LANGUAGE IDENTIFICATION”, 15 th Annual conference on Field Programmable Logic and Applications (FPL), USA, 2005. | |
D. Schuehler and J. Lockwood, “A Modular System for FPGA-based TCP Flow Processing in High-Speed Network,” in 14th International Conference on Field Programmable Logic and Applications (FPL), Antwerp, Belgium, pp. 301–310, 2004. | |
Hisham El-Shishiny, Alexander Troussov, “Word Fragments Based Arabic Language Identification”, NEMLAR, Arabic language Resources and Tools Conference, Cairo, Egypt, 2004. | |
Hussain, S. “Computational Linguistics in Pakistan: Issues and Proposals”, In the Proceedings of EACL (Workshop in Computational Linguistics for Languages of South Asia), Hungary, 2003. | |
Hussain, S., Karamat N., Mansoor, A. “Arabic Script Internationalized Domain Names”, In the Proceedings of the CIIT Workshop on Research in Computing, CWRC’08, CIIT Lahore, Pakistan, 2008. | |
J. Lockwood, J. Turner, and D. Taylor, “Field Programmable Port Extender (FPX) for Distributed Routing and Queuing” in ACM International Symposium on Field Programmable Gate Arrays (FPGA), 2000. | |
K. R. Beesley. “Language identifier: A computer program for automatic natural-language identification on on-line text”. In Proceedings of the 29th Annual Conference of the American Translators Association, pages 47—54, USA, 1988. | |
M.G.A. Malik, “Towards Unicode Compatible Punjabi Character Set”, Proceeding of 27 th Internationalization and Unicode Conference, Berlin, Germany, 2005,. | |
Tejinder Singh Saini1 and Gurpreet Singh Lehal2, “Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach”, Research in Computing Science (Mexico), Vol-33, Pages 151-162. USA, 2008. | |
V. Berlian, S.N. Vega, and S. Bressan, “Indexing the Indonesian web: Language identification and miscellaneous issues”, In the Tenth International World Wide Web Conference, Hong Kong, 2001 | |
] Hussain, S. “Urdu Collation Sequence”, In the Proceedings of the IEEE International MultiTopic Conference, Islamabad, Pakistan, 2003. | |
Mr. Qaiser Abbas
University of Sargodha - Pakistan
qaiser.abbas@uos.edu.pk
Mr. M. S. Ahmad
University of Sargodha - Pakistan
Mr. Sadia Niazi
University of Sargodha - Pakistan
|
|
|
|
View all special issues >> | |
|
|