Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
EUROPE | |
MIDDLE EAST | |
ASIA | |
AFRICA | |
............................. | |
United States of America | |
United Kingdom | |
Canada | |
Australia | |
Italy | |
France | |
Brazil | |
Germany | |
Malaysia | |
Turkey | |
China | |
Taiwan | |
Japan | |
Saudi Arabia | |
Jordan | |
Egypt | |
United Arab Emirates | |
India | |
Nigeria |
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on Combinatorial Explosion
Michal Ptaszynski, Rafal Rzepka, Yoshio Momouchi
Pages - 24 - 36 | Revised - 01-07-2011 | Published - 05-08-2011
MORE INFORMATION
KEYWORDS
Computational Linguistics, Information Retrieval and Extraction, Corpus Linguistis
ABSTRACT
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
1 | NAKAJIMA, Y., PTASZYNSKI, M., HONMA, H., & MASUI, F. (2016). An Extraction Method for Future Reference Expressions Using Morphological and Semantic Patterns. |
2 | Sakuta, H., & Adachi, E. How Differently Do We Talk? A Study of Sentence Patterns in Groups of Different Age, Gender and Social Status. |
3 | NAKAJIMA, Y., PTASZYNSKI, M., HONMA, H., & MASUI, F. (2014). FAN-14-029 Extraction of Future Reference Expressions in Trend Information. ? nn te ri ji e nn Suites ? su Te Rousseau · ? nn Polyster ji ? Rousseau Lecture Proceedings, 2014 (24) , 129-134. |
4 | Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). First Glance on Pattern-based Language Modeling. Language Acquisition and Understanding Research Group (LAU), Technical Reports, Summer. |
5 | Nakajima, Y., Ptaszynski, M., Honma, H., & Masui, F. (2014, March). Investigation of Future Reference Expressions in Trend Information. In Proceedings of the 2014 AAAI Spring Symposium Series (pp. 31-38). |
6 | Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). Detecting emotive sentences with pattern-based language modelling. Procedia Computer Science, 35, 484-493. |
7 | D'hondt, E. K. L. (2014). Cracking the patent: using phrasal representations to aid patent classfication. [Sl: sn]. |
8 | Ptaszynski, M., Masui, F., Rzepka, R., & Araki, K. (2014). Automatic Extraction of Emotive and Non-emotive Sentence Patterns. In Proceedings of The Twentieth Annual Meeting of The Association for Natural Language Processing (NLP2014) (pp. 868-871). |
9 | Ptaszynski, M., Masui, F., Dybala, P., Rzepka, R., & Araki, K. Open Source Affect Analysis System with Extensions. |
10 | Nakajima, Y., Ptaszynski, M., Honma, H., & Masui, F. Extracting References to the Future from News using Morphosemantic Patterns. |
11 | Ptaszynski, M., Dokoshi, H., Oyama, S., Rzepka, R., Kurihara, M., Araki, K., & Momouchi, Y. (2013). Affect analysis in context of characters in narratives. Expert Systems with Applications, 40(1), 168-176. |
12 | Ptaszynski, M., Hasegawa, D., & Masui, F. Women Like Backchannel, But Men Finish Earlier: Pattern Based Language Modeling of Conversations Reveals Gender and Social Distance Differences. |
13 | D’hondt, E., Verberne, S., Weber, N., Koster, C., & Boves, L. (2012). Using skipgrams and pos-based feature selection for patent classification. Computational Linguistics in the Netherlands Journal, 2, 52-70. |
14 | Lempa, P., Ptaszynski, M., & Masui, F. Cyberbullying Blocker Application for Android. |
15 | Ptaszynski, M., Masui, F., Kimura, Y., Rzepka, R., & Araki, K. Extracting Patterns of Harmful Expressions for Cyberbullying Detection. |
B. Pang, L. Lee, S. Vaithyanathan. “Thumbs up?: sentiment classification using machine learning techniques”. In Proc. of EMNLP'02, pp. 79-86, 2002. | |
B. Roark, M. Saraclar, M. Collins, “Discriminative n-gram language modeling”, Computer Speech & Language, Vol. 21, Issue 2, pp. 373-392, 2007. | |
Burkhanov. “Pragmatic specifications: Usage indications, labels, examples; dictionaries of style, dictionaries of collocations”, In Piet van Sterkenburg (Ed.). A practical guide to lexicography, John Benjamins Publishing Company, 2003. | |
C. E. Shannon, “A Mathematical Theory of Communication”, The Bell System Technical Journal, Vol. 27, pp. 379-423 (623-656), 1948. | |
C. Potts and F. Schwarz. “Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora”. Ms., UMass Amherst, 2008. | |
D. E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 3: Generating All Combinations and Partitions. Addison Wesley Professional, 2005. | |
D. Guthrie, B. Allison, W. Liu, L. Guthrie, Y. Wilks, Y. “A Closer Look at Skip-gram Modelling”. In Proc. Fifth International Conference on Language, Resources and Evaluation(LREC'06), pp. 1222-1225, 2006. | |
D. Knight, and S. Adolphs, “Multi-modal corpus pragmatics: The case of active listenership”,Pragmatics and Corpus Linguistics, pp. 175-190, Berlin, New York (Mouton de Gruyter),2008. | |
E. Riloff, “Automatically Generating Extraction Patterns from Untagged Text”, In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp.1044-1049, 1996. | |
F. Sebastiani. “Machine learning in automated text categorization”. ACM Comput. Surv.,34(1), pp. 1-47, 2002. | |
G. Forman. “An extensive empirical study of feature selection metrics for text classification”.J. Mach. Learn. Res., 3 pp. 1289-1305, 2003. | |
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. “Text classification using string kernels”, The Journal of Machine Learning Research, 2, pp. 419-444, 2002. | |
H. Uchino, S. Shirai, S. Ikehara, M. Shintami, “Automatic Extraction of Template Patterns Using n-gram with Tokens” [in Japanese], IEICE Technical Report on Natural Language Understanding and Models of Communication, 96(157), pp. 63-68, 1996. | |
K. Krippendorff, “Combinatorial Explosion”, Web Dictionary of Cybernetics and Systems.Princia Cybernetica Web. | |
K. Sasai, “The Structure of Modern Japanese Exclamatory Sentences: On the Structure of the Nanto-Type Sentence”. Studies in the Japanese Language, Vol, 2, No. 1, pp. 16-31,2006. | |
M. Ptaszynski, P. Dybala, R. Rzepka K. and Araki, “Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum”, Proceedings of PACLING-09, pp. 223-228, 2009. | |
M. Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki and Y. Momouchi, “In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis”, International Journal of Computational Linguistics Research, Vol. 1 , Issue 3, pp.135-154, 2010. | |
N. Constant, C. Davis, C. Potts and F. Schwarz, “The pragmatics of expressive content:Evidence from large corpora”. Sprache und Datenverarbeitung, 33(1-2):5-21, 2009. | |
P. F. Brown, P. V. de Souza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. “Class-based ngram models of natural language”. Computational Linguistics, Vol. 18, No. 4 (December 1992), 467-479, 1992. | |
P. H. Grice, Studies in the Way of Words. Cambridge (MA): Harvard University Press, 1989. | |
P. P. Talukdar, T. Brants, M. Liberman and F. Pereira, “A Context Pattern Induction Method for Named Entity Extraction”, In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 141-148, 2006. | |
P. Pantel and M. Pennacchiotti, “Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations”, In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 113-120, 2006. | |
S. C. Levinson, Pragmatics. Cambridge University Press, 1983. | |
S. Chen, J. Goodman, “An empirical study of smoothing techniques for language modeling”,Comp. Speech & Language, Vol. 13, Issue 4, pp. 359-393, 1999. | |
T. Kudo. MeCab: Yet Another Part-of-Speech and Morphological Analyzer, 2001.http://mecab.sourceforge.net/ [July 27, 2011]. | |
Dr. Michal Ptaszynski
- Japan
ptaszynski@hgu.jp
Dr. Rafal Rzepka
- Japan
Dr. Yoshio Momouchi
- Japan
|
|
|
|
View all special issues >> | |
|
|