Publications

In Preparation

 

Habash, Nizar. Introduction to Arabic Natural Language Processing, book in preparation.

 

Habash, Nizar and Fatiha Sadat. Arabic Preprocessing Schemes and Combinations for Statistical Machine Translation, in preparation.

 

Habash, Nizar, Bonnie Dorr and Christof Monz. Symbolic to Statistical Hybrid Machine Translation: The Case of Generation-Heavy MT, to appear in MT Journal.

 

                                                                                                                                  2009

Biadsy, Fadi, Nizar Habash and Julia Hirschberg. Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules. In Proceedings of the North American Association for Computational Linguistics (NAACL), Boulder, Colorado, USA, 2009.

 

Habash, Nizar, Owen Rambow and Ryan Roth.  MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, 2009. 

 

Habash, Nizar. REMOOV: A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, 2009. 

 

Habash, Nizar, Reem Faraj and Ryan Roth. Syntactic Annotation in the Columbia Arabic Treebank. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, 2009.

 

Habash, Nizar and Jun Hu. Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language. In Proceedings of the Workshop on Statistical Machine Translation at the meeting of the European Association for Computational Linguistics (EACL), Athens, Greece, 2009.

 

Biadsy, Fadi, Julia Hirschberg and Nizar Habash. Spoken Arabic Dialect Identification Using Phonotactic Modeling. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the meeting of the European Association for Computational Linguistics (EACL), Athens, Greece, 2009.

 

Elming, Jakob and Nizar Habash. Syntactic Reordering for English-Arabic Phrase-Based Machine Translation. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the meeting of the European Association for Computational Linguistics (EACL), Athens, Greece, 2009.

 

Buckwalter, Tim and Nizar Habash. Buckwalter Arabic Morphological Analyzer (BAMA) in Arabic Presentation Form. A participating system in the Arab League Educational, Cultural and Scientific Organization (ALECSO) workshop on Arabic Morphological Analyzers, Damascus, Syria, 2009.

                                                                                                                                  2008

Habash, Nizar and Ahmed Elkholy. SEPIA: Surface Span Extension to Syntactic Dependency Precision-based MT Evaluation. In Proceedings of the workshop on Metrics for Machine Translation at the meeting of Association for Machine Translation in the Americas (AMTA-2008), Waikiki, Hawai’i.

Habash, Nizar and Hayden Metsky. Automatic Learning of Morphological Variations for Handling Out-of-Vocabulary Terms in Urdu-English Machine Translation. In Proceedings of the Association for Machine Translation in the Americas (AMTA-2008), Waikiki, Hawai’i, 2008.

Habash, Nizar. Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation. In Proceedings of Association for Computational Linguistics (ACL), Columbus, Ohio. 2008.

 

Roth, Ryan, Owen Rambow, Nizar Habash, Mona Diab, and Cynthia Rudin. Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking. In Proceedings of Association for Computational Linguistics (ACL), Columbus, Ohio. 2008.

 

Crego, Josep M. and Nizar Habash.  Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT. In Proceedings of the Statistical Machine Translation Workshop at the Conference of Association for Computational Linguistics (ACL), Columbus, Ohio. 2008.

 

Habash, Nizar and Ryan Roth. Identification of Naturally Occurring Numerical Expressions in Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC), Marrakech, Morocco, 2008.

 

Farber, Benjamin, Dayne Freitag, Nizar Habash and Owen Rambow. Improving NER in Arabic Using a Morphological Tagger. In Proceedings of the Language Resources and Evaluation Conference (LREC), Marrakech, Morocco, 2008.

 

Habash, Nizar, Owen Rambow, Mona Diab and Reem Farraj. Guidelines for Annotating Arabic Dialect. In Proceedings of Workshop on Arabic and its local languages, LREC, Marrakech, Morocco. 2008.

 

Elming, Jakob, Nizar Habash and Josep Crego. Combination of Statistical Word Alignments Based on Multiple Preprocessing Schemes. Book chapter in “Learning for Machine Translation.” Editors Cyril Goutte, Nicola Cancedda, Marc Dymetman, and George Foster. MIT Press, 2008.

 

                                                                                                                                  2007

Habash, Nizar. Syntactic Preprocessing for Statistical Machine Translation, In Proceedings of the Machine Translation Summit (MT-Summit), Copenhagen, Denmark, 2007.

Diab, Mona, Mahmoud Ghoneim and Nizar Habash. Arabic Diacritization in the Context of Statistical Machine Translation, In Proceedings of the Machine Translation Summit (MT-Summit), Copenhagen, Denmark, 2007.

 

Kirchhoff, Katrin, Owen Rambow, Nizar Habash, Mona Diab. Semi-Automatic Error Analysis for Large-Scale Statistical Machine Translation Systems, In Proceedings of the Machine Translation Summit (MT-Summit), Copenhagen, Denmark, 2007.

 

Habash, Nizar, Ryan Gabbard, Owen Rambow, Seth Kulick and Mitch Marcus. Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Prague, Czech Republic, 2007.

 

Habash, Nizar and Owen Rambow. Arabic Diacritization through Full Morphological Tagging, In Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL), Rochester, New York, 2007.

 

Elming, Jakob and Nizar Habash. Combination of Statistical Word Alignments Based on Multiple Preprocessing Schemes, In Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL), Rochester, New York, 2007.

 

Habash, Nizar and Owen Rambow. Morphophonemic and Orthographic Rules in a Multi- Dialectal Morphological Analyzer and Generator for Arabic Verbs, International Symposium on Computer and Arabic Language (ISCAL), Riyadh, Saudi Arabia, 2007.

 

Habash, Nizar. “Arabic Morphological Representations for Machine Translation.” Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. 2007.

 

Habash, Nizar, Abdelhadi Soudi, and Tim Buckwalter. “On Arabic Transliteration.” Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. 2007.

                                                                                                                                  2006

Biadsy, Fadi, Jihad El-Sana and Nizar Habash. Arabic Online Handwriting Recognition. International Workshop on Handwriting and Optical Character Recognition, Paris, France, 2006.

 

Habash, Nizar, Bonnie Dorr and Christof Monz. Challenges in Building an Arabic Generation-heavy Machine Translation System and Extending it with Statistical Components. In Proceedings of the Association for Machine Translation in the Americas (AMTA-2006), Boston, MA, 2006.

 

Habash, Nizar. “On Arabic and its Dialects,” Multilingual Magazine. #81 Volume 17 Issue 5, 2006.

 

Habash, Nizar and Owen Rambow. Morphological Analysis for Arabic Dialects. In Proceedings of COLING-ACL, Sydney, Australia, 2006.

 

Sadat, Fatiha and Nizar Habash. Morphological Preprocessing Scheme Combination for Statistical MT. In Proceedings of COLING-ACL, Sydney, Australia, 2006.

 

Habash, Nizar and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical Machine Translation, In Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL), New York, 2006.

 

Chiang, David, Mona Diab, Nizar Habash, Owen Rambow, and Safi Shareef. Arabic Dialect Parsing. In Proceedings of the European chapter of the Association of Computational Linguistics (EACL). 2006.

 

Habash, Nizar, Clinton Mah, Randy Calistri-Yeh, Sabiha Imran and Paraic Sheridan. The Design and Validation of an Arabic WordNet for Information Retrieval. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.

 

Rambow, Owen, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Carnegie Keith J. Miller, Teruko Mitamura, Florence Reeder, Advaith Siddharthan. Parallel Syntactic Annotation of Multiple Languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.

 

Passonneau, Rebecca, Nizar Habash and Owen Rambow. Interannotator Agreement on a Multilingual Semantic Annotation Task.  In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.

 

Maamouri, Mohamed, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi. Developing and Using a Pilot Dialectal Arabic Treebank. In Proceedings of the International Conference on Language Resources and Evaluation  (LREC). 2006.

                                                                                                                                  2005

Habash, Nizar, Owen Rambow and George Kiraz. Morphological Analysis and Generation for Arabic Dialects. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the Conference of American Association for Computational Linguistics (ACL’05).

 

Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis, and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the Conference of American Association for Computational Linguistics (ACL’05).

 

Darwish, Kareem, Mona Diab  and  Nizar Habash, Eds. Computational Approaches to Semitic Languages. Workshop Proceedings. Association for Computational Linguistics, Ann Arbor, Michigan, 2005.

                                                                                                                                  2004

Habash, Nizar. The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In Proceedings of the Third International Conference of Natural Language Generation (INLG-04).  Careys Manor, UK, July 2004.

 

Habash, Nizar. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004.

 

Habash, Nizar and Owen Rambow. Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004.

 

Habash, Nizar, Bonnie Dorr, Eduard Hovy, Florence Reeder. Eds. Determining Interlingua Utility for Machine Translation. Seventh Interlingua Workshop. Sixth Biennial Conference of the Association for Machine Translation in the Americas (AMTA-04). Georgetown, Washington DC, 2004.

 

Ayan, Fazil, Bonnie J. Dorr, and Nizar Habash, Application of Alignment to Real-World Data: Combining Linguistic and Statistical Techniques for Adaptable MT. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Georgetown University, Washington DC, 2004.

 

Reeder, Florence, Bonnie Dorr, David Farwell, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Teruko Mitamura, Keith Miller, Owen Rambow, Advaith Siddharthan. Interlingual Annotation for MT Development. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Georgetown University, Washington DC, 2004.

 

Farwell, David, Stephen Helmreich, Bonnie J. Dorr, Nizar Habash, Florence Reeder, Keith Miller, Lori Levin, Teruko Mitamura, Eduard Hovy, Owen Rambow, and Advaith Siddharthan. Interlingual Annotation of Multilingual Text Corpora. In Proceedings of the North American Chapter of the Association for Computational Linguistics Workshop on Frontiers in Corpus Annotation, Boston, MA, pp. 55--62, 2004.

 

Mitamura, Teruko, Keith J. Miller, Bonnie J. Dorr, David Farwell, Nizar Habash, Lori Levin, Stephen Helmreich, Eduard Hovy, Lori Levin, Owen Rambow, Reeder, Florence, and Advaith Siddharthan. Semantic Annotation of Multilingual Text Corpora. In Proceedings of the Workshop on Beyond Named Entity Recognition: Semantic Labeling for NLP Tasks, LREC, Portugal, 2004.

 

Dorr, Bonnie J., Rebecca Green, Lori Levin, Owen Rambow, David Farwell, Nizar Habash, Stephen Helmreich, Eduard Hovy, Keith J. Miller, Teruko Mitamura, Florence Reeder, and Advaith Siddharthan.  Semantic Annotation and Lexico-Syntactic Paraphrase. In Proceedings of the Workshop on Building Lexical Resources from Semantically Annotated Corpora, LREC, Portugal, 2004.

                                                                                                                                  2003

Dorr, Bonnie J., Necip Fazil Ayan, Nizar Habash, Nitin Madnani, and Rebecca Hwa. Rapid Porting of DUSTer to Hindi. ACM Transactions on Asian Language Information Processing (TALIP), 2:3, 2003.

 

Habash, Nizar. Matador: A Large Scale Spanish-English GHMT System. In Proceedings of the MT Summit, New Orleans, LA, pp. 149--156, 2003.

 

Cavalli-Sforza, Violetta, Alon Lavie and Nizar Habash. Eds. Proceedings of the MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches. September 23, 2003, New Orleans, LA, USA. URL

 

Habash, Nizar. Generation-Heavy Hybrid Machine Translation. Doctoral Dissertation. Computer Science Department, University of Maryland College Park, 2003.

 

Habash, Nizar and Bonnie Dorr, A Categorial Variation Database for English, Proceedings of North American Association for Computational Linguistics, Edmonton, Canada, pp. 96--102, 2003.

 

Habash, Nizar, Bonnie Dorr, and David Traum.  Hybrid Natural Language  Generation from Lexical Conceptual Structures.  MT Journal volume 18 (2): 81-128, 2003.

                                                                                                                                  2002

Habash, Nizar and Bonnie Dorr. Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation. In Proceedings of  the Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002, Tiburon, CA, 2002.

 

Dorr, Bonnie and Nizar Habash. Interlingua Approximation: A Generation-Heavy Approach. In Proceedings of Workshop on Interlingua Reliability, Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002,Tiburon, CA, 2002.

 

Dorr, Bonnie, Lisa Pearl, Rebecca Hwa and Nizar Habash. DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment. In Proceedings of  the Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002, Tiburon, CA, 2002.

 

Habash, Nizar. Generation-Heavy Hybrid Machine Translation. In Proceedings of the International Natural Language Generation Conference (NLG-02). New York, 2002.

 

                                                                                                                                  2001

Habash, Nizar and Bonnie Dorr. Large Scale Language Independent Generation Using Thematic Hierarchies. In Proceedings of the MT Summit VIII. Santiago de Compostella, Spain. 2001. 

        

                                                                                                                                 2000

Habash, Nizar. oxyGen: A Language Independent Language Realization Engine. In Proceedings of  the Fourth Conference of the Association for Machine Translation in the Americas, AMTA-2000. Cuernavaca, Mexico.

 

Traum, David and Nizar Habash. Generation from Lexical Conceptual Structures. Workshop on Applied Interlinguas, ANLP-2000. Seattle, WA.

                                                                                                                                  1999

Habash, Nizar. Nuun: A System for Developing Platform and Browser Independent Arabic Web Applications.  In Proceedings of the Arabic Translation and Localization Conference (ATLAS-99). Tunis, Tunisia, 1999. Republished in Arabic in the Arab Journal of Science, 33, June 1999. 

 

Habash, Nizar. Issues in Palestinian Arabic Spelling Standardization. NACAL 27, 1999. Baltimore, MD.

                                                                                                                                  1998

Dorr, Bonnie, Nizar Habash and David Traum. A Thematic Hierarchy for Efficient Generation from Lexical-Conceptual Structures. In Proceedings of the Association of Machine Translation in the Americas, AMTA-98. Longhorne, PA.

 

Habash, Nizar. Introduction to Delason: The Complete Guide to the Artificial Language. Unpublished Manuscript, 1998.