keynote Speakers


1. Pushpak Bhattacharyya

Professor, Indian Institute of Technology - Bombay, India. [home]

Dr. Pushpak Bhattacharyya is a Professor of Computer Science and Engineering at IIT Bombay. He received his B.Tech from IIT Kharagpur, M.Tech from IIT Kanpur and PhD from IIT Bombay. He has held visiting positions at MIT, Cambridge, USA, Stanford University, USA and University Joseph Fourier, Grenoble, France. Dr. Bhattacharyya's research interests include Natural Language Processing, Machine Translation and Machine Leaning. He has had more than 130 publications in top conferences and journals and has served as program chair, area chair, workshop chair and PC member of top fora like ACL, COLING, LREC, SIGIR, CIKM, NAACL, GWC and others. He has guided 7 PhDs and over 100 masters and undergraduate students in their thesis work. Dr. Bhattacharyya plays leading role in India's large scale projects on Machine Translation, Cross Lingual Search, and Wordnet and Dictionary Development. Dr. Bhattacharyya received a number of prestigious awards including IBM Innovation Award, United Nations Research Grant, Microsoft Research Grant, IIT Bombay's Patwardhan Award for Technology Development and Ministry of IT and Digital India Foundation's Manthan Award. Recently he has been appointed Associate Editor of the prestigious journal, ACM Transactions on Asian Language Information Processing.
The topics of his talk is as follows:

Unsupervised Morphology Learning


Morphology Analysis and statistical stemming are considered crucial in most of the NLP and IR applications. Languages that do not have a well established linguistic tradition find it difficult to create their morphology analyzers. Though statistical stemmers are very much in vogue, they typically suffice for IR,but not so much for, say, machine translation. An interesting possibility exists of discovering morphological regularities from a corpus of word forms using seminal approaches like that of Goldsmith. In this talk we first investigate differences in morph analysers and statistical stemmers. Tools like Snowball and Morfessor are studied as representatives of rule based and statistical schools. Different approaches to unsupervised morphology learning are examined, with Swedish, Arabic and Hungarian as case studies. We then move onto our work on unsupervised paradigm discovery. One of our recent work on finding stems with the help of a suffix set shows that the exact lexicon reduction problem through morph analysis is NP-hard. An approximation algorithm is proposed for the stemming problem. An entropy minimization based probabilistic approach is also proposed. Performance of these algorithms are investigated using an unannotated corpus and a suffix set of Malayalam, a morphologically rich language of India belonging to the Dravidian family. (Joint work with Vasudevan N., CSE Department, IIT Bombay)