Computation For Indian Language Technology - CFILT

Computation For Indian Language Technology - CFILT Official handle for Computation for Indian Language Technology Lab @ IIT Bombay The lab was known as Center for Indian Language Technology up until 2020.

Computation for Indian Language Technology (CFILT) was set up with a generous grant from the Department of Information Technology (DIT), Ministry of Communication and Information Technology, Government of India in 2000 at the Department of Computer Science and Engineering, IIT Bombay. Prior to this the Natural Language Processing (NLP) activity of the CSE Department, IIT Bombay took off in 1996 wi

th a grant from the United Nations University, Tokyo to create a multilingual information exchange system for the web. The project called Universal Networking Language (UNL; www.undl.org) was participated in by 15 research groups across continents. At any point of time about 30 research members work in CFILT, which includes PhD , masters and bachelor students, faculty members, linguists and lexicographers. Deep semantics and multilinguality has throughout played a pivotal role in the activities of CFILT. The stress on semantics has led to research in the following fronts:

Lexical Resources: Multilingual wordnets and ontologies and their linking

Lexical and Structural Disambiguation: Resolve word and attachment ambiguities

Shallow Parsing: Identifying correct parts of speech, named entities and non-recursive noun phrases for Marathi and Hindi

Cross Lingual Information Retrieval: Indian language query to English and Hindi Retrieval

Machine Translation: Automatic translation involving Marathi, Hindi and English

Text Entailment: Testing if a piece text (hypothesis) is inferable from another (text)

Sentiment Analysis: Detecting polarity- positive/negative/neutral- of a given document, especially reviews

CFILT presented its esteemed projects at Techconnect, IIT-B's most extensive research outreach activity. Techconnect is ...
26/12/2022

CFILT presented its esteemed projects at Techconnect, IIT-B's most extensive research outreach activity. Techconnect is a part of the institute's TechFest. We received an overwhelming footfall at our stalls. Here are a few snaps from the event!

Eros Investments partners IIT Bombay (CFILT Lab) to develop AI-based movie script generating tool "Kurosawa".Read more a...
23/07/2022

Eros Investments partners IIT Bombay (CFILT Lab) to develop AI-based movie script generating tool "Kurosawa".

Read more at:
https://economictimes.indiatimes.com/industry/media/entertainment/eros-investments-partners-iit-bombay-to-develop-ai-based-script-generating-tool-kurosawa/articleshow/93010134.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst

Kurosawa is based on the latest Deep Learning and Natural Language Processing technologies. It is led by Prof. Pushpak Bhattacharyya along with his students Prerak Gandhi and Vishal Pramanik. Computation For Indian Language Technology (CFILT) Lab acknowledges the support of Ash*ta Saxena, Narjis Asad and Nihar Ranjan Sahoo in this project.

Named after Akira Kurosawa, the noted Japanese film director, Kurosawa is envisaged as an attempt to empower the entertainment industry with cutting-edge technology to generate a full-length feature film script. The software will assist film makers in developing the plot and the script of movies.

01/10/2020

Recently quite a few papers from CFILT have been accepted in top international conferences. Congrats to all authors. -Pushpak

COLING 2020

1. Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages, Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari and Malhar Kulkarni

2. Filtering Back-Translated Data in Unsupervised Neural Machine Translation, Jyotsana Khatri and Pushpak Bhattacharyya

3. A Retrofitting Model for Incorporating Semantic Relations into Word Embeddings, Sapan Shah, Sreedhar Reddy and Pushpak Bhattacharyya

4. Analysing cross-lingual transfer in lemmatisation for Indian languages, Kumar Saurav, Kumar Saunack and Pushpak Bhattacharyya

Findings in EMNLP 2020

5. Looking inside Noun Compounds: Unsupervised Prepositional and Free Paraphrasing using Language Models, Girishkumar Ponkiya, Rudra Murthy, Pushpak Bhattacharyya and Girish Palshikar

AACL-IJCNLP 2020

6. Happy Are Those Who Grade without Seeing: A Multi-Task Learning: Approach to Grade Essays Using Gaze Behaviour, Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Abhijit Mishra and Pushpak Bhattacharyya

27/09/2020

Prof. Pushpak Bhattacharyya, Professor of Computer Science and Engineering, IIT Bombay and Former Director IIT Patna, has new professional services:
1. Member Governing Council (GC), Centre of Advance Financial Research and Learning (CAFRAL), Reserve Bank of India.
2. Member Core Faculty, Center for Machine Intelligence and Data Science (C-MInDS), IIT Bombay.
3. Prof. Incharge Data and Information Sciences Program, under Institute of Eminence, IIT Bombay.

“Having 2 hours to write a paper is fun!”: Detecting Sarcasm in Numerical Portions of Text - a work from our lab by Laks...
21/09/2017

“Having 2 hours to write a paper is fun!”: Detecting Sarcasm in Numerical Portions of Text - a work from our lab by Lakshya, and Arpan along with Prof. Pushpak Bhattacharyya got featured in the Times of London! Congratulations to the authors.

Thomas Carlyle saw sarcasm as the “language of the devil”, Oscar Wilde saw it as the lowest form of wit and Fyodor Dostoyevsky saw it as the last refuge of the mortally offended. Computers struggle...

We are glad to share our Tutorial on Computation Sarcasm at EMNLP 2017 which was presented by Aditya Joshi. The details ...
17/09/2017

We are glad to share our Tutorial on Computation Sarcasm at EMNLP 2017 which was presented by Aditya Joshi. The details an link of the slides are below.

Computational Sarcasm: EMNLP 2017 Tutorial

7th September 2017, Spisenhuset, DGI-Byen, 9:00-12:30.

Link on EMNLP Website: http://emnlp2017.net/tutorials/day1/0/sarcasm.html

Abstract

Sarcasm is a form of verbal irony that is intended to express contempt or ridicule. Motivated by challenges posed by sarcastic text to sentiment analysis, computational approaches to sarcasm have witnessed a growing interest at NLP forums in the past decade. Computational sarcasm refers to automatic approaches pertaining to sarcasm. The tutorial will provide a bird’s-eye view of the research in computational sarcasm for text, while focusing on significant milestones.

The tutorial begins with linguistic theories of sarcasm, with a focus on incongruity: a useful notion that underlies sarcasm and other forms of figurative language. Since the most significant work in computational sarcasm is sarcasm detection: predicting whether a given piece of text is sarcastic or not, sarcasm detection forms the focus hereafter. We begin our discussion on sarcasm detection with datasets, touching on strategies, challenges and nature of datasets. Then, we describe algorithms for sarcasm detection: rule-based (where a specific evidence of sarcasm is utilised as a rule), statistical classifier-based (where features are designed for a statistical classifier), a topic model-based technique, and deep learning-based algorithms for sarcasm detection. In case of each of these algorithms, we refer to our work on sarcasm detection and share our learnings. Since information beyond the text to be classified, contextual information is useful for sarcasm detection, we then describe approaches that use such information through conversational context or author-specific context.

We then follow it by novel areas in computational sarcasm such as sarcasm generation, sarcasm v/s irony classification, etc. We then summarise the tutorial and describe future directions based on errors reported in past work. The tutorial will end with a demonstration of our work on sarcasm detection.

This tutorial will be of interest to researchers investigating computational sarcasm and related areas such as computational humour, figurative language understanding, emotion and sentiment sentiment analysis, etc. The tutorial is motivated by our continually evolving survey paper of sarcasm detection, that is available on arXiv at: Joshi, Aditya, Pushpak Bhattacharyya, and Mark James Carman. “Automatic Sarcasm Detection: A Survey.” arXiv preprint arXiv:1602.03426 (2016). The paper has been selected for publication in ACM Computing Surveys in Issue 50 in 2017. That will be the most recent version of the paper.

The slides are here:http://www.cfilt.iitb.ac.in/tutorial-computational-sarcasm.pdf

--Speaker Profiles--

Dr. Pushpak Bhattacharyya
Indian Institute of Technology Bombay, Mumbai, India & Indian Institute of Technology Patna, Patna, India.
[email protected], [email protected]
http://www.cse.iitb.ac.in/~pb/

Prof. Pushpak Bhattacharyya is the current President of ACL (2016-17). He is the Director of IIT Patna and Vijay and Sita Vashee Chair Professor in IIT Bombay, Computer Science and Engineering Department. He was educated in IIT Kharagpur (B.Tech), IIT Kanpur (M.Tech) and IIT Bombay (PhD). He has been visiting scholar and faculty in MIT, Stanford, UT Houston and University Joseph Fouriere (France). Prof. Bhattacharyya’s research areas are Natural Language Processing, Machine Learning and AI. He has guided more than 250 students (PhD, masters and Bachelors), has published more than 250 research papers and led government and industry projects of international and national importance. A significant contribution of his is Multilingual Lexical Knowledge Bases and Projection. Author of the text book ‘Machine Translation’ Prof. Bhattacharyya is loved by his students for his inspiring teaching and mentorship. He is a Fellow of National Academy of Engineering and recipient of Patwardhan Award of IIT Bombay and VNMM award of IIT Roorkey- both for technology development, and faculty grants of IBM, Microsoft, Yahoo and United Nations.

Aditya Joshi
IITB-Monash Research Academy, Mumbai, India.
[email protected], [email protected]
http://www.cse.iitb.ac.in/~adityaj/

Aditya Joshi is a PhD student at IITB-Monash Research Academy, a joint PhD programme between Indian Institute of Technology Bombay, India and Monash University, Australia, since January 2013. His PhD advisors are Pushpak Bhattacharyya (IITB) and Mark Carman (Monash). His primary research focus is computational sarcasm where he has explored different ways in which incongruity can be captured in order to detect and generate sarcasm. In addition, he has worked on innovative applications of NLP such as sentiment analysis for Indian languages, drunk-texting prediction, news headline translation, political issue extraction, etc.

The 2017 Conference on Empirical Methods on Natural Language Processing

28/02/2017

Our survey paper on Computational Sarcasm, 'Automatic Sarcasm Detection: A Survey' (Aditya Joshi et. al) got accepted for publications in ACM Computing Surveys.

A hearty congratulations to all the authors.

22/02/2017

Hearty congratulations to Prof. Pushpak.

Institute of Engineers (India) has awarded Prof. Pushpak Bhattacharyya "The Eminent Engineer Award" at their 31st National Convention of Computer Engineers held at Shillong on 3rd Feb. 2017.

04/07/2016

# # 3rd Workshop on Asian Translation (WAT2016) # #

This is an announcement of the 3rd Workshop on Asian Translation (WAT2016) as a workshop at Coling 2016. Those who are working on machine translation, please join us.

**** Indian NLP researchers would particularly interested in Indian language shared tasks: ****

- English Hindi (Corpus details here: http://www.cfilt.iitb.ac.in/iitb_parallel)
- Japanese Hindi (A pivot language task)

Please visit http://lotus.kuee.kyoto-u.ac.jp/WAT/ for registration for the tasks and other details.

# # # What's New about WAT2016 # # #

1. Workshop at Coling 2016
2. New language pairs in shared tasks
- Hindi-English, Hindi-Japanese, Indonesian-English, Chinese-English
3. Invite research papers

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

=== CALL FOR PAPERS & PARTICIPATION ===

---------------------------------------------------------------------------
WAT 2016
(The 3rd Workshop on Asian Translation)
in conjunction with COLING 2016
http://lotus.kuee.kyoto-u.ac.jp/WAT/
December 12, 2016, Osaka, Japan

Following the success of the previous Workshops on Asian Translation
(WAT 2014 and WAT 2015), WAT 2016 will bring together machine
translation researchers and users to try, evaluate, share and discuss
brand-new ideas of machine translation. We are working toward the
practical use of machine translation among all Asian countries.

For the WAT 2016, we adopt new translation subtasks
"Hindi-to-English/Japanese mixed domain translations" as well as
"Indonesian/Chinese/Japanese-to-English newswire translation" in
addition to the subtasks that were conducted in the previous two
workshops. The workshop will also feature research papers on topics
related to the machine translation, especially for Asian languages.

WAT 2016 also invites researchers to submit their original work on
machine translation of Asian languages. The scope covers studies and
reports on theories, techniques, and resources to improve the machines
translation of Asian languages. All submitted research papers will be
examined under a double-blind peer-reviewing to decide if they will
appear at the workshop.

Topics of interest include, but are not limited to:
- Word-/phrase-/syntax-/semantics-/rule-based, neural and hybrids machine translation
- Asian language processing
- Incorporating linguistic information into machine translation
- Decoding algorithms
- System combination
- Error analysis
- Manual and automatic machine translation evaluation
- Machine translation applications
- Quality estimation
- Domain adaptation
- Machine translation for low resource languages
- Language resources

************************* IMPORTANT NOTICE *************************
Participants of the previous workshop are also required to sign up to
WAT2016
********************************************************************

IMPORTANT DATES
---------------

August 19 Crowdsourcing evaluation due
September 25 System description draft and research paper (new!) due
October 16 System description draft Review feedback
October 16 Research paper acceptance notification
October 30 System description and research paper camera-ready paper due
December 12 WAT 2016

TASK
----

The task is to improve the text translation quality for scientific
papers and patent documents. Participants choose any of the subtasks
in which they would like to participate and translate the test data
using their machine translation systems. The WAT organizers will
evaluate the results submitted using automatic evaluation and human
evaluation. We will also provide a baseline machine translation.

Subtasks:
Scientific Paper Subtasks:
English/Chinese Japanese
Patent Subtasks:
English/Chinese/Korean Japanese
Newswire Subtasks:
Indonesian/Chinese/Japanese English
Mixed domain Subtasks:
Hindi English/Japanese

Dataset:

* Scientific paper Subtasks:

WAT uses ASPEC for the dataset including training, development,
development test and test data. Participants of the scientific papers
subtask must get a copy of ASPEC by themselves. ASPEC consists of
approximately 3 million Japanese-English parallel sentences from paper
abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
paper excerpts (ASPEC-JC)

* Patent Subtasks:

WAT uses JPO Patent Corpus, which is constructed by Japan Patent
Office (JPO). This corpus consists of 1 million Chinese-Japanese
parallel sentences and 1 million Korean-Japanese parallel sentences
from patent description with four categories. Participants of patents
subtask are required to get it on WAT2016 site of JPO Patent Corpus.

* Newswire Subtasks (Indonesian English):

WAT uses BPPT Corpus, which is constructed by Badan Pengkajian dan
Penerapan Teknologi (BPPT). This corpus consists of 50,000
Indonesian-Japanese parallel sentences from news description with five
categories. Participants of patents subtask are required to get it on
WAT2016 site of BPPT Corpus.

* Newswire Subtasks (Chinese/Japanese English):

TBA

* Mixed domain Subtask:

WAT uses HINDEN for the dataset for training, development, development
test and test data. The training corpus is mixed domain and contains
around 1 million lines of sentences and phrases. In order to access
the corpus participants should sign the following agreement, scan and
send it to the addresss mentioned in it. The training corpus is a
mixed domain corpus whose composition will be availble in the readme
of the corpus you download. The development and test set are from the
News domain and are exactly the same as the ones in WMT 2014.

Automatic evaluation:
We are providing an automatic evaluation server. It is for free for
everyone, but you need to create an account for evaluation. Just
showing the list of evaluation results does not require an account.

Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/registration/index.html
Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/index.html

Human evaluation:
Both crowdsourcing evaluation and JPO adequacy evaluation will be
carried out for selected subtasks and selected submitted systems (the
details will be announced later). Participants can submit one
translation result for each subtask.

INVITED TALK
------------

TBA

ORGANIZERS
----------

Toshiaki Nakazawa, Japan Science and Technology Agency (JST), Japan
Hideya Mino, National Institute of Information and Communications Technology (NICT), Japan
Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan
Isao Goto, Japan Broadcasting Corporation (NHK), Japan
Graham Neubig, Nara Institute of Science and Technology (NAIST), Japan
Sadao Kurohashi, Kyoto University, Japan
Ir. Hammam Riza, Agency for the Assessment and Application of Technology (BPPT), Indonesia
Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IIT), India

CONTACT
-------

[email protected]

15/12/2015

Slides for the ICON 2015 tutorial on 'Translation & Transliteration between Related Languages' conducted on 11th Dec 2015 by Anoop Kunchukuttan & Mitesh Khapra:

Slides:http://www.cse.iitb.ac.in/~anoopk/publications/presentations/icon_2013_smt_tutorial_slides.pdf

Handouts:http://www.cse.iitb.ac.in/~anoopk/publications/presentations/icon-2015-tutorial-translation-related-lang-handouts.pdf

Address

IIT Mumbai
Mumbai
400076

Opening Hours

Monday 9am - 5pm
Tuesday 9am - 5pm
Wednesday 9am - 5pm
Thursday 9am - 5pm
Friday 9am - 5pm

Telephone

+912225764729

Alerts

Be the first to know and let us send you an email when Computation For Indian Language Technology - CFILT posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Contact The University

Send a message to Computation For Indian Language Technology - CFILT:

Share