Professional Portfolio

About Us

COIL-D: Centre for Indian Language Data is a funded project from the Ministry of Electronics and Information Technology (MeitY), Govt. of India. The project is to be executed in consortium mode led by IIT Patna. The other partnering institutions are IIT Delhi, IIIT Guwahati, IIIT Delhi, IGDTUW, Digital India Bhashini Division DIC and MIT Manipal. The project seeks to develop language resources for Human Language Technology (HLT) and establish applications, standards, guidelines, and best practices for building and benchmarking Machine Translation (MT) systems and other NLP tools. MIT is responsible for developing parallel corpora for Dravidian languages Kannada, Tamil, Malayalam, and Telugu.

Aims

Develop a suite of language resources, including parallel corpora for machine translation between Indian languages and benchmark datasets for key NLP tasks like Part-of-Speech (PoS) tagging, Named Entity Recognition (NER), Automatic Speech Recognition (ASR), and Text-to-Speech (TTS).

Primary Focus

50% domain-specific content from a mix of sectors: science, healthcare, agriculture, climate, tourism, and judiciary. 30% educational content, focusing on academic materials, textbooks, and learning resources. 20% conversational and governance content, including social media, dialogues, and official documents.

Key Focus

Standardize, preserve, and create language resources to support NLP research and applications.

Applications

This dataset will provide translation between Tamil and Kannada, Malayalam, Telugu and it will also enable the development of multilingual chatbots. This will facilitate more effective cross-sector communication.

Intiative

Establish a centralized repository for Indian language data and provide a platform for developing and benchmarking Human Language Technology (HLT) applications.

Results

To ensure rapid and efficient translation, the final output requires minimal post-editing, making it immediately suitable for publication on print and digital platforms.

Development

Create comprehensive leaderboards to systematically evaluate the performance of models in key Natural Language Processing (NLP) tasks. These leaderboards will serve as benchmarks for assessing machine translation (MT), Part-of-Speech (PoS) tagging, Named Entity Recognition (NER), Natural Language Generation (NLG), sentiment analysis, Automatic Speech Recognition (ASR), and Text-to-Speech (TTS).

Delivery

These advanced, no-cost tools offer a superior alternative to existing solutions like Google Translate, while simultaneously contributing to the preservation and growth of our linguistic heritage.

Our Primary Objectives

The COIL-D (Centre for Indian Language Data) project is building a single, comprehensive hub for Indian language data. Its main goals are to set up a standardized platform to evaluate machine translation and other natural language processing systems, encourage the development and preservation of language resources for human language technology applications, and define benchmarks for linguistic performance.

Step 1: Identification of Tamil language resources

Identify and list existing Tamil datasets and tools to understand the current resource availability.

Step 2: Acquisition of resources across target domains

To ensure comprehensive coverage of real-world language usage, data will be collected from key domains, including Education, Governance & Policy, Judiciary, Science & Technology, Healthcare, Agriculture, Climate, and Tourism.

Step 3: Creation of language resources and benchmarks

Develop language resources and datasets for Machine Translation and NLP, and establish benchmarks to systematically evaluate tool performance.

Step 4: Development of MT evaluation leaderboards

Develop MT evaluation leaderboards to assess translation systems, enabling performance tracking and fostering continuous improvement.

Step 5: Leaderboards for ASR and TTS technologies

Develop evaluation scoreboards for speech recognition and synthesis systems to measure accuracy, clarity, and overall performance.

Step 6: Benchmarks for PoS and NER taggers

Define evaluation protocols for tools such as PoS and NER taggers to ensure fair and consistent assessment.

Leadership

Commander(Dr) Anil Rana

Director

Manipal Institute of Technology

MAHE, Manipal, India

Dr. Chandrakala C B

Joint Director

Additional Professor

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. Radhika M Pai

Dean and Professor

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. P C Siddalingaswamy

Professor and Associate Dean

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. Smitha N Pai

Professor and Associate Dean

School of Computer Engineering

MIT, MAHE Manipal, India

Project Investigators

Dr. Muralikrishna SN

Principal Investigator

Associate Professor

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. Ashalatha Nayak

Co-Investigator

Professor

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. Ashwath Rao B

Co-Investigator

Assistant Professor - Selection Grade

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. Raghavendra Ganiga

Co-Investigator

Associate Professor

School of Computer Engineering

MIT, MAHE Manipal, India

Mr. Ganesh Babu C

Co-Investigator

Assistant Professor Senior Scale

School of Computer Engineering

MIT, MAHE Manipal, India

Dr. Raghurama Holla

Co-Investigator

Assistant Professor

School of Computer Engineering

MIT, MAHE Manipal, India

Project Staffs

Ms. Niveditha

Liaison Officer

Mr. PVSS Harshavardhan

Junior Research Associate (Tech)

Ms. Deeksha

Junior Research Associate

Ms. Shruthi

Junior Research Associate

Ms. Sona T

Junior Research Associate

Ms. Shama Bhat

Junior Research Associate

Ms. Shrilatha Kulal

Junior Research Associate

Ms. Raksha

Junior Research Associate

Dr. Umalatha Kannoth

Junior Research Associate

Ms. Kavya

Former Junior Research Associate

Interns

Current Interns

Name: Sakshi

Project: Machine Translation models (IndicTrans2)

Name: Rajdeep

Project: Machine Translation models (IndicTrans2)

Name: Shrikanth Nayak

Project: Web-based data acquisition system

Name: Shreesha

Project: Web-based data acquisition system

Name: Prathiksha

Project: Speaker Diarization

Name: Sameeksha

Project: Speaker Diarization

Previous Interns

Name: Ranjan Shettigar

Project: Speech acquisition and recognition

Name: Bhavin kumar

Project: Speech acquisition and recognition

Name: Sathwik

Project: POS taggers for Dravidian languages

Name: Prajwal

Project: POS taggers for Dravidian languages

Name: Athreya

Project: Speech annotation

Name: Adithya Chawhan

Project: Speaker Diarization

For internship inquiries and applications, please contact our team at coild.mit@manipal.edu

Open Positions

Freelance Translators and Reviewer

We are looking for Freelance Translators and Reviewer proficient in Dravidian languages. The language pair for translation are:

Tamil - Kannada
Tamil - Malayalam
Tamil-Telugu

Note that Tamil is the source language

Eligibility Criteria

Proficiency in Tamil
Proficient in either Kannada, Malayalam or Telugu
Basic Computer Knowledge

Translation price: ₹ 1.5/source word
Review price: ₹ 0.75/source word

Apply now

About Us

Aims

Primary Focus

Key Focus

Applications

Intiative

Results

Development

Delivery

Our Primary Objectives

Step 1: Identification of Tamil language resources

Step 2: Acquisition of resources across target domains

Step 3: Creation of language resources and benchmarks

Step 4: Development of MT evaluation leaderboards

Step 5: Leaderboards for ASR and TTS technologies

Step 6: Benchmarks for PoS and NER taggers

Leadership

Commander(Dr) Anil Rana

Dr. Chandrakala C B

Dr. Radhika M Pai

Dr. P C Siddalingaswamy

Dr. Smitha N Pai

Project Investigators

Dr. Muralikrishna SN

Dr. Ashalatha Nayak

Dr. Ashwath Rao B

Dr. Raghavendra Ganiga

Mr. Ganesh Babu C

Dr. Raghurama Holla

Project Staffs

Ms. Niveditha

Mr. PVSS Harshavardhan

Ms. Deeksha

Ms. Shruthi

Ms. Sona T

Ms. Shama Bhat

Ms. Shrilatha Kulal

Ms. Raksha

Dr. Umalatha Kannoth

Ms. Kavya

Interns

Current Interns

Previous Interns

Open Positions

Freelance Translators and Reviewer

Collaborating Institutions

Get In Touch

Contact details