The Indian Institute of Technology Madras (IIT-M) faculty and AI4Bharat developed Artificial Intelligence (AI) models and datasets to process texts in 11 Indian languages. A platform for developing AI solutions that prevail in India, AI4Bharat and IIT-M released AI models and datasets for the languages of Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi.
Providing essential building blocks to students and faculties, startups, and industries, the multilingual AI models and datasets created by IIT – Madras, and AI4Bharat will allow professionals and students to work on the Indian language tools and expand the technology intervention.
By making the cutting-edge resources open-source, which are totally free from any expenses, can be accessed by anyone. Available on https://indicnlp.ai4bharat.org/, interested people can download from this Github repository.
Mitesh M. Khapra, Assistant Professor, Dept. CSE (Computer Science and Engineering), said, “We have a very rich diversity of languages in our country. As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages."
When a learner posts a question on an e-learning platform in the native or other regional language, a tool for automatically processing the learner’s question written in the learner’s language gets in demand for classifying the question into a specific topic.
Khapra said, "While such tools are available for English and other foreign languages, there are hardly any tools for Indian languages and this is the critical gap that we are trying to address through this initiative. These models are available free of cost as we want the entire country to benefit from them.”
AI4Bharat is co-founded by Khapra and Pratyush Kumar, who are IIT Madras students. They work for solving problems in an Indian community, driven by open-sourced manner.
Anoop Kunchukuttan, AI4Bharat volunteer and the lead researcher on this project, said, "We have an urgent responsibility to take the rapid advances of AI and make them accessible to the common man. One way of achieving this is to improve interactions between humans and machines. That is where the field of Natural Language Processing (NLP) comes in. NLP is a branch of AI that deals with the interaction between computers and humans using natural language."
Last year, a team of researchers – comprised of students, faculties, and volunteers from IIT – Madras and AI4Bharat worked on data collection and training powerful models for processing texts, written in Indian languages.
The models are able to take advantage of the similarities between the Indian languages to use data efficiently.