Microsoft CEO Satya Nadella
is in India and as a part of his visit, he will be focussing on developing AI footprint in the country by providing 2 million people skilling opportunities by 2025. He will also meet with villagers working to help the company develop its
AI tools
.
“Great to be in India this week meeting with changemakers like the team at Karya, who are doing the critical work of building high-quality datasets for AI—and expanding economic opportunity at the same time,” he said in a post on X (formerly Twitter).
What is Karya
Karya is a company that creates datasets in several Indian languages to train AI models and for research while creating jobs for Indians, mainly in rural areas. It was started as a Microsoft Research project in Bengaluru in 2017 and since then, it has roped-in people to create high-quality language datasets in several Indian languages. For this, these people are paid, essentially helping them to lift themselves out of poverty.
The project was spun off in 2021 as an organisation independent of Microsoft and Karya app was developed and is used by workers to record and write in their native languages. The project, including the app, is built on Microsoft Azure and uses Azure OpenAI Service, as well as Azure AI Cognitive Services to validate its data. Microsoft is one of its major clients.
How will this benefit Indians
The project aims to make technology accessible in under-resourced languages. Notably, AI tools like OpenAI’s ChatGPT and Microsoft’s Copilot work well in English, thanks to the abundance of written and audio material on the internet in the language.
India has 22 official languages, with hundreds of other languages and thousands of dialects. Out of over 1.4 billion Indians, about 60% of Indians speak Hindi and about 10% speak English, millions without digital tools that can help them move ahead in the modern world.
“I think we want to rectify that most of the internet being in English is not a very good place to start,” says Kalika Bali, a language technologist and researcher at the Microsoft Research Lab in Bengaluru. She uses data collected by Karya for her research.
Bali said that AI has greatly sped up the process of language preservation and its use in large language models (LLMs), which is in turn used to create AI tools as well as preserving rare or dying languages.