You’re reading Entrepreneur India, an international franchise of Entrepreneur Media.
The Union government has launched AI Kosha, a dedicated platform aimed at housing non-personal datasets to support the development of Artificial Intelligence (AI) models and tools. This initiative is a part of the broader IndiaAI Mission, designed to facilitate structured and accessible datasets for AI research and innovation. At its launch, AI Kosha includes 316 datasets, with a major focus on enabling the creation and validation of language translation tools for Indian languages, thereby promoting linguistic inclusivity and digital transformation.
As a crucial pillar of the IndiaAI Datasets Platform, AI Kosha aligns with the government’s commitment to advancing AI research and development. The INR 10,370 crore IndiaAI Mission also encompasses initiatives like pooled access to Graphics Processing Units (GPUs) for startups and academia to support AI model training and deployment. IT Minister Ashwini Vaishnaw announced that 14,000 GPUs had been commissioned for shared access, a notable increase from the earlier 10,000 GPUs, with further expansions planned on a quarterly basis to meet growing demands.
Beyond compute infrastructure, India is actively working towards building a homegrown foundational AI model, an effort that has gained momentum following the cost-effective success of China’s DeepSeek. With increasing interest from startups, the initiative underscores India’s aspirations to emerge as a key player in AI innovation. Additionally, the government’s existing Open Governance Data platform (data.gov.in), which hosts 12,000 plus datasets, further strengthens AI development by encouraging cross-sector data sharing.
The issue of non-personal data access, however, remains a point of contention. A 2018 government-led committee, headed by Infosys co-founder Kris Gopalakrishnan, had proposed compelling private firms to share non-personal data, such as ride-sharing traffic data, to help startups and policymakers. While these recommendations met resistance from the private sector, the rise of Large Language Models (LLMs) has reignited discussions about ethical and strategic data-sharing frameworks.