How is India's Sarvam AI Surpassing Global Leaders in OCR and Speech Technology?
Synopsis
Key Takeaways
Mumbai, Feb 9 (NationPress) The startup Sarvam AI, located in Bengaluru, has announced that its advanced vision and speech models have surpassed major global competitors like Google Gemini and ChatGPT in critical benchmarks for optical character recognition and text-to-speech tailored for Indian languages.
In a recent update on X, Pratyush Kumar, one of the co-founders of Sarvam AI, stated, "Sarvam Vision achieves a remarkable accuracy of 84.3% on the olmOCR-Bench (English only subset), surpassing cutting-edge models such as Gemini 3 Pro and recent OCR innovations like DeepSeek OCR 2."
On OmniDocBench v1.5 (English only subset), Sarvam Vision registered an impressive overall score of 93.28%, particularly excelling in complex formula interpretations and layout parsing, getting closer to the current state-of-the-art standards, Kumar noted.
He also mentioned that the Bulbul V3 text-to-speech model from Sarvam AI is capable of supporting 35 voices across all 22 scheduled Indian languages and is adept at managing varied quality scans and content.
"For Indian languages, Sarvam Vision stands out as the leading model, offering support for all 22 scheduled Indian languages," he asserted.
The Vision series features a 3-billion-parameter state-space model proficient in tasks such as image captioning, scene text recognition, chart interpretation, and intricate table parsing.
Sarvam AI emphasizes its commitment to making artificial intelligence accessible to every individual in India. "We aspire for India to engage confidently and with control in this significant technological evolution. Our goal is to develop foundational components and adapt them to cater to the country’s unique requirements," the company expressed.
Kumar showcased examples on social media where the platform successfully extracted technical terminology from complex tables with merged rows and columns. Additionally, it demonstrated the ability to extract data from a chart featured in the latest Economic Survey.
Beyond document processing, his posts illustrated Sarvam Vision’s capability in understanding general natural scenes, accurately interpreting a photograph of stunning landscapes.
Union IT Minister Ashwini Vaishnaw remarked in a recent post on X that the achievements of this startup reflect the triumph of India’s AI mission.