Voice Recognition Accuracy Across Indian Languages in Healthcare Chatbots

Published Paper PDF: Download PDF

DOI: https://doi.org/10.63345/ijrsml.v13.i10.2

Dr Reeta Mishra

IILM University

Knowledge Park II, Greater Noida, Uttar Pradesh 201306, India

Abstract

Voice-enabled healthcare chatbots represent a transformative approach to democratizing medical advice and services, particularly in linguistically diverse regions such as India. Prior research has predominantly focused on high‑resource languages, leaving a critical gap in our understanding of automatic speech recognition (ASR) performance across Indic languages within real‑world healthcare dialogues. This study rigorously evaluates three leading ASR engines—Engine A (multilingual global provider), Engine B (regional Indic‑focused provider), and Engine C (open‑source)—across five major Indian languages: Hindi, Bengali, Tamil, Telugu, and Marathi. We collected 2,500 voice samples from 500 native speakers stratified by age, gender, and urban or rural residence, employing a standardized healthcare dialogue script encompassing symptom reporting, medication inquiries, appointment scheduling, and lifestyle advice. Each audio file was transcribed manually to serve as ground truth, then processed by the ASR engines. We measured word error rate (WER), semantic error rate (SER), and task completion rate (TCR), and conducted statistical analyses to assess the impacts of language, engine, dialect, and demographic factors. Our results reveal pronounced variability: Hindi and Marathi yielded the lowest average WERs (12.4% and 13.7%, respectively) and highest TCRs (86.7% and 84.3%), whereas Telugu exhibited the highest WER (22.8%) and lowest TCR (62.3%). Dialectal variation and rural speech patterns increased WER by up to 15%, and misrecognition of medical terminology accounted for 18% of semantic errors. Regression analyses confirmed that rural speakers and older adults experienced significantly higher error rates. Based on these findings, we propose a multi‑pronged strategy—including acoustic model adaptation with region‑specific corpora, context‑aware language modeling enriched with medical lexicons, and user‑adaptive feedback loops with confirmation prompts—to substantially improve ASR accuracy and reliability in healthcare chatbot deployments. This study not only quantifies the current limitations of Indic ASR in clinical contexts but also offers actionable recommendations for developers and policymakers to advance inclusive, safe, and effective voice‑based digital health interventions in India’s multilingual landscape.

Keywords

Health‑bot, ASR accuracy, Indian languages, word error rate, healthcare dialogue

References

https://www.researchgate.net/publication/364231722/figure/fig4/AS:11431281088627059@1665192052239/Health-Bot-conversation-flow-diagram.jpg
https://www.researchgate.net/publication/330583728/figure/fig1/AS:718585974513672@1548335655734/Flowchart-of-the-real-word-error-detection-and-correction-steps.jpg
Agarwal, S., & Sharma, R. (2020). Automatic speech recognition for Hindi: A review of datasets, techniques, and challenges. IEEE Transactions on Audio, Speech, and Language Processing, 28(5), 1234–1246.
Banerjee, A., & Joshi, S. (2019). Evaluating speech-to-text accuracy in low-resource Indian languages. International Journal of Speech Technology, 22(4), 789–799.
Bhat, P., & Khan, M. (2021). Contextual language modeling for healthcare chatbots in Telugu. Journal of Medical Systems, 45(10), 98.
Chakraborty, R., & Roy, S. (2018). Dialectal variation and its impact on ASR performance for Bengali. Proceedings of the Language Resources and Evaluation Conference, 1345–1352.
Das, N., & Verma, P. (2022). Semantic error analysis in clinical voice interfaces across Marathi dialects. Journal of Biomedical Informatics, 128, 104031.
Gupta, V., & Mehta, A. (2019). Acoustic model adaptation using regional speech corpora in Tamil ASR. Speech Communication, 113, 47–58.
Jain, S., & Patel, K. (2020). Task completion metrics for voice-driven healthcare assistants. Computers in Biology and Medicine, 119, 103688.
Joshi, A., & Singh, H. (2021). Comparative evaluation of cloud-based ASR engines for Hindi healthcare dialogues. Healthcare Informatics Research, 27(3), 221–229.
Kaur, H., & Khosla, R. (2018). Handling phonetic confusions in multilingual speech recognition. Speech Communication, 98, 25–34.
Kulkarni, P., & Rao, S. (2022). User-adaptive feedback mechanisms in medical voicebots: A usability study. Journal of Medical Internet Research, 24(7), e34512.
Kumar, R., & Pant, D. (2020). Building reliable ASR datasets for low-resource languages: The case of Marathi. Language Resources and Evaluation, 54(2), 317–332.
Mishra, S., & Tripathi, A. (2019). End-to-end deep learning architectures for speech recognition in Indian languages. Neural Computing and Applications, 31(12), 8765–8778.
Mukherjee, S., & Chatterjee, P. (2021). Evaluating semantic error rates in chatbot interactions: Healthcare domain. Artificial Intelligence in Medicine, 116, 102078.
Naik, L., & Desai, T. (2022). Impact of rural noise conditions on speech-to-text accuracy in Kannada. International Journal of Speech Technology, 25(1), 123–136.
Patel, R., & Shah, M. (2018). IndicTTS and ASR consortium: Resources for Indian language speech research. Proceedings of the Workshop on NLP for Similar Languages, Varieties and Dialects, 45–52.
Reddy, D., & Singh, Y. (2020). Speaker-adaptive training for clinical ASR: A case study in Hindi. IEEE Journal of Biomedical and Health Informatics, 24(9), 2504–2513.
Sharma, P., & Vernekar, A. (2021). Dialogue management strategies for voice-based medical chatbots. Journal of Healthcare Engineering, 2021, 8893457.
Singh, N., & Kumar, V. (2019). Phoneme-level error analysis in speech recognition for Indian accented English. Speech, Language and Hearing, 22(2), 115–123.
Thomas, L., & Menon, S. (2022). Real-world evaluation of voice-enabled telemedicine interfaces in rural India. Telemedicine and e-Health, 28(4), 487–495.
Verma, J., & Rao, K. (2020). Domain-specific language models for improved ASR in healthcare chatbots. Computer Speech & Language, 61, 101059.