Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English
DOI:
https://doi.org/10.52225/narra.v5i1.2371Keywords:
LLM, OpenAI, DeepSeek, Qwen, eye diseaseAbstract
The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI’s ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-4o and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-4o were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43±0.28), outperforming both DeepSeek-R1 (4.31±0.43) and ChatGPT-4o (4.14±0.41), with p=0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40±0.29), followed by DeepSeek-R1 (4.20±0.49) and ChatGPT-4o (4.14±0.41), with p=0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-4o demonstrating the most balanced linguistic capabilities (p=0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-4o in ophthalmology-related queries.
Downloads
Downloads
Issue
Section
Citations
License
Copyright (c) 2025 Malik Sallam, Israa M. Alasfoor, Shahad W. Khalid, Rand I. Al-Mulla, Amwaj Al-Farajat, Maad M. Mijwil, Reem Zahrawi, Mohammed Sallam, Jan Egger, Ahmad S. Al-Adwan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.