Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English

Malik Sallam; Israa M. Alasfoor; Shahad  W. Khalid; Rand I. Al-Mulla; Amwaj  Al-Farajat; Maad  M. Mijwil; Reem  Zahrawi; Mohammed  Sallam; Jan  Egger; Ahmad S. Al-Adwan

doi:10.52225/narra.v5i1.2371

Authors

Malik Sallam Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan; Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan https://orcid.org/0000-0002-0165-9670
Israa M. Alasfoor Section of Ophthalmology, Department of Special Surgery, School of Medicine, The University of Jordan, Amman, Jordan; Section of Ophthalmology, Department of Special Surgery, Jordan University Hospital, Amman, Jordan https://orcid.org/0009-0003-7257-6762
Shahad W. Khalid Section of Ophthalmology, Department of Special Surgery, School of Medicine, The University of Jordan, Amman, Jordan; Section of Ophthalmology, Department of Special Surgery, Jordan University Hospital, Amman, Jordan https://orcid.org/0009-0004-3856-3340
Rand I. Al-Mulla Section of Ophthalmology, Department of Special Surgery, School of Medicine, The University of Jordan, Amman, Jordan; Section of Ophthalmology, Department of Special Surgery, Jordan University Hospital, Amman, Jordan https://orcid.org/0009-0003-4175-3965
Amwaj Al-Farajat Section of Ophthalmology, Department of Special Surgery, School of Medicine, The University of Jordan, Amman, Jordan; Section of Ophthalmology, Department of Special Surgery, Jordan University Hospital, Amman, Jordan https://orcid.org/0000-0003-2367-537X
Maad M. Mijwil College of Administration and Economics, Al-Iraqia University, Baghdad, Iraq; Department of Computer Techniques Engineering, Baghdad College of Economic Sciences University, Baghdad, Iraq https://orcid.org/0000-0002-2884-2504
Reem Zahrawi Department of Ophthalmology, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates
Mohammed Sallam Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates; Department of Management, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates; Department of Management, School of Business, International American University, Los Angeles, United States; College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences (MBRU), Dubai, United Arab Emirates https://orcid.org/0000-0003-3273-524X
Jan Egger Institute for Artificial Intelligence in Medicine (IKIM), Essen University Hospital (AöR), Girardetstraße, Germany; Center for Virtual and Extended Reality in Medicine (ZvRM), Essen University Hospital (AöR), Hufelandstraße, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen (AöR), Hufelandstraße, Germany; University of Duisburg-Essen, Faculty of Computer Science, Schützenbahn, Germany https://orcid.org/0000-0002-5225-1982
Ahmad S. Al-Adwan Department of Business Technology, Al-Ahliyya Amman University, Amman, Jordan https://orcid.org/0000-0001-5688-1503

DOI:

https://doi.org/10.52225/narra.v5i1.2371

Keywords:

LLM, OpenAI, DeepSeek, Qwen, eye disease

Abstract

The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI’s ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-4o and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-4o were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43±0.28), outperforming both DeepSeek-R1 (4.31±0.43) and ChatGPT-4o (4.14±0.41), with p=0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40±0.29), followed by DeepSeek-R1 (4.20±0.49) and ChatGPT-4o (4.14±0.41), with p=0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-4o demonstrating the most balanced linguistic capabilities (p=0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-4o in ophthalmology-related queries.

Downloads

Download data is not yet available.

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

quickmenu2

Quickmenu

statistics

Statistics

tools

Tools

template

Templates