Comparative Evaluation of ChatGPT-4o, Claude3.5, Gemini1.5 Pro, and Copilot for Determining Oral Medication Dosages

Authors

  • Morteza Heydari Student Research Committee, Faculty of Pharmacy, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran. Author
  • Mohammadreza Razavizadeh Department of Pharmaceutics, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran Author
  • Saman Sameri Student Research Committee, Faculty of Pharmacy, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Author
  • Kaveh Eslami Department of Clinical Pharmacy, Faculty of Pharmacy, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Author

Abstract

Delivering the most effective treatment to patients in the shortest possible time remains one of the most pressing challenges in modern healthcare. Large language models (LLMs) are widely accessible and have shown remarkable potential across domains, including achieving passing scores on the USMLE (United States Medical Licensing Examination), reducing physician visits, and lowering healthcare costs. This study aims to assess the capabilities, limitations, and practical considerations of integrating LLMs alongside pharmacists, with a focus on oral medication dosage prescriptions across different age groups. Questions were organized into seven domains, each comprising three Questions accompanied by clinical case scenarios, and prompts were designed using a zero-shot approach. Responses were evaluated against UpToDate using five criteria: response rate, accuracy, completeness, clarity, and safety. While none of the models had direct access to UpToDate, GPT-4o achieved the highest performance, correctly answering 100% of case-based questions. Copilot achieved 71.43% overall accuracy and 85.71% on case-based questions, but ranked lowest in completeness and clarity. Gemini 1.5Pro demonstrated the lowest response rate, while Copilot and Claude3.5 SonnetV2 generated unsafe outputs. Overall, the findings underscore the importance of evaluating context-dependent effectiveness before the broader adoption of large language models in clinical practice.

Downloads

Published

2025-10-29

Issue

Section

Original Articles

How to Cite

Heydari, M., Razavizadeh, M., Sameri, S., & Eslami, K. (2025). Comparative Evaluation of ChatGPT-4o, Claude3.5, Gemini1.5 Pro, and Copilot for Determining Oral Medication Dosages. The Journal of Telemedicine, 2(3). https://tjtmed.com/index.php/tjt/article/view/55