Comparative Evaluation of ChatGPT-4o, Claude3.5, Gemini1.5 Pro, and Copilot for Determining Oral Medication Dosages

Authors

  • Morteza Heydari Ahvaz Jundishapur University of Medical Sciences image/svg+xml Author
  • Mohammadreza Razavizadeh Tehran University of Medical Sciences image/svg+xml Author
  • Saman Sameri Ahvaz Jundishapur University of Medical Sciences image/svg+xml Author
  • Kaveh Eslami Ahvaz Jundishapur University of Medical Sciences image/svg+xml Author

DOI:

https://doi.org/10.22034/TJT.2.3.55

Keywords:

Generative Artificial Intelligence, Artificial Intelligence, Large Language Models, Drug Information Services, Electronic Prescribing

Abstract

Delivering the most effective treatment to patients in the shortest possible time remains one of the most pressing challenges in modern healthcare. Large language models (LLMs) are widely accessible and have shown remarkable potential across domains, including achieving passing scores on the USMLE (United States Medical Licensing Examination), reducing physician visits, and lowering healthcare costs. This study aims to assess the capabilities, limitations, and practical considerations of integrating LLMs alongside pharmacists, with a focus on oral medication dosage prescriptions across different age groups. Questions were organized into seven domains, each comprising three Questions accompanied by clinical case scenarios, and prompts were designed using a zero-shot approach. Responses were evaluated against UpToDate using five criteria: response rate, accuracy, completeness, clarity, and safety. While none of the models had direct access to UpToDate, GPT-4o achieved the highest performance, correctly answering 100% of case-based questions. Copilot achieved 71.43% overall accuracy and 85.71% on case-based questions, but ranked lowest in completeness and clarity. Gemini 1.5Pro demonstrated the lowest response rate, while Copilot and Claude3.5 SonnetV2 generated unsafe outputs. Overall, the findings underscore the importance of evaluating context-dependent effectiveness before the broader adoption of large language models in clinical practice.

References

1. Flôres DDRV, Augusto De Toni Sartori A, Antunes JB, Nunes Pinto A, Pletsch J, Da Silva Dal Pizzol T. Drug information center: challenges of the research process to answer enquiries in hospital pharmaceutical practices. European Journal of Hospital Pharmacy. 2018;25(5): 262–266. DOI: https://doi.org/10.1136/ejhpharm-2017-001417

2. Huang X, Estau D, Liu X, Yu Y, Qin J, Li Z. Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists. British Journal of Clinical Pharmacology. 2024;90(1): 232–238. DOI: https://doi.org/10.1111/bcp.15896

3. Ghaibi S, Ipema H, Gabay M. Pharmacist’s Role in Providing Drug Information. American Journal of Health-System Pharmacy. 2015;72(7): 573–577. DOI: https://doi.org/10.2146/sp150002

4. Radha Krishnan RP, Hung EH, Ashford M, Edillo CE, Gardner C, Hatrick HB, et al. Evaluating the capability of ChatGPT in predicting drug–drug interactions: Real‐world evidence using hospitalized patient data. British Journal of Clinical Pharmacology. 2024;90(12): 3361–3366. DOI: https://doi.org/10.1111/bcp.16275

5. Hoyle JD, Davis AT, Putman KK, Trytko JA, Fales WD. Medication Dosing Errors in Pediatric Patients Treated by Emergency Medical Services. Prehospital Emergency Care. 2012;16(1): 59–66. DOI: https://doi.org/10.3109/10903127.2011.614043

6. Peeriga R. Individualization of Drug Dosage. In: Manubolu K, Peeriga R, Chandrasekhar KB (eds) A Short Guide to Clinical Pharmacokinetics. Singapore: Springer Nature Singapore; 2024. p. 97–120. DOI: https://doi.org/10.1007/978-981-97-4283-7_6

7. Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Internal Medicine. 2023;183(6): 589. DOI: https://doi.org/10.1001/jamainternmed.2023.1838

8. Giray L. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of Biomedical Engineering. 2023;51(12): 2629–2633. DOI: https://doi.org/10.1007/s10439-023-03272-4

9. Morath B, Chiriac U, Jaszkowski E, Deiß C, Nürnberg H, Hörth K, et al. Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis. European Journal of Hospital Pharmacy. 2024;31(6): 491–497. DOI: https://doi.org/10.1136/ejhpharm-2023-003750

10. Van Nuland M, Erdogan A, Aςar C, Contrucci R, Hilbrants S, Maanach L, et al. Performance of ChatGPT on Factual Knowledge Questions Regarding Clinical Pharmacy. The Journal of Clinical Pharmacology. 2024;64(9): 1095–1100. DOI: https://doi.org/10.1002/jcph.2443

11. Grossman S, Zerilli T, Nathan JP. Appropriateness of ChatGPT as a resource for medication‐related questions. British Journal of Clinical Pharmacology. 2024; bcp.16212. DOI: https://doi.org/10.1111/bcp.16212

12. Van Nuland M, Lobbezoo AFH, Van De Garde EMW, Herbrink M, Van Heijl I, Bognàr T, et al. Assessing accuracy of ChatGPT in response to questions from day to day pharmaceutical care in hospitals. Exploratory Research in Clinical and Social Pharmacy. 2024;15: 100464. DOI: https://doi.org/10.1016/j.rcsop.2024.100464

Downloads

Published

2025-10-29

Issue

Section

Original Articles

How to Cite

1.
Heydari M, Razavizadeh M, Sameri S, Eslami K. Comparative Evaluation of ChatGPT-4o, Claude3.5, Gemini1.5 Pro, and Copilot for Determining Oral Medication Dosages. J Telemed. [Internet]. 2025 Oct. 29 [cited 2025 Nov. 30];2(3). Available from: https://tjtmed.com/index.php/tjt/article/view/55

Similar Articles

1-10 of 22

You may also start an advanced similarity search for this article.