ChatGPT isn't ready for medical 'tests' as it fails in getting 83% cases wrong

10 months ago 13

A study in JAMA Pediatrics reveals that ChatGPT-4, an AI language model, performed poorly in evaluating children's health cases. With an 83% error rate, the study highlights the risks of relying on unvetted AI in healthcare. Researchers tested ChatGPT-4 against 100 pediatric case studies, finding it provided correct answers in only 17 instances. The inaccurate diagnoses raise concerns about the readiness of AI for medical applications. However, the study suggests ChatGPT-4 can be used as a supplementary tool for clinicians in complex cases.

A new study published in JAMA Pediatrics has thrown cold water on the hopes of some for

AI-powered

medical diagnoses, revealing that the popular language model

ChatGPT-4

performed poorly in evaluating children's health cases. According to a report by Ars Technica, with an error rate of a staggering 83%, the study underscores the dangers of relying on

unvetted AI

in high-stakes situations like

healthcare

Researchers from Cohen Children's Medical Center in New York tested ChatGPT-4 against 100 anonymised paediatric case studies, covering a range of common and complex conditions. The chatbot's dismal performance, missing vital clues and providing inaccurate diagnoses in the overwhelming majority of cases, raises serious concerns about the readiness of current AI technology for medical applications.
Out of 100 cases, ChatGPT provided correct answers in only 17 instances. In 72 cases, it gave inaccurate responses, and in the remaining 11 cases, it did not entirely grasp the correct diagnosis. Among the 83 incorrect diagnoses, 57 percent (47 cases) were related to the same organ system, as per the report.
How was ChatGPT evaluated?

During ChatGPT's evaluation, the researchers inserted the pertinent text of medical cases into the prompt. Subsequently, two qualified physician-researchers assessed the AI-generated responses, categorising them as either correct, incorrect, or "did not fully capture the diagnosis." In instances where ChatGPT fell into the latter category, it often provided a clinically related condition that was overly broad or insufficiently specific to be deemed the accurate diagnosis. For example, in diagnosing a child's case, ChatGPT identified a branchial cleft cyst—a lump in the neck or below the collarbone—when the correct diagnosis was Branchio-oto-renal syndrome. According to the report, this syndrome is a genetic condition leading to abnormal tissue development in the neck, along with malformations in the ears and kidneys. Notably, one of the indicators of this condition is the occurrence of branchial cleft cysts.

However, the study did mention that ChatGPT can be used as a supplementary tools. As part of the findings, the study noted that “LLM-based chatbots could be used as a supplementary tool for clinicians in diagnosing and developing a differential list for complex cases.”

Article From: timesofindia.indiatimes.com

Read Entire Article

Note:

We invite you to explore our website, engage with our content, and become part of our community. Thank you for trusting us as your go-to destination for news that matters.

Certain articles, images, or other media on this website may be sourced from external contributors, agencies, or organizations. In such cases, we make every effort to provide proper attribution, acknowledging the original source of the content.

If you believe that your copyrighted work has been used on our site in a way that constitutes copyright infringement, please contact us promptly. We are committed to addressing and rectifying any such instances

To remove this article:
Removal Request