Evaluating Artiﬁcial Intelligence in Spinal Cord Injury Management: A Comparative Analysis of ChatGPT-4o and Google Gemini Against American College of Surgeons Best Practices Guidelines for Spine Injury

Global Spine Journal, 2025 · DOI: 10.1177/21925682251321837 · Published: January 1, 2025

Spinal Cord Injury Healthcare Bioinformatics

Simple Explanation

This study compares how well two AI chatbots, ChatGPT-4o and Google Gemini Advanced, follow established guidelines for treating spinal cord injuries. The American College of Surgeons (ACS) has created best practice guidelines for managing spinal injuries. Researchers tested the chatbots by asking them questions based on these guidelines and then checked if the answers matched the recommendations. This helps understand if AI can give accurate advice for spinal injury care. The goal was to see if these AI models could be helpful tools for doctors and patients in making decisions about spinal injury treatment, but also to identify any limitations they might have.

Study Duration

Not specified

Participants

Not specified

Evidence Level

Comparative Analysis

Key Findings

1
ChatGPT-4o was correct on 73.07% of the questions, while Gemini Advanced was correct on 69.23%. Most incorrect answers were due to not providing enough information rather than giving wrong information.
2
The models agreed on most questions, but when they disagreed, ChatGPT-4o was more often correct. Both models did well on general information questions, but Gemini Advanced was better at diagnostic questions, while ChatGPT-4o was better at treatment questions.
3
The study found that neither model was significantly better than the other in following the guidelines. The researchers suggest that while AI can be helpful, it has limitations and should not replace the expertise of healthcare professionals.

Research Summary

This study evaluated the ability of ChatGPT-4o and Google Gemini Advanced to adhere to the 2022 American College of Surgeons (ACS) Best Practice Guidelines for spinal injury management. The AI models were posed with 52 questions derived from the ACS guidelines, and their responses were assessed for concordance. The results showed that ChatGPT-4o had a concordance rate of 73.07%, while Gemini Advanced had a rate of 69.23%. Most non-concordant responses were due to insufficient information. There was no statistically significant difference between the two models' performance. The study concludes that while both AI models show potential as valuable assets in spinal injury management, their limitations prevent them from being clinically safe and practical in trauma-based settings without careful clinician oversight.

Practical Implications

Clinical Decision Support

AI models can assist healthcare providers in making informed decisions about spinal injury management by providing responses aligned with current best practices.

Educational Tool

AI chatbots can serve as educational resources for medical professionals and patients seeking information on spinal cord injuries and their management.

Guideline Adherence Monitoring

AI can be used to monitor and assess the adherence of healthcare providers to clinical guidelines in spinal injury management.

Study Limitations

1
The responses generated by ChatGPT, which is based on an LLM trained on data up to April 2023, may lack awareness of significant discoveries or updates in the field made after this date.
2
The scope of this study was limited, as it only evaluated a specific set of questions related to a single set of guidelines for spinal cord injury.
3
The process was inherently subjective and did not represent a precise quantitative measure of the model’s accuracy. This subjectivity may introduce bias into the evaluation.

Your Feedback

Was this summary helpful?

Back to Spinal Cord Injury