Exploring the Performance of AI Scribes

Project Spotlight: Exploring the Performance of AI Scribes in Clinical Consultations

At CoDE, we are committed to independently evaluating emerging technologies to inform safe and effective adoption across healthcare. Our second project focuses on understanding how different AI scribes perform when summarising clinical consultations.

Project Overview

With the rapid expansion of Clinical AI Scribes, healthcare providers need clear insights into how these tools perform in realistic settings. Our evaluation compared seven commercially available AI scribes across a series of simulated consultations, reflecting a range of typical clinical scenarios.

Each system was tested under consistent conditions and their outputs compared to a gold-standard reference. Our focus was on identifying any issues in the completeness, accuracy, and quality of the generated summaries. Specifically the measures that we utilised included:

Inclusion of critical information
Signal to noise ratio
Ability to record information, omitting misleading statements
Background noise interference
Analysis of different accents
Ability to record the correct meaning of terms
Speech impediments
Influence of personality traits
Evaluation of software by health professionals
Microphones and positioning

The findings of these initial experiments are in the process of being finalised.

High Level Insights

Performance Varies Between Systems: While several AI scribes demonstrated strong potential, none were without errors, and significant differences were observed in how well different systems handled complex consultations.

Omissions Were the Most Common Error: Missing important clinical information was the leading issue across all systems, underlining the need for continued clinical oversight.

Different Scenarios, Different Challenges: More complex or ambiguous consultations, such as dermatological and sleep apnoea cases, presented greater difficulties for all AI systems.

Emerging Themes

The Need for Vigilance: AI scribes can greatly support clinicians, but they must be used with a robust safety net of review and verification.

Quality Varies Widely: Careful selection, piloting, and validation are crucial before adoption.

Continuous Improvement: Ongoing feedback and learning are key to evolving these technologies to reliably meet clinical standards.

Why it Matters

AI scribes have the potential to significantly reduce the administrative burden on clinicians and enhance patient-centred care. However, our research emphasises that independent evaluation, clinician oversight, and careful implementation are essential to ensure safe and effective use.

Interested in learning more or trialling AI scribe technology in your setting? Contact us to get involved.