Files

Download

Download Full Text (569 KB)

Description

The rapid expansion of artificial intelligence (AI) in academic research has improved efficiency and accessibility, particularly in literature review and citation generation. AI-driven tools embedded within large language models are increasingly used to identify, synthesize, and cite scholarly sources. Despite their convenience, concerns persist regarding the accuracy and reliability of AI-generated references. In emergency medicine, where evidence-based practice and scholarly rigor are essential, inaccurate citations may undermine research credibility and propagate misinformation. Prior studies have documented fabricated, incomplete, or misattributed references generated by AI; however, the prevalence and characteristics of these inaccuracies within emergency medicine remain poorly defined. Systematic evaluation of citation accuracy is therefore critical to guiding responsible AI use.

We will conduct a cross-sectional analysis of randomly selected research articles and systematic reviews published in the August 2025 issues of Annals of Emergency Medicine and Prehospital Emergency Care. All citations within selected articles will be cross-referenced with the cited source to assess accuracy. Errors will be categorized by type and severity using a predefined classification scale. Eligible publications will be assigned numeric identifiers, and a random number generator will select four original research articles and two review articles. Three medical students will evaluate citations, with the principal investigator reviewing a random subset. An additional 10-20% of citations will be assessed by emergency medicine physicians and residents to evaluate inter-rater reliability. Primary outcomes include citation accuracy and error severity; secondary outcomes include inter-rater reliability (κ) and clinician assessment of the classification system.

From the August issues of Annals of Emergency Medicine: Volume 86 (9) and Prehospital Emergency Care: Volume 29 (20) 29 articles total met inclusion criteria. Three systematic reviews formed an additional subset. From these groups, six articles were selected from the main set and one from the systematic review subset. One article was used as a pilot to establish rater consensus guidelines and excluded from analysis. The final evaluation included six articles comprising 237 unique citations. Student raters were assigned 122 and 115 citations, respectively, with the PI reviewing three overlapping articles (137 citations).    Nineteen citations with errors were identified, which consists of 8% of all citations. There were 4 incidents of initial non-consensus identified, one flagged by a single reviewer. The total non-zero review scores (with overlap) were 28. Error type counts were as follows:  -Existence errors (5): 1  -Accuracy errors (4): 17  -Obsolete errors (3): 3  -Relevance errors (2): 6  -Assembly errors (1): 1A

Given the wide availability and growing prevalence of Al-assisted citation tools, it's important to take a measured approach when utilizing them for scholarly publications, and that authors need to verify citations to ensure academic credibility and integrity, as well as prevent propagation of inaccurate and misleading references.

Publication Date

5-8-2026

Disciplines

Emergency Medicine

Comments

2026 Research Day Corewell Health West, Grand Rapids, MI, May 8, 2026. Abstract 2070

Accuracy of AI-Assisted Citation Generation in Emergency Medicine Publications

Share

COinS