[1] Title and abstract
Item 1a:
Identification as a randomised trial in the title
Item 1b:
Structured summary of trial design, methods, results, and conclusions
CONSORT-AI 1a,b (i) Elaboration:
Indicate that the intervention involves artificial intelligence / machine learning in the title and/or abstract and specify the type of model.
Indicating in the title and/or abstract of the trial report that the intervention involves a form of AI is encouraged, as it immediately identifies the intervention as an artificial intelligence/machine learning intervention and also serves to facilitate indexing and searching of the trial report. The title should be understandable by a wide audience, therefore a broader umbrella term such as artificial intelligence or machine learning is encouraged. More precise terms should be used in the abstract, rather than the title, unless broadly recognised as being a form of artificial intelligence/machine learning. Specific terminology relating to the model type and architecture should be detailed in the abstract.
CONSORT-AI 1a,b (ii) Elaboration:
State the intended use of the AI intervention within the trial in the title and/or abstract.
Describe the intended use of the AI intervention in the trial report title and/or abstract. This should describe the purpose of the AI intervention and the disease context. Some AI interventions may have multiple intended uses or the intended use may evolve over time. Therefore, documenting this allows readers to understand the intended use of the algorithm at the time of the trial.
[2] Introduction
Item 2a:
Scientific background and explanation of rationale
CONSORT-AI 2a (i) Extension:
Explain the intended use for the AI intervention in the context of the clinical pathway, including its purpose and its intended users (e.g. healthcare professionals, patients, public).
In order to understand how the AI intervention is intended to fit into a clinical pathway, a detailed description of its role should be included in the background of the trial report. AI interventions may be designed to interact with different users including healthcare professionals, patients and the public, and its role can be wide-ranging (for example the same AI intervention could theoretically be replacing, augmenting or adjudicating components of clinical decision-making). Clarifying the intended use of the AI intervention and its intended user helps readers understand the purpose for which the AI intervention was evaluated in the trial.
Item 2b:
Specific objectives or hypotheses
[3-12] Methods
3. Trial design
Item 3a:
Description of trial design (such as parallel, factorial) including allocation ratio
Item 3b:
Important changes to methods after trial commencement (such as eligibility criteria), with reasons
4. Participants
Item 4a:
Eligibility criteria for participants
CONSORT-AI 4a (i) Elaboration:
State the inclusion and exclusion criteria at the level of participants.
The inclusion and exclusion criteria should be defined at the participant level as per usual practice in non-AI interventional trial reports. This is distinct from the inclusion and exclusion criteria made at the input data level, which is addressed in
item 4a (ii).
CONSORT-AI 4a (ii) Extension:
State the inclusion and exclusion criteria at the level of the input data.
Input data refer to the data required by the AI intervention to serve its purpose (e.g. for a breast cancer diagnostic system, the input data could be the unprocessed or vendor-specific post-processing mammography scan upon which a diagnosis is being made; for an early warning system, the input data could be physiological measurements or laboratory results from the electronic health record). The trial report should pre-specify if there were minimum requirements for the input data (such as image resolution, quality metrics or data format) which determined pre-randomisation eligibility. It should specify when, how and by whom this was assessed. For example, if a participant met the eligibility criteria for lying flat for a CT scan as per
item 4a (i), but the scan quality was compromised (for any given reason) to such a level that it was deemed unfit for use by the AI system, this should be reported as an exclusion criterion at the input data level. Note that where input data are acquired after randomisation, any exclusion is considered to be from the analysis, not from enrollment (see CONSORT
item 13b and
Figure 1).
Item 4b:
Settings and locations where the data were collected
CONSORT-AI 4b Extension:
Describe how the AI intervention was integrated into the trial setting, including any onsite or offsite requirements.
There are limitations to the generalisability of AI algorithms, one of which is when they are used outside of their development environment. AI systems are dependent on their operational environment and the report should provide details of the hardware and software requirements to allow technical integration of the AI intervention at each study site. For example, it should be stated if the AI intervention required vendor-specific devices, if there was specialised computing hardware at each site, or if the site had to support cloud integration, particularly if this was vendor-specific. If any changes to the algorithm were required at each study site as part of the implementation procedure (such as fine-tuning the algorithm on local data), then this process should also be clearly described.
5. Interventions
Item 5:
The interventions for each group with sufficient details to allow replication, including how and when they were actually administered
CONSORT-AI 5 (i) Extension:
State which version of the AI algorithm was used.
Similar to other forms of software as a medical device, AI systems are likely to undergo multiple iterations and updates in their lifespan. It is therefore important to specify which version of the AI system was used in the clinical trial, whether this is the same as the version evaluated in previous studies that have been used to justify the study rationale, and whether the version changed during the conduct of the trial. If applicable, the report should describe what has changed between the relevant versions and the rationales for the changes. Where available, the report should include a regulatory marking reference, such as an Unique Device Identifier (UDI) which requires a new identifier for updated versions of the device.
CONSORT-AI 5 (ii) Extension:
Describe how the input data were acquired and selected for the AI intervention.
The measured performance of any AI system may be critically dependent on the nature and quality of the input data. A description of the input data handling, including acquisition, selection and pre-processing prior to analysis by the AI system should be provided. Completeness and transparency of this description is integral to the replicability of the intervention beyond the clinical trial in real world settings. It also helps readers identify whether input data handling procedures were standardised across trial sites.
CONSORT-AI 5 (iii) Extension:
Describe how poor quality or unavailable input data were assessed and handled.
As with 4a (ii), input data refer to the data required by the AI intervention to serve its purpose. As discussed in CONSORT-AI 4a (ii), the performance of AI systems may be compromised as a result of poor quality or missing input data (for example, excessive movement artefact on an electrocardiogram). The trial report should report the amount of missing data, as well as how this was identified and handled. The report should also specify if there was a minimum standard required for the input data, and where this standard was not achieved, how this was handled (including the impact on, or any changes to, the participant care pathway).
Poor quality or unavailable data can also affect non-AI interventions. For example, sub-optimal quality of a scan could impact a radiologist’s ability to interpret it and make a diagnosis. It is therefore important that this information is reported equally in the control intervention, where relevant. If this minimum quality standard was different from the inclusion criteria for input data used to assess eligibility pre-randomisation, this should be stated.
CONSORT-AI 5 (iv) Extension:
Specify whether there was human-AI interaction in the handling of the input data, and what level of expertise was required of users.
A description of the human-AI interface and the requirements for successful interaction when handling input data should be described. For example, clinician-led selection of regions of interest from a histology slide which is then interpreted by an AI diagnostic system, or endoscopist selection of a colonoscopy video clips as input data for an algorithm designed to detect polyps. A description of any user training provided and instructions for how users should handle the input data provides transparency and replicability of trial procedures. Poor clarity on the human-AI interface may lead to lack of a standard approach and carry ethical implications, particularly in the event of harm. For example, it may become unclear whether an error case occurred due to human deviation from the instructed procedure, or if it was an error made by the AI system.
CONSORT-AI 5 (v) Extension:
Specify the output of the AI intervention.
The output of the AI intervention should be clearly specified in the trial report. For example, an AI system may output a diagnostic classification or probability, a recommended action, an alarm alerting to an event, an instigated action in a closed-loop system (such as titration of drug infusions), or other. The nature of the AI intervention's output has direct implications on its usability and how it may lead to downstream actions and outcomes.
CONSORT-AI 5 (vi) Extension:
Explain how the AI intervention’s outputs contributed to decision-making or other elements of clinical practice.
Since health outcomes may also critically depend on how humans interact with the AI intervention, the report should explain how the outputs of the AI system were used to contribute to decision-making or other elements of clinical practice. This should include adequate description of downstream interventions which can impact outcomes. As with CONSORT 5 (iv), any elements of human-AI interaction on the outputs should be described in detail, including the level of expertise required to understand the outputs and any training/instructions provided for this purpose. For example, a skin cancer detection system that produced a percentage likelihood as output should be accompanied by an explanation of how this output was interpreted and acted upon by the user, specifying both the intended pathways (e.g. skin lesion excision if the diagnosis is positive) and the thresholds for entry to these pathways (e.g. skin excision if the diagnosis is positive and the probability is >80%). The information produced by comparator interventions should be similarly described, alongside an explanation of how such information was used to arrive at clinical decisions on patient management, where relevant. Any discrepancy in how decision-making occurred versus how it was intended to occur (i.e. as specified in the trial protocol), should be reported.
6. Outcomes
Item 6a:
Completely defined pre-specified primary and secondary outcome measures, including how and when they were assessed
Item 6b:
Any changes to trial outcomes after the trial commenced, with reasons
7. Sample size
Item 7a:
How sample size was determined
Item 7b:
When applicable, explanation of any interim analyses and stopping guidelines
8. Randomisation: sequence generation
Item 8a:
Method used to generate the random allocation sequence
Item 8b:
Type of randomisation; details of any restriction (such as blocking and block size)
9. Allocation concealment mechanism
Item 9:
Mechanism used to implement the random allocation sequence (such as sequentially numbered containers), describing any steps taken to conceal the sequence until interventions were assigned
10. Implementation
Item 10:
Who generated the random allocation sequence, who enrolled participants, and who assigned participants to interventions
11. Blinding
Item 11a:
If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how
Item 11b:
If relevant, description of the similarity of interventions
12. Statistical methods
Item 12a:
Statistical methods used to compare groups for primary and secondary outcomes
Item 12b:
Methods for additional analyses, such as subgroup analyses and adjusted analyses
[13-19] Results
13. Participant flow
Item 13a:
For each group, the numbers of participants who were randomly assigned, received intended treatment, and were analysed for the primary outcome
Item 13b:
For each group, losses and exclusions after randomisation, together with reasons
14. Recruitment
Item 14a:
Dates defining the periods of recruitment and follow-up
Item 14b:
Why the trial ended or was stopped
15. Baseline data
Item 15:
A table showing baseline demographic and clinical characteristics for each group
16. Numbers analysed
Item 16:
For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups
17. Outcomes and estimation
Item 17a:
For each primary and secondary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval)
Item 17b:
For binary outcomes, presentation of both absolute and relative effect sizes is recommended
18. Ancillary analyses
Item 18:
Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing pre-specified from exploratory
19. Harms
Item 19:
All important harms or unintended effects in each group
CONSORT-AI 19 Extension:
Describe results of any analysis of performance errors and how errors were identified, where applicable. If no such analysis was planned or done, explain why not.
Reporting performance errors and failure case analysis is especially important for AI interventions. AI systems can make errors which may be hard to foresee, but which if allowed to be deployed at scale, could have catastrophic consequences. Therefore reporting cases of error and defining risk mitigation strategies are important for informing when, and for which populations, the intervention can be safely implemented. The results of any performance error analysis should be reported and the implications of the results discussed.
[20-22] Discussion
20. Limitations
Item 20:
Trial limitations, addressing sources of potential bias, imprecision, and, if relevant, multiplicity of analyses
21. Generalisability
Item 21:
Generalisability (external validity, applicability) of the trial findings
22. Interpretation
Item 22:
Interpretation consistent with results, balancing benefits and harms, and considering other relevant evidenced
[23-25] Other information
23. Registration
Item 23:
Registration number and name of trial registry
24. Protocol
Item 24:
Where the full trial protocol can be accessed, if available
25. Funding
Item 25:
Sources of funding and other support (such as supply of drugs), role of funders
CONSORT-AI 25 Extension:
State whether and how the AI intervention and/or its code can be accessed, including any restrictions to access or re-use.
The trial report should make it clear whether and how the AI intervention and/or its code can be accessed or re-used. This should include details regarding the license and any restrictions to access.
Citation
When referring to the CONSORT-AI guidelines, please cite one of the following articles:
Nature Medicine
The Lancet Digital Health
BMJ