CORE–MD: A Path to Clearer AI Device Evidence Standards?
One of the most common questions our clients ask us is deceptively simple: "What level of evidence do we need for regulatory approval for our AI tool?"
The answer isn’t always clear. While the EU Medical Device Regulations (MDR) states that devices "shall achieve the performance intended by their manufacturer" and "be safe and effective and shall not compromise the clinical condition or the safety of patients", it offers no real guidance on what counts as sufficient clinical evidence to demonstrate this - especially for software and AI-based medical devices. Guidance documents like MDCG 2020-1 provide high-level direction, but aren’t specific or particularly helpful for novel technologies.
This ambiguity creates a second challenge: divergent interpretations among Notified Bodies. Reviewers may hold different views on what qualifies as "sufficient" evidence, leading to misalignment between manufacturers and assessors, and often, unexpected delays or requests for additional data.
Against this backdrop, the recent proposal from the CORE-MD consortium, published in npj Digital Medicine, is a welcome development. The paper introduces the CORE-MD AI Risk Score, a structured framework to guide the level of clinical evidence required for regulatory approval of AI medical device software, tailored to the specific risk profile of the product.
Key Contributions
The CORE–MD AI Risk Score introduces a 3-tier framework to assess AI-based medical devices, based on 3 universal steps of evidence in medical devices: scientific, technical, and clinical validity:
Valid Clinical Association Score (VCAS): Evaluates the strength and transparency of the relationship between input data and the clinical condition or physiological state the software is intended to address.
Valid Technical Performance Score (VTPS): Assesses the robustness of the technical validation process, including the breadth of external validation and independence of datasets.
Clinical Performance Score (CPS): Quantifies the clinical risk based on the seriousness of the health condition and the intended function of the software (informing, driving decisions, or diagnosing/treating). This language is aligned with IMDRF terminology when it comes to classifying medical devices.
Each subscore ranges from 1 to 3, with the CPS combining two subcomponents for a possible score of 2 to 6, yielding a total risk score range of 4 to 12.
Table 1: Components of the CORE-MD AI risk score, reproduced from Rademakers et al., 2025.
The framework maps these scores to recommended levels of clinical evaluation pre- and post-market. For example, they recommend that high-risk tools (scores ≥10) require extensive pre-market clinical investigations, while lower-risk tools could be conditionally approved with robust post-market clinical follow-up and limited pre-market pilot data related to safety.
Regulatory Value Add
This proposal could provide some welcome benefits to the EU medical device regulatory framework:
Risk-Proportional Evidence Requirements: The framework aligns clinical evidence obligations with the potential risk posed by the device, allowing for a consistent interpretation of the regulations.
Defining risk appetite: It suggests how regulators may be able to increase their risk appetite for lower-risk devices through a greater emphasis on post-market, rather than pre-market evidence.
Alignment with International Standards: The authors ensure that the framework is aligned with IMDRF guidance and the National Institute of Standards and Technology (NIST) AI lifecycle models, laying the groundwork for global regulatory convergence between the EU and the US.
Limitations and Areas for Development
Despite its promise, the framework also presents some limitations:
Subjectivity in Scoring: Determining whether a clinical association is "strong" or "moderate," or whether oversight is "easy" or "difficult," is still fundamentally subjective and could lead to inconsistent opinions on scoring.
Limited Coverage of Emerging Architectures: The framework does not currently account for foundation models or large language models (LLMs) that may have multiple functionalities and variable outputs. These types of tools are increasingly entering clinical workflows - whether they have the necessary regulatory approvals or not - and the evidence requirements for them are not clear.
Uncertain Regulatory Adoption: The score is not yet an official part of MDCG guidance and will require further validation and stakeholder alignment before formal adoption. Therefore, even though the framework is helpful to structure our thinking and dialogue, Notified Bodies aren’t obliged to follow its recommendations.
Conclusion
The CORE–MD AI Risk Score is a timely proposal that offers a proportionate, evidence-based framework for the regulatory evaluation of AI medical device software. If adopted, it could streamline the approval of lower-risk AI devices while ensuring that high-risk systems undergo appropriately rigorous evaluation. As AI continues to evolve, regulatory mechanisms like this will be essential to balance innovation with safety and efficacy in the digital health ecosystem.
If you’re building cutting-edge AI medical devices and need support in navigating the ever-changing evidence requirements, the Clinical team at Hardian are here to support you.
Hardian Health is a clinical digital consultancy focused on leveraging technology into healthcare markets through clinical evidence, market strategy, scientific validation, regulation, health economics and intellectual property.