MHRA/FDA Principles of Good Machine Learning Practice
The U.S. Food and Drug Administration (FDA), Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) have jointly identified 10 guiding principles that can inform the development of Good Machine Learning Practice (GMLP) for Artificial Intelligence as a Medial Device (AIaMD).
These principles were published by the MHRA on their website (27th October 2021). This blog attempts to answer three questions for medical device manufacturers, particularly those in the early stages of realising Artificial Intelligence as a Medical Device (AIaMD) products:
What do the guiding principles cover?
So what does that mean to us for our AIaMD software?
Now what should we do, to adhere to these principles?
What do the guiding principles cover, and so what does that mean to us for our AIaMD software?
1. Total Product Lifecycle (TPLC)
The first bit of good news is that this guidance aligns very nicely with existing regulatory frameworks and the concept of TPLC. Namely, developers should start thinking about how the product integrates into the clinical workflow even before starting to develop software. The first piece of documentation of any medical device that we tell all of our clients about should be the Intended Use Statement, that includes the following:
Intended Medical Indication and Intended Patient Population
Who are the patients - what demographics do you wish to cover? This is important for defining the clinical evidence that is required to be able to show comprehensive performance across that patient population, in an unbiased way. Remember that the patient demographics may differ in various regions (e.g. the racial mix in the USA vs UK vs other parts of the world) and you should think about how to factor that in to your clinical evidence generation.
Intended User Groups, Operating Environment and Operating Principle
Similarly to thinking about patient demographics, think about the user demographics:
Primary users - those who input data in to your AIaMD - what skill and experience do they need in order to input correct data?
Secondary users - those who consume the results from your AIaMD - are they
Specialists, who will have reports presented to them in specialised terms
General practitioners, who may need more signposting to what results mean
Lay users such as patients or their caregivers, who will need a whole different level of reporting to that offered to health care professionals (HCPs)
Tertiary users - those who have access to data but not for clinical purposes - for example
What will your Post Market Surveillance (PMS) people need to be able to monitor AIaMD performance in the field - what data will they need access to in order to provide meaningful metrics?
What will your IT Support people not be allowed access to, to preserve the confidentiality of patient medical information?
Think about the difference between clinical workflows in various parts of the world that you wish to operate in - map out the existing workflows with and without your AIaMD included - see how your offering could fit in (what I colloquially call the driving on the left vs driving on the right problem). Also make sure that the metrics used for clinical reporting are available for all regions (what I colloquially call the miles per hour versus kilometres per hour problem).
Foreseeable Misuse
Think about how the system could be misused - maybe not maliciously, but through lack of understanding by the users:
Clinical misuse
Is your system for detection or for diagnosis - are you pointing to suspicious areas such as lesions for a clinician to interpret and diagnose, or are you intended to automate the diagnosis?
What accuracy do you intended to offer, and what will be the consequence of Type I vs Type II errors?
Cybersecurity misuse
What could the data you accumulate in your system be used for that may be contrary to what the patients consented for, and how do you protect against such misuse?
How could data for individual or groups of patients leak from your system, and how do you projected against such leaks?
What would be the consequence on your business and on patient confidentiality of a hacker taking control of your system and the patient data it contains?
2. Good Software Engineering and Security Practices
Secondly, these principles also align with existing medical device regulations. Manufactures should be asking whether their dev team understands what it takes to develop software for medical devices, starting with documenting in a way that is trainable (for your team) and auditable (by the authorities) for the whole Software Development Lifecyle (SDLC):
A typical SDLC is split into the development process:
Starting with requirements and risk analysis, covering patient safety as well as cybersecurity
Architectural design that assures safety and security
Detailed design that describes the way in which the algorithms work, including logging of actions and explanabiliy of results
Coding, unit testing, integrating
Formal software verification against your technical requirements and the accompanying documentation (instructions for use) that your users rely on to use the product safely and securely
Product validation to show that your users can indeed use the AIaMD to achieve safe, secure, outcomes
Controlled software release - after the first version you upload to your application servers, do subsequent versions maintain the safety and securit that you worked hard to achieve in the first place?
and the supporting processes
Risk management - how are patient safety and cybersecurity risks identified and managed throughout development and operation of your AIaMD?
Configuration management - how are software configurations, including configurations of any curated content that you refer your users to, managed to preserve their currency and correctness?
Problem resolution - how do you deal with reported problems, in a controlled fashion?
Maintenance and change control - how do you bring all the above together for the long term?
Anyone who has made SaMD to market following these principles should have no trouble in following the guidance in order to incorporate these well -established processes into the AIaMD development workflows.
3. Clinical Study Participants and Data Sets should be representative
This all goes back to the intended use, as stated above - once you know what medical indications, what patient populations and what user groups you intend to cover, and you know the performance you wish to achieve (as clearly identified in a systematic literature review) - do you want to be superior to existing practice, or at least non-inferior? You can work this out from a basis of sound data science.
4. Training Data Sets are Independent of Test Data Sets
Following on from 1, 2 and 3 above, what does your SDLC say about how much data is needed and how you can partition it? Splitting test and training data is a no brainer, but it’s important to that this has been included as 1 of the 10 guiding principles. Think about how you can prove this has been done to an external auditor.
5. Selected Reference Datasets Are Based Upon Best Available Methods
The SDLC needs to justify what datasets will be used how - not just that there is a dataset available, but really why any particular dataset supports the intended use of the AIaMD.
6. Model Design is Tailored to the Available Data and reflects the Intended Use of the Device
There it is again, the phrase Intended Use!
7. Focus is Placed on the Performance of the Human-AI Team
Where the model has a “human in the loop”, formative usability evaluation almost always will need to occur; summative usability evaluation may need to occur based on the risk analysis performed early-on in the SDLC. Again this process is no different to non-AIaMD devices, with the caveat that you should be looking for possible anchoring biases from AI outputs to ensure human end-users are not being biased by the fact the AI is perceived to be performant.
8. Testing Demonstrates Device Performance during Clinically Relevant Conditions
The principle of Analytical Validity to verify software performance against a set of technical requirements, followed by Clinical Validity to validate software performance in real-world conditions, is a whole blog topic in itself…
9. Users are Provided Clear, Essential Information
What is often an afterthought is the accompanying documentation (formal instructions for use, training materials, even what the software itself provides such as help screens and tips popups) - I always suggest that these are thought through in requirements and risk analysis up front, and are subject to usability evaluation so that representative users get to try out the documentation, not just the software.
10. Deployed Models are monitored for Performance and Retraining
While the concept of Post Market Surveillance (PMS) and Post Market Clinical Followup (PMCF) are relatively new to the EU MDR, the principles are tried-and-tested. They consist of both proactive and reactive elements:
Proactive measurement of performance, which may well entail planning to follow-up individual cases with clinicians and patients to match outcomes to product claims
Reactive response to problems raised with your own AIaMD as well as problems reported with competing or complementary technology - corrective action against your own problems and preventive actions to avoid problems others are having becoming problems for you too
Make sure you have identified how you are going to monitor your AIaMD performance in an ongoing live clinical setting. Will it be regular audits, error reporting or constant human feedback - or maybe a mix of the three?
Now what do we do, to adhere to these principles?
Fundamentally - which is what any existing Hardian Health client I have ever talked to has hear me say - plan your Clinical Evaluation Lifecycle (CELC) in parallel with your Software Development Lifecycle (SDLC), as in this diagram:
Hardian's regulatory workflow
How does Hardian help manufacturers comply with this guidance?
Typically we work with clients on the following concepts to show that they are adhering to the 10 point guidance on developing AIaMD:
We would show how to apply recognised standards, such as for cybersecurity, the ISO international standards, the US-based NEMA and AAMI cybersecurity standards, and the EU MDCG guidelines, into your requirements and risk management.
We would integrate sound data science into your clinical investigation planning, as part of satisfying the requirements that for example the MHRA have for good clinical practice for clinical investigations for medical devices (ISO 14155).
We would make sure that PMS and PMCF are planned for, in terms of satisfying requirements in your software and in your organisation for the collection and use of appropriate metrics; also that contracts with your customers include the appropriate responsibilities for data sharing and data governance in general.
All of the above are there to lead to the goal we surely all have: to provide safe, secure, medical device products that will benefit individual patients and society in general.
Hardian Health is clinical digital consultancy focussed on leveraging technology into healthcare markets through clinical strategy, scientific validation, regulation, health economics and intellectual property.