Ethnic and racial minorities are commonly underrepresented in clinical trials. This problem is so severe that in April, the U.S. Food and Drug Administration (FDA) expanded upon existing guidance to further emphasize recommendations to sponsors developing treatments to increase enrollment from underrepresented populations in the U.S., including African-American, Hispanic, Asian and other persons of color, in clinical trials. In the updated guidance, the FDA provided details on what sponsors should include in a Race and Ethnicity Diversity Plan when submitting for an investigational new drug (IND) application or a new drug application (NDA).
As an industry, there has been a long-standing commitment to expanding participation and diversity in trials. However, we are aware we could and need to do better, pushing forward to advance fairness and access. At the annual Drug Information Association (DIA) meeting in June, I shared insights on how to leverage artificial intelligence (AI) and machine learning (ML) for clinical trial-site matching to simultaneously improve enrollment rate and diversity.
In taking a deeper look into how AI can help achieve the goal of increasing enrollment of underrepresented populations in trials, we first have to define the issues that need to be addressed. Discriminatory biases, whether unconscious or not, in established policies or practices, can impact data collection processes, how variables are defined and more. As such, applying learning algorithms and related automation processes to data insights influenced by these biases may perpetuate existing disparities and unfairness toward certain communities. By leveraging principles of fairness in ML, AI solutions can help address these challenges to improve trial participation and proportionate representation.
AI/ML: informing trial site selection
The pharma industry has now well established that it is possible to use ML to effectively rank a list of sites for a given study to optimize enrollment. Our newest approach, however, is based on the principles of reinforcement learning, where ML can learn to identify a set of trial sites that, together, yield a high expected patient enrollment for a given clinical trial and ensures the enrolled cohort is diverse. Specifically, during training, the ML model learns to use trial protocol details (e.g., condition, inclusion/exclusion criteria), trial site features, previous performance, claims data and patient demographics at the trial sites (e.g., ethnicity, age) to produce a ranked list of potentially desirable trial-sites that account for performance and diversity. By helping to pinpoint sites that can engage diverse patient populations, it is possible to help improve trial awareness, access and participation.
In order to generate a potential list of target trial sites based on the desired outcomes of reaching diverse patient populations, we have to consider which datasets are needed to factor into the ML model. Datasets to consider are:
Clinical trial metadata
The nuances of individual studies are different, and no two studies are identical. As such, it is key that clinical trial metadata is pre-processed as part of the ML algorithm to ensure all predictions are tailored to the specific study. This dataset can include various types of trials, including observational, interventional and expanded access; the condition the trial is seeking to address; minimum and maximum ages of eligible patients and inclusion and exclusion criteria.
Claims data, including medical and pharmacy claims, can also be collected and utilized. Medical claims can be available through practice management software vendors and switch clearinghouses. Data includes patient-level diagnosis, medical procedures of in-office treatments from office-based professionals, ambulatory and general healthcare facilities. Pharmacy claims (from retail, mail order, long-term care and more) can include varying points of data, such as pharmacy identification, prescription fill date, patient out-of-pocket amount and much more.
Patient group membership data
In a real-world setting, it is common to obtain a list of potential investigators who are relevant to the clinical trial under consideration. Patient group membership data can help identify investigators who have access to a diverse patient cohort, where diversity is defined in terms of the representation of different patient groups. In the U.S., one can leverage claims and EHR data to tie a physician’s patient panel directly to their ethnicity to obtain the patients’ ethnic composition for each investigator. And, in the absence of access to those data, the investigator’s zip code combined with census data can be an alternative.
Investigator performance data
In this day where clinical research organizations (CROs) and other service partners can provide a deep breadth of data from thousands of clinical trials, it is possible to include investigator performance data. The goal here is to select top investigators for both enrollment rate/performance and diversity. To achieve the goals of fairness and of understanding efficacy variables successfully, it is important for trial sponsors, CROs and other stakeholders to consider focused efforts at the trial level to address access combined with a lifecycle approach to explore potential variability. Both require proactive planning, rich data analysis and a careful eye toward demographic and clinical factors throughout. Outperforming traditional methods to improve diversity in trials, AI is non-moral and non-judgmental. It has no preference or prejudice against a particular group, individual or feature. While systematic discriminatory biases are still present, we are working towards our long-standing commitment to dramatically improve patient diversity in clinical trials and advance health equity.
About Lucas Glass
Lucas Glass is the Vice President of the Analytics Center of Excellence (ACOE) at IQVIA. The ACOE is a team of over 200 data scientists, engineers, and product managers that research, develop, and operationalize machine learning and data science solutions within the R&D space. Lucas has launched more than a dozen machine learning offerings within R&D such as site recommender systems, trial matching solutions, enrollment rate algorithms, drug target interactions, drug repurposing, molecular optimization. Lucas’ machine learning research, which is dedicated to R&D, has been published by AAAI, WWW, NIPS, ICML, JAMIA, KDD, and many others.