Joseph P. Cook, Viatris; Aaron Galaznik, Acorn AI / Medidata Solutions; Joseph S. Imperato; Max Ma, Johnson & Johnson; Jun Su, Astellas; May Yamada-Lifton, SAS; and Kelly H. Zou, Viatris
Real-world data (RWD) can come from a variety of sources, including claims, electronic health records, biobanks, genomics tests, and imaging modalities. Increasingly, it is coming from digital data through electronic health and mobile health modalities. Recent events have pushed innovation in the use of RWD to maximize the value of real-world evidence (RWE) in an era of big data, data science, and artificial intelligence (AI).
Given the COVID-19 pandemic, digital innovation, patient preference assessments, and electronic patient-reported outcomes are likely to see increased use. RWE is also increasingly used to help identify patients for randomized clinical trials (RCTs), optimize RCT design, and help optimize evidence packages to accelerate the approval and reimbursement processes.
A panel was held during the 2020 International Chinese Statistical Association’s Applied Statistics Symposium with the goal of providing insights into the current trends and future outlook for RWE. What follows is a summary of the key topics discussed.
Technology: Health Innovations
With respect to RWD, people most commonly recognize insurance claims or electronic medical records. Often used for health services research, they are administrative byproducts of care delivery repurposed for research. In addition to commonly used RWD such as pharmacy claims and EMR, there are other data sources that can be leveraged, including chart reviews, registries, patient-reported outcomes (PROs), biobanks, genomics tests, and imaging modalities. Specific database needs are driven by what study questions investigators aim to address, with each type of data providing unique value.
The study design and statistical methods considered should be those that best address the study objectives. One should consider if the approach should be descriptive, causal-comparative, quasi-experimental, or experimental, as well as the appropriate cohort, whether a prospective or retrospective observational study. Further, one should ask if case-control, cross-sectional, or case report/series should be used. What is the optimal statistical approach? Is there a role for meta-analytic or predictive modeling methods?
Electronic health records (EHR) data-linkage with claim databases, registries, PROs, and surveys are seen more frequently to address specific research questions. Increasingly, it is possible to link these sources of data to increase the richness of what they can provide. We are beginning to figure out how to use genomics, wearables, consumer data, and even social media as new data sources.
RWE is being widely used to gain an understanding of patient populations and subpopulations, as well as the patient journey and when and how treatments are used and any resulting gaps in care. Using this knowledge, RWE can be leveraged across the lifecycle of drug development, including planning and early development, as well as business and commercial activities, including market access, health technology assessments (HTAs), contracting, or tenders.
RWD can serve a crucial role in helping payors understand the financial impact of new treatments for their specific cost structures and in their populations. It can also help bridge the efficacy-effectiveness divide when seeking to understand how new treatments’ clinical trial results generalize to their populations. HTAs routinely seek to estimate the incremental economic impacts to a health care system or insurer using models employing RCT data. Direct measurement, however, of the achieved cost burden and experienced cost-effectiveness for a given health care system is assessed using data from a real-world environment.
Anyone who has worked with RWD knows much of the work is in preparing the data and deriving meaningful variables. Companies have found success in employing AI to enhance data anomaly detection, standardization, and quality checking at this pre-processing stage. Rigor and transparency around how data is then transformed and in how machine learning (ML) is applied will help increase trust and understanding of where and how to employ ML effectively. Improved data linkage and interoperability will be needed to provide the real-time feedback loops in RWD necessary to unleash the potential of AI for clinical decision-making.
The key to data access and linkage is interoperability. No single health system has all the data, and there is an increased tendency to do federated analysis to deal with privacy issues. The difficult part is that not all analytics are adapted or suited to federated analysis. We need a shared system based on trust. For example, what SAS learned during the support for the opioid crisis in Massachusetts was that laws are sometimes required—Chapter 55 was put into place to interconnect multiple databases into law and that helped inform state policy and a program to manage overdose-based mortality.
Technology is already entering the health care industry robustly. Starting to integrate information from wearables and diagnostic devices has the potential to significantly increase the reliability and comprehensiveness of electronic health records. For example, Apple has its health app and the Apple Watch, and Google is adding Fitbit to its holdings. Moreover, we see new ways of doing business with many stakeholders looking for ways to partner that will help encourage greater value for patients. There are many startups and other organizations working on this, but having groups collaborating on data sharing, quality, transparency, and collaboration can help bring structure and order to these innovations.
Applications: RCTs and RWE
RWE and RCT data complement each other. RCT data is generated within a controlled experiment to tease out the incremental benefit of a therapy in a defined setting. RWD is necessary to understand what happens when therapies are deployed in real life. It permits understanding effectiveness in a broader range of patient types, in a larger group than is achievable in an RCT setting. Thus, it complements the internal validity of RCTs with the external validity of RWD. We think by using the two side-by-side, one can better translate the clinical-trial-to-real-world divide, as well as contextualize the representativeness of trial data populations in a broader real-world context.
While RWD can be incredibly rich and varied, it can also be messy and challenging to tease out a signal from background noise. Mining this data, whether with old-fashioned data mining techniques or incorporating AI tools, can be quite challenging. AI is just a tool of data mining. The latest deep reinforcement learning and graph neural network developments show great potential to mine the data with good depth. Cutting-edge changes occur rapidly in technology spaces. However, devices of many types are being developed to help patients better track their health and, if shared, can provide additional richness.
For now, RWE complements RCTs. The holy grail is to not just extend medication indications and labels, but to also get an approval for new medications using RWE sources, as it has the potential to be cheaper and to better account for real-world practice than conducting an RCT. We can have a better understanding of efficacy and effectiveness of medicines in patients with this approach.
It would be ideal to create a feedback loop where “patients like this get treated this way/that way” for clinical decision support that optimizes the outcomes. For this, we would need to define clear goals such as quality of life or cost-effectiveness.
Synthetic control arms and RCTs with real-world data sources are also intriguing. SAS can support these applications, but to do so requires partnership between many stakeholders. Focusing on fundamentals such as data sharing, quality, transparency, standardized processes, and imputation methods for use of RWE would increase confidence in its usage.
The quality and availability of RWD are improving exponentially, providing more reliable data for analysis and generating RWE. In addition, advances in statistical algorithms continue to improve our ability to leverage RWE for the inferential statistics and hypothesis testing required by regulatory agencies around the world.
Outlook: Aftermath of the Pandemic
The COVID-19 pandemic has accelerated drug development and trial and manufacturing processes at our major customers, and RWD is being captured actively in many countries. Scientific breakthroughs in the future should be faster if we leverage this experience to tackle regulatory processes, incorporate new data sources, and leverage emerging analytics and technology infrastructure. There is a push for greater racial diversity in the patient populations studied and analyzed for COVID-19, but this has not been the case in many historical trials.
We know many people don’t live in areas with access to clinical trials, and social determinants of health are key drivers of treatment success. We must pay attention to and eliminate biases from our data sources and models. Secondary data collected for other purposes can introduce collection bias. For example, using claims data already means you are working with a subset of patients who are working. Heightened awareness of this bias in data is a good thing for the future of RWE.
The pandemic has spurred dramatic changes to market access conditions for patients that will have lasting positive effects. For example, to help boost adherence with better access, patients were increasingly allowed to get 90-day prescriptions, use mail-order and home delivery, experience lower out-of-pocket costs at the register for insulin and COVID-19–related health care, and gain improved access to telemedicine and chronic care medications.
In terms of predictions in the aftermath of the pandemic, the health care markets will increasingly find ways to use the available RWE to shape the way markets work, but not without limits. There will be a blurring of the distinction between retrospective and prospective data gathering, both with respect to different RWD types and in linking RWD to RCT data. Innovation in integrated data collection and comprehensive evidence generation will be used to gain insight into the real world, to inform stakeholders’ understanding, and to improve patients’ lives. The confidence in data quality and increase in data sharing, integration, and transparency will mean greater uptake of RWE. Automating access to RWD to gain RWE will continue to drive our decision-making from intuition to insight.
However, there may be both surprises and disappointments in our future, despite the hope to gain more accurate and reliable results using AI and algorithms. Thus, explainable AI may be important to understand and interpret the results, as well as provide forecasts. We are in early days of using AI for medicine; data standards are still being defined, governance of models could be improved, and users may not understand and trust the results. From this perspective, the failure rate of AI projects can be high. AI requires a combination of disciplines: science, engineering, statistics, math, and biology. Some even say data science is an art due to its exploratory nature and need to convince humans.
Ultimately, AI projects should focus on business value, not AI value. Life is not an AI reality talent show. AI solutions should be no-brainers for end users to incorporate as part of their workflow and not be standalone solutions. Thus, fit-for-purpose data, sound methodologies, and impactful applications go hand-in-hand when dealing with RWD and big data.
Editor’s Note: The views expressed are the authors’ own and do not necessarily represent those of their employers.