Quantumrun

IMAGE CREDIT:

iStock

Synthetic health data: A balance between information and privacy

Researchers are using synthetic health data to scale up medical studies while eliminating the risk of data privacy violations.

Author:
Author name
Quantumrun Foresight
June 16, 2023

Insight highlights

Synthetic health data overcomes challenges in accessing quality information while protecting patient confidentiality. It could revolutionize healthcare by boosting research, facilitating tech development, and aiding health system modeling while reducing data misuse risks. However, potential challenges, such as security vulnerabilities, AI bias, and underrepresentation of groups, need addressing with new regulations.

Synthetic health data context

Access to high-quality health and healthcare-related data can be challenging due to cost, privacy regulations, and various legal and intellectual property limitations. To respect patient confidentiality, researchers and developers frequently rely on anonymized data for hypothesis testing, data model validation, algorithm development, and innovative prototyping. However, the threat of re-identifying anonymized data, particularly with rare conditions, is significant and practically impossible to eradicate. Additionally, due to various interoperability challenges, integrating data from diverse sources for developing analysis models, algorithms, and software applications is often complicated. Synthetic data can expedite the process of initiating, refining, or testing pioneering research methods.

Privacy laws in both the United States and Europe safeguard individuals' health details from the access of third parties. Consequently, details like a patient's mental health, prescribed medications, and cholesterol levels are kept private. However, algorithms can construct a set of artificial patients that accurately mirror various sections of the population, thus facilitating a fresh wave of research and development.

At the start of the COVID-19 pandemic, Israel-based Sheba Medical Center leveraged MDClone, a local start-up that generates synthetic data from medical records. This initiative helped produce data from its COVID-19 patients, enabling researchers in Israel to study the virus's progression, which resulted in an algorithm that aided medical professionals to more effectively prioritize ICU patients.

Disruptive impact

Synthetic health data could significantly expedite and enhance medical research. By creating realistic, large-scale datasets without compromising patient privacy, researchers could more efficiently study various health conditions, trends, and outcomes. This feature could lead to faster development of treatments and interventions, more accurate predictive models, and a better understanding of complex diseases. Moreover, the use of synthetic data could aid in tackling health disparities by enabling research on under-studied populations for whom the collection of sufficient real-world data might be difficult or ethically problematic.

Moreover, synthetic health data could transform the development and validation of healthcare technologies. Innovators in digital health, artificial intelligence (AI), and machine learning (ML) stand to benefit significantly from access to rich, varied datasets for training and testing algorithms. With synthetic health data, they can improve their tools' accuracy, fairness, and utility without the legal, ethical, and practical hurdles of handling actual patient data. This feature could accelerate developments in diagnostic AI tools and personalized digital health interventions, and even facilitate the emergence of new, data-driven healthcare paradigms.

Finally, synthetic health data could have important implications for healthcare policy and management. High-quality synthetic data could support more robust health systems modeling, informing the planning and evaluation of healthcare services. It could also enable the exploration of hypothetical scenarios, such as the likely impact of different public health interventions, without the need for expensive, time-consuming, and potentially risky real-world trials.

Implications of synthetic health data

Wider implications of synthetic health data may include:

A lower risk of sensitive patient information being leaked or misused. However, it could lead to new security vulnerabilities if not managed properly.

Better modeling for health conditions and treatment outcomes across different populations leading to improved access to healthcare for underrepresented groups. However, if AI bias is present in this synthetic information, it could also worsen medical discrimination.

Reduced cost of medical research by eliminating the need for expensive and time-consuming patient recruitment and data collection processes.

Governments creating new laws and regulations to protect patient privacy, govern data usage, and ensure equitable access to the benefits of this technology.

More sophisticated AI/ML applications providing a wealth of data without privacy concerns while automating electronic health record processing and management.

Sharing synthetic health data globally improving international cooperation in dealing with health crises, like pandemics, without violating patient privacy. This development can lead to more robust global health systems and quick response mechanisms.