Clinical Research

Predictive Persons: Privacy Law and Digital Twins

Imagine a clinical trial where the control group never sets foot in a clinic – because they don’t need to, and they don’t have feet. These “participants” are digital twins: computational models of real patients, built from health data to forecast disease trajectories and treatment response. 

Imagine a clinical trial where the control group never sets foot in a clinic — because they don’t need to, and they don’t have feet. These “participants” are digital twins: computational models of real patients, built from health data to forecast disease trajectories and treatment response. 

Twin Origins

The conceptual underpinnings of a digital twin date back to David Gelernter’s 1992 book Mirror Worlds, where the American computer scientist envisioned digital models that directly influence real-world structures and systems. The use of the technology, however, was born earlier and first deployed in NASA’s Apollo missions in the 1960s. After the oxygen tank explosion on Apollo 13, engineers relied on ground-based spacecraft simulators tied to real-time telemetry from the craft in orbit, enabling them to troubleshoot and guide astronauts safely home. These early “twins” demonstrated how virtual replicas could simulate real conditions, anticipate failures, and support human decision-making. Today, NASA continues to employ digital twins for developing next-generation vehicles and aircraft. And the technology is migrating into medicine. 

Twin Turning Point

Digital twins for health integrate a wide range of attributes, including genetic data, lifestyle factors, and even physical characteristics, all fed into models that continuously update as new information streams in, to generate biologically realistic data. Their power lies in real-time monitoring and bidirectional connection with the physical individual, allowing simulations to evolve and remain a replica of their physical counterpart. 

For costly drug discovery and development, twins can simulate trial arms, optimize dosing, and anticipate toxicity, folding real-world evidence into model-informed development that could lead to quicker trial timelines. This development reduces reliance on placebo armsaccelerates recruitment, and sharpens safety and efficacy signals — considerations of salience for early-phase trials

But as health information is replicated into digital twin models, creating a “proxy” of the individual that continues to exist in external systems causes susceptibility to re-identificationaggregation, or use (and misuse) in ways the individual never authorized.

The Predictive Patient

Consider a hypothetical patient, Mia, a 33-year-old living with lupus who joins a clinical study for a new biologic therapy. She signs an informed consent form allowing her electronic health record data to be used in developing a patient-specific digital twin that integrates lab results, imaging, and genetic sequencing to simulate treatment responses.

At the outset, Mia’s information is protected by the Health Insurance Portability and Accountability Act (HIPAA). But HIPAA is a privacy framework, not a property statute, and only governs disclosures by covered entities, such as providers and health plans. The digital twin model itself (the code, feature engineering, and trained parameters) resides under intellectual property law. Once Mia’s data are de-identified and transferred to the sponsor’s contracted AI vendor (a tech firm outside HIPAA’s reach), HIPAA protections effectively vanish. Despite de-identification, the data retain enough unique elements, like genetic variants, to allow potential re-identification. U.S. law grants Mia no property interest in her health data and no mechanism to withdraw consent once it has been shared, leaving her reliant on the company’s voluntary data-use policies rather than enforceable rights.

The vendor then enriches her record with non-clinical data from wearables to enhance predictive accuracy. Such health-adjacent information falls entirely outside HIPAA, subject instead to the Federal Trade Commission’s (FTC) limited oversight under the Health Breach Notification Rule, which addresses breaches but not everyday aggregation or resale. By this stage, Mia’s “twin” exists in multiple systems, yet she has no visibility into where her data are stored or how they are used.

As the model matures, its architecture and parameters become proprietary trade secrets. If the sponsor submits the twin’s output to the Food and Drug Administration as evidence in a new drug application (NDA), intellectual-property protections may preclude public disclosure of how the model was validated. At this point, innovation secrecy collides with regulatory transparency, and Mia’s data drive a simulation whose inner workings remain opaque to both patient and clinician.

To complicate things further, the sponsor’s European partner accesses the data. Under the European Union’s General Data Protection Regulation (GDPR), such transfers require an explicit legal basis for processing, such as explicit consent, scientific research in the public interest, or a determination of a legitimate interest balanced against individual rights. If these bases cannot be met, the sponsor may argue that “de-identification” exempts it from GDPR oversight. But European data protection bodies increasingly reject that claim, emphasizing that genetic and biometric data are inherently identifying, thus creating re-identification risks. “De-identification” is not the same as “non-personal” data under E.U. law. 

Months later, the AI firm licenses its twin-based algorithms to insurers, in order to predict hospitalization or medication adherence. Even without direct identifiers, model-derived inferences, such as flare frequency, feed into risk scores that inform decisions. While the Affordable Care Act (ACA) and the Genetic Information Nondiscrimination Act (GINA) bar health-insurance discrimination based on preexisting conditions or genetic information (which might be included in algorithms or risk scores), those protections do not extend to life, disability, or long-term-care insurance markets, where such predictive analytics could lawfully impact premiums or eligibility.

When Mia discovers her data’s secondary use, she requests withdrawal of consent. The sponsor responds that deletion is infeasible because her data have been anonymized and incorporated into trained models, which is permissible under U.S. law. However, in the E.U., the GDPR confers rights that U.S. law does not. Mia could theoretically exercise rights of access, rectification, restriction, erasure (the “right to be forgotten”), and objection to certain uses, including automated decision-making. However, once her data have been embedded in model parameters or derived insights, enforcing those rights becomes technically and legally complex. The contrast reflects two philosophies: U.S. privacy law emphasizes limited consent with stagnation, while E.U. law treats data rights as continuous and revocable.

Mia’s hypothetical experience exposes the governance fractures for digital twins, including HIPAA’s narrow scope, the FTC’s limited enforcement, tensions between intellectual-property secrecy and public accountability, and the absence of harmonized standards for cross-border data use. Ensuring that consent is meaningful once a twin begins to “live” beyond its human counterpart remains an unresolved ethical challenge. Addressing these gaps may require expanding HIPAA to cover downstream processors, establishing legally enforceable withdrawal and explainability rights. 

About the author

  • Julia Etkin

    Julia is a Dean’s Scholar at Harvard Medical School, pursuing a Master of Science in the Center for Bioethics. Her research interests include biopsychosocial pharmacovigilance, health policy of novel psychoactive substances (NPS), trauma studies, and FDA regulation. She has published on topics at the intersection of ethics and equity, including work on FDA advisory committee reform with an emphasis on increasing public trust.