
This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. You can call up all of the symposium contributions here. Please note that Bill of Health continues to have problems receiving some comments. If you post a comment to any symposium piece and do not see it within half an hour or so, please email your comment to me at mmeyer @ law.harvard.edu and I will post it. —MM
Scientists should share. Methods, samples, and data — sharing these is a foundational aspect of the scientific method. Sharing enables researchers to replicate, validate, and build upon the work of colleagues. As Isaac Newton famously wrote: “If I have seen further it is by standing on the shoulders of giants.”
When scientists study humans, however, this impulse to share runs into another motivating force — respect for individual privacy. Clinical research has traditionally been conducted using de-identified data, and participants have been assured privacy. As digital information and computational methods have increased the ability to re-identify participants, researchers have become correspondingly more restrictive with sharing. Solutions are proposed in an attempt to maximize research value while protecting privacy, but these can fail — and, as Gymrek et al. have recently confirmed, biological materials themselves contain highly identifying information through their genetic material alone.
When George Church proposed the Personal Genome Project in 2005, he recognized this inherent tension between privacy and data sharing. He proposed an extreme solution: cutting the Gordian knot by removing assurances of privacy:
If the study subjects are consented with the promise of permanent confidentiality of their records, then the exposure of their data could result in psychological trauma to the participants and loss of public trust in the project. On the other hand, if subjects are recruited and consented based on expectation of full public data release, then the above risks to the subjects and the project can be avoided.
—Church, GM “The Personal Genome Project” Molecular Systems Biology (2005)
Thus, the first ten PGP participants — the PGP-10 — identified themselves publicly.
After this the PGP protocol was relaxed to no longer request public sharing of names — but the high likelihood of re-identification was consistently communicated. This possibility is discussed in the consent form. Participants are given a study guide and must pass an online exam demonstrating their knowledge of potential risks. Participants are then invited to share as much (or as little) as they wish on their public profile (e.g. health history, ZIP code, and direct-to-consumer genetic testing data). Genome sequencing is performed for some participants and shared publicly on the same profile pages. As our database has grown it should come as no surprise to find that many participants within it are, in fact, identifiable.
That the re-identifiability of PGP participants seemed surprising to anyone causes us to think about further improving communication about the nature of the project. It’s a teachable moment — as in the past, we take the media coverage as an opportunity to clarify the non-anonymizable nature of genetic material. To help reduce misunderstandings and clarify the non-anonymous nature of participant profiles, in coming months we’ll work on adding the ability for participants to add their name and profile photo to their public PGP profile.
I’d also like to invite suggestions, here and elsewhere, for data use policies the PGP could request of those using participant data. Our project is committed to sharing data without restriction — but we could also publish some recommended behavior for working with this data.
Some participants may be unhappy having their identity connected to sensitive information on their profile. Some groups have suggested modifying data to minimize details (while maintaining scientific utility — e.g. removing the last two digits of the ZIP code). I would emphasize a different response. If participants are concerned about being connected to sensitive data, they should remove that data — not obfuscate the profile in the hopes of maintaining anonymity. The Erlich lab has demonstrated that Y-chromosomes can be matched to surnames, and we’re sure many other genetic data identification methods will be found in the future. One could imagine researchers in the future making facial predictions from genetic data. We expect all participants will eventually be identifiable through genome data alone.
In other words, my advice to participants is to treat your PGP profile as if your name were already on it. Sharing sensitive personal information can greatly benefit society, but don’t share because you think it’s anonymous. If you’re worried about people learning that you’ve used cocaine or had an abortion, remove those details — not your ZIP code.