While all scientific research produces data, genomic analysis is somewhat unique in that it inherently produces vast quantities of data. Every human genome contains roughly 20,000-25,000 genes, so that even the most routine genomic sequencing or mapping will generate enormous amounts of data. Furthermore, next-generation sequencing techniques are being pioneered to allow researchers to quickly sequence genomes. These advances have resulted in both a dramatic reduction in the time needed to sequence a given genome, while also triggering a substantial reduction in cost. Along with novel methods of sequencing genomes, there have been improvements in storing and sharing genomic data, particularly using computer and internet based databases, giving rise to Big Data in the field of genetics.
While big data has proven useful for genomic research, there is a possibility that the aggregation of so much data could give rise to new ethical concerns. One concern is that promises of privacy made to individual participants might be undermined, if there exists a possibility of subject re-identification.
Re-identification of individual participants, from de-identified data contained in genetic databases, can occur when researchers apply unique algorithms that are able to cross-reference numerous data sets with the available genetic information. This can enable diligent researchers to re-identify specific individuals, even from data sets that are thought to be anonymized. Such re-identification represents a genuine threat to the privacy of the individual, as a researcher could learn about genetic risk factors for diseases, or other sensitive health and personal information, from combing through an individual’s genetic information.
In recent years, groups of researchers have demonstrated that concerns about re-identification from genetic information are far from theoretical. Indeed, several groups of researchers have be demonstrated that re-identification is possible, even with the limited information available in any one particular data set.[1]
As the internet facilitates the aggregation of personal information, the potential of re-identification promises to increase in the coming years. Because of this, re-identification is an issue that must be addressed when conducting genetic research, as normal promises of anonymity might be rendered moot by the threat of re-identification.
The potential of re-identification should change the way that researchers discuss anonymized genetic databases that will become available for large scale research. Participants have to understand that while the information will not be linked to them in a traditional sense, there does exist a potential of re-identification, depending on the availability of other information. Re-identification does not mean that ethical genetic research is doomed, but researchers cannot ignore the risk it does present. Rather researchers should explain that the risk remains extremely small, and that any re-identification is incredibly unlikely to cause any genuine problems for the research participant. Furthermore, by discussing the risk of re-identification directly, research participants can be fully informed, so that they can give meaningful consent.
There are also easy steps that researchers can take to help reduce the risk of re-identification. Researchers can try to better control access to sensitive genetic data, so that only established researchers will have access to the information. Furthermore, researchers should establish, and enforce, sanctions against anyone found to have deliberately attempted to re-identify individuals from research data. Combating re-identification is an important job, and it is encouraging to see that researchers are attempting to generate novel ideas concerning how to reduce any risk of re-identification.[2]
In the meantime, it is crucial that researchers begin grappling with how to talk with participants about re-identification. If presented incorrectly, there is the potential that the small risk of re-identification could seriously dissuade individuals from participating in essential genetic research. This would be a truly unfortunate situation, which could turn the small threat of re-identification into something that could severely damage public trust in the genetic research process.
However, if researchers modify the way they discuss anonymity, privacy, and consent, with participants of genetic research, so that expectations can be managed, then research can proceed ethically and respectfully, even with the potential of re-identification.
[1] see Nature https://dx.doi.org/10.1038/nature.2013.12237; 2013; Schadt et al.
[2] See e.g., https://blog.wellcome.ac.uk/2014/03/25/taking-steps-to-prevent-jigsaw-re-identification-in-genomic-research/;https://www.pnas.org/content/107/17/7898.abstract; https://www.genomicslawreport.com/index.php/2010/04/13/genomic-privacy-and-re-identification-redux/