By Alessandro Blasimme and Effy Vayena
Imagine a clinical research protocol to test the efficacy of a nutritional regime on the aging trajectory of the participants. Such a study would need to be highly powered and include thousands of people in order to observe a credible effect size. Participants would remain enrolled in the study for many years, maybe decades. Endpoints would include novel measures of healthy aging such as functioning (the capacity to perform certain activities) and the quality of social life. Participants would thus be asked to provide enormous amounts of personal data covering at the same time their health state, their habits and their social activities – most likely with the help of smart appliances, sensor-equipped wearables, mobile phones and electronic records.
In a different scenario a research team aims to develop clinical protocols for cancer treatment according to the unique genomic signature of their tumor. They will need patients, willing to undergo whole genome germline and tumor sequencing right at the moment of diagnosis and be included in a basket trial. Therapy would then be targeted to the specific genetic alterations of each individual in the hope that a combination of targeted drugs would generate better medical outcomes than the current standard of care.
These two scenarios correspond to the prototypical form of, respectively, precision medicine and precision oncology studies. The first is likely to require large (very large) longitudinal cohorts of extensively characterized individuals – like the All of Us Research Program. The second will require sustained sharing of genomic data, information on patients’ clinical history and response to treatment, and possibly a unique repository in which such information would flow to – something akin the NCI’s Genomic Data Common.
This kind of data-intense research, in particular, introduces game changing features: increased uncertainty about foreseeable data uses, expanded temporal span of research activities due to virtually unlimited data lifecycles, and finally, the relational nature of data. This last feature refers both to the fact that, for instance, zip codes contain other types of sensitive information like information about ethnic background (redundant encoding); and to the fact that data about one person contain information about others– as is the case, for instance, with genetic data among family members.
If this is the kind of medical research we aspire to conduct, however, we must also ask the question of whether the existing model of research oversight is fit for purpose, especially for research protocols in precision medicine, precision oncology and digital health?
We suspect the answer is: not quite so. The current system of research oversight focuses – and rightly so – on the protection of participants as autonomous and physically vulnerable human beings. But the type of research we have sketched above has human participants at its core as much as it has their data. Moreover, new types of data (including big data) and new data analytics populate a continuously evolving data ecosystem. We need a novel regulatory mindset to adapt to this new scenario and to offer up-to-date safeguards to research participants in the years to come.
In a recently published paper in the Journal of Law, Medicine and Ethics we have argued for a novel set of criteria to craft oversight mechanisms that are adapted to this shifting scenario. Our proposed approach – that we label Systemic Oversight – targets new types of risk and emerging forms of vulnerability arising in the health data ecosystem.
Systemic oversight is articulated in six key features that oversight mechanisms for data intense health research should incorporate.
Adaptivity. The ever-expanding boundaries of the digital are making data collection and data processing virtually ubiquitous. This increases uncertainty as to who is doing what with our data and for which purpose. As a consequence, oversight mechanisms should adapt to novel data types being introduced in health research practices – be them GPS data or data from credit card use.
Flexibility. We generally assign different data types to different levels of oversight. Genetic data, for instance, tend to be more tightly regulated. This approach is not without its merits. However, it is increasingly difficult to establish a priori that, for instance, data from publicly available Tweets do not pose relevant risks to data subjects if used for research purposes. Oversight tools need to be flexible enough to treat data on the basis of their actual use, rather than for their source of origin.
Monitoring. Thanks to the increasing adoption of data analytics in health research such as machine learning and big data mining, data are ever more likely to be reused over time even beyond the scope of initial collection. Such expanded temporality of health data – old and new – calls for oversight activities to continue beyond initial study protocol approval. Continued monitoring will aim at detecting emerging vulnerabilities due to data processing, such as potential harms, discrimination and privacy risks.
Responsiveness. Besides monitoring for emergent risks linked to data processing, systemic oversight prescribes that measures are adopted to prevent new types of risks from translating in actual harms for data subjects. Moreover, plans for containing the effects of failures such as malicious privacy breaches should be in place. This resonates with already existing regulatory aims of clinical research ethics, namely, risk minimization and institutional preparedness.
Reflexivity. Data reveal information not just about the person they were generated from, but also about others who relate to this person, or with they share features. To cope with this relational nature of data, oversight practices should pay attention to the way classificatory activities enabled by new data analytics affect the rights and wellbeing of individuals and communities. This requires, for instance, reflexive scrutiny of assumptions and biases embedded in machine learning algorithms.
Inclusiveness. In order to increase the likelihood that potentially harmful or discriminatory biases emerge before they become technically entrenched in new tools and practices, systemic oversight prescribes upstream engagement of relevant stakeholders – including traditionally under-represented ones – to foster public dialogue and mutual learning.
As we alluded to above, systemic oversight does not supplant, but rather integrates existing regulatory practices. Also importantly, its six normative features draw on approaches that are already under discussion in various domains of health research – from biobanking governance to genomics. What systemic oversight intends to achieve is not new forms of legally mandated procedures to replace informed consent and ethics review boards. Rather we seek to provide a normative reference framework for the emergence of new or improved oversight tools that better correspond to the changing reality of health research. Systemic oversight can thus produce normative alignment of regulatory practices around a coherent set of principles. This is expected to reinforce public trust in data intense health research as a responsible and accountable scientific activity.
The principles of systemic oversight are relevant across the board, that is, at any given phase of the research pipeline, from the planning of a study protocol, to the publication of results, up to the sharing of data with other research groups. More work is needed in order to move from systemic oversight principles to accountable oversight mechanisms. In particular funders, academic institutions and scientific societies are called upon to develop guidelines to help researchers align with such principles. New or refined oversight instruments such as assessment tools, standardized procedures and ad hoc bodies will need to be created.
When the Belmont Report was translated into regulatory standards for clinical research, stakeholders had a way to know in advance what to do, and to plan their research activities accordingly. This has greatly benefited research, both in organizational terms and in building trust around the research enterprise. Systemic Oversight needs not be turned into a set of binding norms. But if the six principles described above are translated into stable oversight practices, precision medicine, precision oncology and data-oriented health research can hope for the same payoff both in terms of process management and public support.