Better, Broader, Safer Uses of Data Help Data Saves Lives | Bennett Institute for Applied Data Science

Today the NHS has published the final version of it’s “Data Saves Lives” strategy, setting out its key commitments in seven key areas: improving trust in the health and care system’s use of data; giving health and care professionals the information they need to provide the best possible care; improving data for adult social care; supporting local and national decision makers with data; empowering researchers with the data they need; working with partners to develop innovations that improve health and care; and developing the right technical infrastructure.

It is an excellent document that has progressed hugely since the draft was published last year, showing true commitment to listening to public and professional feedback, and - crucially from our perspective - the findings and recommendations of the Goldacre Review Better, Broader, Safer: Using Health Data for Research and Analysis. In particular it shows a rare willingness to move beyond aphorisms and get into technical detail, not shying away from complex questions about data architecture. In doing so, it finally sets out - not only a vision - but a clearly achievable and readily deliverable roadmap for unlocking the phenomenal potential of NHS data in a way that will not only deliver faster, more reliable, more secure, and more efficient use of data for researchers and analysts, but do so in a way that provably protects patients’ privacy.

Here, Jess our director of policy and lead researcher on the Goldacre Review, summarises the response to the Review in the following three areas:

Privacy and Security (Trusted Research Environments)
IG, Engagement, and Ethics
Open Working

Privacy and Security (Trusted Research Environments)

In the Goldacre Review, we spend considerable time setting out the limitations of the current mechanism of data sharing in the NHS, namely pseudonymising data and disseminating it to multiple (potentially unknown) endpoints. We stress that this does little to protect the privacy of patients as GP data is incredibly disclosive and people are, therefore, readily re-identifiable even if basic demographic information has been removed. We explain that recognition of this limitation has led to a false belief that the NHS can either provide broad access to its data for research and analytics purposes, or it can preserve patients’ privacy. We counter this argument, explaining how well-designed and performant Trusted Research Environments (TREs) can provide both best-in-class privacy preservation and broad access to NHS data. Following this discussion, we make six summary recommendations:

Build trust by taking concrete action on privacy and transparency: trust cannot be earned through communications and public engagement alone.
Ensure all NHS data policies actively acknowledge the shortcomings of ‘pseudonymisation’ and ‘trust’ as techniques to manage patient privacy: these outdated techniques cannot scale to support more users (academics, NHS analysts, and innovators) using ever more comprehensive patient data to save lives.
Build a small number of secure analytics platforms – shared ‘Trusted Research Environments’ – then make these the norm for all analysis of NHS patient records data by academics, NHS analysts and innovators, wherever there is any privacy risk to patients, unless those patients have consented to their data flowing elsewhere. Every new TRE brings a risk of duplicated effort, duplicated information governance, duplicated privacy risks, monopolies on access or task, and obstructive divergence around data curation and similar activity: there should be as few TREs as possible, with a strong culture of openness and re-use around all code and platforms.
Use the enhanced privacy protections of TREs to create new, faster access rules and processes for safe users of NHS data; ensure all TREs publish logs of all activity, to build public trust.
Map all current bulk flows of pseudonymised NHS GP data, and then shut these down, wherever possible, as soon as TREs for GP data meet all reasonable user needs. 6.Use TREs – where all analysts work in a standard environment – as a strategic opportunity to drive modern, efficient, open, collaborative approaches to data science.

Chapters 1 and 5 of the Data Saves Lives Strategy respond to these recommendations, first acknowledging that the original plan for GPDPR as launched in summer 2021 was a mistake, and has left public trust damaged and in need of repair, and then committing to rebuilding trust by:

keeping data safe and secure
being open about how data is used
ensuring fair terms from data partnerships
giving the public a bigger say in how data is used
improving the public’s access to their own data.

Specifically, the strategy commits to implementing secure data environments (TREs) as the default across the NHS, setting out the following guidelines which will soon be solidified through the development of a technical specification and accreditation framework for secure data environments:

Secure data environments will be the standard way to access NHS Health and Social Care data for research and analysis.
Secure data environments providing access to NHS Health and Social Care data must meet, or demonstrate a credible roadmap to meeting, criteria set out within our accreditation framework.
Secure data environments must maintain the highest level of cyber security to prevent any unauthorised access to data.
Secure data environment owners must be transparent about the data within their environment, who is accessing it, and what it is being used for.
The secure data environment may only be accessed by appropriate, verified users.
Secure data environments must ensure that patients and the public are actively involved in the decision making processes to build trust in how their data is used.
Data made available for analysis in a secure data environment will be de-identified in a proportionate manner to protect patient confidentiality.
NHS Health and Social Care data should only be linked with other datasets within an accredited NHS secure data environment.
All accredited NHS secure data environments must adhere to a policy of open-working, support code-sharing and facilitate use of technology that supports this, such as reproducible analytical pipelines (RAP).
Secure data environments must be able to support flexible and high quality analysis for the diverse range of uses they will support
Secure data environments must ensure that nothing is brought in, or removed from, the Environment without assessment and approval.

This is a historic moment for the NHS, and represents an unprecedented opportunity to modernise the data management and analysis work done across the NHS data ecosystem. Moving to working with NHS data in shared TREs addresses the privacy issues, as is well recognised; but they are also bring huge benefits on productivity and efficiency. They help to eradicate duplication of work, where the same data is stored in multiple locations. They help to address duplication of data curation effort, because everyone is using the same data in the same platforms with, where appropriate, the same tools. Because of this, they help to promote code sharing, and they can make it the default to use modern open working (consistent with Reproducible Analytic Pipelines, a GDS and ONS brand for best practice in this space). In addition to all of this, TREs can help to eradicate data access monopolies. In short, the benefits of this paradigm shift in the way the NHS approaches data sharing and data access will deliver significant benefits to researchers, innovators, analysts, and - most importantly - patients.

Governance

In the Goldacre Review, we also explore how the NHS currently tries to enhance the privacy protection afforded to patients when pseudonymised records are disseminated, by relying on contracts and trust. We explain how, essentially, each user requesting a substantial download of potentially re-identifiable patient data is evaluated to determine whether they and their host organisation are able to manage the data, trustworthy, and able to commit to not misuse the data. We note that these evaluation processes often include multiple organisations, multiple committees, and long delays meaning that good, potentially beneficial, research is often blocked. And yet, as we discuss, these contractual mechanisms do little to protect against misuse as, when large volumes of data are transferred, it moves out of the direct control and oversight of the NHS, and it becomes harder to confidently & track what is done with it. Finally, we note that even if these mechanisms were working as intended, there is no guarantee that they would be considered trustworthy by patients and publics as the only way to truly achieve earned trustworthiness is through meaningful patient and public involvement and engagement from the very beginning of all research projects. With these, and other, considerations in mind, we make the following recommendations:

Rationalise approvals: create one map of all approval processes; require all relevant organisations to amend it until all agree it is accurate; de-duplicate work by creating a single common application form (or standard components) for all ethics, information governance, and other access permissions; coordinate shared meetings when approval requires multiple organisations; have researchers available to address misunderstandings of their project; build institutions to help users who are blocked; recognise and address the risk of data controllers asserting access monopolies to obstruct competitors; publish data on delays annually; ensure high quality patient and public involvement and engagement (PPIE) is done.
Have a frank public conversation about commercial use of NHS data for innovation, but only after privacy issues have been addressed through adoption of TREs; ensure the NHS gets appropriate financial return where marketable innovations are driven by NHS data, which has been collected at great cost over many decades; avoid exclusive commercial arrangements.
Develop clear rules around the use of NHS patient records in performance management of NHS organisations, aiming to: ensure reasonable use in improving services; avoid distracting NHS organisations with unhelpful performance measures.
Address the problem of 160 trusts and 6,500 GPs all acting as separate data controllers. Do this either through one national organisation acting as Data Controller for a copy of all NHS patients’ records in a TRE, or an ‘approvals pool’ where trusts and GPs can nominate a single entity to review and approve requests on their behalf.
Review the National Data Opt Out Policy after TREs are established
Revise the definitions of ‘anonymous’, ‘identifiable’ and ‘linked’ data; add a new category of ‘pseudonymised but re-identifiable’
Provide researchers with easy access to practical guidance, and examples of best-practice PPIE

The Data Saves Lives Strategy responds to these recommendations in various places throughout the text, committing to simplifying the Information Governance Framework; creating fit-for-purpose rules around different types of data - including pseudonymised data; simplifying the national data opt-out; standardising approaches to PPIE; and tackling the ‘bigger’ issues through participatory engagement such as citizens juries. Specifically, the data strategy makes the following commitments:

Embedding the Information Governance Portal as the one-stop shop for help and guidance
Creating fit-for-purpose rules around different types of data (such as pseudonymised), so that staff can clearly understand them, addressing concerns around pseudonymised data as raised by the Goldacre Review
Developing a national information governance transformation plan, focusing on practical data-sharing situations, creating professional standards and addressing training for frontline staff
Co-designing a transparency statement, as part of a regularly updated online hub, setting out how publicly-held health and care data is used across the sector
Developing a standard for public engagement, setting out best practice for health and care organisations, and any other body using NHS data, to engage appropriately with the public and staff across the system on data programmes and issues
Undertaking in-depth engagement with the public and professionals, through forums such as focus groups with seldom heard groups, and large-scale public engagement on topics and questions that are high priority or particularly complex, including how we deliver secure data environments and the future of the national data opt-out, and working closely with regions to understand local needs.
Working with the public, the expert advisory group, the National Data Guardian and other stakeholders to ensure that there is a simple opt-out system in place that provides clarity and choice, giving patients confidence and ensuring data continues to support the functioning of the health and care system

Again, these commitments mark a step-wise change in attitudes towards patient and public trust in the way in which the NHS handles the use of data for research and analytics purposes. They demonstrate a recognition that contracts and communication offer insufficient reassurance that data is being used in a socially acceptable way, and that layers of obfuscating governance rules are weakening rather than strengthening protection of trust. Of course, reworking information governance should only be done once TREs are in place, and it checks related to the purpose of research will always be necessary regardless of the additional security protection offered by TREs, and so it is also reassuring to see the timelines of the commitments recognise this fact.

Open Working

Finally, both the Goldacre Review and the Data Saves Lives Strategy tackle open working, specifically, the benefits of conducting health data research and analysis using modern, open, computational data science approaches. In the review we stress that because medicine, the NHS, and data science are all complex, technical areas, it is not reasonable to expect one person, or even one team, to complete all the tasks associated with one single analytical output on their own, in isolation, behind closed doors, without writing code. Instead, we explain, data preparation, analysis, visualisation, and other tasks, are completed by huge arcing chains of mutual interdependency, writing complex code across multiple teams and organisations. In short, we explain that, the key to delivering high quality analytics that are reproducible, re-usable, auditable, efficient, high quality, and more likely to be free from error, is for the NHS to adopt open, collaborative & software-driven approaches to data science and analysis - as has become the norm in other academic disciplines and in other areas of Government. We recognise that the barriers to open working are currently numerous including: lack of skills and knowledge; anxiety; lack of obligation; lack of resource; obstructive TRE design; concern about legal liabilities; lack of credit or reward; & culture, and make the following recommendations to help overcome these barriers:

Write an ‘open analytics policy for the NHS: Bring together DHSC and the NHS Transformation Directorate to write a policy that makes it clear to all analyst teams across the NHS, and all general managers, that sharing code is not the same as sharing data and that open is the preferred and default method for all analysis conducted using public data and public funding.
Promote and resource ‘Reproducible Analytical Pathways’ (RAP, a set of best practices and training created in ONS) as the minimum standard for academic and NHS data analysis: this will produce high quality, shared, reviewable, re-usable, well-documented code for data curation and analysis; minimise inefficient duplication; avoid unverifiable ‘black box’ analyses; and make each new analysis faster.
Ensure all code for data curation and analysis paid for by the state through academic funders and NHS procurement is shared openly, with appropriate technical documentation, to all data users. Data preparation, analysis and visualisation is complex technical work, requiring collaboration by many individuals, who may never meet, in a range of organisations, across the NHS and other sectors. The only way to manage this shared complexity is by sharing information, as in other technical fields.
Recognise software development as a central feature of all good work with data. UKRI/NIHR should provide open, competitive, high status, standalone funding for software projects and developers working on health data. Universities should embrace research software engineering (RSE) as an intellectually and academically creative collaborative discipline, especially in health, with realistic salaries and recognition.
Bridge the gap between health research and software development: train academic researchers and NHS analysts in contemporary computational data science techniques, using RAP where appropriate; offer ‘onboarding’ training for software developers and data scientists who are entering health services research and epidemiology; use in-person and online training; make online resources openly available where possible.
Note that ‘open code’ is different to ‘open data’: it is reasonable for the NHS and government to do some analyses discreetly without sharing all results in real time.

The data saves lives strategy responds to these recommendations by making the glorious opening statement: “Public services are built with public money, and so the code they are based on should be made available across the health and care system, and those working with it, to reuse and build on.” It then goes on to make the following commitments:

Developing an Open Analytics policy
Beginning to make new source code that we produce or commission open and reusable by default (with clear exceptions).
Publishing code under appropriate licences to encourage further innovation (such as MIT and OGLv3, alongside suitable open datasets or dummy data). Subject to consultation, the relevant policies will also aim to be open and reusable.
Consulting with UK Research and Innovation (UKRI) and the National Institute for Health Research (NIHR) to consider how outputs from research they fund, involving health and care data, can follow open and reusable code principles
Ensuring all accredited NHS secure data environments adhere to a policy of open-working, support code-sharing and facilitate use of technology that supports this, such as reproducible analytical pipelines (RAP).
Building the profile of data and analysis as a profession, including consistent and appropriate competency frameworks, networks, training, career opportunities and status.
Developing an online Analytics Hub, working with AnalystX, to share, promote and endorse training, events and other resources aimed at analysts and non-analysts across all career levels.

In many ways this is the hardest set of recommendations from the Goldacre Review to turn into concrete actions as ‘adopt open working as the default’ is as much about culture change as it is about changes in policy, and all familiar with the NHS will know that achieving widespread and sustainable culture change is a significant challenge. However, simply setting out the intention to make open the norm, committing to setting clear expectations, and stating in a national strategy that publicly funded research should translate into publicly available code is a huge step forward. The benefits of open working have been long overdue in the health space and this level of support from central government will undoubtedly make a significant impact on attitudes throughout the health and care system.

Last words

Overall, the Data Saves Lives strategy is a momentous achievement from the NHS and from the Department of Health and Social Care. It takes to heart the benefits of adopting a better, broader, and safer way of using NHS data for research and analysis and turns this into clearly articulated actions with ambitious timelines and clear commitments.

Of course, there are gaps, in the Goldacre Review we also made recommendations regarding the role of universities and academic funding, the importance of data curation, and the need to put in place a comprehensive multi-layered strategy to modernising the NHS analytics workforce. The Data Saves Lives strategy does not specifically address this, there has been so much else for the team to focus on, but they make very welcome and clear commitments to engage with the other parts of the system to help take these recommendations and new ways of working forward. Similarly, commitments on paper do not always translate into actions in the real-world and, as always with Government strategies, the ‘proof will be in the pudding.’ However, this remains the clearest articulation of how the Government intends to ensure data saves lives in a secure, efficient, and modern way, and this in itself should be celebrated; it is a significant achievement.

As we said in the Review this is a generational opportunity and we look forward to helping the NHS capitalise on this opportunity in any way we can, to help save lives through better, broader, faster, safer use of NHS data.