Making TREs the default across the NHS: A first step | Bennett Institute for Applied Data Science

Background

In June 2022, the NHS published its new data strategy Data Saves Lives. The strategy is a broad ranging document, covering seven ‘key areas’ including: giving health and care professionals the information they need to provide the best possible care; improving data for adult social care; supporting local and national decision makers with data; and empowering researchers with the data they need. Despite this breadth, there is a common aim throughout, namely building public trust in the use of patient data for research and analysis by:

Keeping data safe and secure.
Being open about how data is used.
Ensuring fair terms from data partnerships.
Giving the public a bigger say in how data is used.
Improving the public’s access to their own data.

In keeping with this aim, the Strategy committed to implementing Trusted Research Environments (TREs) - using the broader term Secure Data Environments (SDEs) - as the default data access mechanism across the NHS. We were particularly pleased to see this commitment as it was a key recommendation in our Review Better, Broader, Safer (The Goldacre Review) and, at the time, wrote that this commitment in the NHS Data Strategy represented an historic moment for the NHS and an unprecedented opportunity to modernise the data management and analysis work done across the NHS data ecosystem.

We were so quick to celebrate this commitment to TREs, not just because TREs help address the significant privacy issues associated with providing access to confidential patient records: this is what TREs or SDEs are best known for. But they also bring huge benefits on productivity and efficiency. They help to eradicate duplication of work, where the same data is stored in multiple locations. They help to address duplication of data curation effort, because everyone is using the same data in the same platforms with, where appropriate, the same tools. They help to promote code sharing, and they can make it the default to use modern open working. Finally, they can help to eradicate data access monopolies. In short, we wanted to celebrate the paradigm shift, in the way the NHS approaches data sharing and data access, represented by the commitment to TREs because we believe that this shift will deliver significant benefits to researchers, innovators, analysts, and - most importantly - patients.

It is for all these reasons, and more, that we are thrilled that the NHS has taken its first steps towards making its commitment to TREs a reality. On 6th September, the NHSE Transformation Directorate, published both a public explainer on SDEs and a set of SDE policy guidelines setting out how the NHSE expects SDEs to be used across the health and care system.

ONS Five Safes and NHS Guidelines

When it was published in June 2022, Data Saves Lives included the following initial set of guidelines for implementing secure data environments (TREs) as the default across the NHS:

Secure data environments will be the standard way to access NHS Health and Social Care data for research and analysis.
Secure data environments providing access to NHS Health and Social Care data must meet, or demonstrate a credible roadmap to meeting, criteria set out within our accreditation framework.
Secure data environments must maintain the highest level of cyber security to prevent any unauthorised access to data.
Secure data environment owners must be transparent about the data within their environment, who is accessing it, and what it is being used for.
The secure data environment may only be accessed by appropriate, verified users. = Secure data environments must ensure that patients and the public are actively involved in the decision making processes to build trust in how their data is used.
Data made available for analysis in a secure data environment will be de-identified in a proportionate manner to protect patient confidentiality.
NHS Health and Social Care data should only be linked with other datasets within an accredited NHS secure data environment.
All accredited NHS secure data environments must adhere to a policy of open-working, support code-sharing and facilitate use of technology that supports this, such as reproducible analytical pipelines (RAP).
Secure data environments must be able to support flexible and high quality analysis for the diverse range of uses they will support
Secure data environments must ensure that nothing is brought in, or removed from, the Environment without assessment and approval.

Now, in the final published version of the guidelines, these initial guidelines have been solidified and mapped to the popular ONS Five Safes Framework as follows:

‘Safe’	Guidelines
Safe Settings	Secure Data Environments will be the default way to access NHS health and social care data for research and analysis Secure Data Environments providing access to NHS health and social care data must meet defined criteria Secure Data Environments must maintain the highest level of cybersecurity to prevent unauthorised access to data Secure Data Environment owners must be transparent about how data is used within their environment
Safe People	The Secure Data Environments may only be accessed by appropriate, verified users Secure Data Environments must make sure that patients and the public are actively involved in the decision making processes to build trust in how their data is used.
Safe Data	Data made available for analysis in a Secure Data Environment must protect patient confidentiality Inputs to a Secure Data Environment must be assessed and approved
Safe Projects	Secure Data Environments must adhere to a policy of open-working and support code-sharing Secure Data Environments must be able to support flexible and high-quality analysis for a diverse range of uses All uses of data within secure data environments must be for the public good
Safe Outputs	Outputs from a Secure Data Environment must be assessed and approved and must not identify individuals

This is a comprehensive set of guidelines that reflects the NHS’s understanding of the importance of ensuring Secure Data Environments ensure:

Appropriate use of data (Safe Projects)
Appropriate users of data (Safe People)
Protection against unauthorised access (Safe Settings)
Protection against disclosure risk (Safe Data)
Protection against disclosive results (Safe Outputs)

The inclusion of Guidelines 6 (including patients and public in decision making) and 11 (ensuring data use is for the public good), in particular, demonstrates an understanding that data protection is as much a social issue as it is a technical issue. In other words, recognising the importance of involving patients and publics in decisions regarding the purpose of research and analysis conducted inside a Secure Data Environment, reflects a growing awareness that public trust cannot be guaranteed through technical design alone.

As we say in the Goldacre Review:

“Health data represents people; each EHR used in each analysis represents an individual person; and each individual data point – a diagnostic code, referral, prescription record or similar – represents a moment in a person’s life that may have had deep meaning for them at that time, or a continued impact on their experience of life. It is, therefore, absolutely essential that these individuals are respected, and that their autonomy is protected when health data is being used for research and analysis……

………..This is why the most useful, successful, and impactful health data research projects are often those that design with, and for, patients and the public from the very beginning; that involve a diverse range of representatives in every decision, from what data to request, to how to interpret results and disseminate findings; that listen to and act on the advice, feedback, and input of these representatives; and that treat their values, beliefs and experiences as crucial to success as well curated data, performant software, well executed code, or a carefully designed statistical model.”

The fact that the majority of the guidelines focus on the design requirements, both expected technical requirements and the outlined governance requirements, rather than on these ‘softer’ social requirements is, however, proportionally appropriate. NHS SDEs will only deliver the innumerable expected benefits if they meet sufficiently robust technical specifications. This is why in the Goldacre Review, we provide a high-level list of TRE guidelines, similarly mapped to the ONS five safes, alongside a detailed set of technical requirements. The recommendations, and how they are aligned to the ONS FIve Safes and NHS Guidelines, are summarised below:

‘Safe’	NHS Guidelines	Goldacre Review Recommendations
Settings	1-4	TREs must provide a secure computing environment i.e.: meet and ideally exceed all relevant standards for a secure data centre containing highly sensitive, disclosive, re-identifiable patient data ensure all installed tools for data management, analysis and visualisation meet security specifications, to the degree that is necessary for the security context in which they are being used ensure that only users and projects with appropriate permissions are able to execute code on the platform TRES must provide a performant computing environment, i.e: support the rapid and scalable provisioning of appropriate resources (processor power, memory, storage, and so on) TREs must earn patient trust, i.e,: publish the governance arrangements (including transparency notice, DPIA and relevant Terms of References of governance groups); including how decisions about access are made, according to which criteria, and who is responsible for making these decisions openly disclose all code and technical methods used to preserve patients’ privacy keep, and ideally publish, detailed informative technical logs of all activity in the platform, attached to users, analyses, and their associated permissions ensure all outputs of all analyses executed in the platform are shared openly, other than for pre-specified and pre-arranged exceptions retain copies of all analysis results for audit
People	5-6	TREs must earn patient trust, i.e,: publish appropriate information about all users, analyses, and their associated permissions, in near-real-time, using metadata from actual usage of the platform TREs must be surrounded by good governance, and this must be supported with relevant technical features, i.e.: check that all users are appropriately qualified and have relevant permissions
Data	7-8	TREs must preserve patients’ privacy, i.e,: ensure all intentional identifiers (name, date of birth, and so on) are removed at source, but recognise that this data is nonetheless re-identifiable and manage it as such obstruct attempts to re-identify patients in data detect attempts to re-identify patients in data obstruct attempts to view disclosive information about single individuals detect attempts to export disclosive information about single individuals support privacy enhancing techniques, such as code development on dummy data, to minimise access to real patient data prevent bulk export of identifiable or re-identifiable data provide tools, personnel, training and workflow for automated and manual checking of all exported outputs to ensure they are safe and non-disclosive regularly re-evaluate and compare all currently performant or realistic mechanisms to achieve the above, and ensure that only the safest are used
Projects	9-11	TREs must support reproducible analytical pipelines (RAP) and modern, efficient, high quality, reproducible data analysis, i.e.,: support an appropriate range of tools for data management, analysis, and visualisation permit the execution of analytic code support, and ideally require, sharing and easy discovery of all code for data management and analysis support, and ideally require, sharing and easy discovery of “good enough” technical documentation alongside users’ shared code, consistent with minimum RAP standards support the use of git, GitLab, GitHub or related tools for code management, version control and best practice in software development support flexible standardisation of re-usable code for common data management and analysis tasks, where this meets users’ needs provide robust open technical documentation of all platform features meet the needs of technically skilled users and those with fewer computational skills, providing relevant security mitigations around the latter TREs must be surrounded by good governance, and this must be supported with relevant technical features, i.e.: check that all projects are appropriate and have relevant permissions check that all data access is limited to the minimum participant count and granularity necessary to achieve the analytic objectives to a high standard ensure that all access arrangements are appropriately time-limited check that lapsed or otherwise incomplete projects have their permissions reviewed and revoked
Outputs	12	TREs must preserve patients’ privacy, i.e,: ensure all outputs are checked for potentially disclosive material by a mixture of appropriately validated automated methods, and manual checking

Similarly, the NHS is in the process of developing its initial guidelines into technical criteria, via an extensive process of stakeholder consultation, review, and engagement. It is expected that the resultant ‘technical criteria’ will be published by the end of the year. Longer-term, the ambition is that these criteria will form the basis of an NHS SDE accreditation scheme.

Conclusion and Next Steps

The publication of the NHS’s first ever set of guidelines for SDEs is a major policy milestone. Whilst the guidelines themselves might seem very high-level and abstract in the absence of the forthcoming technical criteria, this is to be expected. ‘First over the line’ policy documents should always be seen as primarily intention setting documents. They create the environment within which the details will be worked out and they offer those who will likely be affected by the change in policy an opportunity to reflect and prepare for the changes ahead.

The creation of this space between intention setting and finalising policy is always important. It is, however, particularly important when the policy in question represents a particularly significant change or development. This is certainly the case with the introduction of SDEs as the default way of accessing NHS data for research and analysis which represents an entirely new way of working with data for most of the NHS. If this shift were to be instigated overnight, it is likely that key considerations, complexities, and unintended consequences would be missed and this could undermine the success of the overall policy programme. We were, therefore, keen in the Goldacre Review to recommend that the NHS approaches this seismic shift in behaviour ‘impatiently, but incrementally.’ It seems that this is indeed the approach being taken. This is something to be celebrated and encouraged, and we look forward to working with the team and the community on the next increments, to help ensure great delivery for everyone.