We created OpenSAFELY in March 2020 to carry out urgent research into the emerging COVID pandemic. As part of creating it we have embedded best practice principles on security, transparency and reproducibility that has made the platform highly productive in generating research outputs, for both our internal team and many external users.

Here Alex Walker - Director of Research at the Bennett Institute - outlines some of the research that has been produced with it, as well as the features that make it such a great platform for research.

OpenSAFELY research has enabled protection, vaccination and treatment in the pandemic

Protection

The first OpenSAFELY study was pre-printed on 7th May 2020 and subsequently in Nature, describing which sub-populations were most likely to die from COVID. The speed and scale of this study provided critical information to SAGE in their continued development of the shielding risk groups. Ethnicity was identified as an important factor in the initial paper, and we subsequently published a study in the Lancet to look at this in much greater detail.

Subsequently, we have been able to look in great depth at other risk groups, such as people living with HIV, people with learning difficulties, and people living with children.

Vaccination

OpenSAFELY has played a key role in several parts of COVID vaccine delivery in the UK, specifically in prioritisation, delivery, and measurement of safety and effectiveness. The early COVID mortality risk stratification described above was a critical part of creating the prioritisation groups, with JCVI requesting that OpenSAFELY present data on several occasions.

Once vaccination had started, we rapidly published a series of updating reports on the rate at which groups in the eligible population were receiving vaccines. This to our knowledge was the first indication that there were large disparities in vaccination rate between demographic groups, with for example 68% of eligible black people vaccinated compared to 96% of eligible white people at the time of our first paper on the subject.

The remaining pieces of the COVID vaccine puzzle are effectiveness and safety. Following early reports of neurological events thought to be associated with COVID vaccination, we carried out a study that found elevated rates of some events after ChAdOx1vaccination, though absolute risks remained low. While effectiveness studies are methodologically challenging in routinely collected data, a broad collaborative OpenSAFELY group has carefully defined methods to describe the comparative effectiveness of initial vaccines, how that effectiveness has changed over time, and effectiveness of booster vaccination.

Treatment

Following the licensing of outpatient treatments for COVID-19, we rapidly linked data to patient records in OpenSAFELY. We rapidly undertook work describing the real world effectiveness of these treatments including in the high-risk population of patients requiring kidney replacement therapy. More recently we expanded this work to include paxlovid. This research, with bespoke estimates of current treatment and hospitalisation rates we were able to provide, was critical in the decision by NICE to recommend ongoing use of sotrovimab for patients with advanced kidney disease where there was an unmet need for treatment. We have partnered with NICE to continue to monitor use and effectiveness of these treatments in the face of future variants of COVID-19. We described which patient groups were being given the treatments, finding large regional variation, with particularly low administration in socioeconomically deprived areas and care homes.

Continued work on critical COVID questions

The type of questions we have addressed has evolved over time, in response to the ever evolving situation. Examples of currently ongoing COVID research include:

  1. Continued COVID vaccine effectiveness and safety. As more follow-up time accrues, and additional booster doses are administered, it is important to continue monitoring their continued effectiveness and safety.
  2. COVID treatments. A number of additional new treatments are emerging. Monitoring the outcomes of patients treated by them is critical for patient safety and in ensuring value for drugs used in the NHS.
  3. Post-COVID follow-up. Through our participation in the CONVALESCENCE collaboration, we are carrying out extensive studies on the outcomes experienced by COVID patients. This includes emerging conditions such as Long COVID, as well as other serious outcomes such as cardiovascular events and mental health outcomes.
  4. Comparison with other COVID data sources. We have active collaborations with The ONS COVID Infection Survey (CIS), Post-hospitalisation COVID-19 study (PHOSP), and International Severe Acute Respiratory and emerging Infection Consortium (ISARIC). Through linkage of these data sources with OpenSAFELY, we will “compare notes” on whether patients identified by each study are recorded as having similar outcomes between data sources, and whether the populations identified are comparable. This helps to describe the validity and generalisability of research from these platforms.

OpenSAFELY is a unique resource for research

As well as having access to huge datasets, OpenSAFELY has a suite of tools that make it easy for researchers to work collaboratively: sharing, checking and reusing analytic code.

Scale

Enabled in part by our enhanced security model, OpenSAFELY contains the full primary care records for 58m patients. This is an unprecedented scale of data. Other UK resources have previously made available some of this population. Internationally, while countries such as Sweden and Denmark are known for their high data quality, their smaller populations put hard limits on their scale. The NHS provides a unique resource, with nearly all healthcare use routinely captured for a large population.

Security

Despite access to such a scale of data, researchers do not come into direct contact with data. Instead they use a series of OpenSAFELY tools to define their population, and write their analytic code using generated “dummy” data. This code is then run on real data, with the researcher only viewing summary outputs from their analyses. This enhanced security model has the potential to lower the barriers to access data.

Public code

All analytic code run on OpenSAFELY is made public, giving a number of advantages:

  • Better scrutiny: anyone can see exactly how an analysis has been done, at every stage of the development cycle.
  • Reproducibility: researchers can easily replicate a previous study to run it in a different population, or ensure that it is robust by running additional analyses
  • Reuse of analytic code: researchers can reuse specific parts of an analysis, such as a disease definition
  • Collaborative working is made easier through use of tools such as git, GitHub, and version control.

Federated analytics

The standard set of tools in the OpenSAFELY platform mean that it’s possible to deploy OpenSAFELY in many environments containing suitable data. It can then be used to run the same analytic code in multiple datasets, giving us a bigger sample, and allowing us to check that similar results are obtained from the different data sources. OpenSAFELY-TPP and OpenSAFELY-EMIS are two deployments of OpenSAFELY using different electronic health record systems. Together they encompass >99% of GP patients in England. By conducting federated analyses using these two datasets we’ve been able to, on a national scale: monitor uptake of COVID vaccines, measure changes in prescribing safety during the pandemic, and measure differences in recording of Long COVID in the population.

Tools for more efficient, robust, reproducible research

As well as specific research studies, a central goal of OpenSAFELY is to develop tools to enable researchers to carry out better research, more quickly. This is made possible through close collaboration between researchers and software engineers. OpenSAFELY has already made substantial progress in the area through tools such as opencodelists.org, and use of collaborative tools such as GitHub. Going forward, we will focus on the following areas:

  • Standardised, reproducible pipelines. Extending work started by our reusable actions library for example:
    • data management, to shape raw data into research ready datasets
    • specific analytic methods, which ingest research ready datasets and produce results tables, figures etc.
  • Data curation, to better characterise the data that exists in the NHS, and describe the challenges and limitations of using it.
  • Libraries of disease definitions, for ease of use and greater standardisation.