Challenges in Estimating COVID-19 Vaccine Effectiveness Using Observational Data | Bennett Institute for Applied Data Science

Introduction

Our new paper from the OpenSAFELY COVID-19 vaccine working group is now published in the Annals of Internal Medicine. We describe some of the biases that exist when estimating COVID-19 vaccine effectiveness in mass vaccination settings using routinely-collected health data, and discuss the use of target trial emulation to avoid or mitigate these biases.

The early randomised controlled trials (RCTs) for the Pfizer and Astra-Zeneca COVID-19 vaccines showed that they were very effective at reducing symptomatic infection, and this quickly led to national vaccination programmes across the world. RCTs like these are vital, typically providing the best available evidence for the performance of a vaccine, but they don’t address other important questions that help inform vaccine policy. For example, they were conducted before the emergence of new SARS-CoV-2 variants, recruitment was too small to assess protection against severe outcomes, and follow-up duration was too short to examine possible waning protection over time.

So how can we answer these questions without conducting an RCT? We can emulate one.

Target Trial Emulation

Target trial emulation is an approach for designing studies that use observational data to estimate the benefit of an intervention. The basic idea is to design a hypothetical randomised trial that would answer the question of interest — for instance, what would be the difference in the number of COVID-19 deaths if everyone aged over 70 was vaccinated compared with if they weren’t? — specifying eligibility criteria, intervention groups and how they are assigned, outcomes and how they are compared, and so on. Then, emulate that trial within the constraints of the observational data that are available.

Considering such a target trial helps to avoid biases in the design of the observational analysis and allows investigators to focus their attention on confounding – systematic differences between people who did and did not receive the intervention that impede fair assessments of the intervention if not properly mitigated.

Let’s think about how target trial emulation might work for COVID-19 vaccine trials.

How were the COVID-19 vaccine trials conducted?

Recruitment was open to anyone meeting the eligibility criteria. After recruitment, each person was randomly assigned to either vaccination or placebo and immediately received their assigned treatment, without knowing which group they were in. Nobody was eligible to re-enter the trial as each person was randomised once and only once. Once recruitment ended and sufficient time had passed, the outcome rates between the vaccinated and unvaccinated groups were compared.

Can we emulate such a trial with observational data from the UK vaccine programme?

Not really. Some obvious differences between the trial and non-trial settings include the lack of a placebo (unvaccinated people did not receive a placebo, they received nothing) and blinding (each person knows if they’re vaccinated or not). This demonstrates that target trial emulation rarely attempts to emulate an “ideal” trial (a common misconception) but, more pragmatically, a trial that can be emulated. That is, a trial that accords reasonably well with the observational data available.

This has a more profound impact on target trial design when considering how to emulate the treatment assignment strategy. In the vaccine trials, people assigned to the vaccination group remained vaccinated (you can’t unvaccinate someone) and people assigned to the unvaccinated group remained unvaccinated (because the vaccine was not available outside of the trial). In the national vaccine programme however, unvaccinated people were continuously eligible for vaccination.

Continuous eligibility creates two key issues for trial emulation. First, it means that unvaccinated people in the emulated control group may become vaccinated at any point during follow-up. This treatment switching would have never occurred in the real trial. Second, for the unvaccinated group there’s no natural analogue of randomisation time, or time-zero, when follow-up begins. At what time should we start follow-up in unvaccinated people?

To deal with this, we have to think about other RCT designs that, though unlikely to be conducted in reality, are more feasibly emulated given the observational data that are available. Our paper considers two such target trial designs — the sequential trial approach and the single trial approach.

Sequential trial approach

The sequential trial approach deals with continuous eligibility by splitting the eligibility period up, day-by-day, and treating each new day as the start of a new trial. On each day, everybody who is vaccinated on that day is assigned to the vaccinated group. Each vaccinated person is then matched to someone who is not vaccinated on that day, and these matches are assigned to the unvaccinated group. This emulates a scenario where a new trial is conducted each day, with a new group of people recruited and randomised into the vaccinated or unvaccinated group.

Matches can be selected randomly, or by matching on important potential confounding characteristics like age and sex. People are not permitted to be matched more than once into the unvaccinated group, but can be included in the vaccinated group if they subsequently become vaccinated. This process continues each day until there are too few people eligible for matching in either group or until the trial period stops.

Time-zero for the unvaccinated group is now easy to define — it’s simply the day on which they were selected as a match.

If a person in the unvaccinated group becomes vaccinated then they are censored (follow-up is stopped), along with the vaccinated person that they were matched with. Censoring of matched pairs in this way prevents follow-up time being systematically longer in the vaccinated group than in the unvaccinated group.

We could analyse each of these trials separately and pool the results, but it’s more efficient to pool the vaccinated and unvaccinated groups respectively and analyse as one big trial. Any remaining differences between the two groups after the matching can be dealt with using adjustment or weighting in the usual way.

Single trial approach

In the single trial approach, the continuous period of eligibility is a feature of the trial. Everybody is recruited at the start of the eligibility period, assigned to the unvaccinated group, and is under follow-up straight away. Each day, anyone who becomes vaccinated is switched from the unvaccinated group to the vaccinated group. This emulates a trial where, each day, people who remain in the unvaccinated group are randomly selected for vaccination.

Since everybody is under follow-up from the start, everybody has the same time-zero — the date on which the trial started.

Confounder adjustment in this approach relies on a model that estimates the probability of each person remaining unvaccinated each day given a set of (possibly time-varying) characteristics, and then weighting person-time by the inverse of this probability each day. These are known as Marginal Structural Models. These go some way further than simply including time-varying confounders in a regression model which does not sufficiently account for the causal feedback that occurs with time-varying treatments (that is, treatment status over time influencing confounders which in turn influences treatment status).

Which approach should you use?

The sequential trial approach envisages a sequence of trials where only a relatively small number of unvaccinated people are recruited on each day, 50% of whom are randomised to vaccination. The single trial approach envisages a trial where the entire trial population is recruited at the start and, on each subsequent day, a small proportion of those who are still unvaccinated are randomised to vaccination.

Each approach has pros and cons. The sequential trial approach is arguably easier to implement as it doesn’t require any modelling of time-dependent confounders, including time itself. However, lots of valid person-time is lost because unvaccinated people are only under follow-up once matched rather than as soon as eligible, and follow-up of each matched-pair is censored if the unvaccinated person becomes vaccinated. Further, matching means many eligible people are excluded because there were no suitable matches. The single trial approach uses all eligible people and all their follow-up time, but depends on a probability-of-vaccination model that may be very difficult to estimate reliably, for example due to very low probability of vaccination with COVID-19 symptoms or a recent positive SARS-CoV-2 test, which can lead to very large inverse probability weights and unstable estimates. This approach also requires explicit modelling of time, typically with splines, which are vulnerable to under- or over-fitting.

Our paper goes into more detail, and presents the results of both approaches in people eligible for vaccination at the start of the UK vaccine programme in December 2020 using data from the OpenSAFELY-TPP database. All code is open and reusable so that other researchers interested in target trial emulation for vaccines (or other similar interventions) can readily adopt these methods in their own work.