Body mass index (BMI) has been identified as a risk factor for clinical outcomes in patients with COVID-19. Studies identifying this risk have used electronic health record (EHR) platforms in which clinical conditions must be properly identified. We set out to define and evaluate various methods of deriving BMI measurements in OpenSAFELY-TPP, an EHR platform that has been used in many studies relating to the COVID-19 pandemic.


With the approval of NHS England, we use routine clinical data from >22 million patients in England to define four derivations of BMI. We compare the number of patients with each type of BMI measurement and the number of measurements themselves. We also examine the plausibility of each derivation by looking at the distribution of measurements and counting values out of the expected range. To evaluate how frequently the BMI derivations are recorded, we track the number of new measurements recorded over time and the average time between updates in patients with multiple measurements.


Primary constraints in creating the optimal BMI derivation is coverage, accuracy, and computational complexity. BMI derivations calculated from height and weight contain a few extreme outliers that affect aggregated statistics. SNOMED-recorded BMI records are more accurate on average and offer better coverage across the population. The canonical OpenSAFELY definition – which uses calculated BMI as a first instance and SNOMED-recorded BMI if missing – offers the best coverage, but contains the same extreme outliers found in calculated BMI and is the most computationally expensive of all methods.


Across all derivations, some cleaning should be performed to drop implausible outliers. Using calculated BMI on its own does not offer the best coverage or accuracy. In choosing between SNOMED-recorded BMI and the current OpenSAFELY implementation, users should decide whether they would like to maximise computational efficiency or coverage.