Introduction to the Data Team | Bennett Institute for Applied Data Science

Hello, from the Bennett Institute’s Data Team!

In this short blog post, we’ll describe our mission, our backgrounds in industry and academia, and what we’re working on now.

What’s our mission?

The Data Team’s mission is to support high-quality epidemiology and service analytics, by making it easy for researchers to work with the Electronic Health Records (EHR) data they need for their research. To do so, we work closely with the Pipeline Team – who, like us, are part of the Bennett Institute’s tech group – as well as the Epidemiology and Service Analytics Teams.

How did we get here?

Before joining the Bennett Institute, we worked in industry and academia; from small startups to large multinationals, as well as universities. We have considerable experience of software development, data science, and data engineering, as well as research in fields from visual neuroscience, antimicrobial resistance, and psychotherapy to physical chemistry and data visualisation. We have contributed code to prominent open source projects, such as the Django web framework; we have helped organise conferences, such as PyCon UK and DjangoCon Europe; and we have presented at conferences, such as NHS-R.

What are we working on now?

We’re working on three main areas, at the moment: ehrQL, documentation, and developing tools for understanding the characteristics of EHR data.

ehrQL

ehrQL – pronounced err-kul – or the Electronic Health Records Query Language makes it easy for researchers to extract the EHR data they need for their research. ehrQL allows a researcher to specify the variables they need, using the concepts they are familiar with, in a programming language they can reason about quickly and accurately. However, there’s more to ehrQL than EHR data! Working in close collaboration with our academic colleagues within and beyond the Bennett Institute, we have introduced the ability to generate dummy data, so a researcher can develop their analysis code without needing to access sensitive patient data; and to connect to several sources of EHR data, enabling truly federated analytics.

Documentation

An important principle of the Bennett Institute is that high-quality, open-access technical documentation improves the productivity of software developers and researchers. We’re constantly improving our technical documentation, from automatically checking for broken links to writing tutorials for ehrQL. You can read it all on docs.opensafely.org.

Developing tools for understanding EHR data

Another important principle of the Bennett Institute is that the tools we’ve developed for understanding EHR data aren’t secondary outputs, hidden in academic papers; they’re primary outputs, openly accessible. As well as ehrQL, we’re working on tools that help researchers understand EHR data – it has many unexpected characteristics! – whilst maintaining high standards of security and privacy. These tools move researchers away from ad hoc queries against sensitive patient data, towards reproducible queries that are subject to the auditing and disclosure controls provided by the OpenSAFELY platform.

We’re always keen to hear from people working with EHR data, writing documentation, and building tools to support researchers. If you’re interested in finding out more, then get in touch!