Getting the data for the Drug Tariff tool (*sigh*) | Bennett Institute for Applied Data Science

Today we’ve released our new Drug Tariff viewer. It was a dataset that we previously didn’t hold, but thought that it would be useful to turn into a graphical viewer for people to use to see how the reimbursement price on drugs changes over time, and also to use as the basis of the effect that price concessions are having on the NHS.

However, obtaining and normalising the data proved to be a bit of a challenge.

Finding the data

Finding the more recent data was not a problem; the NHS Business Services Authority (NHSBSA) has an archive for the last two years of the Drug Tariff.

To find earlier datasets we turned to the NHSBSA Freedom of Information (FoI) archive. This webpage is pretty frustrating as it doesn’t have a search function. Therefore we had to use Ctrl + F in Chrome to see if anybody has requested it before (and only the first two lines of the request are shown on the page). We eventually found a zip file related to this request, which gave us data from 2010 to March 2014 in tabbed Excel files (one for each year), and a zip file in this request which added in April 2014 to March 2015 in single files for each month.

This just left us the problem of April, May and June 2015. Although we could have requested the data via FoI, it would have likely taken 20 working days. Instead, after a bit of playing around, we discovered that some documents remain on the NHSBSA website - they’re just not listed anywhere that we could find. For June’s data, you can just change the URL for July’s data (https://www.nhsbsa.nhs.uk/sites/default/files/2017-06/drug-tariff-part-viiia-july-2015.xlsx) to https://www.nhsbsa.nhs.uk/sites/default/files/2017-06/drug-tariff-part-viiia-june-2015.xlsx and it will download.

Unfortunately that didn’t work for April and May 2015 data. We searched Wayback Machine, but still couldn’t find the xls files. Then we made a further discovery: if you then search Google using “may 2015 drug tariff site:nhsbsa.nhs.uk filetype:xlsx” the file appears (https://www.nhsbsa.nhs.uk/sites/default/files/2017-02/Part_VIIIA_May_2015.xlsx). The April data doesn’t show up in a Google search, BUT if you replace May for April in the URL it magically downloads.

This all feels a bit unnecessary - these are datasets which remain useful, are FOI-able, and aren’t very big (≃150kb/month), so it’s not clear why they are not made more easily available without needing to use Google tricks and URL editing.

Normalising the data

Once we had all the data downloads, we had to create a single usable table. And unfortunately, once again, it took a while to fix.

Firstly, the file names weren’t consistent, which made them more difficult to work with. We also had 49 downloaded files, some (from the earlier FOI) had whole years in one file, whereas later datasets were in single month files. Using an Excel add-in we grouped all the files into multiple tabbed workbooks, one for each year.

None of the worksheets had a date as a column, so we created a blank column and populated it with worksheet tab name, which was the month the data was for (changed where necessary as these were also inconsistent) using ’=MID(CELL(“filename”,A1),FIND("]",CELL(“filename”,A1))+1,255)’ . We then consolidated all the data into a single table, using another Excel add-in.

The main problem with the data was that the older datasets did not have a Dictionary of Medicines and Devices (dm+d) Virtual Medicinal Product Pack (VMPP) code. The VMPP code is specific to a drug, form, strength, and pack size, so aspirin dispersible tablets 75mg will have different VMPPs for 28 tablets and 100 tablets pack sizes. We therefore had to map these older datasets to a VMPP code manually.

All the datasets contained drug name, pack size and form, which we could use to map to VMPP. However, all were as per dm+d standards at the time they were created, which may no longer be current. In order to map the data we undertook the following:

By creating a concatenation of the drug name, pack size, and form, and creating a VLOOKUP to the latest month’s data, we were able to find most VMPP codes for older months. However, there were several that didn’t map.
Some of the unmapped drugs were because of name changes. For example “Ferrous sulphate 200mg tablets” in the older data sets had become “Ferrous sulfate 200mg tablets” - as the dm+d had changed in 2013. Once we’d normalised the names (using a simple search/replace in Excel) more drugs were matched.
We were left with a number of drugs that then required matching to the dm+d latest release. This was a manual check, and needed several amendments. Reasons included: changes to the order in which drug names are listed in combination products, changes in strength descriptors (e.g. percentage becoming mg/ml), and changes in formulation nomenclature.

Eventually (after some further tweaking) we mapped all of the Drug Tariff data to a VMPP code, allowing us to map the drugs over time.

In comparison, getting price concession data wasn’t complicated. Although it’s only available as a webpage, we were able to create a tool to scrape the data and map to dm+d codes, as the naming convention was the same. There are some differences that the tool will automatically translate (such as 0.5mg to 500 microgram), but where it still cannot find an exact match, it will provide us with a list of most likely drugs to choose from.

Undertaking analyses

We then created the visualisation of Drug Tariff and Price Concession data, like this:

But there was one final issue where readily available open-source data would make life easier. Currently the prescribing dataset from the NHSBSA which powers our main OpenPrescribing site doesn’t use VMPP (or any dm+d) codes. Instead it uses hierarchical BNF codes, based on the legacy BNF. Therefore without a map between the two we can’t undertake any analysis on our Drug Tariff or Price Concession datasets in relation to prescribing levels. There is a map on the NHS Digital TRUD site (login required), but this only goes down to chemical substance level, and therefore is not of use.

We currently manually request detailed mapping information (individual presentation BNF codes to VMPP) from the NHSBSA every month, and receive it via email. We have been told consistently that this table will be available online. Unfortunately it doesn’t appear to have been released yet.

What Can Government and the NHS do Better?

We think data is vitally important. It’s the meat and drink of the NHS. Teams like ours should be springing up throughout universities and NHS facilities, making useful open tools from sensibly open data, to improve efficiency and patient care.

We think government can do four things

Make it easy for data users to contact you, and give feedback on what data is needed, and how it could most effectively be shared.
Listen to users, especially if they are using your data to produce something useful, or have good plans to.
If a regularly updating dataset is being requested and used regularly, then consider routinely publishing it online.
Don’t delete old datasets from the internet, or change their web address.

This is just a brief overview. We have a paper coming on barriers we have met in producing the OpenPrescribing tools in general, which sets out various models for how the NHS could share data more effectively to help improve quality and reduce costs.

Despite the frustrations, the data has now been put into the tool, which we hope users find useful. As well as this, the Drug Tariff viewer page also gives you the option to download the entire normalised dataset - we’ve done the hard work so that you don’t have to!