When you produce online tools from data, you often get useful feedback that helps you improve the outputs. (Send us feedback any time!). Additionally, when you use data, you learn about interesting glitches in it, some of which can be entirely undocumented. Here we share one example of helpful feedback, and how we used it to improve our tool.

First some background. Trial reporting is a huge problem in medicine: the results of clinical trials are routinely withheld from doctors, researchers, and patients. We think all trials should be reported. The WHO agrees. A US law called the FDA Amendments Act requires some trials to report their results on ClinicalTrials.gov: this law has many loopholes, but it’s an important start. Since the results reporting requirements of FDAAA came into full force, our FDAAA TrialsTracker has been identifying the individual trials that have breached it. You can read our paper on how the tool works: we also blogged about our methods for identifying overdue trials. Staff in universities who manage trial reporting are already telling us that they find our tool useful.

A trial sponsor reached out to express concern about one trial by them, which was marked by us as “overdue” under FDAAA. They argued that, while their trial definitely does meet the FDA’s pACT (“probable Applicable Clinical Trial”) criteria, it is not actually required to report results: because it only involved over-the-counter (OTC) medicines, and these are not considered to be “FDA-regulated” for the purposes of the Act. We think all trials should be reported: but for our FDAAA TrialsTracker we wanted to use the most conservative possible approach, to look only at the narrow issue of FDAAA compliance, and to accurately implement all its loopholes.

So, how could we best address this loophole? “FDA Regulated Drug” and “FDA Regulated Device” are required data elements that tell you which trials to drop on the basis of the intervention not being “FDA regulated”. We already use it to drop trials from the list of those due under FDAAA. But this data field is only required for trials that began after the implementation of the FDAAA “Final Rule” in January 2017, and this trial started way before then. However, we happen to know (because we have pored over this data) that sponsors can complete this field voluntarily and retrospectively, even for trials beginning before 2017. We therefore advised the sponsor to update their data on ClinicalTrials.gov, to confirm in the public data record that this trial was of a non-FDA-regulated intervention. They promptly did so: and, because our tracker updates every day, the trial was immediately dropped from our tracker. This is great, because we didn’t want to take one email from one person at one sponsor as evidence that a trial fits into a FDAAA loophole: if the sponsor puts the data on ClinicalTrials.gov, then they’re making a formal legal statement, and it’s in the shared public dataset used by everyone.

That’s a good ending. But we wanted to see if we could find a way to prevent this from happening again. In our conversations with the sponsor, they noted that ClinicalTrials.gov previously contained another data field on FDA-regulation status. We established that this data field was retrospectively deleted from the entire ClinicalTrials.gov database as of January 11, 2017: and, to be clear, we regard this as poor data stewardship by the team at ClinicalTrials.gov. We have our own archives, but we wanted one as close as possible to January 11: thanks to the folks at the Clinical Trials Transformation Initiative we obtained and processed an archived copy of the ClinicalTrials.gov database from January 5, 2017, and extracted the field.

This was not an easy piece of work, however: we had reason to suspect that this data would be messier than the current “FDA Regulated” data field. So we manually examined a sample to check if the data was usable. We found that some trials would be correctly removed from the FDAAA tracker database by using it. But we found some edge cases, and some trials where the “is_fda_regulated” field seems to be just wrong. On balance, we have decided to use this field to drop trials from the FDAAA tracker for now: this is consistent with our choice to be conservative, implement FDAAA’s loopholes, and only point the finger at trials which are required to report, even at the expense of excluding some trials which are required to report. Notably, we also found some trials where the old “is_fda_regulated” data field said “no”, but which also had data in the new “FDA regulated” data fields, saying “yes”, in frank contradiction: these trials stay on the tracker.

The updated code went live on the site, within a day of us hearing of the issue. The changes we’ve discussed, as well as any other updates or clarifications, will be reflected in the next revision to our preprint paper on Bioarxiv. So, what did we learn? Lots, as ever, through delivering live tools rather than academic papers alone! Although most of it was a reminder of what we already know. Here’s a list:

  1. Feedback is great!
  2. Public data can be messy (we think bodies like clinicaltrials.gov should document errors more clearly; we will do more work on fixing up the old FDA Regulated field if we get more resource for our trackers program; you might enjoy our paper on registry errors here).
  3. Public data can be unhelpfully deleted (we think this could be handled better by clinicaltrials.gov commenting on data fields, rather than deleting them outright).
  4. Getting organisations to correct their data at source is better than other parties correcting it as secondary users of the data (this issue comes up on OpenPrescribing.net sometimes too).

Cheers!