Over the past year, technology companies have made headlines claiming that their artificially intelligent (AI) products can outperform clinicians at diagnosing breast cancer,1 brain tumours,2 and diabetic retinopathy.3 Claims such as these have influenced policy makers, and AI now forms a key component of the national health strategies in England, the United States, and China.

It is positive to see healthcare systems embracing data analytics and machine learning. However, there are reasonable concerns about the efficacy, ethics, and safety of some commercial, AI health solutions.45 Trust in AI applications (or apps) heavily relies on the myth of the objective and omniscient algorithm, and our systems for generating and implementing evidence have not yet met the new specific challenges of AI. They may even have failed on the basics. In a linked article, Freeman and colleagues6 (doi:10.1136/bmj.m127) throw these general concerns into stark relief with a close examination of the evidence on diagnostic apps for skin cancer. Here we comment on the article findings and the policy implications.