Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Medicine and Public Health
doi: 10.3389/frai.2022.1059093

Statistical Biopsy: An Emerging Screening Approach for Early Detection of Cancers

  • 1Bill and Melinda Gates Foundation, United States
  • 2Yale University, United States
  • 3Wright-Patterson Air Force Base, United States
  • 4Sun Nuclear (United States), United States
  • 5Physics, Florida Atlantic University, United States
  • 6Yale Medicine, United States
Provisionally accepted:
The final, formatted version of the article will be published soon.

Despite large investment cancer continues to be a major source of mortality and morbidity throughout the world. Traditional methods of detection and diagnosis such as biopsy and imaging, tend to be expensive and have risks of complications. As data becomes more abundant and machine learning continues advancing, it is nature to ask how they can help solve some of these problems. In this paper we show that using a person’s personal health data it is possible to predict their risk for a wide variety of cancers. We dub this process a “statistical biopsy”. 

Specifically, we train two neural networks, one predicting risk for 16 different cancer types in females and the other predicting risk for 15 different cancer types in males. The networks were trained as binary classifiers identifying individuals that were diagnosed with the different cancer types within five years of joining the PLOC trial. However, rather than use the binary output of the classifiers we show that the continuous output can instead be used as a cancer risk allowing a holistic look at an individual's cancer risks. We tested our multi-cancer model on the UK Biobank dataset showing that for most cancers the predictions generalized well and that looking at multiple cancer risks at once from personal health data is a possibility.  

While the statistical biopsy will not be able to replace traditional biopsies for diagnosing cancers, we hope there can be a shift of paradigm in how statistical models are used in cancer detection moving to something more powerful and more personalized than general population screening guidelines. 

Keywords: cancer screening, machine learning and AI, neural network, Biopsy, Data Mining, cancer detection, Individualized Medicine

Received:30 Sep 2022; Accepted: 14 Dec 2022.

Copyright: © 2022 Hart, Yan, Nartowt, Roffman, Stark, Muhammad and Deng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Prof. Jun Deng, Yale Medicine, New Haven, 06510, Connecticut, United States