Detecting features to predict cancer spread from biopsy slides

As you might have noticed, we have published the world’s largest open data set of cancer biopsy images - namely the Breast Cancer Biopsy Data Set contributed by Providence St. Joseph Health with support from the Gordon and Betty Moore Foundation.

It’s currently at 4,200 patients (~140TB), and will be updated to encompass slides and pertinent clinical information of 11,000 patients (>300TB). We have successfully conducted 3 machine learning challenges, including one with the support of NIH AIM-AHEAD to improve health equity.

The next step in making these ML models useful in medicine is to make them identify the features they find. They might end up finding features that are already described in clinical medicine such as lymphovascular invasion. Alternatively, they might find new features like the Tumor Adipose Factor described in this recent paper by L’Imperio et al. (2023) - where the ML discovered a novel ‘Tumor Adipose Factor’ that was subsequently validated by pathologists.

I want to pose an open question to all clinicians and researchers at all levels of expertise:

  1. What are the current features accepted in the current standards of care are pertinent to each type of breast cancer to predict the spread?

  2. What are some novel features that are being hypothesized as important predictors of spread?