How foundation models may finally enable AI to transform radiology

Benjamin Belot
Kurma Partners
Published in
7 min readOct 27, 2023

--

Back in 2016, famous computer scientist, AI expert & Turing Award recipient Geoffrey Hinton was urging the world to “stop training radiologists”. As algorithms were becoming more and more accurate, it seemed like AI could solve the well documented global shortage of radiologists. Fast forward a few years later and it’s fair to say that despite the hopes and the hype, AI has had a somehow limited impact on the practice of radiology.

With the advent of large language models (LLMs), we could be ushering into a new era for AI in healthcare. Early use cases are already being identified, especially on medical documentation and back-office workflows of care (medical coding, revenue cycle management…). These use cases have an exciting potential to ease the lives of healthcare providers, dedicating their time and energy to helping patients but growing more and more fatigued.

AI solutions improving the delivery of care itself are admittedly more challenging to develop, but they might be even more impactful. In this article, we will focus on radiology, looking into the main challenges that hindered AI adoption in the first place and how advances in models’ architecture and training methods may significantly move the needle.

AI in Radiology: why has adoption been limited thus far?

As of January 2023, out of the 520 AI algorithms approved by the FDA, 396 were pertaining to radiology, making it a key area of focus for AI companies.

FDA-approved AI-based medical devices recap (by the Medical Futurist)

AI promised to revolutionize medical imaging, offering the potential for faster, more accurate, and cost-effective disease detection. Technologically, it achieved significant progress through deep learning, with some algorithms surpassing clinical experts on accuracy in specific tasks (example)

…so what happened?

Below we listed some of the key hurdles we have identified.

Let’s first dig into what may be called the tunnel vision trap. Up to now, models were typically trained through supervised learning, using large amounts of (well) labelled data to learn to recognize patterns and achieve classification tasks. Such data is particularly hard to come by in healthcare: it is both costly and time-consuming to have new data labeled by experts, themselves in short supply. Furthermore, this task-specific approach created inflexible models that are limited to predefined tasks and cannot adapt to other tasks without costly retraining.

The result is a collection of algorithms detecting a single ailment on a single imaging modality, making it impractical for radiologists to include them in their workflow. Indeed, radiologists must be able to identify dozens of abnormalities in their daily practice and they may be looking for more than one on a given image.

Furthermore, radiologists don’t reach conclusions on images alone, also incorporating lab test reports, patient history…in their analysis. Traditional AI models struggled to handle this multifaceted information, failing to take stock of all the complexities of the job.

Another shortfall of supervision-based algorithms is their limited generalization potential. Many early AI models were designed to perform well on specific datasets or tasks but struggled to generalize to new or diverse cases, whereas radiology demands a high level of generalization given the wide variety of patient conditions, workflows and imaging devices in service.

Challenges related to interpretability and trust should also be mentioned. Radiologists require transparency in AI decision-making. The “black box” nature of deep learning AI models has hindered their acceptance, as healthcare professionals often need to understand the rationale behind AI-generated insights.

All of the above contributed to cast a shadow of doubt on the real benefits of AI for radiology, even prompting some practitioners to declare that it was adding to their already heavy workload!

LLMs: how can they move the needle?

Over the past few years, significant advances have beenmade in machine learning, particularly in model architecture and learning methods. First described in a Google paper from 2017, transformers are a new generation of neural networks capable of identifying subtle patterns within a dataset, notably between elements that may be sequentially far from each other (e.g. words in a sentence).

The use of transformers paved the way to train models on large amounts of unlabeled data using self-supervised learning (SSL). As a result, models learn a richer representation of the environment and can then be adapted or fine-tuned to a wide range of downstream tasks.

In other words, the so-called “foundation model” can then be trained for specific tasks in a much more agile way requiring lesser amounts of well-labelled data — just like ChatGPT can perform summaries, translate text, generate new content…Down the line, the opportunity is for a foundation model to replace many task-specific models.

Some early use cases of generative AI in healthcare are presented below. As you can see, they mostly address documentation and back-office workflows of healthcare. It’s also noticeable that the competition is already becoming fierce.

Mapping of front & back office generative AI solutions in Healthcare (by Sequoia Capital)

In terms of performance, it’s interesting to note that SSL models have outperformed their supervised counterparts on many tasks (example), despite being fine-tuned with smaller amounts of data.

Transformers also tend to fare much better when exposed to new data from different domains. As they excel in learning patterns and relationships from extensive and varied datasets, they can be pre-trained on vast medical databases, ensuring robust generalization across different patient demographics and imaging technologies. A robustness which is essential for real-world clinical applications.

Finally, on the topic of transparency and explainability: LLMs and transformers can incorporate natural language processing (NLP) to analyze and generate textual explanations for their predictions. This feature enhances interpretability, allowing radiologists and healthcare professionals to understand and trust AI-generated recommendations.

Of course, we must still remain cautious: biases, data privacy or hallucinations (i.e. the tendency of these models to over confidently generate false information) are some of the serious challenges that will need to get dealt with. But the potential to address some of the shortcomings of previous AI models is definitely there.

Multimodal AI: the next frontier in radiology

While the first LLMs typically focused on text, the latest generations are multimodal i.e. equipped with “extra sensory skills” to also process images, audio, videos… thus making them closer to human intelligence.

This is particularly interesting in the context of medicine: an inherently multimodal discipline. When providing care, HCPs typically analyze data from a variety of modalities, such as medical images, clinical notes, blood tests & more. All of these data are combined in clinical decision making: for example a brain lesion may be difficult to characterize without any context and based on imaging features alone. If we want to bring true AI assistants (i.e. copilots) for doctors, they also need to be multimodal.

As the mighty Eric Topol put it « the big shift ahead is the ability to transcend narrow, unimodal tasks, confined to images, and broaden machine capabilities to include text and speech, encompassing all input modes, setting the foundation for multimodal AI » . Hence the recent interest for models such as Med-PaLM M, Med Flamingo, RETfound or GPT 4V(ision).

Med-Palm M: multimodal medical LLM for healthcare produced by Google

Businesswise there is also an opportunity here, in a context where everybody is wondering who will extract value from the generative AI value chain (besides Nvidia that is…) VS be gobbled up by big tech or AI incumbents (this category is so underrated!).

Large models from big tech players are generalist by design and hence have not been exposed to tons of healthcare specific data. Unsurprisingly they tend to perform less well than the specialized models mentioned above. Multimodal AI opens up the opportunity to create more data moats as well as vertical SaaS-style AI products that will better meet the needs and fit within the workflows of healthcare users.

At Kurma Partners, we believe in the potential for a second wave of “AI for radiology” companies, leveraging the latest technology developments to completely revamp the approach, from model training to workflow integration.

Rather than incremental steps, the approach might entail rebuilding an AI-ready multimodal viewer from the ground up, that can be interacted with through prompts (just like ChatGPT) and work across imaging modalities (MRI, CT…)… and more (clinical notes, biology test reports…). Conceptually, this “ground up” approach is similar to what companies developing AI for drug discovery are trying to do: turning the table from an approach based on screening millions of compounds to see which one sticks to a pro-active design approach.

There are foundational (pun intended) generative AI-native companies to be built in healthcare and we are seeing more and more emerge from stealth, one of them being our own portfolio company Raidium. Make sure to check it out if the thesis developed in this article resonates.

A view from the Nvidia headquarters in Santa Clara

--

--

Benjamin Belot
Kurma Partners

Partner at Kurma Partners, investing in early-stage healthtech & techbio across Europe. Passionate about healthcare, geeky about music, emotional about football