3 Best Chest X-Ray Analysis Using Deep Learning I've Found

Guillaume Georges
Dec 21, 2024
4 min read

Updated: Dec 29, 2024

Is Deep Learning good enough to diagnosis a pneumonia, or a tuberculosis from a patient from their X-Ray?

I explore the cutting-edge role of Deep Learning in chest X-ray analysis and its potential for healthcare organizations. I’ve reviewed leading scientific studies and examined the most-used commercial tools currently available in the market. I will discuss the strengths and limitations identified by researchers when using these tools and datasets and explain how deep learning fro X-Ray analysis can be a practical solution for healthcare professionals.

If you'd like a more extensive list of commercial X-ray tools available on the market, I recommend the Radiology Health AI Register, where you can check the EU certifications and FDA compliance of these software solutions.

AI-Rad Companion Chest X-Ray

AI-Rad is specialized in supporting radiologist in their diagnosis. Siemmens Healthineers states that it can identify "pulmonary lesions, pleural effusion, pneumothorax, consolidation and atelectasis."

In a use case I found and linked below, AI-Rad found a pulmonary lesion in a young patient X-Ray that the author explained that went unnoticed. After a foolow up with CT scan the lesion was confirmed.

A radiography of lungs with detection of pulmonary lesions by AI — AI technology highlights a pulmonary lesion on a lung radiography. Source video: https://youtu.be/mb2D4_FhsTw?si=_O3KQOlj1HuqTKRw

In a research paper published in the U.S. National Library of Medicine, Julius H. Niehoff, a radiologist at Johannes Wesling University Hospital in Germany, reviewed AI-Rad with very encouraging results. The AI-Rad demonstrated high accuracy in detecting lung lesions, which were later confirmed by CT scans. Niehoff (2023) stated: "Radiologists verifying their own negative findings in chest radiographs by considering the evaluation of AI-Rad may gain higher diagnostic confidence, leading to faster reporting."

Example of a lung lesion detected by the AI Rad Companion Chest X-ray. Image A is the original X-Ray, B is the AI detection, and C is the CT scan confirming the lesion. Source: https://pmc.ncbi.nlm.nih.gov/articles/PMC9985819/

Lunit INSIGHT CXR

Lunit INSIGHT CXR was evaluated in a study published in the Quantitative Imaging in Medicine and Surgery Journal and was ranked among the top three deep learning tools for radiologist evaluations. It is also one of the most peer-reviewed tools I have found.

In a 2024 study by Arzamasov K., the model achieved 92% accuracy in identifying the probability of pathology in X-ray images and 78% accuracy when evaluated by experts—one of the best results among all the models assessed. While these results are very encouraging, they highlight that AI outputs still require confirmation by radiologist experts and are not a replacement for their expertise.

Annalise Enterprise CXR

This software differs from the other programs I listed by providing lateral X-ray analysis, which scientists have noted as missing from other deep learning tools. It meets CE standards for sale in Europe and FDA standards for sale in the US. There are several peer-reviewed studies evaluating its performance, and the model has been found to perform highly for Airspace Disease, Pneumothorax, and Pleural Effusion (Plesner L.L, 2023).

Pitfalls of Deep Learning in X-Ray Analysis

While deep learning appears to be a highly efficient tool, it is important to note that its efficiency relies heavily on being trained with thousands—or even millions—of images. Realistically, most hospitals simply do not have access to datasets of that magnitude. As Mazurowski (2018) points out, "The datasets for medical images are typically much smaller, with a typical number of patients in the hundreds range."

To address this limitation, deep learning engineers use a technique called Data Augmentation, which generates additional training images by transforming the originals. For example, we can flip, rotate, adjust contrast, blur parts of the image, or introduce imperfections. These transformations increase both the diversity and quantity of training images, enabling the model to learn from a larger and more varied dataset.

Deep Learning Bias

Another significant challenge in deep learning for X-ray analysis is bias. These models can perform exceptionally well in certain scenarios, such as analyzing adult X-rays, giving the illusion that the model is robust and ready for deployment. However, when faced with different scenarios, like analyzing pediatric X-rays, the model may fail due to insufficient training data for that specific group.

In such cases, it is crucial for medical experts to work closely with data engineers to anticipate and mitigate these issues. How can this be achieved? Start by mapping the needs of the deep learning tools you are developing and the scenarios in which they will be used. Analyze your dataset of images and collaborate with the data engineer to review the labels assigned to the images in your original dataset. Categorize these images based on their labels, and identify both under-represented and over-represented categories.

Concept of Explainable AI

Another important consideration for deep learning researchers is the concept of explainable AI introduced by Mazurowski (2018) which are "systems that produce classification labels without providing any reasoning behind their predictions raise concerns about trustworthiness among radiologists". Experts are more likely to trust and efficiently evaluate AI findings if there is an explanation or visualization of how the model reached its conclusions. For example, identifying the location of a nodule with a bounding box or measuring cardiac and thoracic diameters for detecting cardiomegaly can significantly enhance trust.

However, a major challenge lies in the fact that, during training, we do not fully understand the reasoning process behind the model's decisions. It would be highly beneficial if models could explain their predictions by indicating, for example, "I identified this lesion because it is similar to X number of images in my training set that had the same form." While models may provide a certainty score or confidence level, they often fail to explain how they arrived at their conclusions, leaving a gap in transparency and trustworthiness.