Loss Design and Architecture Selection for Long-Tailed Multi-Label Chest X-Ray Classification
- URL: http://arxiv.org/abs/2603.02294v1
- Date: Mon, 02 Mar 2026 17:33:00 GMT
- Title: Loss Design and Architecture Selection for Long-Tailed Multi-Label Chest X-Ray Classification
- Authors: Nikhileswara Rao Sulake,
- Abstract summary: Longtailed distributions class pose a significant challenge for multi-label chest X-ray classification.<n>We present a systematic empirical evaluation of loss functions, CNN backbone architectures and post-training strategies on the CXR-LT 2026 benchmark.<n>Our experiments demonstrate that LDAM with deferred re-weighting consistently outperforms standard BCE and asymmetric losses for rare class recognition.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-tailed class distributions pose a significant challenge for multi-label chest X-ray (CXR) classification, where rare but clinically important findings are severely underrepresented. In this work, we present a systematic empirical evaluation of loss functions, CNN backbone architectures and post-training strategies on the CXR-LT 2026 benchmark, comprising approximately 143K images with 30 disease labels from PadChest. Our experiments demonstrate that LDAM with deferred re-weighting (LDAM-DRW) consistently outperforms standard BCE and asymmetric losses for rare class recognition. Amongst the architectures evaluated, ConvNeXt-Large achieves the best single-model performance with 0.5220 mAP and 0.3765 F1 on our development set, whilst classifier re-training and test-time augmentation further improve ranking metrics. On the official test leaderboard, our submission achieved 0.3950 mAP, ranking 5th amongst all 68 participating teams with total of 1528 submissions. We provide a candid analysis of the development-to-test performance gap and discuss practical insights for handling class imbalance in clinical imaging settings. Code is available at https://github.com/Nikhil-Rao20/Long_Tail.
Related papers
- Overview of the CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification [14.263392973355666]
We present the CXR-LT 2026 challenge.<n>This third iteration of the benchmark introduces a multi-center dataset comprising over 145,000 images from PadChest and NIH Chest X-ray datasets.<n>The challenge defines two core tasks: (1) Robust Multi-Label Classification on 30 known classes and (2) Open-World Generalization to 6 unseen (out-of-distribution) rare disease classes.<n>We report the results of the top-performing teams, evaluating them via mean Average Precision (mAP), AUROC, and F1-score.
arXiv Detail & Related papers (2026-02-25T16:39:21Z) - Handling Supervision Scarcity in Chest X-ray Classification: Long-Tailed and Zero-Shot Learning [14.888577410967129]
The CXR-LT 2026 challenge addresses issues on a PadChest-based benchmark with a 36-class label space split into 30 in-distribution classes for training and 6 out-of-distribution classes for zero-shot evaluation.<n>We present task-specific solutions tailored to the distinct supervision regimes.<n>For Task 1 (long-tailed multi-label classification), we adopt an imbalance-aware multi-label learning strategy to improve recognition of tail classes while maintaining stable performance on frequent findings.<n>For Task 2 (zero-shot OOD recognition), we propose a prediction approach that produces scores for unseen disease categories without using any supervised labels
arXiv Detail & Related papers (2026-02-13T20:07:34Z) - FUGC: Benchmarking Semi-Supervised Learning Methods for Cervical Segmentation [63.7829089874007]
This paper introduces the Fetal Ultrasound Grand Challenge (FUGC), the first benchmark for semi-supervised learning in cervical segmentation.<n>FUGC provides a dataset of 890 TVS images, including 500 training images, 90 validation images, and 300 test images.<n> Methods were evaluated using the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), and runtime (RT), with a weighted combination of 0.4/0.4/0.2.
arXiv Detail & Related papers (2026-01-22T01:34:39Z) - Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets [0.35441912284181126]
This work benchmarks two large-scale chest X-ray embedding models (CXR) on public MIMIC-CR and NIH ChestX-ray14 datasets.<n>We extracted embeddings directly from pre-trained encoders, trained lightweight LightGBM classifiers on multiple disease labels, and reported mean AUROC, and F1-score with 95% confidence intervals.
arXiv Detail & Related papers (2025-12-03T12:55:44Z) - Advanced Multi-Architecture Deep Learning Framework for BIRADS-Based Mammographic Image Retrieval: Comprehensive Performance Analysis with Super-Ensemble Optimization [0.0]
mammographic image retrieval systems require exact BIRADS categorical matching across five distinct classes.<n>Current medical image retrieval studies suffer from methodological limitations.
arXiv Detail & Related papers (2025-08-06T18:05:18Z) - CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray [64.2434525370243]
The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays.<n>The CXR-LT 2024 expands the dataset to 377,110 chest X-rays (CXRs) and 45 disease labels, including 19 new rare disease findings.<n>This paper provides an overview of CXR-LT 2024, detailing the data curation process and consolidating state-of-the-art solutions.
arXiv Detail & Related papers (2025-06-09T17:53:31Z) - Deep Rib Fracture Instance Segmentation and Classification from CT on
the RibFrac Challenge [66.86170104167608]
The RibFrac Challenge provides a benchmark dataset of over 5,000 rib fractures from 660 CT scans.
During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary.
The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts.
arXiv Detail & Related papers (2024-02-14T18:18:33Z) - Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays [40.11576642444264]
This report presents a brief description of our solution in the ICCV CVAMD 2023 CXR-LT Competition.
We empirically explored the effectiveness for CXR diagnosis with the integration of several advanced designs.
Our framework finally achieves 0.349 mAP on the competition test set, ranking in the top five.
arXiv Detail & Related papers (2023-08-17T08:25:55Z) - Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually.
Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data.
We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas.
This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z) - SLCA: Slow Learner with Classifier Alignment for Continual Learning on a
Pre-trained Model [73.80068155830708]
We present an extensive analysis for continual learning on a pre-trained model (CLPM)
We propose a simple but extremely effective approach named Slow Learner with Alignment (SLCA)
Across a variety of scenarios, our proposal provides substantial improvements for CLPM.
arXiv Detail & Related papers (2023-03-09T08:57:01Z) - Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling.
Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.