Related papers: KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

URL: http://arxiv.org/abs/2307.07409v1
Date: Mon, 10 Jul 2023 21:18:01 GMT
Title: KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
Authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang
Abstract summary: CheXOFA is a new pre-trained vision-language model (VLM) for the chest X-ray domain. We unify various domain-specific tasks into a simple sequence-to-sequence schema. Our system achieves first place on the RadSum23 leaderboard for the hidden test set.
Score: 29.443550756161667
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.

Related papers

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons [85.99268361356832]
We introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA) GEA is a single unified model capable of grounding itself across varied domains through a multi-embodiment action tokenizer. Our findings reveal the importance of training with cross-domain data and online RL for building generalist agents.
arXiv Detail & Related papers (2024-12-11T15:06:25Z)
On Domain-Specific Post-Training for Multimodal Large Language Models [72.67107077850939]
This paper systematically investigates domain adaptation of MLLMs through post-training. We focus on data synthesis, training pipelines, and task evaluation. We conduct experiments in high-impact domains such as biomedicine, food, and remote sensing.
arXiv Detail & Related papers (2024-11-29T18:42:28Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow. We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification. We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z)
DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding [41.49771026674969]
We introduce a novel, practical, multi-domain multi-task setting, handling multiple domains and multiple tasks within one unified model for domain generalized point cloud understanding. Our DG-PIC does not require any model updates during the testing and can handle unseen domains and multiple tasks, textiti.e., point cloud reconstruction, denoising, and registration, within one unified model.
arXiv Detail & Related papers (2024-07-11T18:21:40Z)
NuwaTS: a Foundation Model Mending Every Incomplete Time Series [24.768755438620666]
We present textbfNuwaTS, a novel framework that repurposes Pre-trained Language Models for general time series imputation. NuwaTS can be applied to impute missing data across any domain. We show that NuwaTS generalizes to other time series tasks, such as forecasting.
arXiv Detail & Related papers (2024-05-24T07:59:02Z)
Quality > Quantity: Synthetic Corpora from Foundation Models for Closed-Domain Extractive Question Answering [35.38140071573828]
We study extractive question answering within closed domains and introduce the concept of targeted pre-training. Our proposed framework uses Galactica to generate synthetic, targeted'' corpora that align with specific writing styles and topics.
arXiv Detail & Related papers (2023-10-25T20:48:16Z)
MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization [55.06956781674986]
We resort to solving the semi-supervised domain generalization task, where there are a few label information in each source domain. We propose MultiMatch, extending FixMatch to the multi-task learning framework, producing the high-quality pseudo-label for SSDG. A series of experiments validate the effectiveness of the proposed method, and it outperforms the existing semi-supervised methods and the SSDG method on several benchmark DG datasets.
arXiv Detail & Related papers (2022-08-11T14:44:33Z)
CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models. Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available. Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z)
Boosting Binary Masks for Multi-Domain Learning through Affine Transformations [49.25451497933657]
The goal of multi-domain learning is to produce a single model performing a task in all the domains together. Recent works showed how we can address this problem by masking the internal weights of a given original conv-net through learned binary variables. We provide a general formulation of binary mask based models for multi-domain learning by affine transformations of the original network parameters.
arXiv Detail & Related papers (2021-03-25T14:54:37Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale [64.11709427403008]
We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains. We show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines.
arXiv Detail & Related papers (2020-10-02T13:22:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.