Large (Vision) Language Models are Unsupervised In-Context Learners
- URL: http://arxiv.org/abs/2504.02349v1
- Date: Thu, 03 Apr 2025 07:33:02 GMT
- Title: Large (Vision) Language Models are Unsupervised In-Context Learners
- Authors: Artyom Gadetsky, Andrei Atanov, Yulun Jiang, Zhitong Gao, Ghazal Hosseini Mighan, Amir Zamir, Maria Brbic,
- Abstract summary: We introduce a joint inference framework for fully unsupervised adaptation.<n>Unlike zero-shot inference, the joint inference makes predictions simultaneously for all inputs in a given task.<n>Our experiments demonstrate substantial improvements over the standard zero-shot approach.
- Score: 14.930827851769276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language and vision-language models have enabled zero-shot inference, allowing models to solve new tasks without task-specific training. Various adaptation techniques such as prompt engineering, In-Context Learning (ICL), and supervised fine-tuning can further enhance the model's performance on a downstream task, but they require substantial manual effort to construct effective prompts or labeled examples. In this work, we introduce a joint inference framework for fully unsupervised adaptation, eliminating the need for manual prompt engineering and labeled examples. Unlike zero-shot inference, which makes independent predictions, the joint inference makes predictions simultaneously for all inputs in a given task. Since direct joint inference involves computationally expensive optimization, we develop efficient approximation techniques, leading to two unsupervised adaptation methods: unsupervised fine-tuning and unsupervised ICL. We demonstrate the effectiveness of our methods across diverse tasks and models, including language-only Llama-3.1 on natural language processing tasks, reasoning-oriented Qwen2.5-Math on grade school math problems, vision-language OpenFlamingo on vision tasks, and the API-only access GPT-4o model on massive multi-discipline tasks. Our experiments demonstrate substantial improvements over the standard zero-shot approach, including 39% absolute improvement on the challenging GSM8K math reasoning dataset. Remarkably, despite being fully unsupervised, our framework often performs on par with supervised approaches that rely on ground truth labels.
Related papers
- Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning [102.18178065928426]
We propose an efficient fine-tuning framework with two novel approaches: Vision Cue Enhancement (VCE) and Dual Low-Rank Adaptation (Dual-LoRA)
VCE enhances the vision projector by integrating multi-level visual cues, improving the model's ability to capture fine-grained visual features.
Dual-LoRA introduces a dual low-rank structure for instruction tuning, decoupling learning into skill and task spaces to enable precise control and efficient adaptation across diverse tasks.
arXiv Detail & Related papers (2024-11-19T11:03:09Z) - Exploring the Transferability of Visual Prompting for Multimodal Large Language Models [47.162575147632396]
Transferable Visual Prompting (TVP) is a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.
We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts.
arXiv Detail & Related papers (2024-04-17T09:39:07Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs [37.754787051387034]
We propose a representation learning framework called breakpoint modeling.
Our approach trains models in an efficient and end-to-end fashion to build intermediate representations.
We show the benefit of our main breakpoint transformer, based on T5, over conventional representation learning approaches.
arXiv Detail & Related papers (2022-11-15T07:28:14Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.