Related papers: Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

URL: http://arxiv.org/abs/2501.00999v2
Date: Mon, 06 Jan 2025 01:49:09 GMT
Title: Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory
Authors: Zhou Yang, Zhengyu Qi, Zhaochun Ren, Zhikai Jia, Haizhou Sun, Xiaofei Zhu, Xiangwen Liao,
Abstract summary: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks.<n>This paper explores the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory.<n>We introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT)
Score: 17.426872862387818
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances reasoning performance and inference efficiency by compressing retrieved example information into the task space. TS-FT employs a space-guided loss to fine-tune LLMs, encouraging the learning of more effective compression and selection mechanisms. Experiments across multiple datasets validate the effectiveness of task space construction. Additionally, IC-ICL not only improves performance but also accelerates inference speed by over 40\%, while TS-FT achieves superior results with a minimal strategy adjustment.

Related papers

Optimising Language Models for Downstream Tasks: A Post-Training Perspective [0.0]
Language models (LMs) have demonstrated remarkable capabilities in NLP.<n>But adapting them efficiently and robustly to specific tasks remains challenging.<n>This thesis proposes a series of methods to better adapt LMs to downstream applications.
arXiv Detail & Related papers (2025-06-26T00:49:35Z)
Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring [2.1205272468688574]
We propose a cognitive architecture for ML monitoring that applies feature engineering principles to agents based on Large Language Models.<n>Decision Procedure module simulates feature engineering through three key steps: Refactor, Break Down, and Compile.<n> Experiments using multiple LLMs demonstrate the efficacy of our approach, achieving significantly higher accuracy compared to various baselines.
arXiv Detail & Related papers (2025-06-11T13:48:25Z)
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning [48.67380502157004]
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks.<n>The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood.
arXiv Detail & Related papers (2025-05-16T08:50:42Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use [46.20445033086643]
Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with external environments.<n>Standard supervised fine-tuning approach, which relies on large-scale datasets, often overlooks task-specific characteristics in tool use.<n>We propose TL-Training, a task-feature-based framework that mitigates the effects of suboptimal training data.
arXiv Detail & Related papers (2024-12-20T02:21:36Z)
Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.<n>In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.<n>Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z)
Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns [47.57912649802414]
We study the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples.
arXiv Detail & Related papers (2024-09-24T07:34:50Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models [5.011816280731356]
We propose a simple yet effective method to contextualize a task toward a specific large language model (LLMs) We show the effectiveness of this approach in text clustering tasks, and also highlight the importance of the contextualization through examples of the above procedure.
arXiv Detail & Related papers (2024-06-19T08:48:05Z)
Extending Token Computation for LLM Reasoning [5.801044612920816]
Large Language Models (LLMs) are pivotal in advancing natural language processing. LLMs often struggle with complex reasoning tasks due to inefficient attention distributions. We introduce a novel method for extending computed tokens in the Chain-of-Thought process, utilizing attention mechanism optimization.
arXiv Detail & Related papers (2024-03-22T03:23:58Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of supervised fine-tuning (SFT)<n>We also review the potential pitfalls of SFT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies.
arXiv Detail & Related papers (2023-08-21T15:35:16Z)
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks. We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.