Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation
- URL: http://arxiv.org/abs/2503.08057v1
- Date: Tue, 11 Mar 2025 05:27:28 GMT
- Title: Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation
- Authors: Wen Luo, Feifan Song, Wei Li, Guangyue Peng, Shaohang Wei, Houfeng Wang,
- Abstract summary: Large Language Models (LLMs) are increasingly required to generate text that is both factually accurate and diverse across various open-ended applications.<n>We introduce Dynamic Focus Decoding (DFD), a novel plug-and-play approach that resolves this trade-off without requiring additional data, knowledge, or models.<n>DFD adaptively adjusts the decoding focus based on distributional differences across layers, leveraging the modular and hierarchical nature of factual knowledge within LLMs.
- Score: 18.835969818281125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly required to generate text that is both factually accurate and diverse across various open-ended applications. However, current stochastic decoding methods struggle to balance such objectives. We introduce Dynamic Focus Decoding (DFD), a novel plug-and-play stochastic approach that resolves this trade-off without requiring additional data, knowledge, or models. DFD adaptively adjusts the decoding focus based on distributional differences across layers, leveraging the modular and hierarchical nature of factual knowledge within LLMs. This dynamic adjustment improves factuality in knowledge-intensive decoding steps and promotes diversity in less knowledge-reliant steps. DFD can be easily integrated with existing decoding methods, enhancing both factuality and diversity with minimal computational overhead. Extensive experiments across seven datasets demonstrate that DFD significantly improves performance, providing a scalable and efficient solution for open-ended text generation.
Related papers
- Transfer between Modalities with MetaQueries [44.57406292414526]
We introduce MetaQueries, a set of learnable queries that act as an efficient interface between autoregressive multimodal LLMs and diffusion models.
Our method simplifies training, requiring only paired image-caption data and standard diffusion objectives.
Our method is flexible and can be easily instruction-tuned for advanced applications such as image editing and subject-driven generation.
arXiv Detail & Related papers (2025-04-08T17:58:47Z) - Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training [66.48331530995786]
We propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context.
Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage.
Experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations.
arXiv Detail & Related papers (2025-02-25T03:03:35Z) - A Text-Based Knowledge-Embedded Soft Sensing Modeling Approach for General Industrial Process Tasks Based on Large Language Model [16.842988666530204]
Data-driven soft sensors (DDSS) have become mainstream methods for predicting key performance indicators in process industries.<n>Development requires complex and costly customized designs tailored to various tasks during the modeling process.<n>We propose a general framework named LLM-TKESS (large language model for text-based knowledge-embedded soft sensing) for enhanced soft sensing modeling.
arXiv Detail & Related papers (2025-01-09T08:59:14Z) - SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models [34.3296459569307]
Large language models (LLMs) have demonstrated remarkable capabilities, but their outputs can sometimes be unreliable or factually incorrect.<n>We introduce Self Logits Evolution Decoding (SLED), a novel decoding framework that enhances the truthfulness of LLMs.<n>We show that SLED consistently improves factual accuracy by up to 20% compared to existing decoding methods.
arXiv Detail & Related papers (2024-11-01T17:33:34Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations.
We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs)
EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding.
Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z) - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning.<n>We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy.<n>We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z) - FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation [19.17759446168802]
We introduce FiD-Light to increase the efficiency of the state-of-the-art retrieval-augmented FiD model.
We adapt FiD-Light with re-ranking capabilities through source pointers to improve the top-ranked precision.
arXiv Detail & Related papers (2022-09-28T17:54:55Z) - CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA.
CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically.
A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.