Related papers: Benchmarking and Understanding Compositional Relational Reasoning of LLMs

Benchmarking and Understanding Compositional Relational Reasoning of LLMs

URL: http://arxiv.org/abs/2412.12841v1
Date: Tue, 17 Dec 2024 12:10:38 GMT
Title: Benchmarking and Understanding Compositional Relational Reasoning of LLMs
Authors: Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang Liang,
Abstract summary: We first propose a new synthetic benchmark called Generalized Associative Recall (GAR)<n> Evaluation shows that GAR is challenging enough for existing LLMs, revealing their fundamental deficiency in CRR.<n>We then use attribution patching to discover the core circuits reused by Vicuna-33B across different tasks and a set of vital attention heads.
Score: 1.915591735124465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compositional relational reasoning (CRR) is a hallmark of human intelligence, but we lack a clear understanding of whether and how existing transformer large language models (LLMs) can solve CRR tasks. To enable systematic exploration of the CRR capability of LLMs, we first propose a new synthetic benchmark called Generalized Associative Recall (GAR) by integrating and generalizing the essence of several tasks in mechanistic interpretability (MI) study in a unified framework. Evaluation shows that GAR is challenging enough for existing LLMs, revealing their fundamental deficiency in CRR. Meanwhile, it is easy enough for systematic MI study. Then, to understand how LLMs solve GAR tasks, we use attribution patching to discover the core circuits reused by Vicuna-33B across different tasks and a set of vital attention heads. Intervention experiments show that the correct functioning of these heads significantly impacts task performance. Especially, we identify two classes of heads whose activations represent the abstract notion of true and false in GAR tasks respectively. They play a fundamental role in CRR across various models and tasks. The dataset and code are available at https://github.com/Caiyun-AI/GAR.

Related papers

Continual Knowledge Consolidation LORA for Domain Incremental Learning [19.396572674271685]
Continual knowledge consolidation low rank adaptation (CONEC-LoRA) addresses the DIL problems.<n>CONEC-LoRA is developed from consolidations between task-shared LORA to extract common knowledge and task-specific LORA.<n>This module integrates the ball-generator loss and transformation module to address the synthetic sample bias problem.
arXiv Detail & Related papers (2025-10-17T11:16:08Z)
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs [24.551034147718312]
Universal multimodal retrieval (UMR) aims to address complex retrieval tasks where both queries and candidates span diverse modalities.<n>We present a study aimed at uncovering the key factors that drive effective embedding learning for UMR using MLLMs.<n>We introduce a unified framework termed U-MARVEL, which outperforms state-of-the-art competitors on the M-B benchmark.
arXiv Detail & Related papers (2025-07-20T10:27:34Z)
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [52.32193550674408]
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL)<n>We propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually.<n>E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B)
arXiv Detail & Related papers (2025-06-07T02:41:54Z)
Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router [9.580226379350737]
Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models.<n>Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models.<n>We propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs.
arXiv Detail & Related papers (2025-06-06T09:18:56Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning [22.825527641316192]
Large language models (LLMs) achieve remarkable performance on challenging benchmarks that are structured as multiple-choice question-answering (QA) tasks. This paper introduces ARR, an intuitive and effective zero-shot prompting method that explicitly incorporates three key steps in QA solving: analyzing the intent of the question, retrieving relevant information, and reasoning step by step.
arXiv Detail & Related papers (2025-02-07T06:30:33Z)
Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns [47.57912649802414]
We study the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples.
arXiv Detail & Related papers (2024-09-24T07:34:50Z)
Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism [28.751003584429615]
Large language models (LLMs) exhibit remarkable in-context learning capabilities. Recent research presents two conflicting views on ICL. We provide a Two-Dimensional Coordinate System that unifies both views into a systematic framework.
arXiv Detail & Related papers (2024-07-24T05:26:52Z)
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model [97.71181484082663]
We introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture. Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.
arXiv Detail & Related papers (2024-02-23T18:45:35Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning [14.677411619418319]
Auxiliary tasks facilitate learning in situations when data is scarce or the principal task of focus is extremely complex.<n>We propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover a new unrelated auxiliary classification task.<n>We generate the auxiliary classification task through a clustering procedure on the most disentangled subspace, obtaining a discrete set of labels.
arXiv Detail & Related papers (2023-10-13T17:40:39Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
Prompting Large Language Models for Counterfactual Generation: An Empirical Study [13.506528217009507]
Large language models (LLMs) have made remarkable progress in a wide range of natural language understanding and generation tasks. We present a comprehensive evaluation framework on various types of NLU tasks, which covers all key factors in determining LLMs' capability of generating counterfactuals.
arXiv Detail & Related papers (2023-05-24T06:44:32Z)
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks [121.74957524305283]
This paper proposes a novel framework named textbfSearch-in-the-Chain (SearChain) for the interaction between Information Retrieval (IR) and Large Language Model (LLM) Experiments show that SearChain outperforms state-of-the-art baselines on complex knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-28T10:15:25Z)
oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.