Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning
- URL: http://arxiv.org/abs/2502.11799v1
- Date: Mon, 17 Feb 2025 13:42:12 GMT
- Title: Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning
- Authors: Peiying Yu, Guoxin Chen, Jingjing Wang,
- Abstract summary: Table-Critic is a novel multi-agent framework that facilitates collaborative criticism and iterative refinement of the reasoning process.
Our framework consists of four specialized agents: a Judge for error identification, a Critic for comprehensive critiques, a Refiner for process improvement, and a Curator for pattern distillation.
Extensive experiments have demonstrated that Table-Critic achieves superior accuracy and error correction rates while maintaining computational efficiency and lower solution degradation rate.
- Score: 3.721438719967748
- License:
- Abstract: Despite the remarkable capabilities of large language models (LLMs) in various reasoning tasks, they still struggle with table reasoning tasks, particularly in maintaining consistency throughout multi-step reasoning processes. While existing approaches have explored various decomposition strategies, they often lack effective mechanisms to identify and correct errors in intermediate reasoning steps, leading to cascading error propagation. To address these issues, we propose Table-Critic, a novel multi-agent framework that facilitates collaborative criticism and iterative refinement of the reasoning process until convergence to correct solutions. Our framework consists of four specialized agents: a Judge for error identification, a Critic for comprehensive critiques, a Refiner for process improvement, and a Curator for pattern distillation. To effectively deal with diverse and unpredictable error types, we introduce a self-evolving template tree that systematically accumulates critique knowledge through experience-driven learning and guides future reflections. Extensive experiments have demonstrated that Table-Critic achieves substantial improvements over existing methods, achieving superior accuracy and error correction rates while maintaining computational efficiency and lower solution degradation rate.
Related papers
- Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.
Models may behave unreliably due to poorly explored failure modes.
causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Framework for Progressive Knowledge Fusion in Large Language Models Through Structured Conceptual Redundancy Analysis [0.0]
The organization of latent knowledge within large-scale models poses unique challenges when addressing overlapping representations and optimizing contextual accuracy.
A framework was proposed to restructure these redundancies through advanced clustering techniques and dynamic thresholding.
Evaluations revealed improved memory efficiency and faster inference times, alongside better alignment in latent knowledge clusters that enhanced interpretability.
arXiv Detail & Related papers (2025-01-23T11:34:04Z) - Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)
We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.
We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z) - Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.
Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.
Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction [9.44858963874474]
Chain-of-Thought (CoT) prompting enhances Large Language Models (LLMs) complex reasoning abilities.
We propose the CoT Rerailer to address these challenges, employing self-consistency and multi-agent debate systems.
We demonstrate the effectiveness of our approach across diverse question-answering datasets in various knowledge domains.
arXiv Detail & Related papers (2024-08-25T21:20:17Z) - Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models.
The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.