MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System
- URL: http://arxiv.org/abs/2503.09600v1
- Date: Wed, 12 Mar 2025 17:59:42 GMT
- Title: MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System
- Authors: Jihao Zhao, Zhiyuan Ji, Zhaoxin Fan, Hanyu Wang, Simin Niu, Bo Tang, Feiyu Xiong, Zhiyu Li,
- Abstract summary: This paper introduces a dual-metric evaluation method, comprising Boundary Clarity and Chunk Stickiness.<n>We highlight the inherent limitations of traditional and semantic chunking in handling complex contextual nuances.<n>We devise the Mixture-aware Mixture-of-Chunkers (MoC) framework, which consists of a three-stage processing mechanism.
- Score: 11.793639794583498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG), while serving as a viable complement to large language models (LLMs), often overlooks the crucial aspect of text chunking within its pipeline. This paper initially introduces a dual-metric evaluation method, comprising Boundary Clarity and Chunk Stickiness, to enable the direct quantification of chunking quality. Leveraging this assessment method, we highlight the inherent limitations of traditional and semantic chunking in handling complex contextual nuances, thereby substantiating the necessity of integrating LLMs into chunking process. To address the inherent trade-off between computational efficiency and chunking precision in LLM-based approaches, we devise the granularity-aware Mixture-of-Chunkers (MoC) framework, which consists of a three-stage processing mechanism. Notably, our objective is to guide the chunker towards generating a structured list of chunking regular expressions, which are subsequently employed to extract chunks from the original text. Extensive experiments demonstrate that both our proposed metrics and the MoC framework effectively settle challenges of the chunking task, revealing the chunking kernel while enhancing the performance of the RAG system.
Related papers
- RALLRec+: Retrieval Augmented Large Language Model Recommendation with Reasoning [22.495874056980824]
We propose Representation learning and textbfReasoning empowered retrieval-textbfAugmented textbfLarge textbfLanguage model textbfRecommendation (RALLRec+).
arXiv Detail & Related papers (2025-03-26T11:03:34Z) - Efficient Knowledge Feeding to Language Models: A Novel Integrated Encoder-Decoder Architecture [0.0]
ICV recasts in-context learning by using latent embeddings of language models.<n>ICV directly integrates information into the model, enabling it to process this information more effectively.
arXiv Detail & Related papers (2025-02-07T04:24:07Z) - Enhancing Item Tokenization for Generative Recommendation through Self-Improvement [67.94240423434944]
Generative recommendation systems are driven by large language models (LLMs)<n>Current item tokenization methods include using text descriptions, numerical strings, or sequences of discrete tokens.<n>We propose a self-improving item tokenization method that allows the LLM to refine its own item tokenizations during training process.
arXiv Detail & Related papers (2024-12-22T21:56:15Z) - Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach.
This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets.
We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z) - Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning.
deploying LLM inference poses challenges due to the high compute and memory requirements.
We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z) - Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor [4.35807211471107]
This work proposes a novel two-stage consistency learning approach for retrieved information compression in retrieval-augmented language models.
The proposed method is empirically validated across multiple datasets, demonstrating notable enhancements in precision and efficiency for question-answering tasks.
arXiv Detail & Related papers (2024-06-04T12:43:23Z) - A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution [29.34028569245905]
We formalize the decision-making process of the baseline ECR system using a Structural Causal Model (SCM)
We develop a rationale-centric counterfactual data augmentation method with LLM-in-the-loop.
Our approach achieves state-of-the-art performance on three popular cross-document ECR benchmarks and demonstrates robustness in out-of-domain scenarios.
arXiv Detail & Related papers (2024-04-02T13:15:07Z) - Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy [46.81745860690336]
Large Language Models (LLMs) have made significant advancements across various tasks, such as question answering, translation, text summarization, and dialogue systems.
This paper presents a generic framework for accelerating the inference process, resulting in a substantial increase in speed and cost reduction.
We conduct extensive experiments to demonstrate the significant improvements achieved by applying our inference acceleration framework.
arXiv Detail & Related papers (2023-12-20T02:55:15Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at
Scale [64.10124092250126]
Unpaired text and audio injection have emerged as dominant methods for improving ASR performance in the absence of a large labeled corpus.
In this work, we compare three state-of-the-art semi-supervised methods encompassing both unpaired text and audio as well as several of their combinations in a controlled setting.
We find that in our setting these methods offer many improvements beyond raw WER, including substantial gains in tail-word WER, decoder computation during inference, and lattice density.
arXiv Detail & Related papers (2023-04-19T18:09:27Z) - CoCoMoT: Conformance Checking of Multi-Perspective Processes via SMT
(Extended Version) [62.96267257163426]
We introduce the CoCoMoT (Computing Conformance Modulo Theories) framework.
First, we show how SAT-based encodings studied in the pure control-flow setting can be lifted to our data-aware case.
Second, we introduce a novel preprocessing technique based on a notion of property-preserving clustering.
arXiv Detail & Related papers (2021-03-18T20:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.