Related papers: Tree-Based Hard Attention with Self-Motivation for Large Language Models

Tree-Based Hard Attention with Self-Motivation for Large Language Models

URL: http://arxiv.org/abs/2402.08874v1
Date: Wed, 14 Feb 2024 00:40:51 GMT
Title: Tree-Based Hard Attention with Self-Motivation for Large Language Models
Authors: Chenxi Lin, Jiayu Ren, Guoxiu He, Zhuoren Jiang, Haiyan Yu, Xiaomin Zhu
Abstract summary: Large language models (LLMs) excel at understanding and generating plain text. They are not specifically tailored to handle hierarchical text structures. We propose a novel framework called Tree-Based Hard Attention with Self-Motivation for Large Language Models.
Score: 7.2677650379517775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While large language models (LLMs) excel at understanding and generating plain text, they are not specifically tailored to handle hierarchical text structures. Extracting the task-desired property from their natural language responses typically necessitates additional processing steps. In fact, selectively comprehending the hierarchical structure of large-scale text is pivotal to understanding its substance. Aligning LLMs more closely with the classification or regression values of specific task through prompting also remains challenging. To this end, we propose a novel framework called Tree-Based Hard Attention with Self-Motivation for Large Language Models (TEAROOM). TEAROOM incorporates a tree-based hard attention mechanism for LLMs to process hierarchically structured text inputs. By leveraging prompting, it enables a frozen LLM to selectively focus on relevant leaves in relation to the root, generating a tailored symbolic representation of their relationship. Moreover, TEAROOM comprises a self-motivation strategy for another LLM equipped with a trainable adapter and a linear layer. The selected symbolic outcomes are integrated into another prompt, along with the predictive value of the task. We iteratively feed output values back into the prompt, enabling the trainable LLM to progressively approximate the golden truth. TEAROOM outperforms existing state-of-the-art methods in experimental evaluations across three benchmark datasets, showing its effectiveness in estimating task-specific properties. Through comprehensive experiments and analysis, we have validated the ability of TEAROOM to gradually approach the underlying golden truth through multiple inferences.

Related papers

StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation [8.251302684712773]
StructText is an end-to-end framework for automatically generating high-fidelity benchmarks for key-value extraction from text.<n>We evaluate the proposed method on 71,539 examples across 49 documents.
arXiv Detail & Related papers (2025-07-28T21:20:44Z)
SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation [50.277959544420455]
SAFT is a structure-aware fine-tuning approach that injects graph topology into pretrained language models.<n>We compute direction-sensitive positional encodings from the magnetic Laplacian of transformed AMRs.<n> SAFT sets a new state-of-the-art on AMR 3.0 with a 3.5 BLEU improvement over baselines.
arXiv Detail & Related papers (2025-07-15T18:12:57Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs [12.878608250420832]
We propose textitgraph of records (textbfGoR) to enhance RAG for long-context global summarization. Inspired by the textitretrieve-then-generate paradigm of RAG, we construct a graph by establishing an edge between the retrieved text chunks and the corresponding LLM-generated response. To further uncover the intricate correlations between them, GoR features a textitgraph neural network and an elaborately designed textitBERTScore-based objective for self-supervised model training.
arXiv Detail & Related papers (2024-10-14T18:34:29Z)
TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models [14.411646409316624]
We introduce textbfHierarchical textbfText-textbfFree textbfAlignment (textbfTS-HTFA), a novel method for time-series forecasting. We replace paired text data with adaptive virtual text based on QR decomposition word embeddings and learnable prompt. Experiments on multiple time-series benchmarks demonstrate that HTFA achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-23T12:57:24Z)
Adaptable and Reliable Text Classification using Large Language Models [7.962669028039958]
This paper introduces an adaptable and reliable text classification paradigm, which leverages Large Language Models (LLMs) We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets. It is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies.
arXiv Detail & Related papers (2024-05-17T04:05:05Z)
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple yet effective black-box zero-shot detection approach based on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods.
arXiv Detail & Related papers (2024-05-07T12:57:01Z)
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation. Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process. AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z)
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training. Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models. This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z)
Token Prediction as Implicit Classification to Identify LLM-Generated Text [37.89852204279844]
This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments.
arXiv Detail & Related papers (2023-11-15T06:33:52Z)
Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation [35.33340453046864]
Chain-of-Thought Attribute Manipulation (CoTAM) is a novel approach that generates new data from existing examples. We leverage the chain-of-thought prompting to directly edit the text in three steps, (1) attribute decomposition, (2) manipulation proposal, and (3) sentence reconstruction.
arXiv Detail & Related papers (2023-07-14T00:10:03Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Evaluating Factual Consistency of Texts with Semantic Role Labeling [3.1776833268555134]
We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind. A final factuality score is computed by an adjustable scoring mechanism. Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T17:59:42Z)
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance [70.08635216710967]
X-Mesh is a text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module. We introduce a new standard text-mesh benchmark, MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons.
arXiv Detail & Related papers (2023-03-28T06:45:31Z)
M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning) It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario. Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.