Related papers: On the Sequence Evaluation based on Stochastic Processes

On the Sequence Evaluation based on Stochastic Processes

URL: http://arxiv.org/abs/2405.17764v2
Date: Sat, 15 Jun 2024 04:08:53 GMT
Title: On the Sequence Evaluation based on Stochastic Processes
Authors: Tianhao Zhang, Zhexiao Lin, Zhecheng Sheng, Chen Jiang, Dongyeop Kang,
Abstract summary: We introduce a likelihood-based training objective for the text encoder and design a more thorough measurement (score) for long text evaluation. The proposed training objective effectively preserves the sequence coherence, while the new score comprehensively captures both temporal and spatial dependencies. Experimental results show superior performance in various sequence evaluation tasks, including global and local discrimination within and between documents of different lengths.
Score: 17.497842325320825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-based training objective for the text encoder and design a more thorough measurement (score) for long text evaluation compared to the previous approach. The proposed training objective effectively preserves the sequence coherence, while the new score comprehensively captures both temporal and spatial dependencies. Theoretical properties of our new score show its advantages in sequence evaluation. Experimental results show superior performance in various sequence evaluation tasks, including global and local discrimination within and between documents of different lengths. We also demonstrate the encoder achieves competitive results on discriminating human and AI written text.

Related papers

Incremental Sequence Classification with Temporal Consistency [9.65650774513798]
We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed.<n>We leverage a temporal-consistency condition that successive predictions should satisfy to develop a novel loss function for training incremental sequence classifiers.<n>Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.
arXiv Detail & Related papers (2025-05-22T11:37:53Z)
Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance. Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws. We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation [34.55224347308013]
Traditional supervised fine-tuning (SFT) strategies for sequence-to-sequence tasks often train models to directly generate the target output. We introduce a task-agnostic framework that enables models to generate intermediate "upwarm" sequences. We show that our approach outperforms traditional SFT methods, and offers a scalable and flexible solution for sequence-to-sequence tasks.
arXiv Detail & Related papers (2025-02-17T20:23:42Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z)
Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations. Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z)
State Sequences Prediction via Fourier Transform for Representation Learning [111.82376793413746]
We propose State Sequences Prediction via Fourier Transform (SPF), a novel method for learning expressive representations efficiently. We theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
arXiv Detail & Related papers (2023-10-24T14:47:02Z)
Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable. We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z)
Learning Temporal Point Processes for Efficient Retrieval of Continuous Time Event Sequences [24.963828650935913]
We propose NEUROSEQRET which learns to retrieve and rank a relevant set of continuous-time event sequences for a given query sequence. We develop two variants of the relevance model which offer a tradeoff between accuracy and efficiency. Our experiments with several datasets show the significant accuracy boost of NEUROSEQRET beyond several baselines.
arXiv Detail & Related papers (2022-02-17T11:16:31Z)
Contrastively Disentangled Sequential Variational Autoencoder [20.75922928324671]
We propose a novel sequence representation learning method, named Contrastively Disentangled Sequential Variational Autoencoder (C-DSVAE) We use a novel evidence lower bound which maximizes the mutual information between the input and the latent factors, while penalizes the mutual information between the static and dynamic factors. Our experiments show that C-DSVAE significantly outperforms the previous state-of-the-art methods on multiple metrics.
arXiv Detail & Related papers (2021-10-22T23:00:32Z)
Interpretable Feature Construction for Time Series Extrinsic Regression [0.028675177318965035]
In some application domains, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER) We suggest an extension of a Bayesian method for robust and interpretable feature construction and selection in the context of TSER. Our approach exploits a relational way to tackle with TSER: (i), we build various and simple representations of the time series which are stored in a relational data scheme, then, (ii), a propositionalisation technique is applied to build interpretable features from secondary tables to "flatten" the data.
arXiv Detail & Related papers (2021-03-15T08:12:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.