Related papers: Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Related papers

Joint Optimization of Model Partitioning and Resource Allocation for Anti-Jamming Collaborative Inference Systems [52.842088497389746]
This letter focuses on an anti-jamming collaborative inference system in the presence of a malicious jammer.<n>We first analyze the effects of jamming and DNN partitioning on inference accuracy via data regression.<n>We propose an efficient alternating optimization-based algorithm, which decomposes the problem into three subproblems.
arXiv Detail & Related papers (2026-03-03T03:52:52Z)
SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation [68.00727783181289]
A core objective in recommender systems is to accurately model the distribution of user preferences over items to enable personalized recommendations.<n>We observe that existing methods inevitably introduce systematic bias when estimating item-level preference distributions.<n>We propose textbfSimply textbfGenerative textbfRecommendation (textbfSimGR), a framework that directly models item-level preference distributions in a shared latent space.
arXiv Detail & Related papers (2026-02-08T07:26:52Z)
Language Ranker: A Lightweight Ranking framework for LLM Decoding [70.01564145836129]
This paper conceptualizes the decoding process as analogous to the ranking stage in recommendation pipelines.<n>Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses.<n> Experiments show that Language Ranker achieves performance comparable to large-scale reward models, while requiring only 0.5M additional parameters.
arXiv Detail & Related papers (2025-10-23T17:56:46Z)
Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation [13.949286462892212]
Simultaneous speech-to-text translation (SimulST) systems have to balance translation quality with latency.<n>Existing metrics often produce inconsistent or misleading results.<n>We present the first comprehensive analysis of SimulST latency metrics across language pairs, systems, and both short- and long-form regimes.
arXiv Detail & Related papers (2025-09-22T04:21:19Z)
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment [49.86376148975563]
Large language models (LLMs) have revolutionized natural language processing through their capabilities in understanding and executing diverse tasks.<n> supervised fine-tuning, particularly in Retrieval-Augmented Generation (RAG) scenarios, often leads to catastrophic forgetting.<n>We propose SelfAug, a self-distribution alignment method that aligns input sequence logits to preserve the model's semantic distribution.
arXiv Detail & Related papers (2025-09-04T06:50:47Z)
Inference Scaling for Bridging Retrieval and Augmented Generation [47.091086803980765]
Retrieval-augmented generation (RAG) has emerged as a popular approach to steering the output of a large language model (LLM)<n>We show such bias can be mitigated, from inference scaling, aggregating inference calls from the permuted order of retrieved contexts.<n>We showcase the effectiveness of MOI on diverse RAG tasks, improving ROUGE-L on MS MARCO and EM on HotpotQA benchmarks by 7 points.
arXiv Detail & Related papers (2024-12-14T05:06:43Z)
FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space. For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets. For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z)
Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output [49.893971654861424]
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG) We compute a factuality score that can be thresholded to yield a binary decision. Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
arXiv Detail & Related papers (2024-11-01T20:44:59Z)
Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems [18.634960596074027]
We show that transformers can implicitly infer hidden states in order to predict the outputs of a wide family of dynamical systems.<n>Findings suggest that transformer in-context learning provides a flexible, non-parametric alternative for output prediction in dynamical systems.
arXiv Detail & Related papers (2024-10-21T22:18:10Z)
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms. We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths. We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z)
Long-Context Linear System Identification [20.835344826113307]
This paper addresses the problem of long-context linear system identification, where the state $x_t$ of a dynamical system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies.
arXiv Detail & Related papers (2024-10-08T05:15:21Z)
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks. In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z)
Analytical Verification of Performance of Deep Neural Network Based Time-Synchronized Distribution System State Estimation [0.18726646412385334]
Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements.
arXiv Detail & Related papers (2023-11-12T22:01:34Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)
SALAD: Self-Adaptive Lightweight Anomaly Detection for Real-time Recurrent Time Series [1.0437764544103274]
This paper introduces SALAD, which is a Self-Adaptive Lightweight Anomaly Detection approach based on a special type of recurrent neural networks called Long Short-Term Memory (LSTM) Experiments based on two real-world open-source time series datasets demonstrate that SALAD outperforms five other state-of-the-art anomaly detection approaches in terms of detection accuracy. In addition, the results also show that SALAD is lightweight and can be deployed on a commodity machine.
arXiv Detail & Related papers (2021-04-19T10:36:23Z)
Stream-level Latency Evaluation for Simultaneous Machine Translation [5.50178437495268]
Simultaneous machine translation has recently gained traction thanks to significant quality improvements and the advent of streaming applications. This work proposes a stream-level adaptation of the current latency measures based on a re-segmentation approach applied to the output translation.
arXiv Detail & Related papers (2021-04-18T11:16:17Z)
Non-Stationary Delayed Bandits with Intermediate Observations [10.538264213183076]
Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. We introduce the problem of non-stationary, delayed bandits with intermediate observations. We develop an efficient algorithm based on UCRL, and prove sublinear regret guarantees for its performance.
arXiv Detail & Related papers (2020-06-03T09:27:03Z)
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence. Traditional learning process of seq2seq models suffers from two problems. We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.