Related papers: Adaptive Endpointing with Deep Contextual Multi-armed Bandits

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

URL: http://arxiv.org/abs/2303.13407v1
Date: Thu, 23 Mar 2023 16:28:26 GMT
Title: Adaptive Endpointing with Deep Contextual Multi-armed Bandits
Authors: Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh
Abstract summary: We propose an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting. Our method does not require ground truth labels, and only uses online learning from reward signals without requiring annotated labels.
Score: 30.13188582607401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search. Our method does not require ground truth labels, and only uses online learning from reward signals without requiring annotated labels. Specifically, we propose a deep contextual multi-armed bandit-based approach, which combines the representational power of neural networks with the action exploration behavior of Thompson modeling algorithms. We compare our approach to several baselines, and show that our deep bandit models also succeed in reducing early cutoff errors while maintaining low latency.

Related papers

Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience. We propose the first online continuous hyperparameter tuning framework for contextual bandits. We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z)
Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization [53.11458949694947]
We consider an adaptive version of content-dependent online influence problem where seed nodes are sequentially activated based on realtime feedback. Our algorithm maintains a network model estimate and selects seed adaptively, exploring the social network while improving the optimal policy optimistically.
arXiv Detail & Related papers (2022-06-29T18:17:28Z)
Spatial Autoregressive Coding for Graph Neural Recommendation [38.66151035948021]
shallow models and deep Graph Neural Networks (GNNs) fail to adequately exploit neighbor proximity in sampled subgraphs or sequences. In this paper, we propose a novel framework SAC, namely Spatial Autoregressive Coding, to solve the above problems in a unified way. Experimental results on both public recommendation datasets and a real scenario web-scale dataset demonstrate the superiority of SAC compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-19T12:00:01Z)
Optimal Stopping via Randomized Neural Networks [6.677219861416146]
This paper presents the benefits of using randomized neural networks instead of standard basis functions or deep neural networks. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In all cases, our algorithms outperform the state-of-the-art and other relevant machine learning approaches in terms of time.
arXiv Detail & Related papers (2021-04-28T09:47:21Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
Deep-3DAligner: Unsupervised 3D Point Set Registration Network With Optimizable Latent Vector [15.900382629390297]
We propose to develop a novel model that integrates the optimization to learning, aiming to address the technical challenges in 3D registration. In addition to the deep transformation decoding network, our framework introduce an optimizable deep underlineSpatial underlineCorrelation underlineRepresentation.
arXiv Detail & Related papers (2020-09-29T22:44:38Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.