Related papers: TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation

TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation

URL: http://arxiv.org/abs/2508.19856v1
Date: Wed, 27 Aug 2025 13:16:31 GMT
Title: TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation
Authors: Shashi Kumar, Srikanth Madikeri, Esaú Villatoro-Tello, Sergio Burdisso, Pradeep Rangappa, Andrés Carofilis, Petr Motlicek, Karthik Pandia, Shankar Venkatesan, Kadri Hacioğlu, Andreas Stolcke,
Abstract summary: TokenVerse++ introduces learnable vectors in the acoustic embedding space of the XLSR-Transducer ASR model for dynamic task activation.<n>We demonstrate this by successfully integrating a dataset with partial labels, specifically for ASR and an additional task, language identification.<n> TokenVerse++ results on par with or exceeding TokenVerse across multiple tasks, establishing it as a more practical multitask alternative without sacrificing ASR performance.
Score: 13.676666039727904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Token-based multitasking frameworks like TokenVerse require all training utterances to have labels for all tasks, hindering their ability to leverage partially annotated datasets and scale effectively. We propose TokenVerse++, which introduces learnable vectors in the acoustic embedding space of the XLSR-Transducer ASR model for dynamic task activation. This core mechanism enables training with utterances labeled for only a subset of tasks, a key advantage over TokenVerse. We demonstrate this by successfully integrating a dataset with partial labels, specifically for ASR and an additional task, language identification, improving overall performance. TokenVerse++ achieves results on par with or exceeding TokenVerse across multiple tasks, establishing it as a more practical multitask alternative without sacrificing ASR performance.

Related papers

From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning [55.90498988440303]
In-context learning (ICL) enables large language models to perform novel tasks without parameter updates.<n>We propose a cost-efficient two-stage pipeline that reduces reliance on language models for data labeling.<n> Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.
arXiv Detail & Related papers (2025-10-28T15:37:51Z)
ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models [59.47738955960352]
ToDRE is a two-stage and training-free token compression framework.<n>It achieves superior performance by pruning tokens based on token Diversity and token-task RElevance.
arXiv Detail & Related papers (2025-05-24T15:47:49Z)
Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions [44.78165979575075]
We propose a novel approach to optimize a set of compact learnable hierarchical task tokens.<n>The global task tokens are designed for effective cross-task feature interactions in a global context.<n>A group of fine-grained task-specific spatial tokens for each task is learned from the corresponding global task tokens.<n>The learned global and local fine-grained task tokens are further used to discover pseudo task-specific dense labels at different levels of granularity.
arXiv Detail & Related papers (2024-11-27T23:53:27Z)
Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z)
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR [3.717584661565119]
TokenVerse is a single Transducer-based model designed to handle multiple tasks. It is achieved by integrating task-specific tokens into the reference text during ASR model training. Our experiments show that the proposed method improves ASR by up to 7.7% in relative WER.
arXiv Detail & Related papers (2024-07-05T11:54:38Z)
Joint-Task Regularization for Partially Labeled Multi-Task Learning [30.823282043129552]
Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. We propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space.
arXiv Detail & Related papers (2024-04-02T14:16:59Z)
Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching [26.26540176172197]
We propose Visual Token Matching (VTM) as a universal few-shot learner for arbitrary dense prediction tasks. VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks.
arXiv Detail & Related papers (2023-03-27T07:58:42Z)
PartAL: Efficient Partial Active Learning in Multi-Task Visual Settings [57.08386016411536]
We show that it is more effective to select not only the images to be annotated but also a subset of tasks for which to provide annotations at each Active Learning (AL) We demonstrate the effectiveness of our approach on several popular multi-task datasets.
arXiv Detail & Related papers (2022-11-21T15:08:35Z)
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks. Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations. The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z)
Structured Prediction as Translation between Augmented Natural Languages [109.50236248762877]
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks. Instead of tackling the problem by training task-specific discriminatives, we frame it as a translation task between augmented natural languages. Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction.
arXiv Detail & Related papers (2021-01-14T18:32:21Z)
Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models. Self-training serves as an effective mechanism to learn from large amounts of unlabeled data. meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.