TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation
- URL: http://arxiv.org/abs/2508.19856v1
- Date: Wed, 27 Aug 2025 13:16:31 GMT
- Title: TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation
- Authors: Shashi Kumar, Srikanth Madikeri, Esaú Villatoro-Tello, Sergio Burdisso, Pradeep Rangappa, Andrés Carofilis, Petr Motlicek, Karthik Pandia, Shankar Venkatesan, Kadri Hacioğlu, Andreas Stolcke,
- Abstract summary: TokenVerse++ introduces learnable vectors in the acoustic embedding space of the XLSR-Transducer ASR model for dynamic task activation.<n>We demonstrate this by successfully integrating a dataset with partial labels, specifically for ASR and an additional task, language identification.<n> TokenVerse++ results on par with or exceeding TokenVerse across multiple tasks, establishing it as a more practical multitask alternative without sacrificing ASR performance.
- Score: 13.676666039727904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Token-based multitasking frameworks like TokenVerse require all training utterances to have labels for all tasks, hindering their ability to leverage partially annotated datasets and scale effectively. We propose TokenVerse++, which introduces learnable vectors in the acoustic embedding space of the XLSR-Transducer ASR model for dynamic task activation. This core mechanism enables training with utterances labeled for only a subset of tasks, a key advantage over TokenVerse. We demonstrate this by successfully integrating a dataset with partial labels, specifically for ASR and an additional task, language identification, improving overall performance. TokenVerse++ achieves results on par with or exceeding TokenVerse across multiple tasks, establishing it as a more practical multitask alternative without sacrificing ASR performance.
Related papers
- From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning [55.90498988440303]
In-context learning (ICL) enables large language models to perform novel tasks without parameter updates.<n>We propose a cost-efficient two-stage pipeline that reduces reliance on language models for data labeling.<n> Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.
arXiv Detail & Related papers (2025-10-28T15:37:51Z) - ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models [59.47738955960352]
ToDRE is a two-stage and training-free token compression framework.<n>It achieves superior performance by pruning tokens based on token Diversity and token-task RElevance.
arXiv Detail & Related papers (2025-05-24T15:47:49Z) - Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions [44.78165979575075]
We propose a novel approach to optimize a set of compact learnable hierarchical task tokens.<n>The global task tokens are designed for effective cross-task feature interactions in a global context.<n>A group of fine-grained task-specific spatial tokens for each task is learned from the corresponding global task tokens.<n>The learned global and local fine-grained task tokens are further used to discover pseudo task-specific dense labels at different levels of granularity.
arXiv Detail & Related papers (2024-11-27T23:53:27Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR [3.717584661565119]
TokenVerse is a single Transducer-based model designed to handle multiple tasks.
It is achieved by integrating task-specific tokens into the reference text during ASR model training.
Our experiments show that the proposed method improves ASR by up to 7.7% in relative WER.
arXiv Detail & Related papers (2024-07-05T11:54:38Z) - Joint-Task Regularization for Partially Labeled Multi-Task Learning [30.823282043129552]
Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets.
We propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space.
arXiv Detail & Related papers (2024-04-02T14:16:59Z) - Universal Few-shot Learning of Dense Prediction Tasks with Visual Token
Matching [26.26540176172197]
We propose Visual Token Matching (VTM) as a universal few-shot learner for arbitrary dense prediction tasks.
VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm.
We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks.
arXiv Detail & Related papers (2023-03-27T07:58:42Z) - PartAL: Efficient Partial Active Learning in Multi-Task Visual Settings [57.08386016411536]
We show that it is more effective to select not only the images to be annotated but also a subset of tasks for which to provide annotations at each Active Learning (AL)
We demonstrate the effectiveness of our approach on several popular multi-task datasets.
arXiv Detail & Related papers (2022-11-21T15:08:35Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z) - Structured Prediction as Translation between Augmented Natural Languages [109.50236248762877]
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks.
Instead of tackling the problem by training task-specific discriminatives, we frame it as a translation task between augmented natural languages.
Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction.
arXiv Detail & Related papers (2021-01-14T18:32:21Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.