Related papers: Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

URL: http://arxiv.org/abs/2509.12423v1
Date: Mon, 15 Sep 2025 20:20:30 GMT
Title: Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
Authors: Danielle Cohen, Yoni Halpern, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan, Anatoly Efros,
Abstract summary: This paper introduces a novel approach to understanding user intents from user interaction trajectories.<n>We perform structured interaction summarization, capturing key information from each user action.<n>Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries.
Score: 8.584946920657517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.

Related papers

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs [61.64185573373394]
We propose a training-free framework that uses an MLLM's intrinsic uncertainty as a proactive guidance signal.<n>We introduce a unified mechanism that scores candidate visual inputs by response uncertainty, enabling the model to autonomously focus on the most salient data.<n>Our work validates that harnessing intrinsic uncertainty is a powerful, general strategy for enhancing fine-grained multimodal performance.
arXiv Detail & Related papers (2025-10-01T09:20:51Z)
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs [54.70676039314542]
We present the first systematic study on quantizing diffusion-based language models.<n>We identify the presence of activation outliers, characterized by abnormally large activation values.<n>We implement state-of-the-art PTQ methods and conduct a comprehensive evaluation across multiple task types and model variants.
arXiv Detail & Related papers (2025-08-20T17:59:51Z)
Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors [1.8352113484137629]
A long-standing challenge in developing accurate recommendation models is simulating user behavior, mainly due to the complex nature of user interactions.<n>We propose an approach to extracting robust user representations using a frozen Large Language Models (LLMs) and simulating cost-effective, resource-efficient user agents powered by fine-tuned Small Language Models (SLMs)<n>Our experiments provide compelling empirical evidence of the efficacy of our methods, demonstrating that user agents developed using our approach have the potential to bridge the gap between offline metrics and real-world performance of recommender systems.
arXiv Detail & Related papers (2025-08-18T22:14:57Z)
Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z)
LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks [43.98557963966335]
Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information. Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks. We propose a novel method, Intermediate Features enhanced Generative Model Inversion (IF-GMI), which disassembles the GAN structure and exploits features between intermediate blocks.
arXiv Detail & Related papers (2024-07-18T19:16:22Z)
Robust Latent Representation Tuning for Image-text Classification [9.789498730131607]
We propose a robust latent representation tuning method for large models. Our approach introduces a modality latent translation module to maximize the correlation between modalities, resulting in a robust representation. Within this framework, common semantics are refined during training, and robust performance is achieved even in the absence of one modality.
arXiv Detail & Related papers (2024-06-10T06:29:00Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction [31.02081143697431]
Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and video-surveillance applications. We propose a lightweight attention-based recurrent backbone that acts solely on past observed positions. We employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations.
arXiv Detail & Related papers (2022-04-25T11:12:37Z)
Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents [35.2098736872247]
We propose a self-supervised contrastive learning approach to learn user-agent interactions. We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction. We also propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes.
arXiv Detail & Related papers (2020-10-21T18:10:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.