Related papers: Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction

Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction

URL: http://arxiv.org/abs/2509.12227v2
Date: Mon, 29 Sep 2025 15:42:01 GMT
Title: Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
Authors: Marzieh Ajirak, Oded Bein, Ellen Rose Bowen, Dora Kanellopoulos, Avital Falk, Faith M. Gunning, Nili Solomonov, Logan Grosenick,
Abstract summary: We introduce a routing-based architecture that dynamically selects modality processing pathways and task-sharing strategies on a per-sample basis.<n>Our model defines multiple modality paths, including raw and fused representations of text and numeric features.<n>We evaluate the model on both synthetic data and real-world psychotherapy notes predicting depression and anxiety outcomes.
Score: 4.171905792428217
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a unified framework for adaptive routing in multitask, multimodal prediction settings where data heterogeneity and task interactions vary across samples. Motivated by applications in psychotherapy where structured assessments and unstructured clinician notes coexist with partially missing data and correlated outcomes, we introduce a routing-based architecture that dynamically selects modality processing pathways and task-sharing strategies on a per-sample basis. Our model defines multiple modality paths, including raw and fused representations of text and numeric features and learns to route each input through the most informative expert combination. Task-specific predictions are produced by shared or independent heads depending on the routing decision, and the entire system is trained end-to-end. We evaluate the model on both synthetic data and real-world psychotherapy notes predicting depression and anxiety outcomes. Our experiments show that our method consistently outperforms fixed multitask or single-task baselines, and that the learned routing policy provides interpretable insights into modality relevance and task structure. This addresses critical challenges in personalized healthcare by enabling per-subject adaptive information processing that accounts for data heterogeneity and task correlations. Applied to psychotherapy, this framework could improve mental health outcomes, enhance treatment assignment precision, and increase clinical cost-effectiveness through personalized intervention strategies.

Related papers

Multiple Treatments Causal Effects Estimation with Task Embeddings and Balanced Representation Learning [0.22940141855172036]
It is important to estimate the single treatment effects and the interaction treatment effects that arise from treatment combinations.<n>Previous studies have proposed using independent outcome networks withworks for interactions.<n>We propose a novel deep learning framework that incorporates a task embedding network and a representation learning network with the balancing penalty.
arXiv Detail & Related papers (2025-11-12T23:36:41Z)
Data-Driven Discovery of Feature Groups in Clinical Time Series [11.418915308804822]
Grouping of features based on similarity and relevance to a prediction task has been shown to enhance the performance of deep learning architectures.<n>We propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers.<n>We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data.
arXiv Detail & Related papers (2025-11-11T13:53:39Z)
Federated Learning for Estimating Heterogeneous Treatment Effects [7.967701699385625]
Current machine learning approaches for estimating heterogeneous treatment effects (HTE) require access to substantial amounts of data per treatment. We propose a novel framework for collaborative learning of HTE estimators across institutions via Federated Learning.
arXiv Detail & Related papers (2024-02-27T17:33:23Z)
Multi-Task Learning with Summary Statistics [4.871473117968554]
We propose a flexible multi-task learning framework utilizing summary statistics from various sources. We also present an adaptive parameter selection approach based on a variant of Lepski's method. This work offers a more flexible tool for training related models across various domains, with practical implications in genetic risk prediction.
arXiv Detail & Related papers (2023-07-05T15:55:23Z)
Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles. Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z)
Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging [2.611153304251067]
We propose a framework for selective inference to address a common multi-task problem in neuroimaging. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. We demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.
arXiv Detail & Related papers (2022-05-27T20:21:20Z)
Disentangled Counterfactual Recurrent Networks for Treatment Effect Inference over Time [71.30985926640659]
We introduce the Disentangled Counterfactual Recurrent Network (DCRN), a sequence-to-sequence architecture that estimates treatment outcomes over time. With an architecture that is completely inspired by the causal structure of treatment influence over time, we advance forecast accuracy and disease understanding. We demonstrate that DCRN outperforms current state-of-the-art methods in forecasting treatment responses, on both real and simulated data.
arXiv Detail & Related papers (2021-12-07T16:40:28Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
Robust Learning Through Cross-Task Consistency [92.42534246652062]
We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency. We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs.
arXiv Detail & Related papers (2020-06-07T09:24:33Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication. We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance. We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.