Related papers: MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints

Related papers

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling [85.590774707406]
Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs.<n>We introduce UniT, a framework for multimodal test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds.
arXiv Detail & Related papers (2026-02-12T18:59:49Z)
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning [65.200259961515]
We develop a vision-centric agent tuning framework that automatically synthesizes multimodal trajectories.<n>We also introduce Pref-X, a set of 11K automatically generated preference pairs.<n>Across three benchmarks, Agent-X, GTA, and GAIA, MATRIX consistently surpasses both open- and closed-source VLMs.
arXiv Detail & Related papers (2025-10-09T17:59:54Z)
Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity [35.95129874095729]
Text-to-image (T2I) models excel on single-entity prompts but struggle with multi-subject descriptions.<n>We introduce the first theoretical framework with principled optimizable objective for steering sampling dynamics toward multi-subject fidelity.
arXiv Detail & Related papers (2025-10-02T17:59:58Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities [69.26544016976396]
We exploit the redundancy within Mixture-of-Experts (MoEs) as a source of additional capacity for learning a new modality. We preserve the original language generation capabilities by applying low-rank adaptation exclusively to the tokens of the new modality.
arXiv Detail & Related papers (2025-03-28T15:21:24Z)
Explainable AI for Multivariate Time Series Pattern Exploration: Latent Space Visual Analytics with Temporal Fusion Transformer and Variational Autoencoders in Power Grid Event Diagnosis [1.170167705525779]
This paper proposes a novel visual analytics framework that integrates two generative AI models, Temporal Fusion Transformer (TFT) and Variational Autoencoders (VAEs) It reduces complex patterns into lower-dimensional latent spaces and visualizes them in 2D using dimensionality reduction techniques such as PCA, t-SNE, and UMAP with DBSCAN. The framework is demonstrated through a case study on power grid signal data, where it identifies multi-label grid event signatures, including faults and anomalies with diverse root causes.
arXiv Detail & Related papers (2024-12-20T17:41:11Z)
Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification [3.8305310459921587]
Multi-SpaCE balances proximity, sparsity, plausibility, and contiguity in time series data. It consistently achieves perfect validity and delivers superior performance compared to existing methods.
arXiv Detail & Related papers (2024-12-14T09:21:44Z)
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis [0.0]
Multimodal Sentiment Analysis (MSA) leverages multiple data modals to analyze human sentiment. Existing MSA models generally employ cutting-edge multimodal fusion and representation learning-based methods to promote MSA capability. Our proposed GSIFN incorporates two main components to solve these problems: (i) a graph-structured and interlaced-masked multimodal Transformer. It adopts the Interlaced Mask mechanism to construct robust multimodal graph embedding, achieve all-modal-in-one Transformer-based fusion, and greatly reduce the computational overhead.
arXiv Detail & Related papers (2024-08-27T06:44:28Z)
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation [70.22782550540714]
Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW. We introduce a Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW.
arXiv Detail & Related papers (2024-08-07T12:42:09Z)
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models [32.10766568096317]
This paper proposes VoCoT, a multi-step Visually grounded object-centric Chain-of-Thought reasoning framework tailored for inference with LMMs. VoCoT is characterized by two key features: (1) object-centric reasoning paths that revolve around cross-modal shared object-level information, and (2) visually grounded representation of object concepts in a multi-modal interleaved and aligned manner.
arXiv Detail & Related papers (2024-05-27T08:12:00Z)
Deriving Compact QUBO Models via Multilevel Constraint Transformation [0.8192907805418583]
We propose a novel Multilevel Constraint Transformation Scheme (MLCTS) that derives QUBO models with fewer ancillary binary variables. For a proof-of-concept, we compare the performance of two QUBO models for the latter problem on both a general-purpose software-based solver and a hardware-based QUBO solver. The MLCTS-derived models demonstrate significantly better performance for both solvers, in particular, solving up to seven times more instances with the hardware-based approach.
arXiv Detail & Related papers (2024-04-04T17:34:08Z)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer [66.71930982549028]
Vision-Language Transformers (VLTs) have shown great success recently, but are accompanied by heavy computation costs. We propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs.
arXiv Detail & Related papers (2024-03-05T14:13:50Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
Evaluation of large language models for assessing code maintainability [4.2909314120969855]
We investigate the association between the cross-entropy of code generated by ten different models and quality aspects. Our results show that, controlling for the number of logical lines of codes, cross-entropy computed by LLMs is indeed a predictor of maintainability on a class level. While the complexity of LLMs affects the range of cross-entropy, this plays a significant role in predicting maintainability aspects.
arXiv Detail & Related papers (2024-01-23T12:29:42Z)
Efficiently Controlling Multiple Risks with Pareto Testing [34.83506056862348]
We propose a two-stage process which combines multi-objective optimization with multiple hypothesis testing. We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications.
arXiv Detail & Related papers (2022-10-14T15:54:39Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.