Related papers: Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Related papers

CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT [9.44589968698834]
CoTZero is an annotation-free paradigm with two components.<n>CoTZero extracts visual primitives and composes them into structured question-reasoning forms.<n>In the cognition-aligned training component, built on the synthesized CoT data, we introduce Cognitively Coherent Verifiable Rewards.
arXiv Detail & Related papers (2026-02-09T07:26:40Z)
Stable Language Guidance for Vision-Language-Action Models [62.80963701282789]
Residual Semantic Steering is a probabilistic framework that disentangles physical affordance from semantic execution.<n> RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.
arXiv Detail & Related papers (2026-01-07T16:16:10Z)
Generative Reasoning Recommendation via LLMs [48.45009951684554]
Large language models (LLMs) face fundamental challenges in functioning as generative reasoning recommendation models (GRRMs)<n>This work explores how to build GRRMs by adapting pre-trained LLMs, which achieves a unified understanding-reasoning-prediction manner for recommendation tasks.<n>We propose GREAM, an end-to-end framework that integrates three components: Collaborative-Semantic Alignment, Reasoning Curriculum Activation, and Sparse-Regularized Group Policy Optimization.
arXiv Detail & Related papers (2025-10-23T17:59:31Z)
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning [93.05917922306196]
Composed Image Retrieval (CIR) aims to find a target image from a reference image and a modification text.<n>CIR-CoT is the first end-to-end retrieval-oriented MLLM designed to integrate explicit Chain-of-Thought (CoT) reasoning.
arXiv Detail & Related papers (2025-10-09T09:41:45Z)
Explaining multimodal LLMs via intra-modal token interactions [55.27436637894534]
Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse vision-language tasks, yet their internal decision-making mechanisms remain insufficiently understood.<n>We propose enhancing interpretability by leveraging intra-modal interaction.
arXiv Detail & Related papers (2025-09-26T14:39:13Z)
Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction [6.478238734128006]
Relation extraction (RE) aims to identify semantic relations between entities in unstructured text.<n>underlineRetrieval underlineOver underlineClassification (ROC) is a novel framework that reformulates multimodal RE as a retrieval task driven by relation semantics.<n>ROC integrates entity type and positional information through a multimodal encoder, expands relation labels into natural language descriptions using a large language model, and aligns entity-relation pairs via semantic similarity-based contrastive learning.
arXiv Detail & Related papers (2025-09-25T13:38:38Z)
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z)
Boosting Neural Language Inference via Cascaded Interactive Reasoning [38.125341836302525]
Natural Language Inference (NLI) focuses on ascertaining the logical relationship between a given premise and hypothesis.<n>This task presents significant challenges due to inherent linguistic features such as diverse phrasing, semantic complexity, and contextual nuances.<n>We introduce the Cascaded Interactive Reasoning Network (CIRN), a novel architecture designed for deeper semantic comprehension in NLI.
arXiv Detail & Related papers (2025-05-10T11:37:15Z)
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [86.21199607040147]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation language models.<n>We introduce Chain-of-Description, a step-by-step visual understanding method, and integrate structured chain-of-thought (CoT) reasoning to support in-depth multimodal reasoning.<n>Extensive experiments demonstrate that SIcog produces next-generation foundation MLLMs with substantially improved multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms.
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration [60.47645731801866]
Large language models (LLMs) are increasingly leveraged as foundational backbones in advanced recommender systems.<n>LLMs are pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone.<n>We propose EAGER-LLM, a decoder-only generative recommendation framework that integrates endogenous and endogenous behavioral and semantic information in a non-intrusive manner.
arXiv Detail & Related papers (2025-02-20T17:01:57Z)
Intrinsic Tensor Field Propagation in Large Language Models: A Novel Approach to Contextual Information Flow [0.0]
Intrinsic Field propagation improves contextual retention, dependency resolution, and inference across various linguistic structures. Experiments conducted on an open-source transformer-based model demonstrate that I provides measurable improvements in contextual retention, dependency resolution, and inference across various linguistic structures.
arXiv Detail & Related papers (2025-01-31T08:32:32Z)
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis [41.29318462528406]
We propose a Disentangled-Language-Focused (DLF) multimodal representation learning framework.<n>It incorporates a feature disentanglement module to separate modality-shared and modality-specific information.<n>A Language-Focused Attractor (LFA) is further developed to strengthen language representation by leveraging complementary modality-specific information.
arXiv Detail & Related papers (2024-12-16T10:03:44Z)
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data. Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph. By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z)
Vision-and-Language Navigation via Causal Learning [13.221880074458227]
Cross-modal causal transformer (GOAT) is a pioneering solution rooted in the paradigm of causal inference. BACL and FACL modules promote unbiased learning by comprehensively mitigating potential spurious correlations. To capture global confounder features, we propose a cross-modal feature pooling module supervised by contrastive learning.
arXiv Detail & Related papers (2024-04-16T02:40:35Z)
How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding [18.97081348819219]
Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the fine-grained phrase-region matching. This paper proposes an Implicit-Enhanced Causal Inference approach to address the challenges of modeling the implicit relations.
arXiv Detail & Related papers (2024-02-29T12:49:48Z)
Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition [4.7938839332508945]
We propose a Prompt-based Logical Semantics Enhancement (PLSE) method for Implicit Discourse Relation Recognition (IDRR) Our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction. Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
arXiv Detail & Related papers (2023-11-01T08:38:08Z)
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z)
Improving Distantly Supervised Relation Extraction by Natural Language Inference [9.181270251524866]
We propose a novel DSRE-NLI framework, which considers both distant supervision from existing knowledge bases and indirect supervision from pretrained language models for other tasks. DSRE-NLI energizes an off-the-shelf natural language inference (NLI) engine with a semi-automatic relation verbalization (SARV) mechanism to provide indirect supervision. With two simple and effective data consolidation strategies, the quality of training data is substantially improved.
arXiv Detail & Related papers (2022-07-31T02:48:34Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations. We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z)
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model. It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.