Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment
- URL: http://arxiv.org/abs/2510.19384v1
- Date: Wed, 22 Oct 2025 09:01:17 GMT
- Title: Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment
- Authors: Yuhang Liu, Minglai Shao, Zengyi Wo, Yunlong Chu, Bing Hao, Shengzhong Liu, Ruijie Wang, Jianxin Li,
- Abstract summary: Pre-training Graph Foundation Models (GFMs) on text-attributed graphs (TAGs) is central to web-scale applications such as search, recommendation, and knowledge discovery.<n> existing CLIP-style graph-text amplifies face two key limitations: they assume strict one-to-one correspondences between nodes and texts, and they rely on static alignment objectives that cannot adapt to varying data quality, making them brittle under noisy supervision.<n>We propose ADAligner, a quality-aware graphtext alignment framework that dynamically adjusts between expressive many-to-many and conservative one-to-one objectives according to supervision quality
- Score: 19.204800655283744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training Graph Foundation Models (GFMs) on text-attributed graphs (TAGs) is central to web-scale applications such as search, recommendation, and knowledge discovery. However, existing CLIP-style graph-text aligners face two key limitations: they assume strict one-to-one correspondences between nodes and texts, overlooking the inherent many-to-many relations in real-world graphs; and they rely on static alignment objectives that cannot adapt to varying data quality, making them brittle under noisy supervision. Together, these limitations expose a core dilemma: embracing expressive many-to-many alignment amplifies noise, while reverting to strict one-to-one strategies sacrifices semantic diversity and fails to handle inherently mismatched pairs. To address these challenges, we propose ADAligner, a dynamic, quality-aware graph-text alignment framework that dynamically adjusts between expressive many-to-many and conservative one-to-one objectives according to supervision quality. ADAligner estimates batch-level alignment reliability in real time and adapts its optimization accordingly, promoting soft, subgraph-level many-to-many alignment when supervision is clean, while emphasizing reliable one-to-one alignment by dynamically filtering low-confidence pairs under noise. Theoretically, we prove that this dynamic mechanism forms a stable negative feedback process, ensuring convergence and robustness. Comprehensive experiments on nine diverse TAG datasets demonstrate that ADAligner consistently outperforms prior graph-text aligners on zero-/few-shot node classification, link prediction and cross-modal retrieval tasks. It maintains strong robustness under noisy supervision and accelerates pre-training by approximately 2 to 3 times compared to multimodal baselines, establishing a scalable and reliable foundation for graph-text representation learning in real-world web environments.
Related papers
- CGSTA: Cross-Scale Graph Contrast with Stability-Aware Alignment for Multivariate Time-Series Anomaly Detection [6.953121860419416]
We propose the CGSTA framework for time-series anomaly detection.<n>DLGC forms local, regional, and global views of variable relations for each sliding window.<n>SAA maintains a per-scale stable reference and guides the current window's fast-changing graphs toward it to suppress noise.
arXiv Detail & Related papers (2026-02-24T01:58:39Z) - OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z) - DREAM: Dual-Standard Semantic Homogeneity with Dynamic Optimization for Graph Learning with Label Noise [53.55187452152358]
This paper proposes a novel method, Dual-Standard Semantic Homogeneity with Dynamic Optimization (DREAM) for reliable, relation-informed optimization on graphs with label noise.<n>Specifically, we design a relation-informed dynamic optimization framework that iteratively reevaluates the reliability of each labeled node in the graph.
arXiv Detail & Related papers (2026-01-24T12:54:18Z) - Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition [55.189113121465816]
We propose a novel correlation adaptation prompt network (CAPNET) for long-tailed multi-label visual recognition.<n>CAPNET explicitly models correlations from CLIP's textual encoder.<n>It improves generalization through test-time ensembling and realigns visual-textual modalities.
arXiv Detail & Related papers (2025-11-25T18:57:28Z) - TAWRMAC: A Novel Dynamic Graph Representation Learning Method [1.7230595437884768]
We introduce TAWRMAC, a novel framework that integrates Temporal Anonymous Walks with Restart, Memory Augmentation, and Neighbor Co-occurrence embedding.<n>By providing stable, generalizable, and context-aware embeddings, TAWRMAC advances the state of the art in continuous-time dynamic graph learning.
arXiv Detail & Related papers (2025-10-10T21:38:07Z) - GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning [50.40400074353263]
Graph Neural Networks (GNNs) are powerful tools for precessing relational data but often struggle to generalize to unseen graphs.<n>We introduce textbfGraph textbfIn-context textbfL textbfTransformer (GILT), a framework built on an LLM-free and tuning-free architecture.
arXiv Detail & Related papers (2025-10-06T08:09:15Z) - Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss [52.28880405119483]
Unsupervised online 3D instance segmentation is a fundamental yet challenging task.<n>Existing methods, such as UNIT, have made progress in this direction but remain constrained by limited training diversity.<n>We propose a new framework that enriches the training distribution through synthetic point cloud sequence generation.
arXiv Detail & Related papers (2025-09-27T08:53:27Z) - Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication [31.43539830271355]
We propose a novel graph alignment framework that simultaneously enhances node distinctiveness and enforces geometric consistency across latent spaces.<n>Our approach introduces a dual-pass encoder that combines low-pass and high-pass spectral filters to generate embeddings that are both structure-aware and highly discriminative.
arXiv Detail & Related papers (2025-09-11T16:36:16Z) - EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation [50.848374648774374]
MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information.<n>We propose EGRA, which incorporates into the behavior graph an item-item graph built from representations generated by a pretrained MMR model.<n>It also introduces a novel bi-level dynamic alignment weighting mechanism to improve modality-behavior representation alignment.
arXiv Detail & Related papers (2025-08-22T07:47:54Z) - Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation Texts [17.477542644785483]
Cross-lingual entity alignment (EA) enables the integration of multiple knowledge graphs (KGs) across different languages.<n>EA pipeline that jointly performs entity-level and Relation-level Alignment by neighbor triple matching strategy.
arXiv Detail & Related papers (2024-07-22T12:25:48Z) - Interactive Test-Time Adaptation with Reliable Spatial-Temporal Voxels for Multi-Modal Segmentation [56.70910056845503]
Multi-modal test-time adaptation (MM-TTA) adapts models to an unlabeled target domain by leveraging the complementary multi-modal inputs in an online manner.<n>Previous MM-TTA methods for 3D segmentation suffer from two major limitations: 1) unstable frame-wise predictions caused by temporal inconsistency, and 2) consistently incorrect predictions that violate the assumption of reliable modality guidance.<n>This work introduces a comprehensive two-fold framework: Latte++ that better suppresses the unstable frame-wise predictions with more informative geometric correspondences, and Interactive Test-Time Adaptation (ITTA), a flexible add-on to empower effortless human feedback
arXiv Detail & Related papers (2024-03-11T06:56:08Z) - Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation [55.429541407920304]
Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature.
Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes.
We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
arXiv Detail & Related papers (2023-06-16T18:14:23Z) - Progressively Guide to Attend: An Iterative Alignment Framework for
Temporal Sentence Grounding [53.377028000325424]
We propose an Iterative Alignment Network (IA-Net) for temporal sentence grounding task.
We pad multi-modal features with learnable parameters to alleviate the nowhere-to-attend problem of non-matched frame-word pairs.
We also devise a calibration module following each attention module to refine the alignment knowledge.
arXiv Detail & Related papers (2021-09-14T02:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.