Graph Optimal Transport for Cross-Domain Alignment
- URL: http://arxiv.org/abs/2006.14744v3
- Date: Fri, 24 Jul 2020 20:04:49 GMT
- Title: Graph Optimal Transport for Cross-Domain Alignment
- Authors: Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu
- Abstract summary: Cross-domain alignment is fundamental to computer vision and natural language processing.
We propose Graph Optimal Transport (GOT), a principled framework that germinates from recent advances in Optimal Transport (OT)
Experiments show consistent outperformance of GOT over baselines across a wide range of tasks.
- Score: 121.80313648519203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-domain alignment between two sets of entities (e.g., objects in an
image, words in a sentence) is fundamental to both computer vision and natural
language processing. Existing methods mainly focus on designing advanced
attention mechanisms to simulate soft alignment, with no training signals to
explicitly encourage alignment. The learned attention matrices are also dense
and lacks interpretability. We propose Graph Optimal Transport (GOT), a
principled framework that germinates from recent advances in Optimal Transport
(OT). In GOT, cross-domain alignment is formulated as a graph matching problem,
by representing entities into a dynamically-constructed graph. Two types of OT
distances are considered: (i) Wasserstein distance (WD) for node (entity)
matching; and (ii) Gromov-Wasserstein distance (GWD) for edge (structure)
matching. Both WD and GWD can be incorporated into existing neural network
models, effectively acting as a drop-in regularizer. The inferred transport
plan also yields sparse and self-normalized alignment, enhancing the
interpretability of the learned model. Experiments show consistent
outperformance of GOT over baselines across a wide range of tasks, including
image-text retrieval, visual question answering, image captioning, machine
translation, and text summarization.
Related papers
- Combining Optimal Transport and Embedding-Based Approaches for More Expressiveness in Unsupervised Graph Alignment [19.145556156889064]
Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting graph structure and node features.
We propose a principled approach to combine their advantages motivated by theoretical analysis of model expressiveness.
We are the first to guarantee the one-to-one matching constraint by reducing the problem to maximum weight matching.
arXiv Detail & Related papers (2024-06-19T04:57:35Z) - Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport Framework [30.05543844763625]
We propose a novel and robust graph matching method based on an unbalanced hierarchical optimal transport framework.
We make the first attempt to exploit cross-modal alignment in graph matching.
Experiments on various graph matching tasks demonstrate the superiority and robustness of our method compared to state-of-the-art approaches.
arXiv Detail & Related papers (2023-10-18T16:16:53Z) - Text Reading Order in Uncontrolled Conditions by Sparse Graph
Segmentation [71.40119152422295]
We propose a lightweight, scalable and generalizable approach to identify text reading order.
The model is language-agnostic and runs effectively across multi-language datasets.
It is small enough to be deployed on virtually any platform including mobile devices.
arXiv Detail & Related papers (2023-05-04T06:21:00Z) - Robust Attributed Graph Alignment via Joint Structure Learning and
Optimal Transport [26.58964162799207]
We propose SLOTAlign, an unsupervised graph alignment framework that jointly performs Structure Learning and Optimal Transport Alignment.
We incorporate multi-view structure learning to enhance graph representation power and reduce the effect of structure and feature inconsistency inherited across graphs.
The proposed SLOTAlign shows superior performance and strongest robustness over seven unsupervised graph alignment methods and five specialized KG alignment methods.
arXiv Detail & Related papers (2023-01-30T08:41:36Z) - Asymmetric Cross-Scale Alignment for Text-Based Person Search [15.618984100653348]
Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description.
To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment.
We present a transformer-based model to extract multi-scale representations, and perform Asymmetric Cross-Scale Alignment (ACSA) to precisely align the two modalities.
arXiv Detail & Related papers (2022-11-26T08:34:35Z) - Graph Reasoning Transformer for Image Parsing [67.76633142645284]
We propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern.
Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.
arXiv Detail & Related papers (2022-09-20T08:21:37Z) - HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory
Prediction via Scene Encoding [76.9165845362574]
We propose a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges.
For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system.
Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction.
arXiv Detail & Related papers (2022-04-30T07:08:30Z) - RANSAC-Flow: generic two-stage image alignment [53.11926395028508]
We show that a simple unsupervised approach performs surprisingly well across a range of tasks.
Despite its simplicity, our method shows competitive results on a range of tasks and datasets.
arXiv Detail & Related papers (2020-04-03T12:37:58Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.