Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised
Referring Expression Grounding
- URL: http://arxiv.org/abs/2207.08386v1
- Date: Mon, 18 Jul 2022 05:30:45 GMT
- Title: Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised
Referring Expression Grounding
- Authors: Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian
and Qingming Huang
- Abstract summary: Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression.
We design an entity-enhanced adaptive reconstruction network (EARN)
EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction.
- Score: 214.8003571700285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised Referring Expression Grounding (REG) aims to ground a
particular target in an image described by a language expression while lacking
the correspondence between target and expression. Two main problems exist in
weakly supervised REG. First, the lack of region-level annotations introduces
ambiguities between proposals and queries. Second, most previous weakly
supervised REG methods ignore the discriminative location and context of the
referent, causing difficulties in distinguishing the target from other
same-category objects. To address the above challenges, we design an
entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN
includes three modules: entity enhancement, adaptive grounding, and
collaborative reconstruction. In entity enhancement, we calculate semantic
similarity as supervision to select the candidate proposals. Adaptive grounding
calculates the ranking score of candidate proposals upon subject, location and
context with hierarchical attention. Collaborative reconstruction measures the
ranking result from three perspectives: adaptive reconstruction, language
reconstruction and attribute classification. The adaptive mechanism helps to
alleviate the variance of different referring expressions. Experiments on five
datasets show EARN outperforms existing state-of-the-art methods. Qualitative
results demonstrate that the proposed EARN can better handle the situation
where multiple objects of a particular category are situated together.
Related papers
- Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension [46.07415235144545]
We address the challenging task of Generalized Referring Expression (GREC)
Existing REC methods face challenges in handling the complex cases encountered in GREC.
We propose a Hierarchical Alignment-enhanced Adaptive Grounding Network (HieA2G)
arXiv Detail & Related papers (2025-01-02T18:57:59Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language
Navigation [23.94546957057613]
Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN)
We propose a novel Grounded Entity-Landmark Adaptive (GELA) pre-training paradigm for VLN tasks.
arXiv Detail & Related papers (2023-08-24T06:25:20Z) - A Cluster-based Approach for Improving Isotropy in Contextual Embedding
Space [18.490856440975996]
The representation degeneration problem in Contextual Word Representations (CWRs) hurts the expressiveness of the embedding space.
We propose a local cluster-based method to address the degeneration issue in contextual embedding spaces.
We show that removing dominant directions of verb representations can transform the space to better suit semantic applications.
arXiv Detail & Related papers (2021-06-02T14:26:37Z) - I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage
Object Detectors [64.93963042395976]
Implicit Instance-Invariant Network (I3Net) is tailored for adapting one-stage detectors.
I3Net implicitly learns instance-invariant features via exploiting the natural characteristics of deep features in different layers.
Experiments reveal that I3Net exceeds the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2021-03-25T11:14:36Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Contextual-Relation Consistent Domain Adaptation for Semantic
Segmentation [44.19436340246248]
This paper presents an innovative local contextual-relation consistent domain adaptation technique.
It aims to achieve local-level consistencies during the global-level alignment.
Experiments demonstrate its superior segmentation performance as compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-07-05T19:00:46Z) - Harmonizing Transferability and Discriminability for Adapting Object
Detectors [48.78231850215302]
We propose a Hierarchical Transferability Network (HTCN) that calibrates the transferability of feature representations for harmonizing discriminability.
Experimental results show that HTCN significantly outperforms the state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2020-03-13T13:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.