Unbiased Directed Object Attention Graph for Object Navigation
- URL: http://arxiv.org/abs/2204.04421v1
- Date: Sat, 9 Apr 2022 08:13:05 GMT
- Title: Unbiased Directed Object Attention Graph for Object Navigation
- Authors: Ronghao Dang, Zhuofan Shi, Liuyi Wang, Zongtao He, Chengju Liu, Qijun
Chen
- Abstract summary: We propose a directed object attention (DOA) graph to guide the agent in explicitly learning the attention relationships between objects.
To distinguish features in different branches, a concise adaptive branch energy distribution (ABED) method is proposed.
Compared with the state-of-the-art (SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate (SR), success weighted by path length (SPL) and success weighted by action efficiency (SAE)
- Score: 4.037009782513272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object navigation tasks require agents to locate specific objects in unknown
environments based on visual information. Previously, graph convolutions were
used to implicitly explore the relationships between objects. However, due to
differences in visibility among objects, it is easy to generate biases in
object attention. Thus, in this paper, we propose a directed object attention
(DOA) graph to guide the agent in explicitly learning the attention
relationships between objects, thereby reducing the object attention bias. In
particular, we use the DOA graph to perform unbiased adaptive object attention
(UAOA) on the object features and unbiased adaptive image attention (UAIA) on
the raw images, respectively. To distinguish features in different branches, a
concise adaptive branch energy distribution (ABED) method is proposed. We
assess our methods on the AI2-Thor dataset. Compared with the state-of-the-art
(SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate
(SR), success weighted by path length (SPL) and success weighted by action
efficiency (SAE), respectively.
Related papers
- SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - EGTR: Extracting Graph from Transformer for Scene Graph Generation [5.935927309154952]
Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects.
We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder.
We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets.
arXiv Detail & Related papers (2024-04-02T16:20:02Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework.
Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Agent-Centric Relation Graph for Object Visual Navigation [25.097165101483284]
We introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment.
ACRG is a highly effective structure that consists of two relationships, i.e., the horizontal relationship among objects and the distance relationship between the agent and objects.
With the above graphs, the agent can perceive the environment and output navigation actions.
arXiv Detail & Related papers (2021-11-29T10:06:31Z) - Salient Object Ranking with Position-Preserved Attention [44.94722064885407]
We study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency.
We propose the first end-to-end framework of the SOR task and solve it in a multi-task learning fashion.
We also introduce a Position-Preserved Attention (PPA) module tailored for the SOR branch.
arXiv Detail & Related papers (2021-06-09T13:00:05Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - Object-and-Action Aware Model for Visual Language Navigation [70.33142095637515]
Vision-and-Language Navigation (VLN) is unique in that it requires turning relatively general natural-language instructions into robot agent actions.
We propose an Object-and-Action Aware Model (OAAM) that processes these two different forms of natural language based instruction separately.
This enables each process to match object-centered/action-centered instruction to their own counterpart visual perception/action orientation flexibly.
arXiv Detail & Related papers (2020-07-29T06:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.