Sequential Decision-Making for Active Object Detection from Hand
- URL: http://arxiv.org/abs/2110.11524v1
- Date: Thu, 21 Oct 2021 23:40:45 GMT
- Title: Sequential Decision-Making for Active Object Detection from Hand
- Authors: Qichen Fu, Xingyu Liu, Kris M. Kitani
- Abstract summary: Key component of understanding hand-object interactions is the ability to identify the active object.
We set up our active object detection method as a sequential decision-making process conditioned on the location and appearance of the hands.
Key innovation of our approach is the design of the active object detection policy that uses an internal representation called the Box Field.
- Score: 43.839322860501596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key component of understanding hand-object interactions is the ability to
identify the active object -- the object that is being manipulated by the human
hand -- despite the occlusion induced by hand-object interactions. Based on the
observation that hand appearance is a strong indicator of the location and size
of the active object, we set up our active object detection method as a
sequential decision-making process that is conditioned on the location and
appearance of the hands. The key innovation of our approach is the design of
the active object detection policy that uses an internal representation called
the Relational Box Field, which allows for every pixel to regress an improved
location of an active object bounding box, essentially giving every pixel the
ability to vote for a better bounding box location. The policy is trained using
a hybrid imitation learning and reinforcement learning approach, and at test
time, the policy is used repeatedly to refine the bounding box location of the
active object. We perform experiments on two large-scale datasets: 100DOH and
MECCANO, improving AP50 performance by 8% and 30%, respectively, over the state
of the art.
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition [21.655278000690686]
We propose an end-to-end object-centric action recognition framework.
It simultaneously performs Detection And Interaction Reasoning in one stage.
We conduct experiments on two datasets, Something-Else and Ikea-Assembly.
arXiv Detail & Related papers (2024-04-18T05:06:12Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Object-Centric Scene Representations using Active Inference [4.298360054690217]
Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment.
We propose a novel approach for scene understanding, leveraging a hierarchical object-centric generative model that enables an agent to infer object category.
For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint.
arXiv Detail & Related papers (2023-02-07T06:45:19Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.