Beyond Object Recognition: A New Benchmark towards Object Concept
Learning
- URL: http://arxiv.org/abs/2212.02710v3
- Date: Sun, 20 Aug 2023 15:44:31 GMT
- Title: Beyond Object Recognition: A New Benchmark towards Object Concept
Learning
- Authors: Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Yuan Yao, Siqi Liu, Cewu Lu
- Abstract summary: We propose a challenging Object Concept Learning task to push the envelope of object understanding.
It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances.
By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN)
- Score: 57.94446186103925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding objects is a central building block of artificial intelligence,
especially for embodied AI. Even though object recognition excels with deep
learning, current machines still struggle to learn higher-level knowledge,
e.g., what attributes an object has, and what can we do with an object. In this
work, we propose a challenging Object Concept Learning (OCL) task to push the
envelope of object understanding. It requires machines to reason out object
affordances and simultaneously give the reason: what attributes make an object
possesses these affordances. To support OCL, we build a densely annotated
knowledge base including extensive labels for three levels of object concept
(category, attribute, affordance), and the causal relations of three levels. By
analyzing the causal structure of OCL, we present a baseline, Object Concept
Reasoning Network (OCRN). It leverages causal intervention and concept
instantiation to infer the three levels following their causal relations. In
experiments, OCRN effectively infers the object knowledge while following the
causalities well. Our data and code are available at https://mvig-rhos.com/ocl.
Related papers
- ConceptFactory: Facilitate 3D Object Knowledge Annotation with Object Conceptualization [41.54457853741178]
ConceptFactory aims at promoting machine intelligence to learn comprehensive object knowledge from both vision and robotics aspects.
It consists of two critical parts: ConceptFactory Suite and ConceptFactory Asset.
arXiv Detail & Related papers (2024-11-01T08:50:04Z) - Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? [62.984473889987605]
We present a zero-shot framework for fine-grained visual concept learning by leveraging large language model and Visual Question Answering (VQA) system.
We pose these questions along with the query image to a VQA system and aggregate the answers to determine the presence or absence of an object in the test images.
Our experiments demonstrate comparable performance with existing zero-shot visual classification methods and few-shot concept learning approaches.
arXiv Detail & Related papers (2024-10-17T15:16:10Z) - Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects [42.9186628100765]
We aim to endow machine intelligence with an analogous capability through performing at the conceptual level.
AOT-driven approach yields benefits in three key perspectives.
arXiv Detail & Related papers (2024-09-18T04:53:38Z) - Entity-Centric Reinforcement Learning for Object Manipulation from Pixels [22.104757862869526]
Reinforcement Learning (RL) offers a general approach to learn object manipulation.
In practice, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality.
We propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction.
arXiv Detail & Related papers (2024-04-01T16:25:08Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection [42.2847114428716]
Task driven object detection aims to detect object instances suitable for affording a task in an image.
Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection.
We propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task.
arXiv Detail & Related papers (2023-09-03T06:18:39Z) - Learning by Asking Questions for Knowledge-based Novel Object
Recognition [64.55573343404572]
In real-world object recognition, there are numerous object classes to be recognized. Conventional image recognition based on supervised learning can only recognize object classes that exist in the training data, and thus has limited applicability in the real world.
Inspired by this, we study a framework for acquiring external knowledge through question generation that would help the model instantly recognize novel objects.
Our pipeline consists of two components: the Object-based object recognition, and the Question Generator, which generates knowledge-aware questions to acquire novel knowledge.
arXiv Detail & Related papers (2022-10-12T02:51:58Z) - PartAfford: Part-level Affordance Discovery from 3D Objects [113.91774531972855]
We present a new task of part-level affordance discovery (PartAfford)
Given only the affordance labels per object, the machine is tasked to (i) decompose 3D shapes into parts and (ii) discover how each part corresponds to a certain affordance category.
We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization.
arXiv Detail & Related papers (2022-02-28T02:58:36Z) - Learning Object Permanence from Video [46.34427538905761]
This paper introduces the setup of learning Object Permanence from data.
We explain why this learning problem should be dissected into four components, where objects are visible, (2) occluded, (3) contained by another object and (4) carried by a containing object.
We then present a unified deep architecture that learns to predict object location under these four scenarios.
arXiv Detail & Related papers (2020-03-23T18:03:01Z) - A Review on Intelligent Object Perception Methods Combining
Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision.
Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.