A Dynamic Data Driven Approach for Explainable Scene Understanding
- URL: http://arxiv.org/abs/2206.09089v1
- Date: Sat, 18 Jun 2022 02:41:51 GMT
- Title: A Dynamic Data Driven Approach for Explainable Scene Understanding
- Authors: Zachary A Daniels and Dimitris Metaxas
- Abstract summary: Scene-understanding is an important topic in the area of Computer Vision.
We consider the active explanation-driven understanding and classification of scenes.
Our framework is entitled ACUMEN: Active Classification and Understanding Method by Explanation-driven Networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene-understanding is an important topic in the area of Computer Vision, and
illustrates computational challenges with applications to a wide range of
domains including remote sensing, surveillance, smart agriculture, robotics,
autonomous driving, and smart cities. We consider the active explanation-driven
understanding and classification of scenes. Suppose that an agent utilizing one
or more sensors is placed in an unknown environment, and based on its sensory
input, the agent needs to assign some label to the perceived scene. The agent
can adjust its sensor(s) to capture additional details about the scene, but
there is a cost associated with sensor manipulation, and as such, it is
important for the agent to understand the scene in a fast and efficient manner.
It is also important that the agent understand not only the global state of a
scene (e.g., the category of the scene or the major events taking place in the
scene) but also the characteristics/properties of the scene that support
decisions and predictions made about the global state of the scene. Finally,
when the agent encounters an unknown scene category, it must be capable of
refusing to assign a label to the scene, requesting aid from a human, and
updating its underlying knowledge base and machine learning models based on
feedback provided by the human. We introduce a dynamic data driven framework
for the active explanation-driven classification of scenes. Our framework is
entitled ACUMEN: Active Classification and Understanding Method by
Explanation-driven Networks. To demonstrate the utility of the proposed ACUMEN
approach and show how it can be adapted to a domain-specific application, we
focus on an example case study involving the classification of indoor scenes
using an active robotic agent with vision-based sensors, i.e., an
electro-optical camera.
Related papers
- Interpretable End-to-End Driving Model for Implicit Scene Understanding [3.4248756007722987]
We propose an end-to-end Interpretable Implicit Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional scene features.
Our approach achieves the new state-of-the-art and is able to obtain scene features that embody richer scene information relevant to driving.
arXiv Detail & Related papers (2023-08-02T14:43:08Z) - Object-Centric Scene Representations using Active Inference [4.298360054690217]
Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment.
We propose a novel approach for scene understanding, leveraging a hierarchical object-centric generative model that enables an agent to infer object category.
For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint.
arXiv Detail & Related papers (2023-02-07T06:45:19Z) - Embodied Agents for Efficient Exploration and Smart Scene Description [47.82947878753809]
We tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment.
We propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images.
Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions.
arXiv Detail & Related papers (2023-01-17T19:28:01Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer
Learning for Telepresence Robotics [124.08684545010664]
Scene graph generation from images is a task of great interest to applications such as robotics.
We propose an initial approximation to a framework called Ontology-Guided Scene Graph Generation (OG-SGG)
arXiv Detail & Related papers (2022-02-21T13:23:15Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - An Image-based Approach of Task-driven Driving Scene Categorization [7.291979964739049]
This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
arXiv Detail & Related papers (2021-03-10T08:23:36Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z) - Towards Embodied Scene Description [36.17224570332247]
Embodiment is an important characteristic for all intelligent agents (creatures and robots)
We propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks.
A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities.
arXiv Detail & Related papers (2020-04-30T08:50:25Z) - SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A
Learnable Scene Descriptor [51.298760338410624]
We propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information.
The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene.
We also design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label.
arXiv Detail & Related papers (2020-01-24T16:53:30Z) - Contextual Sense Making by Fusing Scene Classification, Detections, and
Events in Full Motion Video [0.7348448478819135]
We aim to address the needs of human analysts to consume and exploit data given aerial FMV.
We have divided the problem into three tasks: (1) Context awareness, (2) object cataloging, and (3) event detection.
We have applied our methods on data from different sensors at different resolutions in a variety of geographical areas.
arXiv Detail & Related papers (2020-01-16T18:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.