Reasoning about Actions over Visual and Linguistic Modalities: A Survey
- URL: http://arxiv.org/abs/2207.07568v1
- Date: Fri, 15 Jul 2022 16:15:46 GMT
- Title: Reasoning about Actions over Visual and Linguistic Modalities: A Survey
- Authors: Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang and
Chitta Baral
- Abstract summary: 'Reasoning about Actions & Change' (RAC) has been widely studied in the Knowledge Representation community.
This paper surveys existing tasks, benchmark datasets, various techniques and models, and their respective performance concerning advancements in RAC in the vision and language domain.
- Score: 39.870773512848096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 'Actions' play a vital role in how humans interact with the world and enable
them to achieve desired goals. As a result, most common sense (CS) knowledge
for humans revolves around actions. While 'Reasoning about Actions & Change'
(RAC) has been widely studied in the Knowledge Representation community, it has
recently piqued the interest of NLP and computer vision researchers. This paper
surveys existing tasks, benchmark datasets, various techniques and models, and
their respective performance concerning advancements in RAC in the vision and
language domain. Towards the end, we summarize our key takeaways, discuss the
present challenges facing this research area, and outline potential directions
for future research.
Related papers
- The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends [64.99423243200296]
Conversation Analysis (CA) strives to uncover and analyze critical information from conversation data.
In this paper, we perform a thorough review and systematize CA task to summarize the existing related work.
We derive four key steps of CA from conversation scene reconstruction, to in-depth attribution analysis, and then to performing targeted training, finally generating conversations.
arXiv Detail & Related papers (2024-09-21T16:52:43Z) - Transfer Learning in Human Activity Recognition: A Survey [0.13029741239874087]
Sensor-based human activity recognition (HAR) has been an active research area, owing to its applications in smart environments, assisted living, fitness, healthcare, etc.
Recently, deep learning based end-to-end training has resulted in state-of-the-art performance in domains such as computer vision and natural language.
We focus on these transfer learning methods in the application domains of smart home and wearables-based HAR.
arXiv Detail & Related papers (2024-01-18T18:12:35Z) - Group Activity Recognition in Computer Vision: A Comprehensive Review,
Challenges, and Future Perspectives [0.0]
Group activity recognition is a hot topic in computer vision.
Recognizing activities through group relationships plays a vital role in group activity recognition.
This work examines the progress in technology for recognizing group activities.
arXiv Detail & Related papers (2023-07-25T14:44:41Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Learning Action-Effect Dynamics from Pairs of Scene-graphs [50.72283841720014]
We propose a novel method that leverages scene-graph representation of images to reason about the effects of actions described in natural language.
Our proposed approach is effective in terms of performance, data efficiency, and generalization capability compared to existing models.
arXiv Detail & Related papers (2022-12-07T03:36:37Z) - Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
Directions [23.389491536958772]
Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal.
VLN receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities.
This paper serves as a thorough reference for the VLN research community.
arXiv Detail & Related papers (2022-03-22T16:58:10Z) - Multimodal Research in Vision and Language: A Review of Current and
Emerging Trends [41.07256031348454]
We present a detailed overview of the latest trends in research pertaining to visual and language modalities.
We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation.
We shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems.
arXiv Detail & Related papers (2020-10-19T13:55:10Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.