Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric
Approach
- URL: http://arxiv.org/abs/2206.09597v2
- Date: Wed, 22 Jun 2022 13:07:41 GMT
- Title: Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric
Approach
- Authors: Shiwei Wu, Weidong He, Tong Xu, Hao Wang, Enhong Chen
- Abstract summary: Affordance-centric Question-driven Task Completion for Egocentric Assistant(AQTC) is a novel task which helps AI assistant learn from instructional videos and scripts and guide the user step-by-step.
We deal with the AQTC via a two-stage Function-centric approach, which consists of Question2Function Module to ground the question with the related function and Function2Answer Module to predict the action based on the historical steps.
- Score: 51.424201533529114
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Affordance-centric Question-driven Task Completion for Egocentric
Assistant(AQTC) is a novel task which helps AI assistant learn from
instructional videos and scripts and guide the user step-by-step. In this
paper, we deal with the AQTC via a two-stage Function-centric approach, which
consists of Question2Function Module to ground the question with the related
function and Function2Answer Module to predict the action based on the
historical steps. We evaluated several possible solutions in each module and
obtained significant gains compared to the given baselines. Our code is
available at \url{https://github.com/starsholic/LOVEU-CVPR22-AQTC}.
Related papers
- A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step
Inference [51.26551806938455]
Affordance-centric Question-driven Task Completion (AQTC) for Egocentric Assistant introduces a groundbreaking scenario.
We present a solution for enhancing video alignment to improve multi-step inference.
Our method secured the 2nd place in CVPR'2023 AQTC challenge.
arXiv Detail & Related papers (2023-06-26T04:19:33Z) - First Place Solution to the CVPR'2023 AQTC Challenge: A
Function-Interaction Centric Approach with Spatiotemporal Visual-Language
Alignment [15.99008977852437]
Affordance-Centric Question-driven Task Completion (AQTC) has been proposed to acquire from videos to users with comprehensive and systematic instructions.
Existing methods have neglected the necessity of aligning visual and linguistic signals, as well as the crucial interactional information between humans objects.
We propose to combine largescale pre-trained vision- and video-language models, which serve to contribute stable and reliable multimodal data.
arXiv Detail & Related papers (2023-06-23T09:02:25Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - AssistQ: Affordance-centric Question-driven Task Completion for
Egocentric Assistant [6.379158555341729]
We define a new task called Affordance-centric Question-driven Task Completion.
The AI assistant should learn from instructional videos and scripts to guide the user step-by-step.
To support the task, we constructed AssistQ, a new dataset comprising 529 question-answer samples.
arXiv Detail & Related papers (2022-03-08T17:07:09Z) - Attention-based model for predicting question relatedness on Stack
Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically.
ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics.
Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - Fitted Q-Learning for Relational Domains [29.90646258513537]
We develop the first relational fitted Q-learning algorithms by representing the value function and Bellman residuals.
We show how the two steps of Bellman operator; application and projection steps can be performed using a gradient-boosting technique.
arXiv Detail & Related papers (2020-06-10T01:18:47Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.