Every Mistake Counts in Assembly
- URL: http://arxiv.org/abs/2307.16453v1
- Date: Mon, 31 Jul 2023 07:20:31 GMT
- Title: Every Mistake Counts in Assembly
- Authors: Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao
- Abstract summary: We propose a system that can detect ordering mistakes by utilizing a learned knowledge base.
Our framework constructs a knowledge base with spatial and temporal beliefs based on observed mistakes.
We demonstrate experimentally that our inferred spatial and temporal beliefs are capable of identifying incorrect orderings in real-world action sequences.
- Score: 26.903961683742494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One promising use case of AI assistants is to help with complex procedures
like cooking, home repair, and assembly tasks. Can we teach the assistant to
interject after the user makes a mistake? This paper targets the problem of
identifying ordering mistakes in assembly procedures. We propose a system that
can detect ordering mistakes by utilizing a learned knowledge base. Our
framework constructs a knowledge base with spatial and temporal beliefs based
on observed mistakes. Spatial beliefs depict the topological relationship of
the assembling components, while temporal beliefs aggregate prerequisite
actions as ordering constraints. With an episodic memory design, our algorithm
can dynamically update and construct the belief sets as more actions are
observed, all in an online fashion. We demonstrate experimentally that our
inferred spatial and temporal beliefs are capable of identifying incorrect
orderings in real-world action sequences. To construct the spatial beliefs, we
collect a new set of coarse-level action annotations for Assembly101 based on
the positioning of the toy parts. Finally, we demonstrate the superior
performance of our belief inference algorithm in detecting ordering mistakes on
the Assembly101 dataset.
Related papers
- Supervised Representation Learning towards Generalizable Assembly State Recognition [5.852028557154309]
Assembly state recognition facilitates the execution of assembly procedures, offering feedback to enhance efficiency and minimize errors.
This paper proposes an approach based on representation learning and the novel intermediate-state informed loss function modification (ISIL)
ISIL leverages unlabeled transitions between states and demonstrates significant improvements in clustering and classification performance.
arXiv Detail & Related papers (2024-08-21T15:24:40Z) - Temporally Grounding Instructional Diagrams in Unconstrained Videos [51.85805768507356]
We study the challenging problem of simultaneously localizing a sequence of queries in instructional diagrams in a video.
Most existing methods focus on grounding one query at a time, ignoring the inherent structures among queries.
We propose composite queries constructed by exhaustively pairing up the visual content features of the step diagrams.
We demonstrate the effectiveness of our approach on the IAW dataset for grounding step diagrams and the YouCook2 benchmark for grounding natural language queries.
arXiv Detail & Related papers (2024-07-16T05:44:30Z) - Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - PREGO: online mistake detection in PRocedural EGOcentric videos [49.72812518471056]
We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos.
PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions.
We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
arXiv Detail & Related papers (2024-04-02T13:27:28Z) - Unsupervised Continual Anomaly Detection with Contrastively-learned
Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD.
The framework equips the UAD with continual learning capability through contrastively-learned prompts.
We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z) - Chain of Thought Imitation with Procedure Cloning [129.62135987416164]
We propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations.
We show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations.
arXiv Detail & Related papers (2022-05-22T13:14:09Z) - Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
Procedural Activities [29.05606394634704]
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.
Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections.
Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses.
arXiv Detail & Related papers (2022-03-28T12:59:50Z) - Continual Learning in Low-rank Orthogonal Subspaces [86.36417214618575]
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the learning experience is finished.
The prior art in CL uses episodic memory, parameter regularization or network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space.
We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference.
arXiv Detail & Related papers (2020-10-22T12:07:43Z) - MS$^2$L: Multi-Task Self-Supervised Learning for Skeleton Based Action
Recognition [36.74293548921099]
We integrate motion prediction, jigsaw puzzle recognition, and contrastive learning to learn skeleton features from different aspects.
Our experiments on the NW-UCLA, NTU RGB+D, and PKUMMD datasets show remarkable performance for action recognition.
arXiv Detail & Related papers (2020-10-12T11:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.