Related papers: Data Augmentation for Instruction Following Policies via Trajectory Segmentation

Data Augmentation for Instruction Following Policies via Trajectory Segmentation

URL: http://arxiv.org/abs/2503.01871v1
Date: Tue, 25 Feb 2025 22:06:01 GMT
Title: Data Augmentation for Instruction Following Policies via Trajectory Segmentation
Authors: Niklas Höpner, Ilaria Tiddi, Herke van Hoof,
Abstract summary: We explore methods to extract labelled segments from trajectories.<n>The goal is to improve the performance of an instruction-following policy trained via imitation learning.
Score: 23.15588842738139
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The scalability of instructable agents in robotics or gaming is often hindered by limited data that pairs instructions with agent trajectories. However, large datasets of unannotated trajectories containing sequences of various agent behaviour (play trajectories) are often available. In a semi-supervised setup, we explore methods to extract labelled segments from play trajectories. The goal is to augment a small annotated dataset of instruction-trajectory pairs to improve the performance of an instruction-following policy trained downstream via imitation learning. Assuming little variation in segment length, recent video segmentation methods can effectively extract labelled segments. To address the constraint of segment length, we propose Play Segmentation (PS), a probabilistic model that finds maximum likely segmentations of extended subsegments, while only being trained on individual instruction segments. Our results in a game environment and a simulated robotic gripper setting underscore the importance of segmentation; randomly sampled segments diminish performance, while incorporating labelled segments from PS improves policy performance to the level of a policy trained on twice the amount of labelled data.

Related papers

Optimizing against Infeasible Inclusions from Data for Semantic Segmentation through Morphology [58.17907376475596]
State-of-the-art semantic segmentation models are typically optimized in a data-driven fashion.<n>InSeIn extracts explicit inclusion constraints that govern spatial class relations from the semantic segmentation training set at hand.<n>It then enforces a morphological yet differentiable loss that penalizes violations of these constraints during training to promote prediction feasibility.
arXiv Detail & Related papers (2024-08-26T22:39:08Z)
Few-shot Multispectral Segmentation with Representations Generated by Reinforcement Learning [0.0]
We propose a novel approach for improving few-shot segmentation performance on multispectral images using reinforcement learning. Our methodology involves training an agent to identify the most informative expressions using a small dataset. Due to the limited length of the expressions, the model receives useful representations without any added risk of overfitting.
arXiv Detail & Related papers (2023-11-20T15:04:16Z)
Tracking Anything with Decoupled Video Segmentation [87.07258378407289]
We develop a decoupled video segmentation approach (DEVA) It is composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks.
arXiv Detail & Related papers (2023-09-07T17:59:41Z)
A Semi-supervised Approach for Activity Recognition from Indoor Trajectory Data [0.822021749810331]
We consider the task of classifying the activities of moving objects from their noisy indoor trajectory data in a collaborative manufacturing environment. We present a semi-supervised machine learning approach that first applies an information theoretic criterion to partition a long trajectory into a set of segments. The segments are then labelled automatically based on a constrained hierarchical clustering method.
arXiv Detail & Related papers (2023-01-09T01:20:50Z)
LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds. Our method co-designs an efficient labeling process with semi/weakly supervised learning. Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z)
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos [17.712793578388126]
We take a closer look at Procedure and Summarization (PSS) and propose three fundamental improvements over current methods. We propose a new segmentation metric based on dynamic programming that takes into account the order of segments. We propose a matching algorithm that constrains the temporal order of segment mapping, and is also differentiable.
arXiv Detail & Related papers (2022-09-30T14:44:19Z)
Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting. This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class. The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z)
Unsupervised Action Segmentation with Self-supervised Feature Learning and Co-occurrence Parsing [32.66011849112014]
temporal action segmentation is a task to classify each frame in the video with an action label. In this work we explore a self-supervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos. We develop CAP, a novel co-occurrence action parsing algorithm that can not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal trajectory of the sub-actions in an accurate and general way.
arXiv Detail & Related papers (2021-05-29T00:29:40Z)
STEP: Segmenting and Tracking Every Pixel [107.23184053133636]
We present a new benchmark: Segmenting and Tracking Every Pixel (STEP) Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains. For measuring the performance, we propose a novel evaluation metric and Tracking Quality (STQ)
arXiv Detail & Related papers (2021-02-23T18:43:02Z)
SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation [88.22349093672975]
We design a weakly supervised point cloud segmentation algorithm that only requires clicking on one point per instance to indicate its location for annotation. With over-segmentation for pre-processing, we extend these location annotations into segments as seg-level labels. We show that our seg-level supervised method (SegGroup) achieves comparable results with the fully annotated point-level supervised methods.
arXiv Detail & Related papers (2020-12-18T13:23:34Z)
Self-supervised Sparse to Dense Motion Segmentation [13.888344214818737]
We propose a self supervised method to learn the densification of sparse motion segmentations from single video frames. We evaluate our method on the well-known motion segmentation datasets FBMS59 and DAVIS16.
arXiv Detail & Related papers (2020-08-18T11:40:18Z)
Weakly Supervised Temporal Action Localization with Segment-Level Labels [140.68096218667162]
Temporal action localization presents a trade-off between test performance and annotation-time cost. We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here. We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
arXiv Detail & Related papers (2020-07-03T10:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.