ExAgt: Expert-guided Augmentation for Representation Learning of Traffic
Scenarios
- URL: http://arxiv.org/abs/2207.08609v2
- Date: Wed, 20 Jul 2022 06:59:45 GMT
- Title: ExAgt: Expert-guided Augmentation for Representation Learning of Traffic
Scenarios
- Authors: Lakshman Balasubramanian, Jonas Wurst, Robin Egolf, Michael Botsch,
Wolfgang Utschick and Ke Deng
- Abstract summary: This paper presents ExAgt, a novel method to include expert knowledge for augmenting traffic scenarios.
The ExAgt method is applied in two state-of-the-art cross-view prediction methods.
Results show that the ExAgt method improves representation learning compared to using only standard augmentations.
- Score: 8.879790406465558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representation learning in recent years has been addressed with
self-supervised learning methods. The input data is augmented into two
distorted views and an encoder learns the representations that are invariant to
distortions -- cross-view prediction. Augmentation is one of the key components
in cross-view self-supervised learning frameworks to learn visual
representations. This paper presents ExAgt, a novel method to include expert
knowledge for augmenting traffic scenarios, to improve the learnt
representations without any human annotation. The expert-guided augmentations
are generated in an automated fashion based on the infrastructure, the
interactions between the EGO and the traffic participants and an ideal sensor
model. The ExAgt method is applied in two state-of-the-art cross-view
prediction methods and the representations learnt are tested in downstream
tasks like classification and clustering. Results show that the ExAgt method
improves representation learning compared to using only standard augmentations
and it provides a better representation space stability. The code is available
at https://github.com/lab176344/ExAgt.
Related papers
- Instructing Prompt-to-Prompt Generation for Zero-Shot Learning [116.33775552866476]
We propose a textbfPrompt-to-textbfPrompt generation methodology (textbfP2P) to distill instructive visual prompts for transferable knowledge discovery.
The core of P2P is to mine semantic-related instruction from prompt-conditioned visual features and text instruction on modal-sharing semantic concepts.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation [48.25619775814776]
This paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation.
DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding.
Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
arXiv Detail & Related papers (2023-09-10T13:28:46Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by
Visual-Semantic Fusion for Egocentric Action Anticipation [33.41226268323332]
Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions in the first-person view.
Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network.
We propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework.
arXiv Detail & Related papers (2023-07-08T06:49:54Z) - Self-Supervised Image Representation Learning: Transcending Masking with
Paired Image Overlay [10.715255809531268]
This paper proposes a novel image augmentation technique, overlaying images, which has not been widely applied in self-supervised learning.
The proposed method is evaluated using contrastive learning, a widely used self-supervised learning method that has shown solid performance in downstream tasks.
arXiv Detail & Related papers (2023-01-23T07:00:04Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - Cross-View-Prediction: Exploring Contrastive Feature for Hyperspectral
Image Classification [9.131465469247608]
This paper presents a self-supervised feature learning method for hyperspectral image classification.
Our method tries to construct two different views of the raw hyperspectral image through a cross-representation learning method.
And then to learn semantically consistent representation over the created views by contrastive learning method.
arXiv Detail & Related papers (2022-03-14T11:07:33Z) - Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual
Representations [9.6221436745451]
We describe how we generate a dataset with over a billion images via large weakly-supervised pretraining.
We leverage Transformers to replace the traditional convolutional backbone.
We show that large-scale Transformer-based pretraining provides significant benefits to industry computer vision applications.
arXiv Detail & Related papers (2021-08-12T17:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.