Towards Explainable, Safe Autonomous Driving with Language Embeddings
for Novelty Identification and Active Learning: Framework and Experimental
Analysis with Real-World Data Sets
- URL: http://arxiv.org/abs/2402.07320v1
- Date: Sun, 11 Feb 2024 22:53:21 GMT
- Title: Towards Explainable, Safe Autonomous Driving with Language Embeddings
for Novelty Identification and Active Learning: Framework and Experimental
Analysis with Real-World Data Sets
- Authors: Ross Greer and Mohan Trivedi
- Abstract summary: This research explores the integration of language embeddings for active learning in autonomous driving datasets.
Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research explores the integration of language embeddings for active
learning in autonomous driving datasets, with a focus on novelty detection.
Novelty arises from unexpected scenarios that autonomous vehicles struggle to
navigate, necessitating higher-level reasoning abilities. Our proposed method
employs language-based representations to identify novel scenes, emphasizing
the dual purpose of safety takeover responses and active learning. The research
presents a clustering experiment using Contrastive Language-Image Pretrained
(CLIP) embeddings to organize datasets and detect novelties. We find that the
proposed algorithm effectively isolates novel scenes from a collection of
subsets derived from two real-world driving datasets, one vehicle-mounted and
one infrastructure-mounted. From the generated clusters, we further present
methods for generating textual explanations of elements which differentiate
scenes classified as novel from other scenes in the data pool, presenting
qualitative examples from the clustered results. Our results demonstrate the
effectiveness of language-driven embeddings in identifying novel elements and
generating explanations of data, and we further discuss potential applications
in safe takeovers, data curation, and multi-task active learning.
Related papers
- Spatio-Temporal Context Prompting for Zero-Shot Action Detection [13.22912547389941]
We propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction.
To address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism.
Our method achieves superior results compared to previous approaches and can be further extended to multi-action videos.
arXiv Detail & Related papers (2024-08-28T17:59:05Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Weakly Supervised Open-Vocabulary Object Detection [31.605276665964787]
We propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD.
To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment.
arXiv Detail & Related papers (2023-12-19T18:59:53Z) - Actively Discovering New Slots for Task-oriented Conversation [19.815466126158785]
We propose a general new slot task in an information extraction fashion to realize human-in-the-loop learning.
We leverage existing language tools to extract value candidates where the corresponding labels are leveraged as weak supervision signals.
We conduct extensive experiments on several public datasets and compare with a bunch of competitive baselines to demonstrate our method.
arXiv Detail & Related papers (2023-05-06T13:33:33Z) - Revisiting Deep Active Learning for Semantic Segmentation [37.3546941940388]
We show that the data distribution is decisive for the performance of the various active learning objectives proposed in the literature.
We demonstrate that the integration of semi-supervised learning with active learning can improve performance when the two objectives are aligned.
arXiv Detail & Related papers (2023-02-08T14:23:37Z) - OmDet: Large-scale vision-language multi-dataset pre-training with
multimodal detection network [17.980765138522322]
This work introduces OmDet, a novel language-aware object detection architecture.
Leveraging natural language as a universal knowledge representation, OmDet accumulates a "visual vocabulary" from diverse datasets.
We demonstrate superior performance of OmDet over strong baselines in object detection in the wild, open-vocabulary detection, and phrase grounding.
arXiv Detail & Related papers (2022-09-10T14:25:14Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.