VIL-100: A New Dataset and A Baseline Model for Video Instance Lane
Detection
- URL: http://arxiv.org/abs/2108.08482v1
- Date: Thu, 19 Aug 2021 03:57:39 GMT
- Title: VIL-100: A New Dataset and A Baseline Model for Video Instance Lane
Detection
- Authors: Yujun Zhang, Lei Zhu, Wei Feng, Huazhu Fu, Mingqian Wang, Qingxia Li,
Cheng Li and Song Wang
- Abstract summary: We collect a new video instance lane detection dataset, which contains 100 videos with in total 10,000 frames.
All the frames in each video are manually annotated to a high-quality instance-level lane annotation.
We propose a new baseline model, named multi-level memory aggregation network (MMA-Net) for video instance lane detection.
- Score: 43.11580440256568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lane detection plays a key role in autonomous driving. While car cameras
always take streaming videos on the way, current lane detection works mainly
focus on individual images (frames) by ignoring dynamics along the video. In
this work, we collect a new video instance lane detection (VIL-100) dataset,
which contains 100 videos with in total 10,000 frames, acquired from different
real traffic scenarios. All the frames in each video are manually annotated to
a high-quality instance-level lane annotation, and a set of frame-level and
video-level metrics are included for quantitative performance evaluation.
Moreover, we propose a new baseline model, named multi-level memory aggregation
network (MMA-Net), for video instance lane detection. In our approach, the
representation of current frame is enhanced by attentively aggregating both
local and global memory features from other frames. Experiments on the new
collected dataset show that the proposed MMA-Net outperforms state-of-the-art
lane detection methods and video object segmentation methods. We release our
dataset and code at https://github.com/yujun0-0/MMA-Net.
Related papers
- LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation [87.71768494466959]
LaneTCA bridges the individual video frames and explore how to effectively aggregate the temporal context.
We develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term temporal context.
The two modules are meticulously designed based on the transformer architecture.
arXiv Detail & Related papers (2024-08-25T14:46:29Z) - ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection [51.16181295385818]
We first collect an annotated RGB-D video SODOD (DSOD-100) dataset, which contains 100 videos within a total of 9,362 frames.
All the frames in each video are manually annotated to a high-quality saliency annotation.
We propose a new baseline model, named attentive triple-fusion network (ATF-Net) for RGB-D salient object detection.
arXiv Detail & Related papers (2024-06-18T12:09:43Z) - Matching Anything by Segmenting Anything [109.2507425045143]
We propose MASA, a novel method for robust instance association learning.
MASA learns instance-level correspondence through exhaustive data transformations.
We show that MASA achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences.
arXiv Detail & Related papers (2024-06-06T16:20:07Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Glitch in the Matrix: A Large Scale Benchmark for Content Driven
Audio-Visual Forgery Detection and Localization [20.46053083071752]
We propose and benchmark a new dataset, Localized Visual DeepFake (LAV-DF)
LAV-DF consists of strategic content-driven audio, visual and audio-visual manipulations.
The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture.
arXiv Detail & Related papers (2023-05-03T08:48:45Z) - Robust Online Video Instance Segmentation with Track Queries [15.834703258232002]
We propose a fully online transformer-based video instance segmentation model that performs comparably to top offline methods on the YouTube-VIS 2019 benchmark.
We show that, when combined with a strong enough image segmentation architecture, track queries can exhibit impressive accuracy while not being constrained to short videos.
arXiv Detail & Related papers (2022-11-16T18:50:14Z) - Two-Level Temporal Relation Model for Online Video Instance Segmentation [3.9349485816629888]
We propose an online method that is on par with the performance of the offline counterparts.
We introduce a message-passing graph neural network that encodes objects and relates them through time.
Our model achieves trained end-to-end, state-of-the-art performance on the YouTube-VIS dataset.
arXiv Detail & Related papers (2022-10-30T10:01:01Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.