Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking
- URL: http://arxiv.org/abs/2503.08370v1
- Date: Tue, 11 Mar 2025 12:26:39 GMT
- Title: Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking
- Authors: Xucheng Guo, Yiran Shen, Xiaofang Xiao, Yuanfeng Zhou, Lin Wang,
- Abstract summary: This paper presents Ev-, a novel large-scale event-based multi-modal dataset designed for indoor layout estimation and tracking.<n>The dataset consists of 2.5K sequences, including over 771.3K RGB images and 10 billion event data points.
- Score: 9.808718117070102
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents Ev-Layout, a novel large-scale event-based multi-modal dataset designed for indoor layout estimation and tracking. Ev-Layout makes key contributions to the community by: Utilizing a hybrid data collection platform (with a head-mounted display and VR interface) that integrates both RGB and bio-inspired event cameras to capture indoor layouts in motion. Incorporating time-series data from inertial measurement units (IMUs) and ambient lighting conditions recorded during data collection to highlight the potential impact of motion speed and lighting on layout estimation accuracy. The dataset consists of 2.5K sequences, including over 771.3K RGB images and 10 billion event data points. Of these, 39K images are annotated with indoor layouts, enabling research in both event-based and video-based indoor layout estimation. Based on the dataset, we propose an event-based layout estimation pipeline with a novel event-temporal distribution feature module to effectively aggregate the spatio-temporal information from events. Additionally, we introduce a spatio-temporal feature fusion module that can be easily integrated into a transformer module for fusion purposes. Finally, we conduct benchmarking and extensive experiments on the Ev-Layout dataset, demonstrating that our approach significantly improves the accuracy of dynamic indoor layout estimation compared to existing event-based methods.
Related papers
- EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More [7.974102031202597]
EvLight++ is a novel event-guided low-light video enhancement approach designed for robust performance in real-world scenarios.
EvLight++ significantly outperforms both single image- and video-based methods by 1.37 dB and 3.71 dB, respectively.
arXiv Detail & Related papers (2024-08-29T04:30:31Z) - TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking [30.89375068036783]
Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models.
We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data.
Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
arXiv Detail & Related papers (2024-05-08T12:19:08Z) - ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization [62.751303924391564]
How to effectively explore spatial-temporal features is important for video colorization.
We develop a memory-based feature propagation module that can establish reliable connections with features from far-apart frames.
We develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood.
arXiv Detail & Related papers (2024-04-09T12:23:30Z) - Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline [37.06330707742272]
We first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT.
It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date.
We propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data.
arXiv Detail & Related papers (2024-03-09T08:49:50Z) - Segment Any Events via Weighted Adaptation of Pivotal Tokens [85.39087004253163]
This paper focuses on the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data.
We introduce a multi-scale feature distillation methodology to optimize the alignment of token embeddings originating from event data with their RGB image counterparts.
arXiv Detail & Related papers (2023-12-24T12:47:08Z) - iBARLE: imBalance-Aware Room Layout Estimation [54.819085005591894]
Room layout estimation predicts layouts from a single panorama.
There are significant imbalances in real-world datasets including the dimensions of layout complexity, camera locations, and variation in scene appearance.
We propose imBalance-Aware Room Layout Estimation (iBARLE) framework to address these issues.
iBARLE consists of (1) Appearance Variation Generation (AVG) module, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w.r.t. room structure, and (3) a gradient-based layout objective function.
arXiv Detail & Related papers (2023-08-29T06:20:36Z) - On the Generation of a Synthetic Event-Based Vision Dataset for
Navigation and Landing [69.34740063574921]
This paper presents a methodology for generating event-based vision datasets from optimal landing trajectories.
We construct sequences of photorealistic images of the lunar surface with the Planet and Asteroid Natural Scene Generation Utility.
We demonstrate that the pipeline can generate realistic event-based representations of surface features by constructing a dataset of 500 trajectories.
arXiv Detail & Related papers (2023-08-01T09:14:20Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc.<n>Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Indoor Layout Estimation by 2D LiDAR and Camera Fusion [3.2387553628943535]
This paper presents an algorithm for indoor layout estimation and reconstruction through the fusion of a sequence of captured images and LiDAR data sets.
In the proposed system, a movable platform collects both intensity images and 2D LiDAR information.
arXiv Detail & Related papers (2020-01-15T16:43:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.