Related papers: WAFFLE: Multimodal Floorplan Understanding in the Wild

WAFFLE: Multimodal Floorplan Understanding in the Wild

URL: http://arxiv.org/abs/2412.00955v2
Date: Tue, 03 Dec 2024 18:58:44 GMT
Title: WAFFLE: Multimodal Floorplan Understanding in the Wild
Authors: Keren Ganon, Morris Alper, Rachel Mikulinsky, Hadar Averbuch-Elor,
Abstract summary: We introduce WAFFLE, a novel dataset of nearly 20K floorplan images and metadata curated from Internet data spanning diverse building types, locations, and data formats.<n>We show that WAFFLE enables progress on new building understanding tasks, both discriminative and generative, which were not feasible using prior datasets.<n>We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.
Score: 10.832723844562887
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Buildings are a central feature of human culture and are increasingly being analyzed with computational methods. However, recent works on computational building understanding have largely focused on natural imagery of buildings, neglecting the fundamental element defining a building's structure -- its floorplan. Conversely, existing works on floorplan understanding are extremely limited in scope, often focusing on floorplans of a single semantic category and region (e.g. floorplans of apartments from a single country). In this work, we introduce WAFFLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curated from Internet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE enables progress on new building understanding tasks, both discriminative and generative, which were not feasible using prior datasets. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.

Related papers

Building Floor Number Estimation from Crowdsourced Street-Level Images: Munich Dataset and Baseline Method [17.492721759864505]
Large-scale floor-count data are rarely available in cadastral and 3D city databases.<n>This study proposes an end-to-end deep learning framework that infers floor numbers directly from street-level imagery.<n>The proposed classification-regression network attains 81.2% exact accuracy and predicts 97.9% of buildings within +/-1 floor.
arXiv Detail & Related papers (2025-05-23T15:27:46Z)
MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes [6.9924720592711935]
We develop textbfModified Swiss Dwellings (MSD) -- the first large-scale floor plan dataset that contains a significant share of layouts of multi-apartment dwellings. MSD features over 5.3K floor plans of medium- to large-scale building complexes, covering over 18.9K distinct apartments.
arXiv Detail & Related papers (2024-07-14T08:51:25Z)
Towards Vision-Language Geo-Foundation Model: A Survey [65.70547895998541]
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks. This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2024-06-13T17:57:30Z)
Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z)
City Foundation Models for Learning General Purpose Representations from OpenStreetMap [16.09047066527081]
We present CityFM, a framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OpenStreetMap, and produces multimodal representations of entities of different types, spatial, visual, and textual information. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
arXiv Detail & Related papers (2023-10-01T05:55:30Z)
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding [50.412121156940294]
Action understanding can be formed as the mapping from the physical space to the semantic space. We propose a novel model mapping from the physical space to semantic space to fully use Pangea.
arXiv Detail & Related papers (2023-04-02T15:04:43Z)
Building Floorspace in China: A Dataset and Learning Pipeline [0.32228025627337864]
This paper provides a first milestone in measuring the floorspace of buildings in 40 major Chinese cities. We use Sentinel-1 and -2 satellite images as our main data source. We provide a detailed description of our data, algorithms, and evaluations.
arXiv Detail & Related papers (2023-03-03T21:45:36Z)
FloorLevel-Net: Recognizing Floor-Level Lines with Height-Attention-Guided Multi-task Learning [49.30194762653723]
This work tackles the problem of locating floor-level lines in street-view images, using a supervised deep learning approach. We first compile a new dataset and develop a new data augmentation scheme to synthesize training samples. Next, we design FloorLevel-Net, a multi-task learning network that associates explicit features of building facades and implicit floor-level lines.
arXiv Detail & Related papers (2021-07-06T08:17:59Z)
Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation. Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)
Graph-Based Generative Representation Learning of Semantically and Behaviorally Augmented Floorplans [12.488287536032747]
We present a floorplan embedding technique that uses an attributed graph to represent the geometric information as well as design semantics and behavioral features of the inhabitants as node and edge attributes. A Long Short-Term Memory (LSTM) Variational Autoencoder (VAE) architecture is proposed and trained to embed attributed graphs as vectors in a continuous space. A user study is conducted to evaluate the coupling of similar floorplans retrieved from the embedding space with respect to a given input.
arXiv Detail & Related papers (2020-12-08T20:51:56Z)
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks. The dataset has been point-wisely annotated with both hierarchical and instance-based labels. We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.