Related papers: ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

URL: http://arxiv.org/abs/2506.07739v3
Date: Sat, 02 Aug 2025 12:10:07 GMT
Title: ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models
Authors: Jing Zhong, Jun Yin, Peilin Li, Pengyu Zeng, Miao Zang, Ran Luo, Shuai Lu,
Abstract summary: We construct a professional architectural style dataset named ArchDiffBench, which comprises 1,765 high-quality architectural images and their corresponding style annotations.<n>By integrating ad-vanced computer vision techniques, deep learning, and machine learning, ArchiLense enables automatic recognition, comparison, and precise classi-fication of architectural imagery.<n>ArchiLense achieves strong performance in architectural style recognition, with a 92.4% con-sistency rate with expert annotations and 84.5% classification accuracy.
Score: 14.032055369239627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Architectural cultures across regions are characterized by stylistic diversity, shaped by historical, social, and technological contexts in addition to geograph-ical conditions. Understanding architectural styles requires the ability to describe and analyze the stylistic features of different architects from various regions through visual observations of architectural imagery. However, traditional studies of architectural culture have largely relied on subjective expert interpretations and historical literature reviews, often suffering from regional biases and limited ex-planatory scope. To address these challenges, this study proposes three core contributions: (1) We construct a professional architectural style dataset named ArchDiffBench, which comprises 1,765 high-quality architectural images and their corresponding style annotations, collected from different regions and historical periods. (2) We propose ArchiLense, an analytical framework grounded in Vision-Language Models and constructed using the ArchDiffBench dataset. By integrating ad-vanced computer vision techniques, deep learning, and machine learning algo-rithms, ArchiLense enables automatic recognition, comparison, and precise classi-fication of architectural imagery, producing descriptive language outputs that ar-ticulate stylistic differences. (3) Extensive evaluations show that ArchiLense achieves strong performance in architectural style recognition, with a 92.4% con-sistency rate with expert annotations and 84.5% classification accuracy, effec-tively capturing stylistic distinctions across images. The proposed approach transcends the subjectivity inherent in traditional analyses and offers a more objective and accurate perspective for comparative studies of architectural culture.

Related papers

A vision-intelligent framework for mapping the genealogy of vernacular architecture [1.6520865430314056]
This study proposes a research framework by which intelligent technologies can be assembled to augment researchers' intuition.<n>We employ this framework to examine the stylistic classification of 1,277 historical shophouses in Singapore's Chinatown.<n>Findings extend beyond the chronological classification established by the Urban Redevelopment Authority of Singapore in the 1980s and 1990s.
arXiv Detail & Related papers (2025-05-24T06:39:28Z)
The Architecture Tradeoff and Risk Analysis Framework (ATRAF): A Unified Approach for Evaluating Software Architectures, Reference Architectures, and Architectural Frameworks [0.0]
We introduce the Architecture Tradeoff and Risk Analysis Framework (ATRAF)<n>ATRAF is a scenario-driven framework for evaluating tradeoffs and risks across architectural levels.<n>It enables the identification of sensitivities, tradeoffs, and risks while supporting continuous refinement of architectural artifacts.
arXiv Detail & Related papers (2025-05-01T17:48:52Z)
ArchSeek: Retrieving Architectural Case Studies Using Vision-Language Models [6.936621948709572]
ArchSeek is an innovative case study search system with recommendation capability.<n>Powered by vision-language models and cross-modal embeddings, it enables text and image queries with fine-grained control.
arXiv Detail & Related papers (2025-03-24T13:50:23Z)
Semi-Automated Design of Data-Intensive Architectures [49.1574468325115]
This paper introduces a development methodology for data-intensive architectures.<n>It guides architects in (i) designing a suitable architecture for their specific application scenario, and (ii) selecting an appropriate set of concrete systems to implement the application.<n>We show that the description languages we adopt can capture the key aspects of data-intensive architectures proposed by researchers and practitioners.
arXiv Detail & Related papers (2025-03-21T16:01:11Z)
Evaluation of Architectural Synthesis Using Generative AI [49.1574468325115]
This paper presents a comparative evaluation of two systems: GPT-4o and Claude 3.5, in the task of architectural 3D synthesis.<n>We conduct a case study on two buildings from Palladio's Four Books of Architecture (1965): Villa Rotonda and Palazzo Porto.<n>We assess the systems' abilities in (1) interpreting 2D and 3D representations of buildings from drawings, (2) encoding the buildings into a CAD software script, and (3) self-improving based on outputs.
arXiv Detail & Related papers (2025-03-04T18:39:28Z)
A Survey of Model Architectures in Information Retrieval [64.75808744228067]
We focus on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation.<n>We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs)<n>We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.
arXiv Detail & Related papers (2025-02-20T18:42:58Z)
GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z)
Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images. We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images. This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z)
Thoughts on Architecture [0.0]
The term architecture has evolved from its original Greek roots and its application to buildings and computers to its more recent manifestation for minds. This article considers lessons from this history, in terms of a set of relevant distinctions introduced at each of these stages and a definition of architecture that spans all three.
arXiv Detail & Related papers (2023-06-23T15:47:17Z)
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning. Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.