StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
- URL: http://arxiv.org/abs/2506.14670v1
- Date: Tue, 17 Jun 2025 16:06:03 GMT
- Title: StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
- Authors: Jina Kim, Leeje Jang, Yao-Yi Chiang, Guanyu Wang, Michelle Pasco,
- Abstract summary: We present StreetLens, a researcher-configurable AI system for neighborhood studies.<n>StreetLens embeds relevant social science expertise in a vision-language model for scalable neighborhood environmental assessments.<n>It generates a wide spectrum of semantic annotations from objective features to subjective perceptions.
- Score: 5.987690246378683
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traditionally, neighborhood studies have employed interviews, surveys, and manual image annotation guided by detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to examine their impact on developmental and health outcomes. While these methods yield rich insights, they are time-consuming and require intensive expert intervention. Recent technological advances, including vision-language models (VLMs), have begun to automate parts of this process; however, existing efforts are often ad hoc and lack adaptability across research designs and geographic contexts. In this demo paper, we present StreetLens, a human-centered, researcher-configurable workflow that embeds relevant social science expertise in a VLM for scalable neighborhood environmental assessments. StreetLens mimics the process of trained human coders by grounding the analysis in questions derived from established interview protocols, retrieving relevant street view imagery (SVI), and generating a wide spectrum of semantic annotations from objective features (e.g., the number of cars) to subjective perceptions (e.g., the sense of disorder in an image). By enabling researchers to define the VLM's role through domain-informed prompting, StreetLens places domain knowledge at the core of the analysis process. It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed across diverse settings. We provide a Google Colab notebook to make StreetLens accessible and extensible for researchers working with public or custom SVI datasets. StreetLens represents a shift toward flexible, agentic AI systems that work closely with researchers to accelerate and scale neighborhood studies.
Related papers
- Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions [5.983872847786255]
Vision-based hand gesture recognition (VHGR) delivers a wide range of applications, such as sign language understanding and human-computer interaction using cameras.<n>Despite the large volume of research works in the field, a structured and complete survey on VHGR is still missing.<n>This review aims to constitute a useful guideline for researchers, helping them to choose the right strategy for delving into a certain VHGR task.
arXiv Detail & Related papers (2025-07-06T17:03:01Z) - Self-Supervised Learning for Image Segmentation: A Comprehensive Survey [8.139668811376822]
Self-supervised learning (SSL) has become a powerful machine learning (ML) paradigm for solving several practical downstream computer vision problems.<n>This survey thoroughly investigates over 150 recent image segmentation articles, particularly focusing on SSL.<n>It provides a practical categorization of pretext tasks, downstream tasks, and commonly used benchmark datasets for image segmentation research.
arXiv Detail & Related papers (2025-05-19T17:47:32Z) - VizCV: AI-assisted visualization of researchers' publications tracks [7.233541652625401]
VizCV is a novel web-based end-to-end visual analytics framework.<n>It incorporates AI-assisted analysis and supports automated reporting of career evolution.
arXiv Detail & Related papers (2025-05-13T15:47:59Z) - Can a Large Language Model Assess Urban Design Quality? Evaluating Walkability Metrics Across Expertise Levels [0.0]
Urban environments are vital to supporting human activity in public spaces.<n>The emergence of big data, such as street view images (SVIs) combined with large language models (MLLMs) is transforming how researchers and practitioners investigate, measure, and evaluate urban environments.<n>This study explores the extent to which the integration of expert knowledge can influence the performance of MLLMs in evaluating the quality of urban design.
arXiv Detail & Related papers (2025-04-28T09:41:17Z) - Large Language Model Agent: A Survey on Methodology, Applications and Challenges [88.3032929492409]
Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence.<n>This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy.<n>Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time.
arXiv Detail & Related papers (2025-03-27T12:50:17Z) - Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI)<n>Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z) - Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents [29.73055078727462]
Vital Insight is a novel, LLM-assisted, prototype system to enable human-in-the-loop inference (sensemaking) and visualizations of multi-modal passive sensing data from smartphones and wearables.<n>We observe experts' interactions with it and develop an expert sensemaking model that explains how experts move between direct data representations and AI-supported inferences.
arXiv Detail & Related papers (2024-10-18T21:56:35Z) - A Survey of Stance Detection on Social Media: New Directions and Perspectives [50.27382951812502]
stance detection has emerged as a crucial subfield within affective computing.
Recent years have seen a surge of research interest in developing effective stance detection methods.
This paper provides a comprehensive survey of stance detection techniques on social media.
arXiv Detail & Related papers (2024-09-24T03:06:25Z) - From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems.
The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness.
This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z) - DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour [6.716560115378451]
We introduce a modular, flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis.
Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency.
arXiv Detail & Related papers (2024-07-18T11:28:52Z) - Towards Generalist Robot Learning from Internet Video: A Survey [56.621902345314645]
We present an overview of the emerging field of Learning from Videos (LfV)
LfV aims to address the robotics data bottleneck by augmenting traditional robot data with large-scale internet video data.
We provide a review of current methods for extracting knowledge from large-scale internet video, addressing key challenges in LfV, and boosting downstream robot and reinforcement learning via the use of video data.
arXiv Detail & Related papers (2024-04-30T15:57:41Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - Automatic Gaze Analysis: A Survey of DeepLearning based Approaches [61.32686939754183]
Eye gaze analysis is an important research problem in the field of computer vision and Human-Computer Interaction.
There are several open questions including what are the important cues to interpret gaze direction in an unconstrained environment.
We review the progress across a range of gaze analysis tasks and applications to shed light on these fundamental questions.
arXiv Detail & Related papers (2021-08-12T00:30:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.