Related papers: StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery

StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery

URL: http://arxiv.org/abs/2506.14670v2
Date: Sat, 11 Oct 2025 04:18:39 GMT
Title: StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
Authors: Jina Kim, Leeje Jang, Yao-Yi Chiang, Guanyu Wang, Michelle C. Pasco,
Abstract summary: We present StreetLens, a user-configurable human-centered workflow that integrates relevant social science expertise into a vision language model.<n>StreetLens places domain knowledge at the core of the analysis process.<n>It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed.
Score: 8.188916140324139
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Traditionally, neighborhood studies have used interviews, surveys, and manual image annotation guided by detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to examine their impact on developmental and health outcomes. Although these methods yield rich insights, they are time-consuming and require intensive expert intervention. Recent technological advances, including vision language models (VLMs), have begun to automate parts of this process; however, existing efforts are often ad hoc and lack adaptability across research designs and geographic contexts. In this paper, we present StreetLens, a user-configurable human-centered workflow that integrates relevant social science expertise into a VLM for scalable neighborhood environmental assessments. StreetLens mimics the process of trained human coders by focusing the analysis on questions derived from established interview protocols, retrieving relevant street view imagery (SVI), and generating a wide spectrum of semantic annotations from objective features (e.g., the number of cars) to subjective perceptions (e.g., the sense of disorder in an image). By enabling researchers to define the VLM's role through domain-informed prompting, StreetLens places domain knowledge at the core of the analysis process. It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed in diverse settings. StreetLens represents a shift toward flexible and agentic AI systems that work closely with researchers to accelerate and scale neighborhood studies. StreetLens is publicly available at https://knowledge-computing.github.io/projects/streetlens.

Related papers

A Case Study in Responsible AI-Assisted Video Solutions: Multi-Metric Behavioral Insights in a Public Market Setting [4.760683150745747]
The study focuses on generating Multi-Metric Behavioral Insights through the extraction of customer directional flow, dwell duration, and movement patterns.<n>Data collected over 18 days, spanning routine operations and a festival window from May 2-4, reveals a consistently right-skewed dwell-time behavior.<n>Movement analysis indicates uneven circulation, with over 60% of traffic concentrated in approximately 30% of the venue space.
arXiv Detail & Related papers (2026-03-04T21:11:16Z)
Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z)
How to Build Robust, Scalable Models for GSV-Based Indicators in Neighborhood Research [5.236003339365069]
We show how to select and adapt foundation models for datasets with limited size and labels, while leveraging larger, unlabeled datasets through unsupervised training.<n>Our study includes comprehensive quantitative and visual analyses comparing model performance before and after unsupervised adaptation.
arXiv Detail & Related papers (2026-01-10T06:00:09Z)
StreetWeave: A Declarative Grammar for Street-Overlaid Visualization of Multivariate Data [3.4529078468373706]
Street and pedestrian network visualization important to urban planners, climate researchers, and health experts.<n>There is no established design framework to guide the creation of these visualizations.<n>We introduce StreetWeave, a declarative grammar for designing custom visualizations of multivariate spatial network data.
arXiv Detail & Related papers (2025-08-10T21:59:20Z)
Urbanite: A Dataflow-Based Framework for Human-AI Interactive Alignment in Urban Visual Analytics [4.107382739138796]
Urban visual analytics has become essential for deriving insights into pressing real-world problems.<n>The need to manage diverse datasets, distill intricate, and integrate various analytical methods presents a high barrier to entry.<n>We propose Urbanite, a framework for human-AI collaboration in urban visual analytics.
arXiv Detail & Related papers (2025-08-10T15:44:37Z)
Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions [5.983872847786255]
Vision-based hand gesture recognition (VHGR) delivers a wide range of applications, such as sign language understanding and human-computer interaction using cameras.<n>Despite the large volume of research works in the field, a structured and complete survey on VHGR is still missing.<n>This review aims to constitute a useful guideline for researchers, helping them to choose the right strategy for delving into a certain VHGR task.
arXiv Detail & Related papers (2025-07-06T17:03:01Z)
Self-Supervised Learning for Image Segmentation: A Comprehensive Survey [8.139668811376822]
Self-supervised learning (SSL) has become a powerful machine learning (ML) paradigm for solving several practical downstream computer vision problems.<n>This survey thoroughly investigates over 150 recent image segmentation articles, particularly focusing on SSL.<n>It provides a practical categorization of pretext tasks, downstream tasks, and commonly used benchmark datasets for image segmentation research.
arXiv Detail & Related papers (2025-05-19T17:47:32Z)
Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey [54.90240495777929]
Ambiguity remains a fundamental challenge in Natural Language Processing (NLP)<n>With the advent of Large Language Models (LLMs), addressing ambiguity has become even more critical due to their expanded capabilities and applications.<n>This paper explores the definition, forms, and implications of ambiguity for language driven systems.
arXiv Detail & Related papers (2025-05-18T20:53:41Z)
VizCV: AI-assisted visualization of researchers' publications tracks [7.233541652625401]
VizCV is a novel web-based end-to-end visual analytics framework.<n>It incorporates AI-assisted analysis and supports automated reporting of career evolution.
arXiv Detail & Related papers (2025-05-13T15:47:59Z)
Can a Large Language Model Assess Urban Design Quality? Evaluating Walkability Metrics Across Expertise Levels [0.0]
Urban environments are vital to supporting human activity in public spaces.<n>The emergence of big data, such as street view images (SVIs) combined with large language models (MLLMs) is transforming how researchers and practitioners investigate, measure, and evaluate urban environments.<n>This study explores the extent to which the integration of expert knowledge can influence the performance of MLLMs in evaluating the quality of urban design.
arXiv Detail & Related papers (2025-04-28T09:41:17Z)
Large Language Model Agent: A Survey on Methodology, Applications and Challenges [88.3032929492409]
Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence.<n>This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy.<n>Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time.
arXiv Detail & Related papers (2025-03-27T12:50:17Z)
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI)<n>Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z)
Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents [29.73055078727462]
Vital Insight is a novel, LLM-assisted, prototype system to enable human-in-the-loop inference (sensemaking) and visualizations of multi-modal passive sensing data from smartphones and wearables.<n>We observe experts' interactions with it and develop an expert sensemaking model that explains how experts move between direct data representations and AI-supported inferences.
arXiv Detail & Related papers (2024-10-18T21:56:35Z)
A Survey of Stance Detection on Social Media: New Directions and Perspectives [50.27382951812502]
stance detection has emerged as a crucial subfield within affective computing. Recent years have seen a surge of research interest in developing effective stance detection methods. This paper provides a comprehensive survey of stance detection techniques on social media.
arXiv Detail & Related papers (2024-09-24T03:06:25Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour [6.716560115378451]
We introduce a modular, flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis. Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency.
arXiv Detail & Related papers (2024-07-18T11:28:52Z)
Towards Generalist Robot Learning from Internet Video: A Survey [56.621902345314645]
We present an overview of the emerging field of Learning from Videos (LfV) LfV aims to address the robotics data bottleneck by augmenting traditional robot data with large-scale internet video data. We provide a review of current methods for extracting knowledge from large-scale internet video, addressing key challenges in LfV, and boosting downstream robot and reinforcement learning via the use of video data.
arXiv Detail & Related papers (2024-04-30T15:57:41Z)
Combatting Human Trafficking in the Cyberspace: A Natural Language Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques. We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models. A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z)
Automatic Gaze Analysis: A Survey of DeepLearning based Approaches [61.32686939754183]
Eye gaze analysis is an important research problem in the field of computer vision and Human-Computer Interaction. There are several open questions including what are the important cues to interpret gaze direction in an unconstrained environment. We review the progress across a range of gaze analysis tasks and applications to shed light on these fundamental questions.
arXiv Detail & Related papers (2021-08-12T00:30:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.