Related papers: Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions

Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions

URL: http://arxiv.org/abs/2312.13126v1
Date: Wed, 20 Dec 2023 15:45:54 GMT
Title: Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions
Authors: Deepank Verma, Olaf Mumm, Vanessa Miriam Carlow
Abstract summary: This study explores the current advancements in Generative agents powered by large language models (LLMs) The experiment employs Generative agents to interact with the urban environments using street view images to plan their journey toward specific goals. Since LLMs do not possess embodiment, nor have access to the visual realm, and lack a sense of motion or direction, we designed movement and visual modules that help agents gain an overall understanding of surroundings.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Evaluating the surroundings to gain understanding, frame perspectives, and anticipate behavioral reactions is an inherent human trait. However, these continuous encounters are diverse and complex, posing challenges to their study and experimentation. Researchers have been able to isolate environmental features and study their effect on human perception and behavior. However, the research attempts to replicate and study human behaviors with proxies, such as by integrating virtual mediums and interviews, have been inconsistent. Large language models (LLMs) have recently been unveiled as capable of contextual understanding and semantic reasoning. These models have been trained on large amounts of text and have evolved to mimic believable human behavior. This study explores the current advancements in Generative agents powered by LLMs with the help of perceptual experiments. The experiment employs Generative agents to interact with the urban environments using street view images to plan their journey toward specific goals. The agents are given virtual personalities, which make them distinguishable. They are also provided a memory database to store their thoughts and essential visual information and retrieve it when needed to plan their movement. Since LLMs do not possess embodiment, nor have access to the visual realm, and lack a sense of motion or direction, we designed movement and visual modules that help agents gain an overall understanding of surroundings. The agents are further employed to rate the surroundings they encounter based on their perceived sense of safety and liveliness. As these agents store details in their memory, we query the findings to get details regarding their thought processes. Overall, this study experiments with current AI developments and their potential in simulated human behavior in urban environments.

Related papers

Behavioral Exploration: Learning to Explore via In-Context Adaptation [53.92981562916783]
We train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how exploratory'' the expert's behaviors are relative to this context.<n>This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected.<n>We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks.
arXiv Detail & Related papers (2025-07-11T21:36:19Z)
Embodied AI Agents: Modeling the World [188.85697524284834]
This paper describes our research on AI agents embodied in visual, virtual or physical forms.<n>We propose that the development of world models is central to reasoning and planning of embodied AI agents.<n>We also propose to learn the mental world model of users to enable better human-agent collaboration.
arXiv Detail & Related papers (2025-06-27T16:05:34Z)
Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z)
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models [60.675955082094944]
We present IVE, an agentic exploration framework inspired by human curiosity.<n>We evaluate IVE in both simulated and real-world tabletop environments.
arXiv Detail & Related papers (2025-05-12T17:59:11Z)
Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach [23.52028824411467]
We present a large-scale experimental study involving numepisodes navigation episodes in a real environment with a physical robot. We analyze the type of reasoning emerging from end-to-end training. We show in a post-hoc analysis that the value function learned by the agent relates to long-term planning.
arXiv Detail & Related papers (2025-03-11T11:16:47Z)
TravelAgent: Generative Agents in the Built Environment [5.27010745275885]
TravelAgent is a novel simulation platform that models pedestrian navigation and activity patterns across diverse indoor and outdoor environments. We analyze data from 100 simulations comprising 1898 agent steps across diverse spatial layouts and agent archetypes, achieving an overall task completion rate of 76%. Our findings highlight the potential of TravelAgent as a tool for urban design, spatial cognition research, and agent-based modeling.
arXiv Detail & Related papers (2024-12-25T21:27:51Z)
Sniff AI: Is My 'Spicy' Your 'Spicy'? Exploring LLM's Perceptual Alignment with Human Smell Experiences [12.203995379495916]
This work focuses on olfaction, human smell experiences. We conducted a user study with 40 participants to investigate how well AI can interpret human descriptions of scents. Results indicated limited perceptual alignment, with biases towards certain scents, like lemon and peppermint, and continued failing to identify others, like rosemary.
arXiv Detail & Related papers (2024-11-11T12:56:52Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions. We show that recent representations are already partially resilient to perturbations of non-causal agents. We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)
I am Only Happy When There is Light: The Impact of Environmental Changes on Affective Facial Expressions Recognition [65.69256728493015]
We study the impact of different image conditions on the recognition of arousal from human facial expressions. Our results show how the interpretation of human affective states can differ greatly in either the positive or negative direction.
arXiv Detail & Related papers (2022-10-28T16:28:26Z)
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain [23.598727613908853]
We present MECCANO, a dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view.
arXiv Detail & Related papers (2022-09-19T00:52:42Z)
What do navigation agents learn about their environment? [39.74076893981299]
We introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents. We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
arXiv Detail & Related papers (2022-06-17T01:33:43Z)
The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents [51.94554095091305]
We argue for an introspective agent, which considers its own abilities in the context of its environment. Just as in nature, we hope to reframe strategy as one tool, among many, to succeed in an environment.
arXiv Detail & Related papers (2022-01-02T20:14:01Z)
Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z)
Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z)
Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning [24.163616087447874]
We introduce em causal curiosity, a novel intrinsic reward. We show that it allows our agents to learn optimal sequences of actions. We also show that the knowledge of causal factor representations aids zero-shot learning for more complex tasks.
arXiv Detail & Related papers (2020-10-07T02:07:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.