Visions Of Destruction: Exploring Human Impact on Nature by Navigating
the Latent Space of a Diffusion Model via Gaze
- URL: http://arxiv.org/abs/2401.06361v1
- Date: Thu, 28 Dec 2023 15:55:11 GMT
- Title: Visions Of Destruction: Exploring Human Impact on Nature by Navigating
the Latent Space of a Diffusion Model via Gaze
- Authors: Mar Canet Sola and Varvara Guljajeva
- Abstract summary: This paper discusses the artwork "Visions of Destruction", with a primary conceptual focus on the Anthropocene.
The paper looks into early references of interactive art history that deploy eye-tracking as a method for audience interaction, and presents recent AI-aided artworks that demonstrate interactive latent space navigation.
- Score: 2.7195102129095003
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper discusses the artwork "Visions of Destruction", with a primary
conceptual focus on the Anthropocene, which is communicated through audience
interaction and generative AI as artistic research methods. Gaze-based
interaction transitions the audience from mere observers to agents of landscape
transformation, fostering a profound, on-the-edge engagement with pressing
issues such as climate change and planetary destruction. The paper looks into
early references of interactive art history that deploy eye-tracking as a
method for audience interaction, and presents recent AI-aided artworks that
demonstrate interactive latent space navigation.
Related papers
- Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction [60.964512894143475]
We present Generative Spatial Transformer ( GST), a novel auto-regressive framework that jointly addresses spatial localization and view prediction.
Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
arXiv Detail & Related papers (2024-10-24T17:58:05Z) - Reflections on Disentanglement and the Latent Space [0.0]
The latent space of image generative models is a multi-dimensional space of compressed hidden visual knowledge.
This paper proposes a double view of the latent space, as a multi-dimensional archive of culture and as a multi-dimensional space of potentiality.
arXiv Detail & Related papers (2024-10-08T14:55:07Z) - Visions of Destruction: Exploring a Potential of Generative AI in Interactive Art [2.3020018305241337]
This paper explores the potential of generative AI within interactive art, employing a practice-based research approach.
It presents the interactive artwork "Visions of Destruction" as a detailed case study, highlighting its innovative use of generative AI to create a dynamic, audience-responsive experience.
arXiv Detail & Related papers (2024-08-26T21:20:45Z) - Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition [13.956664101032006]
We first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories.
We then introduce the zero-shot interaction-oriented attention prediction task ZeroIA, which challenges models to predict visual cues for interactions not encountered during training.
Thirdly, we present the Interactive Attention model IA, designed to emulate human observers cognitive processes to tackle the ZeroIA problem.
arXiv Detail & Related papers (2024-05-16T09:34:57Z) - Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method [63.49140028965778]
We present GazeHOI, the first dataset to capture simultaneous 3D modeling of gaze, hand, and object interactions.
To tackle these issues, we propose a stacked gaze-guided hand-object interaction diffusion model, named GHO-Diffusion.
We also introduce HOI-Manifold Guidance during the sampling stage of GHO-Diffusion, enabling fine-grained control over generated motions.
arXiv Detail & Related papers (2024-03-24T14:24:13Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Embodied Agents for Efficient Exploration and Smart Scene Description [47.82947878753809]
We tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment.
We propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images.
Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions.
arXiv Detail & Related papers (2023-01-17T19:28:01Z) - Sonic Interactions in Virtual Environments: the Egocentric Audio
Perspective of the Digital Twin [6.639956135839834]
This chapter aims to transform studies related to sonic interactions in virtual environments into a research field equipped with the egocentric perspective of the auditory digital twin.
The guardian of such locus of agency is the auditory digital twin that fosters intra-actions between humans and technology.
arXiv Detail & Related papers (2022-04-21T07:18:16Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.