Attention is All You Want: Machinic Gaze and the Anthropocene
- URL: http://arxiv.org/abs/2405.09734v1
- Date: Thu, 16 May 2024 00:00:53 GMT
- Title: Attention is All You Want: Machinic Gaze and the Anthropocene
- Authors: Liam Magee, Vanicka Arora,
- Abstract summary: computational vision interprets and synthesises representations of the Anthropocene.
We examine how this emergent machinic gaze both looks out, through its compositions of futuristic landscapes, and looks back, towards an observing and observed human subject.
In its varied assistive, surveillant and generative roles, computational vision not only mirrors human desire but articulates oblique demands of its own.
- Score: 2.4554686192257424
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This chapter experiments with ways computational vision interprets and synthesises representations of the Anthropocene. Text-to-image systems such as MidJourney and StableDiffusion, trained on large data sets of harvested images and captions, yield often striking compositions that serve, alternately, as banal reproduction, alien imaginary and refracted commentary on the preoccupations of Internet visual culture. While the effects of AI on visual culture may themselves be transformative or catastrophic, we are more interested here in how it has been trained to imagine shared human, technical and ecological futures. Through a series of textual prompts that marry elements of the Anthropocenic and Australian environmental vernacular, we examine how this emergent machinic gaze both looks out, through its compositions of futuristic landscapes, and looks back, towards an observing and observed human subject. In its varied assistive, surveillant and generative roles, computational vision not only mirrors human desire but articulates oblique demands of its own.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives [3.418398936676879]
This work presents FRESCO, a framework designed to explore the socio-cultural implications of images on social media platforms at scale.
FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques.
The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer.
arXiv Detail & Related papers (2024-07-03T16:57:38Z) - (Re)framing Built Heritage through the Machinic Gaze [3.683202928838613]
We argue that the proliferation of machine learning and vision technologies create new scopic regimes for heritage.
We introduce the term machinic gaze' to conceptualise the reconfiguration of heritage representation via AI models.
arXiv Detail & Related papers (2023-10-06T23:48:01Z) - Contextually-rich human affect perception using multimodal scene
information [36.042369831043686]
We leverage pretrained vision-language (VLN) models to extract descriptions of foreground context from images.
We propose a multimodal context fusion (MCF) module to combine foreground cues with the visual scene and person-based contextual information for emotion prediction.
We show the effectiveness of our proposed modular design on two datasets associated with natural scenes and TV shows.
arXiv Detail & Related papers (2023-03-13T07:46:41Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Comparing Visual Reasoning in Humans and AI [66.89451296340809]
We created a dataset of complex scenes that contained human behaviors and social interactions.
We used a quantitative metric of similarity between scene descriptions of the AI/human and ground truth of five other human descriptions of each scene.
Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes.
arXiv Detail & Related papers (2021-04-29T04:44:13Z) - Style and Pose Control for Image Synthesis of Humans from a Single
Monocular View [78.6284090004218]
StylePoseGAN is a non-controllable generator to accept conditioning of pose and appearance separately.
Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts.
StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics.
arXiv Detail & Related papers (2021-02-22T18:50:47Z) - Learning to See: You Are What You See [3.0709727531116617]
The artwork explores bias in artificial neural networks and provides mechanisms for the manipulation of real-world representations.
The exploration of these representations acts as a metaphor for the process of developing a visual understanding and/or visual vocabulary of the world.
arXiv Detail & Related papers (2020-02-28T07:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.