Related papers: Optimization and Mobile Deployment for Anthropocene Neural Style Transfer

Optimization and Mobile Deployment for Anthropocene Neural Style Transfer

URL: http://arxiv.org/abs/2601.21141v1
Date: Thu, 29 Jan 2026 00:50:03 GMT
Title: Optimization and Mobile Deployment for Anthropocene Neural Style Transfer
Authors: Po-Hsun Chen, Ivan C. H. Liu,
Abstract summary: AnthropoCam is a mobile-based neural style transfer system optimized for the visual synthesis of Anthropocene environments.<n>System integrates a React Native with a Flask-based GPU backend, achieving high-resolution inference within 3-5 seconds on general mobile hardware.
Score: 0.3867363075280543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents AnthropoCam, a mobile-based neural style transfer (NST) system optimized for the visual synthesis of Anthropocene environments. Unlike conventional artistic NST, which prioritizes painterly abstraction, stylizing human-altered landscapes demands a careful balance between amplifying material textures and preserving semantic legibility. Industrial infrastructures, waste accumulations, and modified ecosystems contain dense, repetitive patterns that are visually expressive yet highly susceptible to semantic erosion under aggressive style transfer. To address this challenge, we systematically investigate the impact of NST parameter configurations on the visual translation of Anthropocene textures, including feature layer selection, style and content loss weighting, training stability, and output resolution. Through controlled experiments, we identify an optimal parameter manifold that maximizes stylistic expression while preventing semantic erasure. Our results demonstrate that appropriate combinations of convolutional depth, loss ratios, and resolution scaling enable the faithful transformation of anthropogenic material properties into a coherent visual language. Building on these findings, we implement a low-latency, feed-forward NST pipeline deployed on mobile devices. The system integrates a React Native frontend with a Flask-based GPU backend, achieving high-resolution inference within 3-5 seconds on general mobile hardware. This enables real-time, in-situ visual intervention at the site of image capture, supporting participatory engagement with Anthropocene landscapes. By coupling domain-specific NST optimization with mobile deployment, AnthropoCam reframes neural style transfer as a practical and expressive tool for real-time environmental visualization in the Anthropocene.

Related papers

DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles [9.641815204004823]
3D Gaussian Splatting has redefined the capabilities of photorealistic neural rendering.<n>DAV-GSWT is a framework that leverages diffusion priors and active view sampling to synthesize high-fidelity Wang Tiles.<n>Our system significantly reduces the required data volume while maintaining the visual integrity and interactive performance necessary for large-scale virtual environments.
arXiv Detail & Related papers (2026-02-17T04:47:39Z)
Dynamic Avatar-Scene Rendering from Human-centric Context [75.95641456716373]
We propose bf Separate-then-Map (StM) strategy to bridge separately defined and optimized models.<n>StM significantly outperforms existing state-of-the-art methods in both visual quality and rendering accuracy.
arXiv Detail & Related papers (2025-11-13T17:39:06Z)
Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration [2.0855516369698845]
We introduce a hybrid deep learning framework that combines convolutional neural networks (CNNs) with vision transformers (ViTs)<n>The architecture employs a CNN-based encoder-decoder to preserve structural details, while a transformer module enhances global awareness through self-attention.<n>We show that the proposed method attains 32.20 dB in PSNR and 0.934 in SSIM, while remaining lightweight with 2.83 million parameters and an average inference time of 61 ms.
arXiv Detail & Related papers (2025-11-08T17:48:58Z)
Vision At Night: Exploring Biologically Inspired Preprocessing For Improved Robustness Via Color And Contrast Transformations [18.437759539809175]
We explore biologically motivated input preprocessing for robust semantic segmentation.<n>By applying Difference-of-Gaussians (DoG) filtering to RGB, grayscale, and opponent-color channels, we enhance local contrast without modifying model architecture or training.<n>We show that such preprocessing maintains in-distribution performance while improving to adverse conditions like night, fog, and snow.
arXiv Detail & Related papers (2025-09-29T14:48:32Z)
HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis [59.25751939710903]
We propose a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos.<n>Our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately.<n>Results highlight a practical and scalable solution for EVS in real-world scenarios.
arXiv Detail & Related papers (2025-06-24T03:54:40Z)
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance [9.346027495459039]
Stretch-and-Squeeze (SnS) is an unbiased, model-agnostic, and gradient-free framework to characterize a unit's invariance landscape.<n>SnS seeks perturbations that maximally alter the representation of a reference stimulus in a given processing stage while preserving unit activation.<n>Applying to convolutional neural networks (CNNs), SnS revealed image variations that were further from a reference image in pixel-space than those produced by affine transformations.
arXiv Detail & Related papers (2025-06-20T14:49:35Z)
VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
Neural Neighbor Style Transfer [31.746423262728598]
We propose a pipeline that offers state-of-the-art quality, generalization, and competitive efficiency for artistic style transfer. Our approach is based on explicitly replacing neural features extracted from the content input with those from a style exemplar, then synthesizing the final output.
arXiv Detail & Related papers (2022-03-24T17:11:31Z)
Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks. We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames. We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z)
Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes. We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters. We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN) We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.