Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
- URL: http://arxiv.org/abs/2512.01242v1
- Date: Mon, 01 Dec 2025 03:38:44 GMT
- Title: Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
- Authors: Zirui Zhao, Boye Niu, David Hsu, Wee Sun Lee,
- Abstract summary: We study abstract visual composition in which identity is determined by the configuration and relations among a small set of geometric primitives.<n>An AlphaGo-style search enforces feasibility, while a fine-tuned vision-language model scores semantic alignment as reward signals.<n>Inspired by the Generative Adversarial Network, we use the generated instances for adversarial reward refinement.
- Score: 29.755551944026738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study abstract visual composition, in which identity is primarily determined by the spatial configuration and relations among a small set of geometric primitives (e.g., parts, symmetry, topology). They are invariant primarily to texture and photorealistic detail. Composing such structures from fixed components under geometric constraints and vague goal specification (such as text) is non-trivial due to combinatorial placement choices, limited data, and discrete feasibility (overlap-free, allowable orientations), which create a sparse solution manifold ill-suited to purely statistical pixel-space generators. We propose a constraint-guided framework that combines explicit geometric reasoning with neural semantics. An AlphaGo-style search enforces feasibility, while a fine-tuned vision-language model scores semantic alignment as reward signals. Our algorithm uses a policy network as a heuristic in Monte-Carlo Tree Search and fine-tunes the network via search-generated plans. Inspired by the Generative Adversarial Network, we use the generated instances for adversarial reward refinement. Over time, the generation should approach the actual data more closely when the reward model cannot distinguish between generated instances and ground-truth. In the Tangram Assembly task, our approach yields higher validity and semantic fidelity than diffusion and auto-regressive baselines, especially as constraints tighten.
Related papers
- Adaptive Edge Learning for Density-Aware Graph Generation [0.0]
We propose a density-aware conditional graph generation framework using Wasserstein GANs (WGAN)<n>A differentiable edge predictor determines pairwise relationships directly from node embeddings.<n>A density-aware selection mechanism adaptively controls edge density to match class-specific sparsity distributions.
arXiv Detail & Related papers (2026-01-30T15:01:50Z) - Flatten The Complex: Joint B-Rep Generation via Compositional $k$-Cell Particles [22.846357150067927]
Boundary Representation (B-Reps) is the widely adopted standard in Computer Design (CAD) andAided modeling.<n>Previous methods rely on sequences to handle this hierarchy which fails to fully exploit the geometric relationships between cells.
arXiv Detail & Related papers (2026-01-25T08:00:28Z) - TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z) - Latent Structural Similarity Networks for Unsupervised Discovery in Multivariate Time Series [0.0]
Method learns window-level sequence representations using an unsupervised sequence-to-sequence autoencoder.<n>It induces a sparse similarity network by thresholding a latent-space similarity measure.<n>This network is intended as an analyzable abstraction that compresses the pairwise search space.
arXiv Detail & Related papers (2026-01-15T03:05:17Z) - SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes [61.110517195874074]
We present a scheme to directly generate manifold, polygonal meshes of complex connectivity as the output of a neural network.<n>Our key innovation is to define a continuous latent connectivity space at each mesh, which implies the discrete mesh.<n>In applications, this approach not only yields high-quality outputs from generative models, but also enables directly learning challenging geometry processing tasks such as mesh repair.
arXiv Detail & Related papers (2024-09-30T17:59:03Z) - Generation of Uncertainty-Aware Emergent Concepts in Factorized 3D Scene Graphs via Graph Neural Networks [14.276364545017222]
This paper presents a learning-based method to generate online spatial emergent concepts as optimizable factors within a SLAM backend.<n>In both simulated and real indoor scenarios, our approach improves complex concept detection by 20.7% and 5.3%, trajectory estimation by 19.2%, and map reconstruction by 12.3% and 3.8%, respectively.
arXiv Detail & Related papers (2024-09-18T13:24:44Z) - SC2GAN: Rethinking Entanglement by Self-correcting Correlated GAN Space [16.040942072859075]
Gene Networks that achieve following editing directions for one attribute could result in entangled changes with other attributes.
We propose a novel framework SC$2$GAN disentanglement by re-projecting low-density latent code samples in the original latent space.
arXiv Detail & Related papers (2023-10-10T14:42:32Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Temporally-Consistent Surface Reconstruction using Metrically-Consistent
Atlases [131.50372468579067]
We propose a method for unsupervised reconstruction of a temporally-consistent sequence of surfaces from a sequence of time-evolving point clouds.
We represent the reconstructed surfaces as atlases computed by a neural network, which enables us to establish correspondences between frames.
Our approach outperforms state-of-the-art ones on several challenging datasets.
arXiv Detail & Related papers (2021-11-12T17:48:25Z) - IGAN: Inferent and Generative Adversarial Networks [0.0]
IGAN learns both a generative and an inference model on a complex high dimensional data distribution.
It extends the traditional GAN framework with inference by rewriting the adversarial strategy in both the image and the latent space.
arXiv Detail & Related papers (2021-09-27T21:48:35Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - Extended Stochastic Block Models with Application to Criminal Networks [3.2211782521637393]
We study covert networks that encode relationships among criminals.
The coexistence of noisy block patterns limits the reliability of routinely-used community detection algorithms.
We develop a new class of extended block models (ESBM) that infer groups of nodes having common connectivity patterns.
arXiv Detail & Related papers (2020-07-16T19:06:16Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Network Bending: Expressive Manipulation of Deep Generative Models [0.2062593640149624]
We introduce a new framework for manipulating and interacting with deep generative models that we call network bending.
We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.
arXiv Detail & Related papers (2020-05-25T21:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.