Related papers: InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

URL: http://arxiv.org/abs/2403.17422v1
Date: Tue, 26 Mar 2024 06:35:55 GMT
Title: InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Authors: Jihyun Lee, Shunsuke Saito, Giljoo Nam, Minhyuk Sung, Tae-Kyun Kim,
Abstract summary: We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction. For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation. Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
Score: 53.90516061351706
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction. Sampling from our model yields plausible and diverse two-hand shapes in close interaction with or without an object. Our prior can be incorporated into any optimization or learning methods to reduce ambiguity in an ill-posed setup. Our key observation is that directly modeling the joint distribution of multiple instances imposes high learning complexity due to its combinatorial nature. Thus, we propose to decompose the modeling of joint distribution into the modeling of factored unconditional and conditional single instance distribution. In particular, we introduce a diffusion model that learns the single-hand distribution unconditional and conditional to another hand via conditioning dropout. For sampling, we combine anti-penetration and classifier-free guidance to enable plausible generation. Furthermore, we establish the rigorous evaluation protocol of two-hand synthesis, where our method significantly outperforms baseline generative models in terms of plausibility and diversity. We also demonstrate that our diffusion prior can boost the performance of two-hand reconstruction from monocular in-the-wild images, achieving new state-of-the-art accuracy.

Related papers

Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech [42.663766380488205]
DIDiffGes can synthesize high-quality, expressive gestures from speech using only a few sampling steps. Our method outperforms state-of-the-art approaches in human likeness, appropriateness, and style correctness.
arXiv Detail & Related papers (2025-03-21T11:23:39Z)
Bridging the inference gap in Mutimodal Variational Autoencoders [6.246098300155483]
Multimodal Variational Autoencoders offer versatile and scalable methods for generating unobserved modalities from observed ones. Recent models using mixturesof-experts aggregation suffer from theoretically grounded limitations that restrict their generation quality on complex datasets. We propose a novel interpretable model able to learn both joint and conditional distributions without introducing mixture aggregation.
arXiv Detail & Related papers (2025-02-06T10:43:55Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density [70.14884528360199]
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density.
arXiv Detail & Related papers (2024-07-11T16:46:04Z)
Transfer Learning for Diffusion Models [43.10840361752551]
Diffusion models consistently produce high-quality synthetic samples. They can be impractical in real-world applications due to high collection costs or associated risks. This paper introduces the Transfer Guided Diffusion Process (TGDP), a novel approach distinct from conventional finetuning and regularization methods.
arXiv Detail & Related papers (2024-05-27T06:48:58Z)
Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z)
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement [53.2171981279647]
We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models. We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-14T14:22:22Z)
Exploring Compositional Visual Generation with Latent Classifier Guidance [19.48538300223431]
We train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation. We show that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images.
arXiv Detail & Related papers (2023-04-25T03:02:58Z)
Learning Data Representations with Joint Diffusion Models [20.25147743706431]
Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. We extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives. The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks.
arXiv Detail & Related papers (2023-01-31T13:29:19Z)
Learning Under Adversarial and Interventional Shifts [36.183840774167756]
We propose a new formulation, RISe, for designing robust models against a set of distribution shifts. We employ the distributionally robust optimization framework to optimize the resulting objective in both supervised and reinforcement learning settings.
arXiv Detail & Related papers (2021-03-29T20:10:51Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.