ORES: Open-vocabulary Responsible Visual Synthesis
- URL: http://arxiv.org/abs/2308.13785v1
- Date: Sat, 26 Aug 2023 06:47:34 GMT
- Title: ORES: Open-vocabulary Responsible Visual Synthesis
- Authors: Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang,
Zicheng Liu, Nan Duan
- Abstract summary: We formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts.
To address this problem, we present a Two-stage Intervention (TIN) framework.
By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible.
- Score: 104.7572323359984
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Avoiding synthesizing specific visual concepts is an essential challenge in
responsible visual synthesis. However, the visual concept that needs to be
avoided for responsible visual synthesis tends to be diverse, depending on the
region, context, and usage scenarios. In this work, we formalize a new task,
Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model
is able to avoid forbidden visual concepts while allowing users to input any
desired content. To address this problem, we present a Two-stage Intervention
(TIN) framework. By introducing 1) rewriting with learnable instruction through
a large-scale language model (LLM) and 2) synthesizing with prompt intervention
on a diffusion synthesis model, it can effectively synthesize images avoiding
any concepts but following the user's query as much as possible. To evaluate on
ORES, we provide a publicly available dataset, baseline models, and benchmark.
Experimental results demonstrate the effectiveness of our method in reducing
risks of image generation. Our work highlights the potential of LLMs in
responsible visual synthesis. Our code and dataset is public available.
Related papers
- SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor.
We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z) - Visually Dehallucinative Instruction Generation [0.8192907805418583]
This paper presents a novel and scalable method for generating visually dehallucinative instructions, dubbed CAP2QA, that constrains the scope to only image contents.
It shows that our proposed method significantly reduces visual hallucination while consistently improving visual recognition ability and expressiveness.
arXiv Detail & Related papers (2024-02-13T10:25:45Z) - Teaching Language Models to Hallucinate Less with Synthetic Tasks [47.87453655902263]
Large language models (LLMs) frequently hallucinate on abstractive summarization tasks.
We show that reducing hallucination on a synthetic task can also reduce hallucination on real-world downstream tasks.
arXiv Detail & Related papers (2023-10-10T17:57:00Z) - Survey on Controlable Image Synthesis with Deep Learning [15.29961293132048]
We present a survey of some recent works on 3D controllable image synthesis using deep learning.
We first introduce the datasets and evaluation indicators for 3D controllable image synthesis.
The photometrically controllable image synthesis approaches are also reviewed for 3D re-lighting researches.
arXiv Detail & Related papers (2023-07-18T07:02:51Z) - ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real
Novel View Synthesis via Contrastive Learning [102.46382882098847]
We first investigate the effects of synthetic data in synthetic-to-real novel view synthesis.
We propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints.
Our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2023-03-20T12:06:14Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - Integrated Speech and Gesture Synthesis [26.267738299876314]
Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities.
We propose to synthesize the two modalities in a single model, a new problem we call integrated speech and gesture synthesis (ISG)
Model is able to achieve this with faster synthesis time and greatly reduced parameter count compared to the pipeline system.
arXiv Detail & Related papers (2021-08-25T19:04:00Z) - Semantic View Synthesis [56.47999473206778]
We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input.
First, we focus on synthesizing the color and depth of the visible surface of the 3D scene.
We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process.
arXiv Detail & Related papers (2020-08-24T17:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.