Related papers: MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

URL: http://arxiv.org/abs/2407.02329v1
Date: Tue, 2 Jul 2024 14:59:37 GMT
Title: MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Authors: Dewei Zhou, You Li, Fan Ma, Zongxin Yang, Yi Yang,
Abstract summary: We introduce the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image. MIG faces three main challenges: avoiding attribute leakage between instances, supporting diverse instance descriptions, and maintaining consistency in iterative generation. We introduce the COCO-MIG and Multimodal-MIG benchmarks to evaluate these methods.
Score: 33.52454028815209
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image, each accurately placed at predefined positions with attributes such as category, color, and shape, strictly following user specifications. MIG faces three main challenges: avoiding attribute leakage between instances, supporting diverse instance descriptions, and maintaining consistency in iterative generation. To address attribute leakage, we propose the Multi-Instance Generation Controller (MIGC). MIGC generates multiple instances through a divide-and-conquer strategy, breaking down multi-instance shading into single-instance tasks with singular attributes, later integrated. To provide more types of instance descriptions, we developed MIGC++. MIGC++ allows attribute control through text \& images and position control through boxes \& masks. Lastly, we introduced the Consistent-MIG algorithm to enhance the iterative MIG ability of MIGC and MIGC++. This algorithm ensures consistency in unmodified regions during the addition, deletion, or modification of instances, and preserves the identity of instances when their attributes are changed. We introduce the COCO-MIG and Multimodal-MIG benchmarks to evaluate these methods. Extensive experiments on these benchmarks, along with the COCO-Position benchmark and DrawBench, demonstrate that our methods substantially outperform existing techniques, maintaining precise control over aspects including position, attribute, and quantity. Project page: https://github.com/limuloo/MIGC.

Related papers

One Arrow, Many Targets: Probing LLMs for Multi-Attribute Controllable Text Summarization [7.734726150561089]
Multi-Attribute Controllable Summarization (MACS) is a well-established task within the natural language processing (NLP) community. This work addresses the gap by examining the MACS task through the lens of large language models. We propose and evaluate a novel hierarchical adapter fusion technique to integrate learnings from two distinct controllable attributes.
arXiv Detail & Related papers (2024-11-02T11:07:25Z)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models. Recent studies extend the SAM to Few-shot Semantic segmentation (FSS) We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z)
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation [18.806738617249426]
Generalized Referring Expression introduces new challenges by allowing expressions to describe multiple objects or lack specific object references. Existing RES methods, usually rely on sophisticated encoder-decoder and feature fusion modules. We propose a novel Model with Adaptive Binding Prototypes (MABP) that adaptively binds queries to object features in the corresponding region.
arXiv Detail & Related papers (2024-05-24T03:07:38Z)
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis [22.27724733876081]
We present a Multi-Instance Generation (MIG) task, simultaneously generating multiple instances with diverse controls in one image. We introduce an innovative approach named Multi-Instance Generation Controller (MIGC) to address the challenges of the MIG task. To evaluate how well generation models perform on the MIG task, we provide a COCO-MIG benchmark along with an evaluation pipeline.
arXiv Detail & Related papers (2024-02-08T04:52:36Z)
GSVA: Generalized Segmentation via Multimodal Large Language Models [72.57095903188922]
Generalized Referring Expression (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. Current solutions to GRES remain unsatisfactory since segmentation MLLMs cannot correctly handle the cases where users might reference multiple subjects in a singular prompt. We propose Generalized Vision Assistant (GSVA) to address this gap.
arXiv Detail & Related papers (2023-12-15T02:54:31Z)
M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition. It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making. Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z)
HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition. The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations. We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z)
Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain. Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z)
A Distributional Lens for Multi-Aspect Controllable Text Generation [17.97374410245602]
Multi-aspect controllable text generation is a more challenging and practical task than single-aspect control. Existing methods achieve complex multi-aspect control by fusing multiple controllers learned from single-aspect. We propose to directly search for the intersection areas of multiple attribute distributions as their combination for generation.
arXiv Detail & Related papers (2022-10-06T13:08:04Z)
SOIT: Segmenting Objects with Instance-Aware Transformers [16.234574932216855]
This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers. Inspired by DETR citecarion 2020end, our method views instance segmentation as a direct set prediction problem. Experimental results on the MS COCO dataset demonstrate that SOIT outperforms state-of-the-art instance segmentation approaches significantly.
arXiv Detail & Related papers (2021-12-21T08:23:22Z)
PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden. We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules. Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.