Related papers: FOCUS: Towards Universal Foreground Segmentation

FOCUS: Towards Universal Foreground Segmentation

URL: http://arxiv.org/abs/2501.05238v1
Date: Thu, 09 Jan 2025 13:44:15 GMT
Title: FOCUS: Towards Universal Foreground Segmentation
Authors: Zuyao You, Lingyu Kong, Lingchen Meng, Zuxuan Wu,
Abstract summary: Foreground segmentation is a fundamental task in computer vision, encompassing various subdivision tasks.<n>Previous research has typically designed task-specific architectures for each task, leading to a lack of unification.<n>We introduce FOCUS, the Foreground ObjeCts Universal framework that can handle multiple foreground tasks.
Score: 32.60315411785438
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Foreground segmentation is a fundamental task in computer vision, encompassing various subdivision tasks. Previous research has typically designed task-specific architectures for each task, leading to a lack of unification. Moreover, they primarily focus on recognizing foreground objects without effectively distinguishing them from the background. In this paper, we emphasize the importance of the background and its relationship with the foreground. We introduce FOCUS, the Foreground ObjeCts Universal Segmentation framework that can handle multiple foreground tasks. We develop a multi-scale semantic network using the edge information of objects to enhance image features. To achieve boundary-aware segmentation, we propose a novel distillation method, integrating the contrastive learning strategy to refine the prediction mask in multi-modal feature space. We conduct extensive experiments on a total of 13 datasets across 5 tasks, and the results demonstrate that FOCUS consistently outperforms the state-of-the-art task-specific models on most metrics.

Related papers

Multimodal Referring Segmentation: A Survey [93.24051010753817]
Multimodal referring segmentation aims to segment target objects in visual scenes, such as images, videos, and 3D scenes, based on referring expressions in text or audio format.<n>Over the past decade, it has gained significant attention in the multimodal community, driven by advances in convolutional neural networks, transformers, and large language models.
arXiv Detail & Related papers (2025-08-01T02:14:00Z)
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation [19.987706084203523]
We propose Panoptic Perception, a novel task and a new fine-grained dataset (FineGrip) to achieve a more thorough and universal interpretation for RSIs. The new task integrates pixel-level, instance-level, and image-level information for universal image perception. FineGrip dataset includes 2,649 remote sensing images, 12,054 fine-grained instance segmentation masks belonging to 20 foreground things categories, 7,599 background semantic masks for 5 stuff classes and 13,245 captioning sentences.
arXiv Detail & Related papers (2024-04-06T12:27:21Z)
ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer [91.43066633305662]
We propose a novel underlineComPlementary underlinetransformer, textbfComPtr, for diverse bi-source dense prediction tasks. ComPtr treats different inputs equally and builds an efficient dense interaction model in the form of sequence-to-sequence on top of the transformer.
arXiv Detail & Related papers (2023-07-23T15:17:45Z)
A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks. These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation. Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z)
AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation. We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z)
Sharp Eyes: A Salient Object Detector Working The Same Way as Human Visual Characteristics [3.222802562733787]
We propose a sharp eyes network (SENet) that first seperates the object from scene, and then finely segments it. The proposed method aims to utilize the expanded objects to guide the network obtain complete prediction.
arXiv Detail & Related papers (2023-01-18T11:00:45Z)
Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples. We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z)
AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance. We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations. AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
Empirical Study of Multi-Task Hourglass Model for Semantic Segmentation Task [0.7614628596146599]
We propose to use a multi-task approach by complementing the semantic segmentation task with edge detection, semantic contour, and distance transform tasks. We demonstrate the effectiveness of learning in a multi-task setting for hourglass models in the Cityscapes, CamVid, and Freiburg Forest datasets.
arXiv Detail & Related papers (2021-05-28T01:08:10Z)
Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction. We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.