SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks
- URL: http://arxiv.org/abs/2508.03566v1
- Date: Tue, 05 Aug 2025 15:36:13 GMT
- Title: SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks
- Authors: Xinyu Xiong, Zihuang Wu, Lei Zhang, Lei Lu, Ming Li, Guanbin Li,
- Abstract summary: We propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet.<n>We extend the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder.<n>Our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs.
- Score: 50.97089872043121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have highlighted the potential of adapting the Segment Anything Model (SAM) for various downstream tasks. However, constructing a more powerful and generalizable encoder to further enhance performance remains an open challenge. In this work, we propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet while extending the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder. By incorporating a dual-resolution strategy and a dense glue layer, our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs. Extensive experiments conducted on four benchmarks, including dichotomous image segmentation, camouflaged object detection, marine animal segmentation, and remote sensing saliency detection, demonstrate the superior performance of our proposed method. The code is available at https://github.com/WZH0120/SAM2-UNeXT.
Related papers
- DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining [30.564216896513596]
Few-shot semantic segmentation has gained increasing interest due to its generalization capability.<n>Recent approaches have turned to foundation models to enhance representation transferability.<n>We propose FS-DINO, with only DINOv2's encoder and a lightweight segmenter.
arXiv Detail & Related papers (2025-04-22T07:47:06Z) - DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation.<n>Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts.<n>Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z) - Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2 [0.0]
This report focuses on the construction of the overall Det-SAM2 framework and the subsequent engineering optimization applied to SAM2.<n>We present a case demonstrating an application built on the Det-SAM2 framework: AI refereeing in a billiards scenario, derived from our business context.
arXiv Detail & Related papers (2024-11-28T07:58:30Z) - SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM [62.85895749882285]
Marine Animal (MAS) involves segmenting animals within marine environments.
We propose a novel feature learning framework, named Dual-SAM for high-performance MAS.
Our proposed method achieves state-of-the-art performances on five widely-used MAS datasets.
arXiv Detail & Related papers (2024-04-07T15:34:40Z) - Deep Instruction Tuning for Segment Anything Model [68.7934961590075]
Segment Anything Model (SAM) has become a research hotspot in the fields of multimedia and computer vision.
SAM can support different types of segmentation prompts, but it performs much worse on text-instructed tasks.
We propose two simple yet effective deep instruction tuning (DIT) methods for SAM, one is end-to-end and the other is layer-wise.
arXiv Detail & Related papers (2024-03-31T11:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.