Related papers: BLO-Inst: Bi-Level Optimization Based Alignment of YOLO and SAM for Robust Instance Segmentation

BLO-Inst: Bi-Level Optimization Based Alignment of YOLO and SAM for Robust Instance Segmentation

URL: http://arxiv.org/abs/2601.22061v1
Date: Thu, 29 Jan 2026 17:58:55 GMT
Title: BLO-Inst: Bi-Level Optimization Based Alignment of YOLO and SAM for Robust Instance Segmentation
Authors: Li Zhang, Pengtao Xie,
Abstract summary: We introduce BLO-Inst, a unified framework that aligns detection and segmentation objectives by bi-level optimization.<n>BLO-Inst achieves superior performance, outperforming standard baselines on tasks in general and biomedical domains.
Score: 26.763780360661965
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Segment Anything Model has revolutionized image segmentation with its zero-shot capabilities, yet its reliance on manual prompts hinders fully automated deployment. While integrating object detectors as prompt generators offers a pathway to automation, existing pipelines suffer from two fundamental limitations: objective mismatch, where detectors optimized for geometric localization do not correspond to the optimal prompting context required by SAM, and alignment overfitting in standard joint training, where the detector simply memorizes specific prompt adjustments for training samples rather than learning a generalizable policy. To bridge this gap, we introduce BLO-Inst, a unified framework that aligns detection and segmentation objectives by bi-level optimization. We formulate the alignment as a nested optimization problem over disjoint data splits. In the lower level, the SAM is fine-tuned to maximize segmentation fidelity given the current detection proposals on a subset ($D_1$). In the upper level, the detector is updated to generate bounding boxes that explicitly minimize the validation loss of the fine-tuned SAM on a separate subset ($D_2$). This effectively transforms the detector into a segmentation-aware prompt generator, optimizing the bounding boxes not just for localization accuracy, but for downstream mask quality. Extensive experiments demonstrate that BLO-Inst achieves superior performance, outperforming standard baselines on tasks in general and biomedical domains.

Related papers

Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts [5.225009704851243]
This paper proposes a novel two-stage framework, for zero-shot anomaly segmentation tasks in industrial anomaly detection.<n>To mitigate SAM's inclination towards object segmentation, we propose the Co-Feature Point Prompt Generation module.<n>To further optimize SAM's segmentation results, we introduce the Cascaded Prompts for SAM (CPS) module.
arXiv Detail & Related papers (2025-10-13T05:53:49Z)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models [60.713908578319256]
We propose Direct Discrepancy Learning (DDL) to optimize the detector with task-oriented knowledge.<n>Built upon this, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance.<n>MIRAGE samples human-written texts from 10 corpora across 5 text-domains, which are then re-generated or revised using 17 cutting-edge LLMs.
arXiv Detail & Related papers (2025-09-15T10:59:57Z)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models. Recent studies extend the SAM to Few-shot Semantic segmentation (FSS) We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z)
DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment [7.768332621617199]
We introduce a strong DETR-based detector named Domain Adaptive detection TRansformer ( DATR) for unsupervised domain adaptation of object detection. Our proposed DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Experiments demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios.
arXiv Detail & Related papers (2024-05-20T03:48:45Z)
Fast One-Stage Unsupervised Domain Adaptive Person Search [17.164485293539833]
Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations. We propose a Fast One-stage Unsupervised person Search (FOUS) which integrates complementary domain adaptaion with label adaptaion. FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW.
arXiv Detail & Related papers (2024-05-05T07:15:47Z)
Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD) We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector. Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z)
W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection [64.10643170523414]
We propose a novel WSOD framework with a new paradigm that switches from weak supervision to noisy supervision (W2N) In the localization adaptation module, we propose a regularization loss to reduce the proportion of discriminative parts in original pseudo ground-truths. Our W2N outperforms all existing pure WSOD methods and transfer learning methods.
arXiv Detail & Related papers (2022-07-25T12:13:48Z)
Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field. We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network. An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
Latent Space Regularization for Unsupervised Domain Adaptation in Semantic Segmentation [14.050836886292869]
We introduce feature-level space-shaping regularization strategies to reduce the domain discrepancy in semantic segmentation. We verify the effectiveness of such methods in the autonomous driving setting.
arXiv Detail & Related papers (2021-04-06T16:07:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.