Related papers: SAM$^{*}$: Task-Adaptive SAM with Physics-Guided Rewards

SAM$^{*}$: Task-Adaptive SAM with Physics-Guided Rewards

URL: http://arxiv.org/abs/2509.07047v1
Date: Mon, 08 Sep 2025 13:51:20 GMT
Title: SAM$^{*}$: Task-Adaptive SAM with Physics-Guided Rewards
Authors: Kamyar Barakati, Utkarsh Pratiush, Sheryl L. Sanchez, Aditya Raghavan, Delia J. Milliron, Mahshid Ahmadi, Philip D. Rack, Sergei V. Kalinin,
Abstract summary: Image segmentation is a critical task in microscopy, essential for accurately analyzing and interpreting complex visual data.<n>Here, we introduce a reward function-based optimization to fine-tune foundational models.<n>We demonstrate the effectiveness of this approach in microscopy imaging, where precise segmentation is crucial for analyzing cellular structures, material interfaces, and nanoscale features.
Score: 0.5805874695844994
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Image segmentation is a critical task in microscopy, essential for accurately analyzing and interpreting complex visual data. This task can be performed using custom models trained on domain-specific datasets, transfer learning from pre-trained models, or foundational models that offer broad applicability. However, foundational models often present a considerable number of non-transparent tuning parameters that require extensive manual optimization, limiting their usability for real-time streaming data analysis. Here, we introduce a reward function-based optimization to fine-tune foundational models and illustrate this approach for SAM (Segment Anything Model) framework by Meta. The reward functions can be constructed to represent the physics of the imaged system, including particle size distributions, geometries, and other criteria. By integrating a reward-driven optimization framework, we enhance SAM's adaptability and performance, leading to an optimized variant, SAM$^{*}$, that better aligns with the requirements of diverse segmentation tasks and particularly allows for real-time streaming data segmentation. We demonstrate the effectiveness of this approach in microscopy imaging, where precise segmentation is crucial for analyzing cellular structures, material interfaces, and nanoscale features.

Related papers

Understanding the Transfer Limits of Vision Foundation Models [38.99867932557529]
Foundation models leverage large-scale pretraining to capture extensive knowledge, demonstrating generalization in a wide range of language tasks.<n>We postulate that this limitation arises from a mismatch between pretraining objectives and the demands of downstream vision-and-imaging tasks.<n>Pretraining strategies like masked image reconstruction or contrastive learning shape representations for tasks such as recovery of generic visual patterns or global semantic structures.<n>Our findings indicate that better alignment between pretraining and downstream tasks, measured by simple divergence metrics such as maximum-mean-discrepancy (MMD) between the same features before and after fine-tuning, correlates with greater performance improvements and
arXiv Detail & Related papers (2026-01-22T12:07:56Z)
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model [76.08429266631823]
We propose an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM)<n>URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction.<n> Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches.
arXiv Detail & Related papers (2025-11-02T13:45:51Z)
Reward driven discovery of the optimal microstructure representations with invariant variational autoencoders [0.015295722752489374]
Variational Autoencoders (VAEs) provide a powerful means of constructing such low-dimensional representations.<n>VAEs are often optimized through trial-and-error and empirical analysis.<n>We investigated reward-based strategies for evaluating latent space representations.
arXiv Detail & Related papers (2025-09-30T20:15:42Z)
Adapting Segment Anything Model (SAM) to Experimental Datasets via Fine-Tuning on GAN-based Simulation: A Case Study in Additive Manufacturing [1.8547557605937304]
Segment Anything Model (SAM) is designed for general-purpose image segmentation.<n>In this work, we explore the application and limitations of SAM for industrial X-ray CT inspection of additive manufacturing components.<n>We propose a fine-tuning strategy utilizing parameter-efficient techniques, specifically Conv-LoRa, to adapt SAM for material-specific datasets.
arXiv Detail & Related papers (2024-12-16T02:11:19Z)
Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments. We propose UOIS-SAM, a data-efficient solution for the UOIS task. UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z)
Physics-based reward driven image analysis in microscopy [5.581609660066545]
We present a methodology based on the concept of a Reward Function to optimize image analysis dynamically. The Reward Function is engineered to closely align with the experimental objectives and broader context. We extend the reward function approach towards the identification of partially-disordered regions, creating a physics-driven reward function and action space of high-dimensional clustering.
arXiv Detail & Related papers (2024-04-22T12:55:04Z)
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks [5.661631789478932]
We propose a universal foundation model for medical image analysis that processes images with heterogeneous spatial properties using a unified structure. We pre-train a spatial adaptive visual tokenizer (SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked image modeling (MIM) on 55 public medical image datasets. The experimental results on downstream medical image classification and segmentation tasks demonstrate the superior performance and label efficiency of our model.
arXiv Detail & Related papers (2023-12-12T08:33:45Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance [12.169801149021566]
The Segment Anything Model (SAM) has emerged as a versatile tool for image segmentation without specific domain training. Traditional models like nnUNet perform automatic segmentation during inference but need extensive domain-specific training. We propose nnSAM, integrating SAM's robust feature extraction with nnUNet's automatic configuration to enhance segmentation accuracy on small datasets.
arXiv Detail & Related papers (2023-09-29T04:26:25Z)
Topological Data Analysis Guided Segment Anything Model Prompt Optimization for Zero-Shot Segmentation in Biological Imaging [5.795215830149858]
We propose topological data analysis guided prompt optimization for the Segment Anything Model (SAM) Our results show that the TDA optimized point cloud is much better suited for finding small objects and massively reduces computational complexity.
arXiv Detail & Related papers (2023-06-30T05:00:38Z)
Shared Space Transfer Learning for analyzing multi-site fMRI data [83.41324371491774]
Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data. MVPA works best with a well-designed feature set and an adequate sample size. Most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes. This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning approach.
arXiv Detail & Related papers (2020-10-24T08:50:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.