Related papers: Object segmentation in the wild with foundation models: application to vision assisted neuro-prostheses for upper limbs

Object segmentation in the wild with foundation models: application to vision assisted neuro-prostheses for upper limbs

URL: http://arxiv.org/abs/2507.18517v1
Date: Thu, 24 Jul 2025 15:40:44 GMT
Title: Object segmentation in the wild with foundation models: application to vision assisted neuro-prostheses for upper limbs
Authors: Bolutife Atoki, Jenny Benois-Pineau, Renaud Péteri, Fabien Baldacci, Aymar de Rugy,
Abstract summary: We investigate whether foundation models, trained on a large number and variety of objects, can perform object segmentation without fine-tuning on specific images containing everyday objects.<n>We propose a method for generating prompts based on gaze fixations to guide the Segment Anything Model (SAM) in our segmentation scenario.<n> Evaluation results of our approach show an improvement of the IoU segmentation quality metric by up to 0.51 points on real-world challenging data of Grasping-in-the-Wild corpus.
Score: 2.7554193753662015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we address the problem of semantic object segmentation using foundation models. We investigate whether foundation models, trained on a large number and variety of objects, can perform object segmentation without fine-tuning on specific images containing everyday objects, but in highly cluttered visual scenes. The ''in the wild'' context is driven by the target application of vision guided upper limb neuroprostheses. We propose a method for generating prompts based on gaze fixations to guide the Segment Anything Model (SAM) in our segmentation scenario, and fine-tune it on egocentric visual data. Evaluation results of our approach show an improvement of the IoU segmentation quality metric by up to 0.51 points on real-world challenging data of Grasping-in-the-Wild corpus which is made available on the RoboFlow Platform (https://universe.roboflow.com/iwrist/grasping-in-the-wild)

Related papers

Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images. Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms. We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
In-context segmentation aims to segment objects using given reference images.<n>Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries.<n>This work approaches the problem from a fresh perspective - unlocking the capability of the latent diffusion model for in-context segmentation.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
Annotation Free Semantic Segmentation with Vision Foundation Models [11.026377387506216]
We generate free annotations for any semantic segmentation dataset using existing foundation models. We use CLIP to detect objects and SAM to generate high quality object masks. Our approach can bring language-based semantics to any pretrained vision encoder with minimal training.
arXiv Detail & Related papers (2024-03-14T11:57:58Z)
Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer) We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments. We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z)
LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks. We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole. Our method first partitions the motion field by minimizing the mutual information between segments. It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.