RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation
- URL: http://arxiv.org/abs/2410.11722v2
- Date: Thu, 24 Oct 2024 15:48:41 GMT
- Title: RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation
- Authors: Anton Antonov, Andrey Moskalenko, Denis Shepelev, Alexander Krapukhin, Konstantin Soshin, Anton Konushin, Vlad Shakhuro,
- Abstract summary: We conduct a large crowdsourcing study of click patterns in an interactive segmentation scenario and collect 475K real-user clicks.
Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks.
According to our benchmark, in real-world usage interactive segmentation models may perform worse than it has been reported in the baseline benchmark, and most of the methods are not robust.
- Score: 37.44155289954746
- License:
- Abstract: The emergence of Segment Anything (SAM) sparked research interest in the field of interactive segmentation, especially in the context of image editing tasks and speeding up data annotation. Unlike common semantic segmentation, interactive segmentation methods allow users to directly influence their output through prompts (e.g. clicks). However, click patterns in real-world interactive segmentation scenarios remain largely unexplored. Most methods rely on the assumption that users would click in the center of the largest erroneous area. Nevertheless, recent studies show that this is not always the case. Thus, methods may have poor performance in real-world deployment despite high metrics in a baseline benchmark. To accurately simulate real-user clicks, we conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns. According to our benchmark, in real-world usage interactive segmentation models may perform worse than it has been reported in the baseline benchmark, and most of the methods are not robust. We believe that RClicks is a significant step towards creating interactive segmentation methods that provide the best user experience in real-world cases.
Related papers
- Scale Disparity of Instances in Interactive Point Cloud Segmentation [15.865365305312174]
We propose ClickFormer, an innovative interactive point cloud segmentation model that accurately segments instances of both thing and stuff categories.
We employ global attention in the query-voxel transformer to mitigate the risk of generating false positives.
Experiments demonstrate that ClickFormer outperforms existing interactive point cloud segmentation methods across both indoor and outdoor datasets.
arXiv Detail & Related papers (2024-07-19T03:45:48Z) - TETRIS: Towards Exploring the Robustness of Interactive Segmentation [39.1981941213761]
We propose a methodology for finding extreme user inputs by a direct optimization in a white-box adversarial attack on the interactive segmentation model.
We report the results of an extensive evaluation of dozens of models.
arXiv Detail & Related papers (2024-02-09T01:36:21Z) - Interactive segmentation in aerial images: a new benchmark and an open
access web-based tool [2.729446374377189]
In recent years, interactive semantic segmentation proposed in computer vision has achieved an ideal state of human-computer interaction segmentation.
This study aims to bridge the gap between interactive segmentation and remote sensing analysis by conducting benchmark study on various interactive segmentation models.
arXiv Detail & Related papers (2023-08-25T04:49:49Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Contour-based Interactive Segmentation [4.164728134421114]
We consider a natural form of user interaction as a loose contour, and introduce a contour-based interactive segmentation method.
We demonstrate that a single contour provides the same accuracy as multiple clicks, thus reducing the required amount of user interactions.
arXiv Detail & Related papers (2023-02-13T13:35:26Z) - Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction [74.52904110197004]
We propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting.
In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes.
We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly.
arXiv Detail & Related papers (2022-01-25T12:44:23Z) - WeClick: Weakly-Supervised Video Semantic Segmentation with Click
Annotations [64.52412111417019]
We propose an effective weakly-supervised video semantic segmentation pipeline with click annotations, called WeClick.
Since detailed semantic information is not captured by clicks, directly training with click labels leads to poor segmentation predictions.
WeClick outperforms the state-of-the-art methods, increases performance by 10.24% mIoU than baseline, and achieves real-time execution.
arXiv Detail & Related papers (2021-07-07T09:12:46Z) - Localized Interactive Instance Segmentation [24.55415554455844]
We propose a clicking scheme wherein user interactions are restricted to the proximity of the object.
We demonstrate the effectiveness of our proposed clicking scheme and localization strategy through detailed experimentation.
arXiv Detail & Related papers (2020-10-18T23:24:09Z) - FAIRS -- Soft Focus Generator and Attention for Robust Object
Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks.
We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.