Single Point, Full Mask: Velocity-Guided Level Set Evolution for End-to-End Amodal Segmentation
- URL: http://arxiv.org/abs/2508.01661v1
- Date: Sun, 03 Aug 2025 08:36:13 GMT
- Title: Single Point, Full Mask: Velocity-Guided Level Set Evolution for End-to-End Amodal Segmentation
- Authors: Zhixuan Li, Yujia Liu, Chen Hui, Weisi Lin,
- Abstract summary: Amodal segmentation aims to recover complete object shapes, including occluded regions with no visual appearance.<n>Existing methods rely on strong prompts, such as visible masks or bounding boxes, which are costly or impractical to obtain in real-world settings.<n>We propose VELA, which performs explicit evolution from point-based prompts.
- Score: 41.188891367216804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amodal segmentation aims to recover complete object shapes, including occluded regions with no visual appearance, whereas conventional segmentation focuses solely on visible areas. Existing methods typically rely on strong prompts, such as visible masks or bounding boxes, which are costly or impractical to obtain in real-world settings. While recent approaches such as the Segment Anything Model (SAM) support point-based prompts for guidance, they often perform direct mask regression without explicitly modeling shape evolution, limiting generalization in complex occlusion scenarios. Moreover, most existing methods suffer from a black-box nature, lacking geometric interpretability and offering limited insight into how occluded shapes are inferred. To deal with these limitations, we propose VELA, an end-to-end VElocity-driven Level-set Amodal segmentation method that performs explicit contour evolution from point-based prompts. VELA first constructs an initial level set function from image features and the point input, which then progressively evolves into the final amodal mask under the guidance of a shape-specific motion field predicted by a fully differentiable network. This network learns to generate evolution dynamics at each step, enabling geometrically grounded and topologically flexible contour modeling. Extensive experiments on COCOA-cls, D2SA, and KINS benchmarks demonstrate that VELA outperforms existing strongly prompted methods while requiring only a single-point prompt, validating the effectiveness of interpretable geometric modeling under weak guidance. The code will be publicly released.
Related papers
- PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter [54.33433051500349]
We propose Point Mamba Adapter (PMA), which constructs an ordered feature sequence from all layers of the pre-trained model.<n>We also propose a geometry-constrained gate prompt generator (G2PG) shared across different layers.
arXiv Detail & Related papers (2025-05-27T09:27:16Z) - Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA [49.10341970643037]
Amodal segmentation aims to infer the complete shape of occluded objects, even when the occluded region's appearance is unavailable.<n>Current amodal segmentation methods lack the capability to interact with users through text input.<n>We propose a novel task named amodal reasoning segmentation, aiming to predict the complete amodal shape of occluded objects.
arXiv Detail & Related papers (2025-03-13T10:08:18Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.<n>Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.<n>We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - BLADE: Box-Level Supervised Amodal Segmentation through Directed
Expansion [10.57956193654977]
Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision.
We present a novel solution by introducing a directed expansion approach from visible masks to corresponding amodal masks.
Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect.
arXiv Detail & Related papers (2024-01-03T09:37:03Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - Growing Instance Mask on Leaf [12.312639923806548]
We present a single-shot method, called textbfVeinMask, for achieving competitive performance in low design complexity.
Considering the superiorities above, we propose VeinMask to formulate the instance segmentation problem.
VeinMask performs much better than other contour-based methods in low design complexity.
arXiv Detail & Related papers (2022-11-30T04:50:56Z) - Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z) - Learning Vector Quantized Shape Code for Amodal Blastomere Instance
Segmentation [33.558545104711186]
Amodal instance segmentation aims to recover the complete silhouette of an object even when the object is not fully visible.
We propose to classify input features into intermediate shape codes and recover complete object shapes from them.
Our method would enable accurate measurement of blastomeres in in vitro fertilization (IVF) clinics.
arXiv Detail & Related papers (2020-12-02T06:17:28Z) - The Devil is in the Boundary: Exploiting Boundary Representation for
Basis-based Instance Segmentation [85.153426159438]
We propose Basis based Instance(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods.
Our B2Inst leads to consistent improvements and accurately parses out the instance boundaries in a scene.
arXiv Detail & Related papers (2020-11-26T11:26:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.