Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image
Segmentation with Multi-agent Reinforcement Learning
- URL: http://arxiv.org/abs/2303.10692v1
- Date: Sun, 19 Mar 2023 15:52:56 GMT
- Title: Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image
Segmentation with Multi-agent Reinforcement Learning
- Authors: Chaofan Ma, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng
Wang, Ya Zhang
- Abstract summary: We propose to model interactive image segmentation with a Markov decision process (MDP) and solve it with reinforcement learning (RL)
Considering the large exploration space for voxel-wise prediction, multi-agent reinforcement learning is adopted, where the voxel-level policy is shared among agents.
Experimental results on four benchmark datasets have shown that the proposed method significantly outperforms the state-of-the-arts.
- Score: 33.181732857907384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive segmentation has recently been explored to effectively and
efficiently harvest high-quality segmentation masks by iteratively
incorporating user hints. While iterative in nature, most existing interactive
segmentation methods tend to ignore the dynamics of successive interactions and
take each interaction independently. We here propose to model iterative
interactive image segmentation with a Markov decision process (MDP) and solve
it with reinforcement learning (RL) where each voxel is treated as an agent.
Considering the large exploration space for voxel-wise prediction and the
dependence among neighboring voxels for the segmentation tasks, multi-agent
reinforcement learning is adopted, where the voxel-level policy is shared among
agents. Considering that boundary voxels are more important for segmentation,
we further introduce a boundary-aware reward, which consists of a global reward
in the form of relative cross-entropy gain, to update the policy in a
constrained direction, and a boundary reward in the form of relative weight, to
emphasize the correctness of boundary predictions. To combine the advantages of
different types of interactions, i.e., simple and efficient for point-clicking,
and stable and robust for scribbles, we propose a supervoxel-clicking based
interaction design. Experimental results on four benchmark datasets have shown
that the proposed method significantly outperforms the state-of-the-arts, with
the advantage of fewer interactions, higher accuracy, and enhanced robustness.
Related papers
- InterFormer: Towards Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction [72.50606292994341]
We propose a novel module named InterFormer to learn heterogeneous information interaction in an interleaving style.
Our proposed InterFormer achieves state-of-the-art performance on three public datasets and a large-scale industrial dataset.
arXiv Detail & Related papers (2024-11-15T00:20:36Z) - IDRNet: Intervention-Driven Relation Network for Semantic Segmentation [34.09179171102469]
Co-occurrent visual patterns suggest that pixel relation modeling facilitates dense prediction tasks.
Despite the impressive results, existing paradigms often suffer from inadequate or ineffective contextual information aggregation.
We propose a novel textbfIntervention-textbfDriven textbfRelation textbfNetwork.
arXiv Detail & Related papers (2023-10-16T18:37:33Z) - Interactive segmentation in aerial images: a new benchmark and an open
access web-based tool [2.729446374377189]
In recent years, interactive semantic segmentation proposed in computer vision has achieved an ideal state of human-computer interaction segmentation.
This study aims to bridge the gap between interactive segmentation and remote sensing analysis by conducting benchmark study on various interactive segmentation models.
arXiv Detail & Related papers (2023-08-25T04:49:49Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Multi-granularity Interaction Simulation for Unsupervised Interactive
Segmentation [38.08152990071453]
We introduce a Multi-granularity Interaction Simulation (MIS) approach to open up a promising direction for unsupervised interactive segmentation.
Our MIS significantly outperforms non-deep learning unsupervised methods and is even comparable with some previous deep-supervised methods without any annotation.
arXiv Detail & Related papers (2023-03-23T16:19:43Z) - Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction [74.52904110197004]
We propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting.
In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes.
We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly.
arXiv Detail & Related papers (2022-01-25T12:44:23Z) - Unlimited Neighborhood Interaction for Heterogeneous Trajectory
Prediction [97.40338982628094]
We propose a simple yet effective Unlimited Neighborhood Interaction Network (UNIN) which predicts trajectories of heterogeneous agents in multiply categories.
Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously.
A hierarchical graph attention module is proposed to obtain category-tocategory interaction and agent-to-agent interaction.
arXiv Detail & Related papers (2021-07-31T13:36:04Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Asynchronous Interaction Aggregation for Action Detection [43.34864954534389]
We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
There are two key designs in it: one is the Interaction Aggregation structure (IA) adopting a uniform paradigm to model and integrate multiple types of interaction; the other is the Asynchronous Memory Update algorithm (AMU) that enables us to achieve better performance.
arXiv Detail & Related papers (2020-04-16T07:03:20Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.