DMSA: Dynamic Multi-scale Unsupervised Semantic Segmentation Based on
Adaptive Affinity
- URL: http://arxiv.org/abs/2303.00199v1
- Date: Wed, 1 Mar 2023 03:08:30 GMT
- Title: DMSA: Dynamic Multi-scale Unsupervised Semantic Segmentation Based on
Adaptive Affinity
- Authors: Kun Yang, Jun Lu
- Abstract summary: The framework uses Atrous Spatial Pyramid Pooling (ASPP) module to enhance feature extraction.
A Pixel-Adaptive Refinement (PAR) module is introduced, which can adaptively refine the initial pseudo labels.
Experiments show that the proposed DSMA framework is superior to the existing methods on the saliency dataset.
- Score: 11.080515677051455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proposed method in this paper proposes an end-to-end unsupervised
semantic segmentation architecture DMSA based on four loss functions. The
framework uses Atrous Spatial Pyramid Pooling (ASPP) module to enhance feature
extraction. At the same time, a dynamic dilation strategy is designed to better
capture multi-scale context information. Secondly, a Pixel-Adaptive Refinement
(PAR) module is introduced, which can adaptively refine the initial pseudo
labels after feature fusion to obtain high quality pseudo labels. Experiments
show that the proposed DSMA framework is superior to the existing methods on
the saliency dataset. On the COCO 80 dataset, the MIoU is improved by 2.0, and
the accuracy is improved by 5.39. On the Pascal VOC 2012 Augmented dataset, the
MIoU is improved by 4.9, and the accuracy is improved by 3.4. In addition, the
convergence speed of the model is also greatly improved after the introduction
of the PAR module.
Related papers
- AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? [49.64902130083662]
This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random variations, rotation, zooming and intensity-based transformations.<n>The proposed augmentation strategy is evaluated on three models: multi-stream e2eET, FPPR point cloud-based hand gesture recognition (HGR), and DD-Network.
arXiv Detail & Related papers (2025-06-08T16:43:05Z) - LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models [76.8317443926908]
Masked Diffusion Models (MDMs) present a promising paradigm for language modeling.<n>The challenge arises from the high variance in Evidence Lower Bound (ELBO)-based likelihood estimates required for preference optimization.<n>We propose Variance-Reduced Preference Optimization (VRPO), a framework that formally analyzes the variance of ELBO estimators and derives on both the bias and variance of preference optimization gradients.
arXiv Detail & Related papers (2025-05-25T16:36:20Z) - Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation [22.196813133996038]
SAGCN is a distance-based adaptive hierarchical aggregation method.<n>It refines the aggregation process through differentiated representation metrics.<n>Extensive experiments conducted on four real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2025-05-14T17:39:34Z) - LP-DETR: Layer-wise Progressive Relations for Object Detection [4.632366780742503]
LP-DETR (Layer-wise Progressive DETR) is a novel approach that enhances DETR-based object detection through multi-scale relation modeling.
Our method introduces learnable spatial relationships between object queries through a relation-aware self-attention mechanism.
arXiv Detail & Related papers (2025-02-07T18:25:28Z) - Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression [57.40108516085593]
Deep feature instrumental variable (DFIV) regression is a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks.
We prove that the DFIV algorithm achieves the minimax optimal learning rate when the target structural function lies in a Besov space.
arXiv Detail & Related papers (2025-01-09T01:22:22Z) - Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update [16.743187639189976]
Structure-based molecule optimization (SBMO) aims to optimize molecules with both continuous coordinates and discrete types against protein targets.
MolJO is the first gradient-based SBMO framework that facilitates joint guidance signals across different modalities.
MolJO achieves state-of-the-art performance on CrossDocked 2020 benchmark.
arXiv Detail & Related papers (2024-11-20T12:48:29Z) - SZU-AFS Antispoofing System for the ASVspoof 5 Challenge [3.713577625357432]
The SZU-AFS anti-spoofing system was designed for Track 1 of the ASVspoof 5 Challenge under open conditions.
The final fusion system achieves a minDCF of 0.115 and an EER of 4.04% on the evaluation set.
arXiv Detail & Related papers (2024-08-19T12:12:29Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - Self Correspondence Distillation for End-to-End Weakly-Supervised
Semantic Segmentation [13.623713806739271]
We propose a novel Self Correspondence Distillation (SCD) method to refine pseudo-labels without introducing external supervision.
In addition, we design a Variation-aware Refine Module to enhance the local consistency of pseudo-labels.
Our method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2023-02-27T13:46:40Z) - Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [99.349598600887]
Conformer is the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture.
We propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.
arXiv Detail & Related papers (2022-06-02T06:06:29Z) - Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19
Lung Infection Segmentation on CT Images [1.898617934078969]
In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 on CT images.
A deep collaborative supervision scheme is proposed to guide the network learning the features of edges and semantics.
The effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets.
arXiv Detail & Related papers (2021-12-20T07:32:39Z) - PAENet: A Progressive Attention-Enhanced Network for 3D to 2D Retinal
Vessel Segmentation [0.0]
3D to 2D retinal vessel segmentation is a challenging problem in Optical Coherence Tomography Angiography ( OCTA) images.
We propose a Progressive Attention-Enhanced Network (PAENet) based on attention mechanisms to extract rich feature representation.
Our proposed algorithm achieves state-of-the-art performance compared with previous methods.
arXiv Detail & Related papers (2021-08-26T10:27:25Z) - Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers [124.01928050651466]
We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
arXiv Detail & Related papers (2021-08-16T07:09:06Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.