An Efficient MLP-based Point-guided Segmentation Network for Ore Images
with Ambiguous Boundary
- URL: http://arxiv.org/abs/2402.17370v1
- Date: Tue, 27 Feb 2024 10:09:29 GMT
- Title: An Efficient MLP-based Point-guided Segmentation Network for Ore Images
with Ambiguous Boundary
- Authors: Guodong Sun, Yuting Peng, Le Cheng, Mengya Xu, An Wang, Bo Wu,
Hongliang Ren, Yang Zhang
- Abstract summary: This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses on solving the problem of edge burring.
Our approach achieves a remarkable processing speed of over 27 frames per second with a model size of only 73 MB.
Our method delivers a consistently high level of accuracy, with impressive performance scores of 60.4 and 48.9 in$AP_50box$ and$AP_50mask$ respectively.
- Score: 12.258442550351178
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The precise segmentation of ore images is critical to the successful
execution of the beneficiation process. Due to the homogeneous appearance of
the ores, which leads to low contrast and unclear boundaries, accurate
segmentation becomes challenging, and recognition becomes problematic. This
paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP),
which focuses on solving the problem of edge burring. Specifically, we
introduce a lightweight backbone better suited for efficiently extracting
low-level features. Besides, we design a feature pyramid network consisting of
two MLP structures that balance local and global information thus enhancing
detection accuracy. Furthermore, we propose a novel loss function that guides
the prediction points to match the instance edge points to achieve clear object
boundaries. We have conducted extensive experiments to validate the efficacy of
our proposed method. Our approach achieves a remarkable processing speed of
over 27 frames per second (FPS) with a model size of only 73 MB. Moreover, our
method delivers a consistently high level of accuracy, with impressive
performance scores of 60.4 and 48.9 in~$AP_{50}^{box}$ and~$AP_{50}^{mask}$
respectively, as compared to the currently available state-of-the-art
techniques, when tested on the ore image dataset. The source code will be
released at \url{https://github.com/MVME-HBUT/ORENEXT}.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Efficient Segmentation with Texture in Ore Images Based on
Box-supervised Approach [6.6773975364173]
A box-supervised technique with texture features is proposed to identify complete and independent ores.
The proposed method achieves over 50 frames per second with a small model size of 21.6 MB.
The method maintains a high level of accuracy compared with the state-of-the-art approaches on ore image dataset.
arXiv Detail & Related papers (2023-11-10T08:28:22Z) - Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text
Image Super-Resolution [22.60056946339325]
We propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling.
The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features.
We demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy.
arXiv Detail & Related papers (2023-09-16T08:12:12Z) - Improving Pixel-based MIM by Reducing Wasted Modeling Capability [77.99468514275185]
We propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction.
To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures.
Our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation.
arXiv Detail & Related papers (2023-08-01T03:44:56Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - Rethinking Network Design and Local Geometry in Point Cloud: A Simple
Residual MLP Framework [55.40001810884942]
We introduce a pure residual network, called PointMLP, which integrates no sophisticated local geometrical extractors but still performs very competitively.
On the real-world ScanObjectNN dataset, our method even surpasses the prior best method by 3.3% accuracy.
Compared to most recent CurveNet, PointMLP trains 2x faster, tests 7x faster, and is more accurate on ModelNet40 benchmark.
arXiv Detail & Related papers (2022-02-15T01:39:07Z) - FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [6.613724825924151]
We propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled features.
We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN)
In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within Mask-Former.
arXiv Detail & Related papers (2021-08-16T12:52:42Z) - A Coarse-to-Fine Instance Segmentation Network with Learning Boundary
Representation [10.967299485260163]
Boundary-based instance segmentation has drawn much attention since of its attractive efficiency.
Existing methods suffer from the difficulty in long-distance regression.
We propose a coarse-to-fine module to address the problem.
arXiv Detail & Related papers (2021-06-18T16:37:28Z) - Progressively Guided Alternate Refinement Network for RGB-D Salient
Object Detection [63.18846475183332]
We aim to develop an efficient and compact deep network for RGB-D salient object detection.
We propose a progressively guided alternate refinement network to refine it.
Our model outperforms existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2020-08-17T02:55:06Z) - Towards Accurate Pixel-wise Object Tracking by Attention Retrieval [50.06436600343181]
We propose an attention retrieval network (ARN) to perform soft spatial constraints on backbone features.
We set a new state-of-the-art on recent pixel-wise object tracking benchmark VOT 2020 while running at 40 fps.
arXiv Detail & Related papers (2020-08-06T16:25:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.