Learning Robust Feature Representations for Scene Text Detection
- URL: http://arxiv.org/abs/2005.12466v1
- Date: Tue, 26 May 2020 01:06:47 GMT
- Title: Learning Robust Feature Representations for Scene Text Detection
- Authors: Sihwan Kim and Taejang Park
- Abstract summary: We present a network architecture derived from the loss to maximize conditional log-likelihood.
By extending the layer of latent variables to multiple layers, the network is able to learn robust features on scale.
In experiments, the proposed algorithm significantly outperforms state-of-the-art methods in terms of both recall and precision.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text detection based on deep neural networks have progressed
substantially over the past years. However, previous state-of-the-art methods
may still fall short when dealing with challenging public benchmarks because
the performances of algorithm are determined by the robust features extraction
and components in network architecture. To address this issue, we will present
a network architecture derived from the loss to maximize conditional
log-likelihood by optimizing the lower bound with a proper approximate
posterior that has shown impressive performance in several generative models.
In addition, by extending the layer of latent variables to multiple layers, the
network is able to learn robust features on scale with no task-specific
regularization or data augmentation. We provide a detailed analysis and show
the results on three public benchmark datasets to confirm the efficiency and
reliability of the proposed algorithm. In experiments, the proposed algorithm
significantly outperforms state-of-the-art methods in terms of both recall and
precision. Specifically, it achieves an H-mean of 95.12 and 96.78 on ICDAR 2011
and ICDAR 2013, respectively.
Related papers
- Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point
Clouds [3.3888257250564364]
This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar.
The sparsity and capturing temporal-topological features of mmWave data is still a problem.
We introduce graph structure and topological features to the point cloud and propose a semantic segmentation framework.
Our model achieves mean accuracy on a custom dataset by $mathbf82.31%$ and outperforms state-of-the-art algorithms.
arXiv Detail & Related papers (2023-04-27T12:28:06Z) - Quantifying uncertainty for deep learning based forecasting and
flow-reconstruction using neural architecture search ensembles [0.8258451067861933]
We present an automated approach to deep neural network (DNN) discovery and demonstrate how this may also be utilized for ensemble-based uncertainty quantification.
We highlight how the proposed method not only discovers high-performing neural network ensembles for our tasks, but also quantifies uncertainty seamlessly.
We demonstrate the feasibility of this framework for two tasks - forecasting from historical data and flow reconstruction from sparse sensors for the sea-surface temperature.
arXiv Detail & Related papers (2023-02-20T03:57:06Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Hybridization of Capsule and LSTM Networks for unsupervised anomaly
detection on multivariate data [0.0]
This paper introduces a novel NN architecture which hybridises the Long-Short-Term-Memory (LSTM) and Capsule Networks into a single network.
The proposed method uses an unsupervised learning technique to overcome the issues with finding large volumes of labelled training data.
arXiv Detail & Related papers (2022-02-11T10:33:53Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - ScalingNet: extracting features from raw EEG data for emotion
recognition [4.047737925426405]
We propose a novel convolutional layer allowing to adaptively extract effective data-driven spectrogram-like features from raw EEG signals.
The proposed neural network architecture based on the scaling layer, references as ScalingNet, has achieved the state-of-the-art result across the established DEAP benchmark dataset.
arXiv Detail & Related papers (2021-02-07T08:54:27Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.