Generative Multi-Stream Architecture For American Sign Language
Recognition
- URL: http://arxiv.org/abs/2003.08743v1
- Date: Mon, 9 Mar 2020 21:04:51 GMT
- Title: Generative Multi-Stream Architecture For American Sign Language
Recognition
- Authors: Dom Huh, Sai Gurrapu, Frederick Olson, Huzefa Rangwala, Parth Pathak,
Jana Kosecka
- Abstract summary: Training on datasets with low feature-richness for complex applications limit optimal convergence below human performance.
We propose a generative multistream architecture, eliminating the need for additional hardware with the intent to improve feature convergence without risking impracticability.
Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
- Score: 15.717424753251674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With advancements in deep model architectures, tasks in computer vision can
reach optimal convergence provided proper data preprocessing and model
parameter initialization. However, training on datasets with low
feature-richness for complex applications limit and detriment optimal
convergence below human performance. In past works, researchers have provided
external sources of complementary data at the cost of supplementary hardware,
which are fed in streams to counteract this limitation and boost performance.
We propose a generative multi-stream architecture, eliminating the need for
additional hardware with the intent to improve feature richness without risking
impracticability. We also introduce the compact spatio-temporal residual block
to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs
comparatively to the top C3D residual variant architecture, the pseudo-3D
model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation
accuracy with a variance of 1.42% from training, outperforming past models by
0.45% in validation accuracy and 5.53% in variance.
Related papers
- Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Pretrained Deep 2.5D Models for Efficient Predictive Modeling from
Retinal OCT [7.8641166297532035]
3D deep learning models play a crucial role in building powerful predictive models of disease progression.
In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers.
We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period.
arXiv Detail & Related papers (2023-07-25T23:46:48Z) - Robust Category-Level 3D Pose Estimation from Synthetic Data [17.247607850702558]
We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models.
We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering.
arXiv Detail & Related papers (2023-05-25T14:56:03Z) - SmoothNets: Optimizing CNN architecture design for differentially
private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients.
This introduces a reduction in model utility compared to non-private training.
We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z) - Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate
Model for Real-time Edge Computing [0.0]
This system was found to be very accurate and achieved a 94.5% accuracy of SOTA HRNet 256x192.
Our model adopts an encoder-decoder architecture and is carefully downsized to improve its efficiency.
arXiv Detail & Related papers (2021-11-08T01:44:46Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Point Transformer for Shape Classification and Retrieval of 3D and ALS
Roof PointClouds [3.3744638598036123]
This paper proposes a fully attentional model - em Point Transformer, for deriving a rich point cloud representation.
The model's shape classification and retrieval performance are evaluated on a large-scale urban dataset - RoofN3D and a standard benchmark dataset ModelNet40.
The proposed method outperforms other state-of-the-art models in the RoofN3D dataset, gives competitive results in the ModelNet40 benchmark, and showcases high robustness to various unseen point corruptions.
arXiv Detail & Related papers (2020-11-08T08:11:02Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z) - Improving 3D Object Detection through Progressive Population Based
Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection.
We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations.
We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.