Generative Multi-Stream Architecture For American Sign Language
Recognition
- URL: http://arxiv.org/abs/2003.08743v1
- Date: Mon, 9 Mar 2020 21:04:51 GMT
- Title: Generative Multi-Stream Architecture For American Sign Language
Recognition
- Authors: Dom Huh, Sai Gurrapu, Frederick Olson, Huzefa Rangwala, Parth Pathak,
Jana Kosecka
- Abstract summary: Training on datasets with low feature-richness for complex applications limit optimal convergence below human performance.
We propose a generative multistream architecture, eliminating the need for additional hardware with the intent to improve feature convergence without risking impracticability.
Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
- Score: 15.717424753251674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With advancements in deep model architectures, tasks in computer vision can
reach optimal convergence provided proper data preprocessing and model
parameter initialization. However, training on datasets with low
feature-richness for complex applications limit and detriment optimal
convergence below human performance. In past works, researchers have provided
external sources of complementary data at the cost of supplementary hardware,
which are fed in streams to counteract this limitation and boost performance.
We propose a generative multi-stream architecture, eliminating the need for
additional hardware with the intent to improve feature richness without risking
impracticability. We also introduce the compact spatio-temporal residual block
to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs
comparatively to the top C3D residual variant architecture, the pseudo-3D
model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation
accuracy with a variance of 1.42% from training, outperforming past models by
0.45% in validation accuracy and 5.53% in variance.
Related papers
- OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction [5.285847977231642]
3D semantic occupancy prediction is crucial for ensuring the safety in autonomous driving.
Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features.
We propose OccLoff, a framework that Learns to optimize Feature Fusion for 3D occupancy prediction.
arXiv Detail & Related papers (2024-11-06T06:34:27Z) - Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning [55.339257446600634]
We introduce Robin3D, a powerful 3DLLM trained on large-scale instruction-following data.
We construct 1 million instruction-following data, consisting of 344K Adversarial samples, 508K Diverse samples, and 165K benchmark training set samples.
Robin3D consistently outperforms previous methods across five widely-used 3D multimodal learning benchmarks.
arXiv Detail & Related papers (2024-09-30T21:55:38Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Pretrained Deep 2.5D Models for Efficient Predictive Modeling from
Retinal OCT [7.8641166297532035]
3D deep learning models play a crucial role in building powerful predictive models of disease progression.
In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers.
We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period.
arXiv Detail & Related papers (2023-07-25T23:46:48Z) - Robust Category-Level 3D Pose Estimation from Synthetic Data [17.247607850702558]
We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models.
We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering.
arXiv Detail & Related papers (2023-05-25T14:56:03Z) - SmoothNets: Optimizing CNN architecture design for differentially
private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients.
This introduces a reduction in model utility compared to non-private training.
We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Point Transformer for Shape Classification and Retrieval of 3D and ALS
Roof PointClouds [3.3744638598036123]
This paper proposes a fully attentional model - em Point Transformer, for deriving a rich point cloud representation.
The model's shape classification and retrieval performance are evaluated on a large-scale urban dataset - RoofN3D and a standard benchmark dataset ModelNet40.
The proposed method outperforms other state-of-the-art models in the RoofN3D dataset, gives competitive results in the ModelNet40 benchmark, and showcases high robustness to various unseen point corruptions.
arXiv Detail & Related papers (2020-11-08T08:11:02Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.