Related papers: Generative Multi-Stream Architecture For American Sign Language Recognition

Generative Multi-Stream Architecture For American Sign Language Recognition

URL: http://arxiv.org/abs/2003.08743v1
Date: Mon, 9 Mar 2020 21:04:51 GMT
Title: Generative Multi-Stream Architecture For American Sign Language Recognition
Authors: Dom Huh, Sai Gurrapu, Frederick Olson, Huzefa Rangwala, Parth Pathak, Jana Kosecka
Abstract summary: Training on datasets with low feature-richness for complex applications limit optimal convergence below human performance. We propose a generative multistream architecture, eliminating the need for additional hardware with the intent to improve feature convergence without risking impracticability. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
Score: 15.717424753251674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data at the cost of supplementary hardware, which are fed in streams to counteract this limitation and boost performance. We propose a generative multi-stream architecture, eliminating the need for additional hardware with the intent to improve feature richness without risking impracticability. We also introduce the compact spatio-temporal residual block to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs comparatively to the top C3D residual variant architecture, the pseudo-3D model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.

Related papers

HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation [21.823965837699166]
HDiffTG is a novel 3D Human Pose (3DHCN) method that integrates Transformer, Graph Convolutional Network (GCN), and diffusion model into a unified framework.<n>We show that HDiffTG significantly improves pose estimation accuracy and robustness while maintaining a lightweight design.
arXiv Detail & Related papers (2025-05-07T09:26:37Z)
A Light Perspective for 3D Object Detection [46.23578780480946]
This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process. Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV3. By fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%.
arXiv Detail & Related papers (2025-03-10T10:03:23Z)
Factorized Implicit Global Convolution for Automotive Computational Fluid Dynamics Prediction [52.32698071488864]
We propose Factorized Implicit Global Convolution (FIGConv), a novel architecture that efficiently solves CFD problems for very large 3D meshes. FIGConv achieves quadratic complexity $O(N2)$, a significant improvement over existing 3D neural CFD models. We validate our approach on the industry-standard Ahmed body dataset and the large-scale DrivAerNet dataset.
arXiv Detail & Related papers (2025-02-06T18:57:57Z)
Building Efficient Lightweight CNN Models [0.0]
Convolutional Neural Networks (CNNs) are pivotal in image classification tasks due to their robust feature extraction capabilities. This paper introduces a methodology to construct lightweight CNNs while maintaining competitive accuracy. The proposed model achieved a state-of-the-art accuracy of 99% on the handwritten digit MNIST and 89% on fashion MNIST, with only 14,862 parameters and a model size of 0.17 MB.
arXiv Detail & Related papers (2025-01-26T14:39:01Z)
3D Shape Tokenization via Latent Flow Matching [38.28217561449967]
We introduce a latent 3D representation that models 3D surfaces as probability density functions in 3D, i.e., p(x,y,z), with flow-matching. Our representation is specifically designed for consumption by machine learning models, offering continuity and compactness by construction while requiring only point clouds and minimal data preprocessing.
arXiv Detail & Related papers (2024-12-20T07:22:41Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets. Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction [5.285847977231642]
3D semantic occupancy prediction is crucial for ensuring the safety in autonomous driving. Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features. We propose OccLoff, a framework that Learns to optimize Feature Fusion for 3D occupancy prediction.
arXiv Detail & Related papers (2024-11-06T06:34:27Z)
Lightweight Deep Learning Framework for Accurate Particle Flow Energy Reconstruction [8.598010350935596]
This paper systematically evaluates a deep learning reconstruction framework.<n>We design a hybrid loss function combining weighted mean squared with error structural similarity index.<n>We enhance the model's capability to capture cross-modaltemporal correlations and energy-displacement nonlinearities.
arXiv Detail & Related papers (2024-10-08T11:49:18Z)
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning [55.339257446600634]
We introduce Robin3D, a powerful 3DLLM trained on large-scale instruction-following data. We construct 1 million instruction-following data, consisting of 344K Adversarial samples, 508K Diverse samples, and 165K benchmark training set samples. Robin3D consistently outperforms previous methods across five widely-used 3D multimodal learning benchmarks.
arXiv Detail & Related papers (2024-09-30T21:55:38Z)
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC) Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT [7.8641166297532035]
3D deep learning models play a crucial role in building powerful predictive models of disease progression. In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers. We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period.
arXiv Detail & Related papers (2023-07-25T23:46:48Z)
Robust Category-Level 3D Pose Estimation from Synthetic Data [17.247607850702558]
We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models. We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering.
arXiv Detail & Related papers (2023-05-25T14:56:03Z)
SmoothNets: Optimizing CNN architecture design for differentially private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients. This introduces a reduction in model utility compared to non-private training. We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z)
Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics. Recent neural implicit modeling methods show promising results on synthetic or dense datasets. But, they perform poorly on real-world data that is sparse and noisy. This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z)
Point Transformer for Shape Classification and Retrieval of 3D and ALS Roof PointClouds [3.3744638598036123]
This paper proposes a fully attentional model - em Point Transformer, for deriving a rich point cloud representation. The model's shape classification and retrieval performance are evaluated on a large-scale urban dataset - RoofN3D and a standard benchmark dataset ModelNet40. The proposed method outperforms other state-of-the-art models in the RoofN3D dataset, gives competitive results in the ModelNet40 benchmark, and showcases high robustness to various unseen point corruptions.
arXiv Detail & Related papers (2020-11-08T08:11:02Z)
Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild. We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits. The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.