Related papers: LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation

LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation

URL: http://arxiv.org/abs/2202.03544v1
Date: Mon, 7 Feb 2022 22:12:27 GMT
Title: LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation
Authors: Naina Dhingra
Abstract summary: This paper presents a lightweight network for head pose estimation (HPE) task. The proposed network textitLwPosr uses mixture of depthwise separable convolutional (DSC) and transformer encoder layers.
Score: 2.538209532048867
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper presents a lightweight network for head pose estimation (HPE) task. While previous approaches rely on convolutional neural networks, the proposed network \textit{LwPosr} uses mixture of depthwise separable convolutional (DSC) and transformer encoder layers which are structured in two streams and three stages to provide fine-grained regression for predicting head poses. The quantitative and qualitative demonstration is provided to show that the proposed network is able to learn head poses efficiently while using less parameter space. Extensive ablations are conducted using three open-source datasets namely 300W-LP, AFLW2000, and BIWI datasets. To our knowledge, (1) \textit{LwPosr} is the lightest network proposed for estimating head poses compared to both keypoints-based and keypoints-free approaches; (2) it sets a benchmark for both overperforming the previous lightweight network on mean absolute error and on reducing number of parameters; (3) it is first of its kind to use mixture of DSCs and transformer encoders for HPE. This approach is suitable for mobile devices which require lightweight networks.

Related papers

DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing [55.366556355538954]
We propose the Dynamic Iterative Shrinkage Thresholding Network (DISTA-Net), which reconceptualizes traditional sparse reconstruction within a dynamic framework.<n>DISTA-Net is the first deep learning model designed specifically for the unmixing of closely-spaced infrared small targets.<n>We have established the first open-source ecosystem to foster further research in this field.
arXiv Detail & Related papers (2025-05-25T13:52:00Z)
FLIM-based Salient Object Detection Networks with Adaptive Decoders [40.26047220842738]
This work proposes flyweight networks, hundreds of times lighter than lightweight models, for Object Detection (SOD) It combines a FLIM encoder with an adaptive decoder, whose weights are estimated for each input image by a given function. We compare FLIM models with adaptive decoders for two challenging SOD tasks with three lightweight networks from the state-of-the-art, two FLIM networks with decoders trained by backpropagation, and one FLIM network whose labeled markers define the decoder's weights.
arXiv Detail & Related papers (2025-04-29T15:44:02Z)
Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z)
DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection. It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence. This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z)
Active search and coverage using point-cloud reinforcement learning [50.741409008225766]
This paper presents an end-to-end deep reinforcement learning solution for target search and coverage. We show that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points. We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome.
arXiv Detail & Related papers (2023-12-18T18:16:30Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures. The proposed approach can be applied to general backbones like PointNet and DGCNN. Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z)
Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss [43.950140695759764]
We propose an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. We introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy. The proposed method achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1-38.4 times smaller model in terms of the number of parameters than baseline approaches.
arXiv Detail & Related papers (2021-08-25T07:51:09Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning [67.40866334083941]
We propose an end-to-end 3-D lightweight convolutional neural network (CNN) for limited samples-based HSI classification. Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost. Our model achieves competitive performance for HSI classification compared to several state-of-the-art methods.
arXiv Detail & Related papers (2020-12-07T03:44:35Z)
LiteDepthwiseNet: An Extreme Lightweight Network for Hyperspectral Image Classification [9.571458051525768]
This paper proposes a new network architecture, LiteDepthwiseNet, for hyperspectral image (HSI) classification. LiteDepthwiseNet decomposes standard convolution into depthwise convolution and pointwise convolution, which can achieve high classification performance with minimal parameters. Experiment results on three benchmark hyperspectral datasets show that LiteDepthwiseNet achieves state-of-the-art performance with a very small number of parameters and low computational cost.
arXiv Detail & Related papers (2020-10-15T13:12:17Z)
SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human Pose Regression [27.85790535227085]
We propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second.
arXiv Detail & Related papers (2020-09-04T03:43:24Z)
Resolution Adaptive Networks for Efficient Inference [53.04907454606711]
We propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs. In RANet, the input images are first routed to a lightweight sub-network that efficiently extracts low-resolution representations. High-resolution paths in the network maintain the capability to recognize the "hard" samples.
arXiv Detail & Related papers (2020-03-16T16:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.