Related papers: TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals

TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals

URL: http://arxiv.org/abs/2404.09474v2
Date: Tue, 14 May 2024 13:26:43 GMT
Title: TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
Authors: Alexander Vedernikov, Puneet Kumar, Haoyu Chen, Tapio Seppanen, Xiaobai Li,
Abstract summary: We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
Score: 58.865901821451295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain and boost processing speed, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form. Evaluated on the EngageNet dataset, the proposed method outperforms existing baselines, utilizing only two behavioral features (head pose rotations) compared to the 98 used in baseline models. Furthermore, comparative analysis shows TCCT-Net's architecture offers an order-of-magnitude improvement in inference speed compared to state-of-the-art image-based Recurrent Neural Network (RNN) methods. The code will be released at https://github.com/vedernikovphoto/TCCT_Net.

Related papers

BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN) We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations. Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z)
Flow reconstruction in time-varying geometries using graph neural networks [1.0485739694839669]
The model incorporates a feature propagation algorithm as a preprocessing step to handle extremely sparse inputs. A binary indicator is introduced as a validity mask to distinguish between the original and propagated data points. The model is trained on a unique data set of Direct Numerical Simulations (DNS) of a motored engine at a technically relevant operating condition.
arXiv Detail & Related papers (2024-11-13T16:49:56Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Wavelet-Inspired Multiscale Graph Convolutional Recurrent Network for Traffic Forecasting [0.0]
We propose a Graph Conal Recurrent Network (WavGCRN) which combines multiscale analysis (MSA)-based method with Deep Learning (DL)-based method. The proposed method can offer well-defined interpretability, powerful learning capability, and competitive forecasting performance on real-world traffic data sets.
arXiv Detail & Related papers (2024-01-11T16:55:48Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Graph-based Multi-ODE Neural Networks for Spatio-Temporal Traffic Forecasting [8.832864937330722]
Long-range traffic forecasting remains a challenging task due to the intricate and extensive-temporal correlations observed in traffic networks. In this paper, we propose a architecture called Graph-based Multi-ODE Neural Networks (GRAM-ODE) which is designed with multiple connective ODE-GNN modules to learn better representations. Our extensive set of experiments conducted on six real-world datasets demonstrate the superior performance of GRAM-ODE compared with state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-30T02:10:42Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
Spatio-temporal Modeling for Large-scale Vehicular Networks Using Graph Convolutional Networks [110.80088437391379]
A graph-based framework called SMART is proposed to model and keep track of the statistics of vehicle-to-temporal (V2I) communication latency across a large geographical area. We develop a graph reconstruction-based approach using a graph convolutional network integrated with a deep Q-networks algorithm. Our results show that the proposed method can significantly improve both the accuracy and efficiency for modeling and the latency performance of large vehicular networks.
arXiv Detail & Related papers (2021-03-13T06:56:29Z)
Densely Connected Recurrent Residual (Dense R2UNet) Convolutional Neural Network for Segmentation of Lung CT Images [0.342658286826597]
We present a synthesis of Recurrent CNN, Residual Network and Dense Convolutional Network based on the U-Net model architecture. The proposed model tested on the benchmark Lung Lesion dataset showed better performance on segmentation tasks than its equivalent models.
arXiv Detail & Related papers (2021-02-01T06:34:10Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.