Related papers: Analysis of Latent-Space Motion for Collaborative Intelligence

Related papers

Langevin Flows for Modeling Neural Latent Dynamics [81.81271685018284]
We introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation.<n>Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and forces -- to represent both autonomous and non-autonomous processes in neural systems.<n>Our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor.
arXiv Detail & Related papers (2025-07-15T17:57:48Z)
Multi-Modal Gesture Recognition from Video and Surgical Tool Pose Information via Motion Invariants [9.77463802740227]
Recognizing surgical gestures in real-time is a stepping stone towards automated activity recognition, skill assessment, intra-operative assistance, and eventually surgical automation. While some recent works in multi-modal neural networks learn the relationships between vision and kinematics data, current approaches treat kinematics information as independent signals, with no underlying relation between tool-tip poses. We show that gesture recognition improves when combining invariant signals with tool position, achieving 90.3% frame-wise accuracy on the JIGSAWS suturing dataset.
arXiv Detail & Related papers (2025-03-19T19:02:58Z)
Equivariant Graph Neural Operator for Modeling 3D Dynamics [148.98826858078556]
We propose Equivariant Graph Neural Operator (EGNO) to directly models dynamics as trajectories instead of just next-step prediction. EGNO explicitly learns the temporal evolution of 3D dynamics where we formulate the dynamics as a function over time and learn neural operators to approximate it. Comprehensive experiments in multiple domains, including particle simulations, human motion capture, and molecular dynamics, demonstrate the significantly superior performance of EGNO against existing methods.
arXiv Detail & Related papers (2024-01-19T21:50:32Z)
Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes [24.723536390322582]
tensor decomposition is an important tool for multiway data analysis. We propose Dynamic EMbedIngs fOr dynamic algorithm dEcomposition (DEMOTE) We show the advantage of our approach in both simulation study and real-world applications.
arXiv Detail & Related papers (2023-10-30T15:49:45Z)
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs) They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias. In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z)
Relational Self-Attention: What's Missing in Attention for Video Understanding [52.38780998425556]
We introduce a relational feature transform, dubbed the relational self-attention (RSA) Our experiments and ablation studies show that the RSA network substantially outperforms convolution and self-attention counterparts.
arXiv Detail & Related papers (2021-11-02T15:36:11Z)
EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z)
Tensor Representations for Action Recognition [54.710267354274194]
Human actions in sequences are characterized by the complex interplay between spatial features and their temporal dynamics. We propose novel tensor representations for capturing higher-order relationships between visual features for the task of action recognition. We use higher-order tensors and so-called Eigenvalue Power Normalization (NEP) which have been long speculated to perform spectral detection of higher-order occurrences.
arXiv Detail & Related papers (2020-12-28T17:27:18Z)
On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks [0.0]
We show that neural networks (PINNs) struggle in cases where the target functions to be approximated exhibit high-frequency or multi-scale features. We construct novel architectures that employ multi-scale random observational features and justify how such coordinate embedding layers can lead to robust and accurate PINN models.
arXiv Detail & Related papers (2020-12-18T04:19:30Z)
MotionSqueeze: Neural Motion Feature Learning for Video Understanding [46.82376603090792]
Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information. In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features. We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost.
arXiv Detail & Related papers (2020-07-20T08:30:14Z)
Understanding Recurrent Neural Networks Using Nonequilibrium Response Theory [5.33024001730262]
Recurrent neural networks (RNNs) are brain-inspired models widely used in machine learning for analyzing sequential data. We show how RNNs process input signals using the response theory from nonequilibrium statistical mechanics.
arXiv Detail & Related papers (2020-06-19T10:09:09Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.