Related papers: 3D Motion Perception of Binocular Vision Target with PID-CNN

3D Motion Perception of Binocular Vision Target with PID-CNN

URL: http://arxiv.org/abs/2511.20332v2
Date: Mon, 01 Dec 2025 07:41:45 GMT
Title: 3D Motion Perception of Binocular Vision Target with PID-CNN
Authors: Jiazhao Shi, Pan Pan, Haotian Shi,
Abstract summary: This article trained a network for perceiving three-dimensional coordinate errors, velocity, and acceleration, and has a basic perception capability.<n>Designed a relatively small convolutional neural network, with a total of 17 layers and 413 thousand parameters.
Score: 10.329773750968926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This article trained a network for perceiving three-dimensional motion information of binocular vision target, which can provide real-time three-dimensional coordinate, velocity, and acceleration, and has a basic spatiotemporal perception capability. Understood the ability of neural networks to fit nonlinear problems from the perspective of PID. Considered a single-layer neural network as using a second-order difference equation and a nonlinearity to describe a local problem. Multilayer networks gradually transform the raw representation to the desired representation through multiple such combinations. Analysed some reference principles for designing neural networks. Designed a relatively small PID convolutional neural network, with a total of 17 layers and 413 thousand parameters. Implemented a simple but practical feature reuse method by concatenation and pooling. The network was trained and tested using the simulated randomly moving ball datasets, and the experimental results showed that the prediction accuracy was close to the upper limit that the input image resolution can represent. Analysed the experimental results and errors, as well as the existing shortcomings and possible directions for improvement. Finally, discussed the advantages of high-dimensional convolution in improving computational efficiency and feature space utilization. As well as the potential advantages of using PID information to implement memory and attention mechanisms.

Related papers

Exploring Superposition and Interference in State-of-the-Art Low-Parameter Vision Models [0.0]
We address interference in feature maps, a phenomenon associated with superposition, where neurons simultaneously encode multiple characteristics.<n>Our research suggests that limiting interference can enhance scaling and accuracy in very low-scaled networks (under 1.5M parameters)<n>We propose a proof-of-concept architecture named NoDepth Bottleneck built on mechanistic insights from our experiments, demonstrating robust scaling accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2025-07-21T16:57:25Z)
3DPyranet Features Fusion for Spatio-temporal Feature Learning [2.327279581393927]
3D pyramidal neural pyramid called 3DPyraNet and a discriminative approach for classifier-temporal feature learning called 3DPyraNet-F are proposed.<n>3DPyraNet-F extract the features maps of the highest layer of the learned network, fuse them in a single vector, and provide it as input in a way to a linear-SVM.<n>Results are reported with 3DPyraNet in real-world environments, especially in the presence of camera induced motion.
arXiv Detail & Related papers (2025-04-26T17:32:37Z)
PiLocNet: Physics-informed neural network on 3D localization with rotating point spread function [3.029152208453665]
We propose a novel enhancement of our previously introduced localization neural network, LocNet.<n>The improved network is a physics-informed neural network (PINN) that we call PiLocNet.<n>Although the paper focuses on the use of single-lobe rotating PSF to encode the full 3D source location, we expect the method to be widely applicable to other PSFs and imaging problems.
arXiv Detail & Related papers (2024-10-17T07:49:23Z)
N-BVH: Neural ray queries with bounding volume hierarchies [51.430495562430565]
In 3D computer graphics, the bulk of a scene's memory usage is due to polygons and textures. We devise N-BVH, a neural compression architecture designed to answer arbitrary ray queries in 3D. Our method provides faithful approximations of visibility, depth, and appearance attributes.
arXiv Detail & Related papers (2024-05-25T13:54:34Z)
SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences. It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z)
NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction. The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network. A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z)
Parameter Convex Neural Networks [13.42851919291587]
We propose the exponential multilayer neural network (EMLP) which is convex with regard to the parameters of the neural network under some conditions. For late experiments, we use the same architecture to make the exponential graph convolutional network (EGCN) and do the experiment on the graph classificaion dataset.
arXiv Detail & Related papers (2022-06-11T16:44:59Z)
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks. We generalize observations on small neural networks to more complex systems. An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z)
Model-inspired Deep Learning for Light-Field Microscopy with Application to Neuron Localization [27.247818386065894]
We propose a model-inspired deep learning approach to perform fast and robust 3D localization of sources using light-field microscopy images. This is achieved by developing a deep network that efficiently solves a convolutional sparse coding problem. Experiments on localization of mammalian neurons from light-fields show that the proposed approach simultaneously provides enhanced performance, interpretability and efficiency.
arXiv Detail & Related papers (2021-03-10T16:24:47Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation [76.21696417873311]
We introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.
arXiv Detail & Related papers (2020-03-25T10:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.