Using Motion Cues to Supervise Single-Frame Body Pose and Shape
Estimation in Low Data Regimes
- URL: http://arxiv.org/abs/2402.02736v1
- Date: Mon, 5 Feb 2024 05:37:48 GMT
- Title: Using Motion Cues to Supervise Single-Frame Body Pose and Shape
Estimation in Low Data Regimes
- Authors: Andrey Davydov, Alexey Sidnev, Artsiom Sanakoyeu, Yuhua Chen, Mathieu
Salzmann, Pascal Fua
- Abstract summary: When enough annotated training data is available, supervised deep-learning algorithms excel at estimating human body pose and shape using a single camera.
We show that, in such cases, easy-to-obtain unannotated videos can be used instead to provide the required supervisory signals.
- Score: 93.69730589828532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When enough annotated training data is available, supervised deep-learning
algorithms excel at estimating human body pose and shape using a single camera.
The effects of too little such data being available can be mitigated by using
other information sources, such as databases of body shapes, to learn priors.
Unfortunately, such sources are not always available either. We show that, in
such cases, easy-to-obtain unannotated videos can be used instead to provide
the required supervisory signals. Given a trained model using too little
annotated data, we compute poses in consecutive frames along with the optical
flow between them. We then enforce consistency between the image optical flow
and the one that can be inferred from the change in pose from one frame to the
next. This provides enough additional supervision to effectively refine the
network weights and to perform on par with methods trained using far more
annotated data.
Related papers
- Enhancing pretraining efficiency for medical image segmentation via transferability metrics [0.0]
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge.
We introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data.
arXiv Detail & Related papers (2024-10-24T12:11:52Z) - EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - Premonition: Using Generative Models to Preempt Future Data Changes in
Continual Learning [63.850451635362425]
Continual learning requires a model to adapt to ongoing changes in the data distribution.
We show that the combination of a large language model and an image generation model can similarly provide useful premonitions.
We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem.
arXiv Detail & Related papers (2024-03-12T06:29:54Z) - Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream.
This poses great challenges given the high correlation between consecutive video frames.
We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z) - Less is More: On the Feature Redundancy of Pretrained Models When
Transferring to Few-shot Tasks [120.23328563831704]
Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data.
We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce.
arXiv Detail & Related papers (2023-10-05T19:00:49Z) - Contrastive Learning for Self-Supervised Pre-Training of Point Cloud
Segmentation Networks With Image Data [7.145862669763328]
Self-supervised pre-training on unlabelled data is one way to reduce the amount of manual annotations needed.
We combine image and point cloud modalities by first learning self-supervised image features and then using these features to train a 3D model.
Our pre-training method only requires a single scan of a scene and can be applied to cases where localization information is unavailable.
arXiv Detail & Related papers (2023-01-18T03:14:14Z) - Self-Supervised Pretraining for 2D Medical Image Segmentation [0.0]
Self-supervised learning offers a way to lower the need for manually annotated data by pretraining models for a specific domain on unlabelled data.
We find that self-supervised pretraining on natural images and target-domain-specific images leads to the fastest and most stable downstream convergence.
In low-data scenarios, supervised ImageNet pretraining achieves the best accuracy, requiring less than 100 annotated samples to realise close to minimal error.
arXiv Detail & Related papers (2022-09-01T09:25:22Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Improving generalization with synthetic training data for deep learning
based quality inspection [0.0]
supervised deep learning requires a large amount of annotated images for training.
In practice, collecting and annotating such data is costly and laborious.
We show the use of randomly generated synthetic training images can help tackle domain instability.
arXiv Detail & Related papers (2022-02-25T16:51:01Z) - Decoupled Appearance and Motion Learning for Efficient Anomaly Detection
in Surveillance Video [9.80717374118619]
We propose a new neural network architecture that learns the normal behavior in a purely unsupervised fashion.
Our model can process 16 to 45 times more frames per second than related approaches.
arXiv Detail & Related papers (2020-11-10T11:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.