Related papers: Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

URL: http://arxiv.org/abs/2212.08663v2
Date: Wed, 23 Aug 2023 17:59:57 GMT
Title: Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Authors: Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu
Abstract summary: Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Data augmentation lies at the core for creating the information gap. In this paper, we explore the channel dimension for generic data augmentation by exploiting precision redundancy.
Score: 89.00646449740606
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Among many techniques, data augmentation lies at the core for creating the information gap. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation by exploiting precision redundancy. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. Our approach significantly surpasses existing generic data augmentation methods, while showing on par performance against modality-specific augmentations. We comprehensively evaluate our approach on vision, audio, 3D point clouds, as well as the DABS benchmark which is comprised of various data modalities. The code is available at https: //github.com/microsoft/random_quantize.

Related papers

Robust estimation of the intrinsic dimension of data sets with quantum cognition machine learning [31.347602507204847]
We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning. We learn a representation of each data point as a quantum state, encoding both local properties of the point as well as its relation with the entire data. Inspired by ideas from quantum geometry, we then construct from the quantum states a point cloud equipped with a quantum metric. The proposed estimator is based on the detection of this spectral gap.
arXiv Detail & Related papers (2024-09-19T14:24:35Z)
Population Transformer: Learning Population-level Representations of Neural Activity [29.18788640048468]
We present a self-supervised framework that learns population-level codes for arbitrary ensembles of neural recordings at scale. We address key challenges in scaling models with neural time-series data, namely, sparse and variable electrode distribution across subjects and datasets. We release our code as well as a pretrained PopT to enable off-the-shelf improvements in multi-channel intracranial data decoding and interpretability.
arXiv Detail & Related papers (2024-06-05T08:15:09Z)
Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment [24.545341041444797]
Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods fail to catch them simultaneously. In this work, instead of stacking up models, a more elegant data sampling method is explored, which compacts both the local and global content in a regular input size. Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity.
arXiv Detail & Related papers (2024-01-05T03:12:03Z)
SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network [1.4732811715354455]
We introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network) A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh. The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub.
arXiv Detail & Related papers (2023-06-30T11:49:00Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples [8.975667614727652]
We propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples. The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization.
arXiv Detail & Related papers (2021-11-04T04:52:50Z)
Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation [53.95297550117153]
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking emphat a fraction of their entries only. The proposed approach is particularly useful for large-scale multidimensional grid data, and for tasks that require context over a large receptive field.
arXiv Detail & Related papers (2021-05-29T08:39:57Z)
Data Augmentation for Object Detection via Differentiable Neural Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce. Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data. We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z)
Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation. We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z)
Scribble-Supervised Semantic Segmentation by Random Walk on Neural Representation and Self-Supervision on Neural Eigenspace [10.603823180750446]
This work aims to achieve semantic segmentation supervised by scribble label directly without auxiliary information and other intermediate manipulation. We impose diffusion on neural representation by random walk and consistency on neural eigenspace by self-supervision. The results demonstrate the superiority of the proposed method and are even comparable to some full-label supervised ones.
arXiv Detail & Related papers (2020-11-11T08:22:25Z)
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space. We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step. We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.