Auto-Encoder based Co-Training Multi-View Representation Learning
- URL: http://arxiv.org/abs/2201.02978v1
- Date: Sun, 9 Jan 2022 10:20:16 GMT
- Title: Auto-Encoder based Co-Training Multi-View Representation Learning
- Authors: Run-kun Lu, Jian-wei Liu, Yuan-fang Wang, Hao-jie Xie, Xin Zuo
- Abstract summary: We propose a novel algorithm called Auto-encoder based Co-training Multi-View Learning (ACMVL)
The algorithm has two stages, the first is to train auto-encoder of each view, and the second stage is to train a supervised network.
According to the experimental result, we can learn a well performed latent feature representation, and auto-encoder of each view has more powerful reconstruction ability than traditional auto-encoder.
- Score: 10.120166898507328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-view learning is a learning problem that utilizes the various
representations of an object to mine valuable knowledge and improve the
performance of learning algorithm, and one of the significant directions of
multi-view learning is sub-space learning. As we known, auto-encoder is a
method of deep learning, which can learn the latent feature of raw data by
reconstructing the input, and based on this, we propose a novel algorithm
called Auto-encoder based Co-training Multi-View Learning (ACMVL), which
utilizes both complementarity and consistency and finds a joint latent feature
representation of multiple views. The algorithm has two stages, the first is to
train auto-encoder of each view, and the second stage is to train a supervised
network. Interestingly, the two stages share the weights partly and assist each
other by co-training process. According to the experimental result, we can
learn a well performed latent feature representation, and auto-encoder of each
view has more powerful reconstruction ability than traditional auto-encoder.
Related papers
- Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning [21.49630640829186]
In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning.
We propose a masked two-channel decoupling framework based on deep neural networks to solve this problem.
Our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data.
arXiv Detail & Related papers (2024-04-26T11:39:50Z) - MV2MAE: Multi-View Video Masked Autoencoders [33.61642891911761]
We present a method for self-supervised learning from synchronized multi-view videos.
We use a cross-view reconstruction task to inject geometry information in the model.
Our approach is based on the masked autoencoder (MAE) framework.
arXiv Detail & Related papers (2024-01-29T05:58:23Z) - DVANet: Disentangling View and Action Features for Multi-View Action
Recognition [56.283944756315066]
We present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video.
Our model and method of training significantly outperforms all other uni-modal models on four multi-view action recognition datasets.
arXiv Detail & Related papers (2023-12-10T01:19:48Z) - CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for
Passage Retrieval [34.08763911138496]
This study brings multi-view modeling to the contextual masked auto-encoder.
We refer to this multi-view pretraining method as CoT-MAE v2.
arXiv Detail & Related papers (2023-04-05T08:00:38Z) - A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision [93.90545426665999]
We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision.
A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well.
It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.
arXiv Detail & Related papers (2023-03-30T13:42:58Z) - i-Code: An Integrative and Composable Multimodal Learning Framework [99.56065789066027]
i-Code is a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations.
The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning.
Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11%.
arXiv Detail & Related papers (2022-05-03T23:38:50Z) - Distilling Audio-Visual Knowledge by Compositional Contrastive Learning [51.20935362463473]
We learn a compositional embedding that closes the cross-modal semantic gap.
We establish a new, comprehensive multi-modal distillation benchmark on three video datasets.
arXiv Detail & Related papers (2021-04-22T09:31:20Z) - Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL)
In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion.
We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z) - Memory-augmented Dense Predictive Coding for Video Representation
Learning [103.69904379356413]
We propose a new architecture and learning framework Memory-augmented Predictive Coding (MemDPC) for the task.
We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both.
In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.
arXiv Detail & Related papers (2020-08-03T17:57:01Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.