A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention
- URL: http://arxiv.org/abs/2006.12065v4
- Date: Tue, 9 Feb 2021 21:24:42 GMT
- Title: A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention
- Authors: Gr\'egoire Mialon, Dexiong Chen, Alexandre d'Aspremont, Julien Mairal
- Abstract summary: We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
- Score: 96.77554122595578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of learning on sets of features, motivated by the need
of performing pooling operations in long biological sequences of varying sizes,
with long-range dependencies, and possibly few labeled data. To address this
challenging task, we introduce a parametrized representation of fixed size,
which embeds and then aggregates elements from a given input set according to
the optimal transport plan between the set and a trainable reference. Our
approach scales to large datasets and allows end-to-end training of the
reference, while also providing a simple unsupervised learning mechanism with
small computational cost. Our aggregation technique admits two useful
interpretations: it may be seen as a mechanism related to attention layers in
neural networks, or it may be seen as a scalable surrogate of a classical
optimal transport-based kernel. We experimentally demonstrate the effectiveness
of our approach on biological sequences, achieving state-of-the-art results for
protein fold recognition and detection of chromatin profiles tasks, and, as a
proof of concept, we show promising results for processing natural language
sequences. We provide an open-source implementation of our embedding that can
be used alone or as a module in larger learning models at
https://github.com/claying/OTK.
Related papers
- Self-Supervised Learning with Generative Adversarial Networks for Electron Microscopy [0.0]
We show how self-supervised pretraining facilitates efficient fine-tuning for a spectrum of downstream tasks.
We demonstrate the versatility of self-supervised pretraining across various downstream tasks in the context of electron microscopy.
arXiv Detail & Related papers (2024-02-28T12:25:01Z) - Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Learning Operators with Coupled Attention [9.715465024071333]
We propose a novel operator learning method, LOCA, motivated from the recent success of the attention mechanism.
In our architecture the input functions are mapped to a finite set of features which are then averaged with attention weights that depend on the output query locations.
By coupling these attention weights together with an integral transform, LOCA is able to explicitly learn correlations in the target output functions.
arXiv Detail & Related papers (2022-01-04T08:22:03Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Clustering augmented Self-Supervised Learning: Anapplication to Land
Cover Mapping [10.720852987343896]
We introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning.
We demonstrate the effectiveness of the method on two societally relevant applications.
arXiv Detail & Related papers (2021-08-16T19:35:43Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.