Three approaches to supervised learning for compositional data with
pairwise logratios
- URL: http://arxiv.org/abs/2111.08953v1
- Date: Wed, 17 Nov 2021 07:51:20 GMT
- Title: Three approaches to supervised learning for compositional data with
pairwise logratios
- Authors: Germa Coenders and Michael Greenacre
- Abstract summary: Common approach to compositional data analysis is to transform the data by means of logratios.
Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in many research problems.
We present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The common approach to compositional data analysis is to transform the data
by means of logratios. Logratios between pairs of compositional parts (pairwise
logratios) are the easiest to interpret in many research problems. When the
number of parts is large, some form of logratio selection is a must, for
instance by means of an unsupervised learning method based on a stepwise
selection of the pairwise logratios that explain the largest percentage of the
logratio variance in the compositional dataset. In this article we present
three alternative stepwise supervised learning methods to select the pairwise
logratios that best explain a dependent variable in a generalized linear model,
each geared for a specific problem. The first method features unrestricted
search, where any pairwise logratio can be selected. This method has a complex
interpretation if some pairs of parts in the logratios overlap, but it leads to
the most accurate predictions. The second method restricts parts to occur only
once, which makes the corresponding logratios intuitively interpretable. The
third method uses additive logratios, so that $K-1$ selected logratios involve
exactly $K$ parts. This method in fact searches for the subcomposition with the
highest explanatory power. Once the subcomposition is identified, the
researcher's favourite logratio representation may be used in subsequent
analyses, not only pairwise logratios. Our methodology allows logratios or
non-compositional covariates to be forced into the models based on theoretical
knowledge, and various stopping criteria are available based on information
measures or statistical significance with the Bonferroni correction. We present
an illustration of the three approaches on a dataset from a study predicting
Crohn's disease. The first method excels in terms of predictive power, and the
other two in interpretability.
Related papers
- D2 Pruning: Message Passing for Balancing Diversity and Difficulty in
Data Pruning [70.98091101459421]
Coreset selection seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset.
We propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection.
Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates.
arXiv Detail & Related papers (2023-10-11T23:01:29Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Context-Aware Ensemble Learning for Time Series [11.716677452529114]
We introduce a new approach using a meta learner that effectively combines the base model predictions via using a superset of the features that is the union of the base models' feature vectors instead of the predictions themselves.
Our model does not use the predictions of the base models as inputs to a machine learning algorithm, but choose the best possible combination at each time step based on the state of the problem.
arXiv Detail & Related papers (2022-11-30T10:36:13Z) - Meta-Learning Approaches for a One-Shot Collective-Decision Aggregation:
Correctly Choosing how to Choose Correctly [0.7874708385247353]
We present two one-shot machine-learning-based aggregation approaches.
The first predicts, given multiple features about the collective's choices, which aggregation method will be best for a given case.
The second directly predicts which decision is optimal, given, among other things, the selection made by each method.
arXiv Detail & Related papers (2022-04-03T15:06:59Z) - OneRel:Joint Entity and Relation Extraction with One Module in One Step [42.576188878294886]
Joint entity and relation extraction is an essential task in natural language processing and knowledge graph construction.
We propose a novel joint entity and relation extraction model, named OneRel, which casts joint extraction as a fine-grained triple classification problem.
arXiv Detail & Related papers (2022-03-10T15:09:59Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - sJIVE: Supervised Joint and Individual Variation Explained [0.0]
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in biomedical research.
We propose a method called supervised joint and individual variation explained (sJIVE) that can simultaneously identify shared (joint) and source-specific (individual) underlying structure.
arXiv Detail & Related papers (2021-02-26T02:54:45Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.