Robust Similarity and Distance Learning via Decision Forests
- URL: http://arxiv.org/abs/2007.13843v2
- Date: Fri, 21 Aug 2020 15:41:38 GMT
- Title: Robust Similarity and Distance Learning via Decision Forests
- Authors: Tyler M. Tomita and Joshua T. Vogelstein
- Abstract summary: We propose a novel decision forest algorithm for the task of distance learning, which we call Similarity and Metric Random Forests (SMERF)
Its ability to approximate arbitrary distances and identify important features is empirically demonstrated on simulated data sets.
- Score: 8.587164648430251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Canonical distances such as Euclidean distance often fail to capture the
appropriate relationships between items, subsequently leading to subpar
inference and prediction. Many algorithms have been proposed for automated
learning of suitable distances, most of which employ linear methods to learn a
global metric over the feature space. While such methods offer nice theoretical
properties, interpretability, and computationally efficient means for
implementing them, they are limited in expressive capacity. Methods which have
been designed to improve expressiveness sacrifice one or more of the nice
properties of the linear methods. To bridge this gap, we propose a highly
expressive novel decision forest algorithm for the task of distance learning,
which we call Similarity and Metric Random Forests (SMERF). We show that the
tree construction procedure in SMERF is a proper generalization of standard
classification and regression trees. Thus, the mathematical driving forces of
SMERF are examined via its direct connection to regression forests, for which
theory has been developed. Its ability to approximate arbitrary distances and
identify important features is empirically demonstrated on simulated data sets.
Last, we demonstrate that it accurately predicts links in networks.
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals [1.0485739694839669]
Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples.
XCBR has been relatively underexplored for many algorithms such as tree-based models until recently.
arXiv Detail & Related papers (2024-08-13T07:08:54Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a new framework based on large language models (LLMs) and decision Tree reasoning (OCTree)
Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space.
Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - Minimal Learning Machine for Multi-Label Learning [0.0]
Distance-based supervised method, the minimal learning machine, constructs a predictive model from data.
In this paper, we evaluate how this technique and its core component, the distance mapping, can be adapted to multi-label learning.
The proposed approach is based on combining the distance mapping with an inverse distance weighting.
arXiv Detail & Related papers (2023-05-09T15:16:50Z) - Data-driven abstractions via adaptive refinements and a Kantorovich
metric [extended version] [56.94699829208978]
We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems.
In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains.
We show that our method yields a much better computational complexity than using classical linear programming techniques.
arXiv Detail & Related papers (2023-03-30T11:26:40Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Comparative Study Between Distance Measures On Supervised Optimum-Path
Forest Classification [0.0]
Optimum-Path Forest (OPF) uses a graph-based methodology and a distance measure to create arcs between nodes and hence sets of trees.
This work proposes a comparative study over a wide range of distance measures applied to the supervised Optimum-Path Forest classification.
arXiv Detail & Related papers (2022-02-08T13:34:09Z) - A Practical Index Structure Supporting Fr\'echet Proximity Queries Among
Trajectories [1.9335262420787858]
We present a scalable approach for range and $k$ nearest neighbor queries under computationally expensive metrics.
Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories.
We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world data sets.
arXiv Detail & Related papers (2020-05-28T04:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.