Related papers: Robust Similarity and Distance Learning via Decision Forests

Robust Similarity and Distance Learning via Decision Forests

URL: http://arxiv.org/abs/2007.13843v2
Date: Fri, 21 Aug 2020 15:41:38 GMT
Title: Robust Similarity and Distance Learning via Decision Forests
Authors: Tyler M. Tomita and Joshua T. Vogelstein
Abstract summary: We propose a novel decision forest algorithm for the task of distance learning, which we call Similarity and Metric Random Forests (SMERF) Its ability to approximate arbitrary distances and identify important features is empirically demonstrated on simulated data sets.
Score: 8.587164648430251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Canonical distances such as Euclidean distance often fail to capture the appropriate relationships between items, subsequently leading to subpar inference and prediction. Many algorithms have been proposed for automated learning of suitable distances, most of which employ linear methods to learn a global metric over the feature space. While such methods offer nice theoretical properties, interpretability, and computationally efficient means for implementing them, they are limited in expressive capacity. Methods which have been designed to improve expressiveness sacrifice one or more of the nice properties of the linear methods. To bridge this gap, we propose a highly expressive novel decision forest algorithm for the task of distance learning, which we call Similarity and Metric Random Forests (SMERF). We show that the tree construction procedure in SMERF is a proper generalization of standard classification and regression trees. Thus, the mathematical driving forces of SMERF are examined via its direct connection to regression forests, for which theory has been developed. Its ability to approximate arbitrary distances and identify important features is empirically demonstrated on simulated data sets. Last, we demonstrate that it accurately predicts links in networks.

Related papers

GeoMM: On Geodesic Perspective for Multi-modal Learning [55.41612200877861]
This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time.<n>Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning.
arXiv Detail & Related papers (2025-05-16T13:12:41Z)
A novel gradient-based method for decision trees optimizing arbitrary differential loss functions [2.4861619769660637]
This work introduces a novel method for constructing gradient-based decision trees that optimize arbitrary differentiable loss functions. We demonstrate the method's applicability to classification, regression, and survival analysis tasks. The implementation of the method is publicly available, providing a practical tool for researchers and practitioners.
arXiv Detail & Related papers (2025-03-22T20:25:30Z)
A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs [50.982315553104975]
Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms. Most SMMs are manually built by human experts using bottom-up procedures. We propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner.
arXiv Detail & Related papers (2024-12-02T12:06:41Z)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning. The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms. Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z)
Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals [1.0485739694839669]
Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples. XCBR has been relatively underexplored for many algorithms such as tree-based models until recently.
arXiv Detail & Related papers (2024-08-13T07:08:54Z)
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios. Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z)
Minimal Learning Machine for Multi-Label Learning [0.0]
Distance-based supervised method, the minimal learning machine, constructs a predictive model from data. In this paper, we evaluate how this technique and its core component, the distance mapping, can be adapted to multi-label learning. The proposed approach is based on combining the distance mapping with an inverse distance weighting.
arXiv Detail & Related papers (2023-05-09T15:16:50Z)
Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version] [56.94699829208978]
We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains. We show that our method yields a much better computational complexity than using classical linear programming techniques.
arXiv Detail & Related papers (2023-03-30T11:26:40Z)
Efficient Parametric Approximations of Neural Network Function Space Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification [0.0]
Optimum-Path Forest (OPF) uses a graph-based methodology and a distance measure to create arcs between nodes and hence sets of trees. This work proposes a comparative study over a wide range of distance measures applied to the supervised Optimum-Path Forest classification.
arXiv Detail & Related papers (2022-02-08T13:34:09Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
A Practical Index Structure Supporting Fr\'echet Proximity Queries Among Trajectories [1.9335262420787858]
We present a scalable approach for range and $k$ nearest neighbor queries under computationally expensive metrics. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories. We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world data sets.
arXiv Detail & Related papers (2020-05-28T04:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.