Exploring Scale-Measures of Data Sets
- URL: http://arxiv.org/abs/2102.02576v1
- Date: Thu, 4 Feb 2021 12:29:15 GMT
- Title: Exploring Scale-Measures of Data Sets
- Authors: Tom Hanika and Johannes Hirth
- Abstract summary: A profound theory for scaling data is scale-measures, as developed in the field of formal concept analysis.
Recent developments indicate that the set of all scale-measures for a given data set constitutes a lattice.
We propose a novel scale-measure exploration algorithm that is based on the well-known and proven attribute exploration approach.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measurement is a fundamental building block of numerous scientific models and
their creation. This is in particular true for data driven science. Due to the
high complexity and size of modern data sets, the necessity for the development
of understandable and efficient scaling methods is at hand. A profound theory
for scaling data is scale-measures, as developed in the field of formal concept
analysis. Recent developments indicate that the set of all scale-measures for a
given data set constitutes a lattice and does hence allow efficient exploring
algorithms. In this work we study the properties of said lattice and propose a
novel scale-measure exploration algorithm that is based on the well-known and
proven attribute exploration approach. Our results motivate multiple
applications in scale recommendation, most prominently (semi-)automatic
scaling.
Related papers
- Scaling Up Diffusion and Flow-based XGBoost Models [5.944645679491607]
We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models.
With better implementation it can be scaled to datasets 370x larger than previously used.
We present results on large-scale scientific datasets as part of the Fast Calorimeter Simulation Challenge.
arXiv Detail & Related papers (2024-08-28T18:00:00Z) - (Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points.
Our metric leads to the conceptual definition of generative distances and generative geodesics.
Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z) - Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - Scaling Laws For Dense Retrieval [22.76001461620846]
We investigate whether the performance of dense retrieval models follows the scaling law as other neural models.
Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations.
arXiv Detail & Related papers (2024-03-27T15:27:36Z) - Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method.
We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Intrinsic Dimension for Large-Scale Geometric Learning [0.0]
A naive approach to determine the dimension of a dataset is based on the number of attributes.
More sophisticated methods derive a notion of intrinsic dimension (ID) that employs more complex feature functions.
arXiv Detail & Related papers (2022-10-11T09:50:50Z) - Revisiting Neural Scaling Laws in Language and Vision [43.57394336742374]
We argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting parameters.
We present a recipe for estimating scaling law parameters reliably from learning curves.
We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains.
arXiv Detail & Related papers (2022-09-13T09:41:51Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Smooth densities and generative modeling with unsupervised random
forests [1.433758865948252]
An important application for density estimators is synthetic data generation.
We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints.
We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators.
arXiv Detail & Related papers (2022-05-19T09:50:25Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.