Data Understanding Survey: Pursuing Improved Dataset Characterization Via Tensor-based Methods
- URL: http://arxiv.org/abs/2510.14161v1
- Date: Wed, 15 Oct 2025 23:17:54 GMT
- Title: Data Understanding Survey: Pursuing Improved Dataset Characterization Via Tensor-based Methods
- Authors: Matthew D. Merris, Tim Andersen,
- Abstract summary: Existing dataset characterization methods often fail to deliver the deep understanding and insights essential for innovation and explainability.<n>We advocate for the adoption of tensor-based characterization, promising a leap forward in understanding complex datasets and paving the way for intelligent, explainable data-driven discoveries.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the evolving domains of Machine Learning and Data Analytics, existing dataset characterization methods such as statistical, structural, and model-based analyses often fail to deliver the deep understanding and insights essential for innovation and explainability. This work surveys the current state-of-the-art conventional data analytic techniques and examines their limitations, and discusses a variety of tensor-based methods and how these may provide a more robust alternative to traditional statistical, structural, and model-based dataset characterization techniques. Through examples, we illustrate how tensor methods unveil nuanced data characteristics, offering enhanced interpretability and actionable intelligence. We advocate for the adoption of tensor-based characterization, promising a leap forward in understanding complex datasets and paving the way for intelligent, explainable data-driven discoveries.
Related papers
- A Theory of the Mechanics of Information: Generalization Through Measurement of Uncertainty (Learning is Measuring) [0.0]
We introduce a model-free framework using surprisal (information theoretic uncertainty) to analyze and perform inferences from raw data.<n>It eliminates distribution modeling, reducing bias, and enabling efficient updates including direct edits and deletion of training data.<n>It emphasizes traceability, interpretability, and data-driven decision making, offering a unified, human-understandable framework for machine learning.
arXiv Detail & Related papers (2025-10-26T19:45:25Z) - Provenance Networks: End-to-End Exemplar-Based Explainability [0.0]
We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability.<n>Provenance networks learn to link each prediction directly to its supporting training examples as part of the model's normal operation.<n>It addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors.
arXiv Detail & Related papers (2025-10-03T01:48:38Z) - A review on data-driven constitutive laws for solids [0.0]
This review article highlights state-of-the-art data-driven techniques to discover, encode, surrogate, or emulate laws.
Our objective is to provide an organized taxonomy to a large spectrum of methodologies developed in the past decades.
arXiv Detail & Related papers (2024-05-06T17:33:58Z) - Enhancing Explainability in Mobility Data Science through a combination
of methods [0.08192907805418582]
This paper introduces a comprehensive framework that harmonizes pivotal XAI techniques.
LIMEInterpretable Model-a-gnostic Explanations, SHAP, Saliency maps, attention mechanisms, direct trajectory visualization, and Permutation Feature (PFI)
To validate our framework, we undertook a survey to gauge preferences and reception among various user demographics.
arXiv Detail & Related papers (2023-12-01T07:09:21Z) - Persistence-based operators in machine learning [62.997667081978825]
We introduce a class of persistence-based neural network layers.
Persistence-based layers allow the users to easily inject knowledge about symmetries respected by the data, are equipped with learnable weights, and can be composed with state-of-the-art neural architectures.
arXiv Detail & Related papers (2022-12-28T18:03:41Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Latent Properties of Lifelong Learning Systems [59.50307752165016]
We introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms.
We validate the approach for estimating these properties via experiments on synthetic data.
arXiv Detail & Related papers (2022-07-28T20:58:13Z) - Representations of epistemic uncertainty and awareness in data-driven
strategies [0.0]
We present a theoretical model for uncertainty in knowledge representation and its transfer mediated by agents.
We look at inequivalent knowledge representations in terms of inferences, preference relations, and information measures.
We discuss some implications of the proposed model for data-driven strategies.
arXiv Detail & Related papers (2021-10-21T21:18:21Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.