An Epistemic Approach to the Formal Specification of Statistical Machine
Learning
- URL: http://arxiv.org/abs/2004.12734v3
- Date: Sun, 20 Sep 2020 17:51:14 GMT
- Title: An Epistemic Approach to the Formal Specification of Statistical Machine
Learning
- Authors: Yusuke Kawamoto
- Abstract summary: We introduce a formal model for supervised learning based on a Kripke model.
We then formalize various notions of the classification performance, robustness, and fairness of statistical classifiers.
- Score: 1.599072005190786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an epistemic approach to formalizing statistical properties of
machine learning. Specifically, we introduce a formal model for supervised
learning based on a Kripke model where each possible world corresponds to a
possible dataset and modal operators are interpreted as transformation and
testing on datasets. Then we formalize various notions of the classification
performance, robustness, and fairness of statistical classifiers by using our
extension of statistical epistemic logic (StatEL). In this formalization, we
show relationships among properties of classifiers, and relevance between
classification performance and robustness. As far as we know, this is the first
work that uses epistemic models and logical formulas to express statistical
properties of machine learning, and would be a starting point to develop
theories of formal specification of machine learning.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - A process algebraic framework for multi-agent dynamic epistemic systems [55.2480439325792]
We propose a unifying framework for modeling and analyzing multi-agent, knowledge-based, dynamic systems.
On the modeling side, we propose a process algebraic, agent-oriented specification language that makes such a framework easy to use for practical purposes.
arXiv Detail & Related papers (2024-07-24T08:35:50Z) - The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline.
Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood.
The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z) - Towards a Prediction of Machine Learning Training Time to Support
Continuous Learning Systems Development [5.207307163958806]
We present an empirical study of the Full.
Time Complexity (FPTC) approach by Zheng et al.
We study the formulations proposed for the Logistic Regression and Random Forest classifiers.
We observe how, from the conducted study, the prediction of training time is strictly related to the context.
arXiv Detail & Related papers (2023-09-20T11:35:03Z) - Simulation-Based Prior Knowledge Elicitation for Parametric Bayesian Models [2.9172603864294024]
We focus on translating domain expert knowledge into corresponding prior distributions over model parameters, a process known as prior elicitation.
A major challenge for existing elicitation methods is how to effectively utilize all of these different formats in order to formulate prior distributions that align with the expert's expectations, regardless of the model structure.
Our results support the claim that our method is largely independent of the underlying model structure and adaptable to various elicitation techniques, including quantile-based, moment-based, and histogram-based methods.
arXiv Detail & Related papers (2023-08-22T10:43:05Z) - Geometric and Topological Inference for Deep Representations of Complex
Networks [13.173307471333619]
We present a class of statistics that emphasize the topology as well as the geometry of representations.
We evaluate these statistics in terms of the sensitivity and specificity that they afford when used for model selection.
These new methods enable brain and computer scientists to visualize the dynamic representational transformations learned by brains and models.
arXiv Detail & Related papers (2022-03-10T17:14:14Z) - Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing.
Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - A hybrid model-based and learning-based approach for classification
using limited number of training samples [13.60714541247498]
In this paper, a hybrid classification method -- HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers.
The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers.
arXiv Detail & Related papers (2021-06-25T05:19:50Z) - Instance-Based Learning of Span Representations: A Case Study through
Named Entity Recognition [48.06319154279427]
We present a method of instance-based learning that learns similarities between spans.
Our method enables to build models that have high interpretability without sacrificing performance.
arXiv Detail & Related papers (2020-04-29T23:32:42Z) - Structural Regularization [0.0]
We propose a novel method for modeling data by using structural models based on economic theory as regularizers for statistical models.
We show that our method can outperform both the (misspecified) structural model and un-structural-regularized statistical models.
arXiv Detail & Related papers (2020-04-27T06:47:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.