Related papers: The Gauss-Markov Adjunction: Categorical Semantics of Residuals in Supervised Learning

The Gauss-Markov Adjunction: Categorical Semantics of Residuals in Supervised Learning

URL: http://arxiv.org/abs/2507.02442v1
Date: Thu, 03 Jul 2025 08:58:59 GMT
Title: The Gauss-Markov Adjunction: Categorical Semantics of Residuals in Supervised Learning
Authors: Moto Kamiura,
Abstract summary: This paper develops a semantic framework for structuring and understanding AI systems.<n>By defining two concrete categories corresponding to parameters and data, along with an adjoint pair of functors between them, we introduce our categorical formulation of supervised learning.<n>We position this formulation as an instance of extended denotational semantics for supervised learning, and propose applying a semantic perspective developed in theoretical computer science as a formal foundation for Explicability in AI.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enhancing the intelligibility and interpretability of machine learning is a crucial task in responding to the demand for Explicability as an AI principle, and in promoting the better social implementation of AI. The aim of our research is to contribute to this improvement by reformulating machine learning models through the lens of category theory, thereby developing a semantic framework for structuring and understanding AI systems. Our categorical modeling in this paper clarifies and formalizes the structural interplay between residuals and parameters in supervised learning. The present paper focuses on the multiple linear regression model, which represents the most basic form of supervised learning. By defining two concrete categories corresponding to parameters and data, along with an adjoint pair of functors between them, we introduce our categorical formulation of supervised learning. We show that the essential structure of this framework is captured by what we call the Gauss-Markov Adjunction. Within this setting, the dual flow of information can be explicitly described as a correspondence between variations in parameters and residuals. The ordinary least squares estimator for the parameters and the minimum residual are related via the preservation of limits by the right adjoint functor. Furthermore, we position this formulation as an instance of extended denotational semantics for supervised learning, and propose applying a semantic perspective developed in theoretical computer science as a formal foundation for Explicability in AI.

Related papers

Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws [52.10468229008941]
This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting.<n>We provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model.<n>Building on these insights, we introduce a novel method for Contrastive Language-Image Pretraining with a reference model, termed DRRho-CLIP.
arXiv Detail & Related papers (2025-05-10T16:55:03Z)
Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models [0.0]
We introduce new mathematical constructs, including hyper-symmetry categories and functorial representations, to model complex transformations within machine learning algorithms. Our contributions include the design of symmetry-enriched learning models, the development of advanced optimization techniques leveraging categorical symmetries, and the theoretical analysis of their implications for model robustness, generalization, and convergence.
arXiv Detail & Related papers (2024-09-18T16:20:57Z)
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning [48.59516337905877]
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Recent work has developed theoretical insights into these algorithms. We take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective.
arXiv Detail & Related papers (2024-06-04T07:22:12Z)
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models [52.77133661679439]
Investigating internal reasoning mechanisms of large language models can help us design better model architectures and training strategies. In this study, we constructed a symbolic dataset to investigate the mechanisms by which Transformer models employ vertical thinking strategy. We proposed a random matrix-based algorithm to enhance the model's reasoning ability, resulting in a 75% reduction in the training time required for the GPT-2 model.
arXiv Detail & Related papers (2024-05-24T07:41:26Z)
Token Space: A Category Theory Framework for AI Computations [0.0]
This paper introduces the Token Space framework, a novel mathematical construct designed to enhance the interpretability and effectiveness of deep learning models. By establishing a categorical structure at the Token level, we provide a new lens through which AI computations can be understood.
arXiv Detail & Related papers (2024-04-11T15:56:06Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory. We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z)
Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z)
Categorical semantics of compositional reinforcement learning [18.406992961818368]
We develop a knowledge representation framework for a compositional theory of reinforcement learning (RL)<n>Our approach relies on the theoretical study of the category $mathsfMDP$, whose objects are Markov decision processes (MDPs) acting as models of tasks.<n>We introduce zig-zag diagrams that rely on the compositional guarantees engendered by the category $mathsfMDP$.
arXiv Detail & Related papers (2022-08-29T15:51:36Z)
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers [40.666590079580544]
We propose an explainable approach for relation extraction that mitigates the tension between generalization and explainability. Our approach uses a multi-task learning architecture, which jointly trains a classifier for relation extraction. We convert the model outputs to rules to bring global explanations to this approach.
arXiv Detail & Related papers (2022-04-25T03:53:12Z)
Understanding Interpretability by generalized distillation in Supervised Classification [3.5473853445215897]
Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex Machine Learning models. We propose an interpretation-by-distillation formulation that is defined relative to other ML models. We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets.
arXiv Detail & Related papers (2020-12-05T17:42:50Z)
Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.