Attribution-Scores in Data Management and Explainable Machine Learning
- URL: http://arxiv.org/abs/2308.00184v1
- Date: Mon, 31 Jul 2023 22:41:17 GMT
- Title: Attribution-Scores in Data Management and Explainable Machine Learning
- Authors: Leopoldo Bertossi
- Abstract summary: We describe recent research on the use of actual causality in the definition of responsibility scores in databases.
In the case of databases, useful connections with database repairs are illustrated and exploited.
For classification models, the responsibility score is properly extended and illustrated.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe recent research on the use of actual causality in the definition
of responsibility scores as explanations for query answers in databases, and
for outcomes from classification models in machine learning. In the case of
databases, useful connections with database repairs are illustrated and
exploited. Repairs are also used to give a quantitative measure of the
consistency of a database. For classification models, the responsibility score
is properly extended and illustrated. The efficient computation of Shap-score
is also analyzed and discussed. The emphasis is placed on work done by the
author and collaborators.
Related papers
- kNN Classification of Malware Data Dependency Graph Features [0.0]
This study obtains accurate classification from the use of features tied to structure and semantics.
By training an accurate model using labeled data, this feature representation of semantics is shown to be correlated with ground truth labels.
Our results provide evidence that data dependency graphs accurately capture both semantic and structural information for increased explainability in classification results.
arXiv Detail & Related papers (2024-06-04T16:39:02Z) - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning)
This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation.
The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z) - From Database Repairs to Causality in Databases and Beyond [0.0]
We describe some recent approaches to score-based explanations for query answers in databases.
Special emphasis is placed on the use of counterfactual reasoning for score specification and computation.
arXiv Detail & Related papers (2023-06-15T04:08:23Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Score-Based Explanations in Data Management and Machine Learning: An
Answer-Set Programming Approach to Counterfactual Analysis [0.0]
We describe some recent approaches to score-based explanations for query answers in databases and outcomes from classification models in machine learning.
Special emphasis is placed on declarative approaches based on answer-set programming to the use of counterfactual reasoning for score specification and computation.
arXiv Detail & Related papers (2021-06-19T19:21:48Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Score-Based Explanations in Data Management and Machine Learning [0.0]
We consider explanations for query answers in databases, and for results from classification models.
The described approaches are mostly of a causal and counterfactual nature.
arXiv Detail & Related papers (2020-07-24T23:13:27Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.