Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey
with Causality Perspectives
- URL: http://arxiv.org/abs/2307.16851v1
- Date: Mon, 31 Jul 2023 17:11:35 GMT
- Title: Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey
with Causality Perspectives
- Authors: Haoyang Liu, Maheep Chaudhary, Haohan Wang
- Abstract summary: The trustworthiness of machine learning has emerged as a critical topic in the field.
This survey presents the background of trustworthy machine learning development using a unified set of concepts.
We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness.
- Score: 11.63431725146897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The trustworthiness of machine learning has emerged as a critical topic in
the field, encompassing various applications and research areas such as
robustness, security, interpretability, and fairness. The last decade saw the
development of numerous methods addressing these challenges. In this survey, we
systematically review these advancements from a data-centric perspective,
highlighting the shortcomings of traditional empirical risk minimization (ERM)
training in handling challenges posed by the data.
Interestingly, we observe a convergence of these methods, despite being
developed independently across trustworthy machine learning subfields. Pearl's
hierarchy of causality offers a unifying framework for these techniques.
Accordingly, this survey presents the background of trustworthy machine
learning development using a unified set of concepts, connects this language to
Pearl's causal hierarchy, and finally discusses methods explicitly inspired by
causality literature. We provide a unified language with mathematical
vocabulary to link these methods across robustness, adversarial robustness,
interpretability, and fairness, fostering a more cohesive understanding of the
field.
Further, we explore the trustworthiness of large pretrained models. After
summarizing dominant techniques like fine-tuning, parameter-efficient
fine-tuning, prompting, and reinforcement learning with human feedback, we draw
connections between them and the standard ERM. This connection allows us to
build upon the principled understanding of trustworthy methods, extending it to
these new techniques in large pretrained models, paving the way for future
methods. Existing methods under this perspective are also reviewed.
Lastly, we offer a brief summary of the applications of these methods and
discuss potential future aspects related to our survey. For more information,
please visit http://trustai.one.
Related papers
- Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - A review on data-driven constitutive laws for solids [0.0]
This review article highlights state-of-the-art data-driven techniques to discover, encode, surrogate, or emulate laws.
Our objective is to provide an organized taxonomy to a large spectrum of methodologies developed in the past decades.
arXiv Detail & Related papers (2024-05-06T17:33:58Z) - Machine Learning Robustness: A Primer [12.426425119438846]
The discussion begins with a detailed definition of robustness, portraying it as the ability of ML models to maintain stable performance across varied and unexpected environmental conditions.
The chapter delves into the factors that impede robustness, such as data bias, model complexity, and the pitfalls of underspecified ML pipelines.
The discussion progresses to explore amelioration strategies for bolstering robustness, starting with data-centric approaches like debiasing and augmentation.
arXiv Detail & Related papers (2024-04-01T03:49:42Z) - Heterogeneous Contrastive Learning for Foundation Models and Beyond [73.74745053250619]
In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive self-supervised learning to model large-scale heterogeneous data.
This survey critically evaluates the current landscape of heterogeneous contrastive learning for foundation models.
arXiv Detail & Related papers (2024-03-30T02:55:49Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Understanding Data Augmentation from a Robustness Perspective [10.063624819905508]
Data augmentation stands out as a pivotal technique to amplify model robustness.
This manuscript takes both a theoretical and empirical approach to understanding the phenomenon.
Our empirical evaluations dissect the intricate mechanisms of emblematic data augmentation strategies.
These insights provide a novel lens through which we can re-evaluate model safety and robustness in visual recognition tasks.
arXiv Detail & Related papers (2023-09-07T10:54:56Z) - Rethinking Bayesian Learning for Data Analysis: The Art of Prior and
Inference in Sparsity-Aware Modeling [20.296566563098057]
Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades.
This article reviews some recent advances in incorporating sparsity-promoting priors into three popular data modeling tools.
arXiv Detail & Related papers (2022-05-28T00:43:52Z) - Causal Reasoning Meets Visual Representation Learning: A Prospective
Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z) - Quantifying Epistemic Uncertainty in Deep Learning [15.494774321257939]
Uncertainty quantification is at the core of the reliability and robustness of machine learning.
We provide a theoretical framework to dissect the uncertainty, especially the textitepistemic component, in deep learning.
We propose two approaches to estimate these uncertainties, one based on influence function and one on variability.
arXiv Detail & Related papers (2021-10-23T03:21:10Z) - Self-Supervised Representation Learning: Introduction, Advances and
Challenges [125.38214493654534]
Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets.
This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data.
arXiv Detail & Related papers (2021-10-18T13:51:22Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.