Related papers: Generalizing in the Real World with Representation Learning

Generalizing in the Real World with Representation Learning

URL: http://arxiv.org/abs/2210.09925v1
Date: Tue, 18 Oct 2022 15:11:09 GMT
Title: Generalizing in the Real World with Representation Learning
Authors: Tegan Maharaj
Abstract summary: Machine learning (ML) formalizes the problem of getting computers to learn from experience as optimization of performance according to some metric(s) This is in contrast to requiring behaviour specified in advance (e.g. by hard-coded rules) In this thesis I cover some of my work towards better understanding deep net generalization, identify several ways assumptions and problem settings fail to generalize to the real world, and propose ways to address those failures in practice.
Score: 1.3494312389622642
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Machine learning (ML) formalizes the problem of getting computers to learn from experience as optimization of performance according to some metric(s) on a set of data examples. This is in contrast to requiring behaviour specified in advance (e.g. by hard-coded rules). Formalization of this problem has enabled great progress in many applications with large real-world impact, including translation, speech recognition, self-driving cars, and drug discovery. But practical instantiations of this formalism make many assumptions - for example, that data are i.i.d.: independent and identically distributed - whose soundness is seldom investigated. And in making great progress in such a short time, the field has developed many norms and ad-hoc standards, focused on a relatively small range of problem settings. As applications of ML, particularly in artificial intelligence (AI) systems, become more pervasive in the real world, we need to critically examine these assumptions, norms, and problem settings, as well as the methods that have become de-facto standards. There is much we still do not understand about how and why deep networks trained with stochastic gradient descent are able to generalize as well as they do, why they fail when they do, and how they will perform on out-of-distribution data. In this thesis I cover some of my work towards better understanding deep net generalization, identify several ways assumptions and problem settings fail to generalize to the real world, and propose ways to address those failures in practice.

Related papers

Towards Few-Shot Learning in the Open World: A Review and Beyond [52.41344813375177]
Few-shot learning aims to mimic human intelligence by enabling significant generalizations and transferability. This paper presents a review of recent advancements designed to adapt FSL for use in open-world settings. We categorize existing methods into three distinct types of open-world few-shot learning: those involving varying instances, varying classes, and varying distributions.
arXiv Detail & Related papers (2024-08-19T06:23:21Z)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features. This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks. We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z)
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States [52.56827348431552]
gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
arXiv Detail & Related papers (2024-02-12T18:41:31Z)
Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases [9.610231090476857]
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. This paper is an experimental investigation aimed at uncovering the challenges that LMs encounter when tasked with knowledge base question answering (KBQA) Our experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions.
arXiv Detail & Related papers (2023-09-15T12:06:45Z)
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization [3.6393183544320236]
Speech recognition has become an important challenge when using deep learning (DL) It requires large-scale training datasets and high computational and storage resources. Deep transfer learning (DTL) has been introduced to overcome these issues.
arXiv Detail & Related papers (2023-04-27T21:08:05Z)
Fairness and Accuracy under Domain Generalization [10.661409428935494]
Concerns have arisen that machine learning algorithms may be biased against certain social groups. Many approaches have been proposed to make ML models fair, but they typically rely on the assumption that data distributions in training and deployment are identical. We study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains.
arXiv Detail & Related papers (2023-01-30T23:10:17Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
On the Generalization Mystery in Deep Learning [15.2292571922932]
We argue that the answer to two questions lies in the interaction of the gradients of different examples during training. We formalize this argument with an easy to compute and interpretable metric for coherence. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others.
arXiv Detail & Related papers (2022-03-18T16:09:53Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
Out of Distribution Generalization in Machine Learning [0.0]
In everyday situations when models are tested in slightly different data than they were trained on, ML algorithms can fail spectacularly. This research attempts to formally define this problem, what sets of assumptions are reasonable to make in our data. Then, we focus on a certain class of out of distribution problems, their assumptions, and introduce simple algorithms that follow from these assumptions.
arXiv Detail & Related papers (2021-03-03T20:35:19Z)
Underspecification Presents Challenges for Credibility in Modern Machine Learning [95.90009829265297]
Underspecification is common in modern ML pipelines, such as those based on deep learning. We show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice.
arXiv Detail & Related papers (2020-11-06T14:53:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.