Related papers: Gradient-based inference of abstract task representations for generalization in neural networks

Gradient-based inference of abstract task representations for generalization in neural networks

URL: http://arxiv.org/abs/2407.17356v1
Date: Wed, 24 Jul 2024 15:28:08 GMT
Title: Gradient-based inference of abstract task representations for generalization in neural networks
Authors: Ali Hummos, Felipe del Río, Brabeeba Mien Wang, Julio Hurtado, Cristian B. Calderon, Guangyu Robert Yang,
Abstract summary: We show that gradients backpropagated through a neural network to a task representation layer are an efficient way to infer current task demands. We demonstrate that gradient-based inference provides higher learning efficiency and generalization to novel tasks and limits.
Score: 5.794537047184604
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Humans and many animals show remarkably adaptive behavior and can respond differently to the same input depending on their internal goals. The brain not only represents the intermediate abstractions needed to perform a computation but also actively maintains a representation of the computation itself (task abstraction). Such separation of the computation and its abstraction is associated with faster learning, flexible decision-making, and broad generalization capacity. We investigate if such benefits might extend to neural networks trained with task abstractions. For such benefits to emerge, one needs a task inference mechanism that possesses two crucial abilities: First, the ability to infer abstract task representations when no longer explicitly provided (task inference), and second, manipulate task representations to adapt to novel problems (task recomposition). To tackle this, we cast task inference as an optimization problem from a variational inference perspective and ground our approach in an expectation-maximization framework. We show that gradients backpropagated through a neural network to a task representation layer are an efficient heuristic to infer current task demands, a process we refer to as gradient-based inference (GBI). Further iterative optimization of the task representation layer allows for recomposing abstractions to adapt to novel situations. Using a toy example, a novel image classifier, and a language model, we demonstrate that GBI provides higher learning efficiency and generalization to novel tasks and limits forgetting. Moreover, we show that GBI has unique advantages such as preserving information for uncertainty estimation and detecting out-of-distribution samples.

Related papers

Algorithm Development in Neural Networks: Insights from the Streaming Parity Task [8.188549368578704]
We study the learning dynamics of neural networks trained on a streaming parity task.<n>We show that, with sufficient finite training experience, RNNs exhibit a phase transition to perfect infinite generalization.<n>Our results disclose one mechanism by which neural networks can generalize infinitely from finite training experience.
arXiv Detail & Related papers (2025-07-14T04:07:43Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Human-Guided Complexity-Controlled Abstractions [30.38996929410352]
We train neural models to generate a spectrum of discrete representations and control the complexity. We show that tuning the representation to a task-appropriate complexity level supports the highest finetuning performance. Our results indicate a promising direction for rapid model finetuning by leveraging human insight.
arXiv Detail & Related papers (2023-10-26T16:45:34Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks [33.98624423578388]
Auxiliary tasks improve representations learned by deep reinforcement learning agents. We derive a new family of auxiliary tasks based on the successor measure. We show that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms.
arXiv Detail & Related papers (2023-04-25T04:25:08Z)
Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation. MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z)
Learning Abstract and Transferable Representations for Planning [25.63560394067908]
We propose a framework for autonomously learning state abstractions of an agent's environment. These abstractions are task-independent, and so can be reused to solve new tasks. We show how to combine these portable representations with problem-specific ones to generate a sound description of a specific task.
arXiv Detail & Related papers (2022-05-04T14:40:04Z)
Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning. We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z)
Representation Learning Beyond Linear Prediction Functions [33.94130046391917]
We show that diversity can be achieved when source tasks and the target task use different prediction function spaces beyond linear functions. For a general function class, we find that eluder dimension gives a lower bound on the number of tasks required for diversity.
arXiv Detail & Related papers (2021-05-31T14:21:52Z)
Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting. We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions. Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.