Does Deep Learning Learn to Abstract? A Systematic Probing Framework
- URL: http://arxiv.org/abs/2302.11978v1
- Date: Thu, 23 Feb 2023 12:50:02 GMT
- Title: Does Deep Learning Learn to Abstract? A Systematic Probing Framework
- Authors: Shengnan An, Zeqi Lin, Bei Chen, Qiang Fu, Nanning Zheng, Jian-Guang
Lou
- Abstract summary: Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context.
We introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective.
- Score: 69.2366890742283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Abstraction is a desirable capability for deep learning models, which means
to induce abstract concepts from concrete instances and flexibly apply them
beyond the learning context. At the same time, there is a lack of clear
understanding about both the presence and further characteristics of this
capability in deep learning models. In this paper, we introduce a systematic
probing framework to explore the abstraction capability of deep learning models
from a transferability perspective. A set of controlled experiments are
conducted based on this framework, providing strong evidence that two probed
pre-trained language models (PLMs), T5 and GPT2, have the abstraction
capability. We also conduct in-depth analysis, thus shedding further light: (1)
the whole training phase exhibits a "memorize-then-abstract" two-stage process;
(2) the learned abstract concepts are gathered in a few middle-layer attention
heads, rather than being evenly distributed throughout the model; (3) the
probed abstraction capabilities exhibit robustness against concept mutations,
and are more robust to low-level/source-side mutations than
high-level/target-side ones; (4) generic pre-training is critical to the
emergence of abstraction capability, and PLMs exhibit better abstraction with
larger model sizes and data scales.
Related papers
- Unified View of Grokking, Double Descent and Emergent Abilities: A
Perspective from Circuits Competition [83.13280812128411]
Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models.
We present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits.
arXiv Detail & Related papers (2024-02-23T08:14:36Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains.
We show that such abstractions are learnable in practical settings through Neural Causal Models.
Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z) - Emergence and Function of Abstract Representations in Self-Supervised
Transformers [0.0]
We study the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes.
We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset.
Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process.
arXiv Detail & Related papers (2023-12-08T20:47:15Z) - AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph [62.685920585838616]
abstraction ability is essential in human intelligence, which remains under-explored in language models.
We present AbsPyramid, a unified entailment graph of 221K textual descriptions of abstraction knowledge.
arXiv Detail & Related papers (2023-11-15T18:11:23Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - A Theory of Abstraction in Reinforcement Learning [18.976500531441346]
In this dissertation, I present a theory of abstraction in reinforcement learning.
I first offer three desiderata for functions that carry out the process of abstraction.
I then present a suite of new algorithms and analysis that clarify how agents can learn to abstract according to these desiderata.
arXiv Detail & Related papers (2022-03-01T12:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.