Learning Language Representations with Logical Inductive Bias
- URL: http://arxiv.org/abs/2302.09458v1
- Date: Sun, 19 Feb 2023 02:21:32 GMT
- Title: Learning Language Representations with Logical Inductive Bias
- Authors: Jianshu Chen
- Abstract summary: We explore a new logical inductive bias for better language representation learning.
We develop a novel neural architecture named FOLNet to encode this new inductive bias.
We find that the self-attention module in transformers can be composed by two of our neural logic operators.
- Score: 19.842271716111153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer architectures have achieved great success in solving natural
language tasks, which learn strong language representations from large-scale
unlabeled texts. In this paper, we seek to go further beyond and explore a new
logical inductive bias for better language representation learning. Logic
reasoning is known as a formal methodology to reach answers from given
knowledge and facts. Inspired by such a view, we develop a novel neural
architecture named FOLNet (First-Order Logic Network), to encode this new
inductive bias. We construct a set of neural logic operators as learnable Horn
clauses, which are further forward-chained into a fully differentiable neural
architecture (FOLNet). Interestingly, we find that the self-attention module in
transformers can be composed by two of our neural logic operators, which
probably explains their strong reasoning performance. Our proposed FOLNet has
the same input and output interfaces as other pretrained models and thus could
be pretrained/finetuned by using similar losses. It also allows FOLNet to be
used in a plug-and-play manner when replacing other pretrained models. With our
logical inductive bias, the same set of ``logic deduction skills'' learned
through pretraining are expected to be equally capable of solving diverse
downstream tasks. For this reason, FOLNet learns language representations that
have much stronger transfer capabilities. Experimental results on several
language understanding tasks show that our pretrained FOLNet model outperforms
the existing strong transformer-based approaches.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic.
To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv Detail & Related papers (2023-10-09T06:54:02Z) - Modeling rapid language learning by distilling Bayesian priors into
artificial neural networks [18.752638142258668]
We show that learning from limited naturalistic data is possible with an approach that combines the strong inductive biases of a Bayesian model with the flexible representations of a neural network.
The resulting system can learn formal linguistic patterns from a small number of examples.
It can also learn aspects of English syntax from a corpus of natural language.
arXiv Detail & Related papers (2023-05-24T04:11:59Z) - Join-Chain Network: A Logical Reasoning View of the Multi-head Attention
in Transformer [59.73454783958702]
We propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions.
In particular, we demonstrate that such an ensemble of join-chains can express a broad subset of ''tree-structured'' first-order logical expressions, named FOET.
We find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic predicate space.
arXiv Detail & Related papers (2022-10-06T07:39:58Z) - Pre-Training a Graph Recurrent Network for Language Representation [34.4554387894105]
We consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications.
We find that our model can generate more diverse outputs with less contextualized feature redundancy than existing attention-based models.
arXiv Detail & Related papers (2022-09-08T14:12:15Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - LogiGAN: Learning Logical Reasoning via Adversarial Pre-training [58.11043285534766]
We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models.
Inspired by the facilitation effect of reflective thinking in human learning, we simulate the learning-thinking process with an adversarial Generator-Verifier architecture.
Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets.
arXiv Detail & Related papers (2022-05-18T08:46:49Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.