MLPs Compass: What is learned when MLPs are combined with PLMs?
- URL: http://arxiv.org/abs/2401.01667v1
- Date: Wed, 3 Jan 2024 11:06:01 GMT
- Title: MLPs Compass: What is learned when MLPs are combined with PLMs?
- Authors: Li Zhou, Wenyu Chen, Yong Cao, Dingyi Zeng, Wanlong Liu, Hong Qu
- Abstract summary: Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs)
This paper aims to quantify whether simples can further enhance the already potent ability of PLMs to capture linguistic information.
- Score: 20.003022732050994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Transformer-based pre-trained language models and their variants
exhibit strong semantic representation capabilities, the question of
comprehending the information gain derived from the additional components of
PLMs remains an open question in this field. Motivated by recent efforts that
prove Multilayer-Perceptrons (MLPs) modules achieving robust structural capture
capabilities, even outperforming Graph Neural Networks (GNNs), this paper aims
to quantify whether simple MLPs can further enhance the already potent ability
of PLMs to capture linguistic information. Specifically, we design a simple yet
effective probing framework containing MLPs components based on BERT structure
and conduct extensive experiments encompassing 10 probing tasks spanning three
distinct linguistic levels. The experimental results demonstrate that MLPs can
indeed enhance the comprehension of linguistic structure by PLMs. Our research
provides interpretable and valuable insights into crafting variations of PLMs
utilizing MLPs for tasks that emphasize diverse linguistic structures.
Related papers
- MLPs Learn In-Context on Regression and Classification Tasks [28.13046236900491]
In-context learning (ICL) is often assumed to be a unique hallmark of Transformer models.
We demonstrate that multi-layer perceptrons (MLPs) can also learn in-context.
arXiv Detail & Related papers (2024-05-24T15:04:36Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z) - NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning [40.994306592119266]
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications.
Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning.
We propose to coin a lightweight PLM through NTK-approximating modules in fusion.
arXiv Detail & Related papers (2023-07-18T03:12:51Z) - How Does Pretraining Improve Discourse-Aware Translation? [41.20896077662125]
We introduce a probing task to interpret the ability of pretrained language models to capture discourse relation knowledge.
We validate three state-of-the-art PLMs across encoder-, decoder-, and encoder-decoder-based models.
Our findings are instructive to understand how and when discourse knowledge in PLMs should work for downstream tasks.
arXiv Detail & Related papers (2023-05-31T13:36:51Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP [46.52398427166938]
One promising inference acceleration direction is to distill the GNNs into message-passing-free student multi-layer perceptrons.
We introduce a novel structure-mixing knowledge strategy to enhance the learning ability of students for structure information.
Our SA-MLP can consistently outperform the teacher GNNs, while maintaining faster inference assitance.
arXiv Detail & Related papers (2022-10-18T05:55:36Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z) - Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey [8.427521246916463]
Pretrained Language Models (PLM) have established a new paradigm through learning informative representations on large-scale text corpus.
This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks.
To address this issue, integrating knowledge into PLMs have recently become a very active research area and a variety of approaches have been developed.
arXiv Detail & Related papers (2021-10-16T03:27:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.