Related papers: MLPs Compass: What is learned when MLPs are combined with PLMs?

MLPs Compass: What is learned when MLPs are combined with PLMs?

URL: http://arxiv.org/abs/2401.01667v1
Date: Wed, 3 Jan 2024 11:06:01 GMT
Title: MLPs Compass: What is learned when MLPs are combined with PLMs?
Authors: Li Zhou, Wenyu Chen, Yong Cao, Dingyi Zeng, Wanlong Liu, Hong Qu
Abstract summary: Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs) This paper aims to quantify whether simples can further enhance the already potent ability of PLMs to capture linguistic information.
Score: 20.003022732050994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Transformer-based pre-trained language models and their variants exhibit strong semantic representation capabilities, the question of comprehending the information gain derived from the additional components of PLMs remains an open question in this field. Motivated by recent efforts that prove Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs), this paper aims to quantify whether simple MLPs can further enhance the already potent ability of PLMs to capture linguistic information. Specifically, we design a simple yet effective probing framework containing MLPs components based on BERT structure and conduct extensive experiments encompassing 10 probing tasks spanning three distinct linguistic levels. The experimental results demonstrate that MLPs can indeed enhance the comprehension of linguistic structure by PLMs. Our research provides interpretable and valuable insights into crafting variations of PLMs utilizing MLPs for tasks that emphasize diverse linguistic structures.

Related papers

Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations [50.45261187796993]
Graph Neural Networks (GNNs) fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs) exhibit a surprising ability in structure-aware tasks.<n>This paper introduces a comprehensive probing framework from an information-theoretic perspective.
arXiv Detail & Related papers (2025-06-26T18:10:28Z)
Explicit Learning and the LLM in Machine Translation [20.630120942837564]
This study explores the capacity of large language models (LLMs) for explicit learning. Using constructed languages generated by means as controlled test environments, we designed experiments to assess an LLM's ability to explicitly learn and apply grammar rules. Supervised fine-tuning on chains of thought significantly enhances LLM performance but struggles to generalize to typologically novel or more complex linguistic features.
arXiv Detail & Related papers (2025-03-12T14:57:08Z)
MLPs Learn In-Context on Regression and Classification Tasks [28.13046236900491]
In-context learning (ICL) is often assumed to be a unique hallmark of Transformer models. We demonstrate that multi-layer perceptrons (MLPs) can also learn in-context.
arXiv Detail & Related papers (2024-05-24T15:04:36Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue. By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights. This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z)
NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning [40.994306592119266]
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning. We propose to coin a lightweight PLM through NTK-approximating modules in fusion.
arXiv Detail & Related papers (2023-07-18T03:12:51Z)
How Does Pretraining Improve Discourse-Aware Translation? [41.20896077662125]
We introduce a probing task to interpret the ability of pretrained language models to capture discourse relation knowledge. We validate three state-of-the-art PLMs across encoder-, decoder-, and encoder-decoder-based models. Our findings are instructive to understand how and when discourse knowledge in PLMs should work for downstream tasks.
arXiv Detail & Related papers (2023-05-31T13:36:51Z)
Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks. We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking. We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z)
A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs) For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z)
SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP [46.52398427166938]
One promising inference acceleration direction is to distill the GNNs into message-passing-free student multi-layer perceptrons. We introduce a novel structure-mixing knowledge strategy to enhance the learning ability of students for structure information. Our SA-MLP can consistently outperform the teacher GNNs, while maintaining faster inference assitance.
arXiv Detail & Related papers (2022-10-18T05:55:36Z)
ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM) Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey [8.427521246916463]
Pretrained Language Models (PLM) have established a new paradigm through learning informative representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. To address this issue, integrating knowledge into PLMs have recently become a very active research area and a variety of approaches have been developed.
arXiv Detail & Related papers (2021-10-16T03:27:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.