Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language
Conversion for Language Models
- URL: http://arxiv.org/abs/2401.11725v2
- Date: Tue, 12 Mar 2024 15:48:17 GMT
- Title: Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language
Conversion for Language Models
- Authors: Yile Wang, Sijie Cheng, Zixin Sun, Peng Li, Yang Liu
- Abstract summary: Symbols play important roles in various tasks such as abstract reasoning, chemical property prediction, and table question answering.
Despite impressive natural language comprehension capabilities, large language models' reasoning abilities for symbols remain inadequate.
We propose symbol-to-language (S2L), a tuning-free method that enables large language models to solve symbol-related problems with information expressed in natural language.
- Score: 16.265409100706584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbols (or more broadly, non-natural language textual representations) such
as numerical sequences, molecular formulas, and table delimiters widely exist,
playing important roles in various tasks such as abstract reasoning, chemical
property prediction, and table question answering. Despite the impressive
natural language comprehension capabilities of large language models (LLMs),
their reasoning abilities for symbols remain inadequate, which could attributed
to the difference between symbol representations and general natural languages.
We propose symbol-to-language (S2L), a tuning-free method that enables large
language models to solve symbol-related problems with information expressed in
natural language. Specifically, S2L first converts the symbols involved to
language-based representations, which can be implemented by prompting LLMs or
leveraging external tools, then these language-based representations are
integrated into the original problem via direct substitution or concatenation,
serving as useful input information for LLMs. We evaluate the S2L method using
both API-based (GPT-4, ChatGPT) and open-source (OpenChat) models over eight
symbol-related tasks, ranging from symbol-only abstract reasoning to sentiment
analysis in social media. Experimental results show that S2L consistently leads
to superior performance. For example, by employing S2L for GPT-4, there can be
average significant improvements of +21.9% and +9.5% for subtasks in 1D-ARC and
Dyck language, respectively. Codes and data are available at
https://github.com/THUNLP-MT/symbol2language.
Related papers
- LangSAMP: Language-Script Aware Multilingual Pretraining [48.16511046793275]
Recent multilingual pretrained language models (mPLMs) often avoid using language embeddings.
LangSAMP incorporates both language and script embeddings to enhance representation learning.
We apply LangSAMP to the continual pretraining of XLM-R on a highly multilingual corpus covering more than 500 languages.
arXiv Detail & Related papers (2024-09-26T18:29:10Z) - MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models [13.136507215114722]
MLLM-SR is a conversational symbolic regression method that can generate expressions that meet the requirements simply by describing the requirements with natural language instructions.
We experimentally demonstrate that MLLM-SR can well understand the prior knowledge we add to the natural language instructions.
arXiv Detail & Related papers (2024-06-08T09:17:54Z) - MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling [70.34758460372629]
We introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages.
MYTE produces shorter encodings for all 99 analyzed languages.
This, in turn, improves multilingual LM performance and diminishes the perplexity gap throughout diverse languages.
arXiv Detail & Related papers (2024-03-15T21:21:11Z) - Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human
Activity Reasoning [58.5857133154749]
We propose a new symbolic system with broad-coverage symbols and rational rules.
We leverage the recent advancement of LLMs as an approximation of the two ideal properties.
Our method shows superiority in extensive activity understanding tasks.
arXiv Detail & Related papers (2023-11-29T05:27:14Z) - Symbol-LLM: Towards Foundational Symbol-centric Interface For Large
Language Models [41.91490484827197]
Injecting a collection of symbolic data directly into the training of Large Language Models can be problematic.
In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models.
Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models.
arXiv Detail & Related papers (2023-11-15T18:59:56Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive
Learning [38.83062453145388]
Sign language retrieval consists of two sub-tasks: text-to-sign-video (T2V) retrieval and sign-video-to-text (V2T) retrieval.
We take into account the linguistic properties of both sign languages and natural languages, and simultaneously identify the fine-grained cross-lingual mappings.
Our framework outperforms the pioneering method by large margins on various datasets.
arXiv Detail & Related papers (2023-03-22T17:59:59Z) - Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs)
Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages.
We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z) - The Geometry of Multilingual Language Model Representations [25.880639246639323]
We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language.
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information.
arXiv Detail & Related papers (2022-05-22T23:58:24Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.