Related papers: Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models

Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models

URL: http://arxiv.org/abs/2401.11725v2
Date: Tue, 12 Mar 2024 15:48:17 GMT
Title: Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
Authors: Yile Wang, Sijie Cheng, Zixin Sun, Peng Li, Yang Liu
Abstract summary: Symbols play important roles in various tasks such as abstract reasoning, chemical property prediction, and table question answering. Despite impressive natural language comprehension capabilities, large language models' reasoning abilities for symbols remain inadequate. We propose symbol-to-language (S2L), a tuning-free method that enables large language models to solve symbol-related problems with information expressed in natural language.
Score: 16.265409100706584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symbols (or more broadly, non-natural language textual representations) such as numerical sequences, molecular formulas, and table delimiters widely exist, playing important roles in various tasks such as abstract reasoning, chemical property prediction, and table question answering. Despite the impressive natural language comprehension capabilities of large language models (LLMs), their reasoning abilities for symbols remain inadequate, which could attributed to the difference between symbol representations and general natural languages. We propose symbol-to-language (S2L), a tuning-free method that enables large language models to solve symbol-related problems with information expressed in natural language. Specifically, S2L first converts the symbols involved to language-based representations, which can be implemented by prompting LLMs or leveraging external tools, then these language-based representations are integrated into the original problem via direct substitution or concatenation, serving as useful input information for LLMs. We evaluate the S2L method using both API-based (GPT-4, ChatGPT) and open-source (OpenChat) models over eight symbol-related tasks, ranging from symbol-only abstract reasoning to sentiment analysis in social media. Experimental results show that S2L consistently leads to superior performance. For example, by employing S2L for GPT-4, there can be average significant improvements of +21.9% and +9.5% for subtasks in 1D-ARC and Dyck language, respectively. Codes and data are available at https://github.com/THUNLP-MT/symbol2language.

Related papers

Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages [11.19692440351977]
Existing studies often focus on individual neurons, but their polysemantic nature makes it difficult to isolate language-specific units.<n>We introduce SAE-LAPE, a method based on feature activation probability, to identify language-specific features within the feed-forward network.<n>These features influence the model's multilingual performance and language output and can be used for language identification with performance comparable to fastText.
arXiv Detail & Related papers (2025-07-15T12:00:30Z)
Unnatural Languages Are Not Bugs but Features for LLMs [92.8332103170009]
Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts. We present a systematic investigation challenging this perception, demonstrating that unnatural languages contain latent features usable by models.
arXiv Detail & Related papers (2025-03-02T12:10:17Z)
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages [15.203789021094982]
In large language models (LLMs), how are multiple languages learned and encoded? We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages.
arXiv Detail & Related papers (2025-01-10T21:18:21Z)
LangSAMP: Language-Script Aware Multilingual Pretraining [48.16511046793275]
Recent multilingual pretrained language models (mPLMs) often avoid using language embeddings. LangSAMP incorporates both language and script embeddings to enhance representation learning. We apply LangSAMP to the continual pretraining of XLM-R on a highly multilingual corpus covering more than 500 languages.
arXiv Detail & Related papers (2024-09-26T18:29:10Z)
MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models [13.136507215114722]
MLLM-SR is a conversational symbolic regression method that can generate expressions that meet the requirements simply by describing the requirements with natural language instructions. We experimentally demonstrate that MLLM-SR can well understand the prior knowledge we add to the natural language instructions.
arXiv Detail & Related papers (2024-06-08T09:17:54Z)
SignLLM: Sign Language Production Large Language Models [31.557139567708067]
We propose SignLLM, a multilingual Sign Language Production (SLP) large language model. Two novel SLP modes MLSF and Prompt2LangGloss allow sign language gestures generation from query texts input and question-style prompts input respectively. We extensively evaluate SignLLM, demonstrating that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.
arXiv Detail & Related papers (2024-05-17T12:01:43Z)
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling [70.34758460372629]
We introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages. MYTE produces shorter encodings for all 99 analyzed languages. This, in turn, improves multilingual LM performance and diminishes the perplexity gap throughout diverse languages.
arXiv Detail & Related papers (2024-03-15T21:21:11Z)
Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning [58.5857133154749]
We propose a new symbolic system with broad-coverage symbols and rational rules. We leverage the recent advancement of LLMs as an approximation of the two ideal properties. Our method shows superiority in extensive activity understanding tasks.
arXiv Detail & Related papers (2023-11-29T05:27:14Z)
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models [41.91490484827197]
Injecting a collection of symbolic data directly into the training of Large Language Models can be problematic. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models.
arXiv Detail & Related papers (2023-11-15T18:59:56Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning [38.83062453145388]
Sign language retrieval consists of two sub-tasks: text-to-sign-video (T2V) retrieval and sign-video-to-text (V2T) retrieval. We take into account the linguistic properties of both sign languages and natural languages, and simultaneously identify the fine-grained cross-lingual mappings. Our framework outperforms the pioneering method by large margins on various datasets.
arXiv Detail & Related papers (2023-03-22T17:59:59Z)
Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs) Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages. We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z)
The Geometry of Multilingual Language Model Representations [25.880639246639323]
We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies. We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information.
arXiv Detail & Related papers (2022-05-22T23:58:24Z)
Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding. XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model. Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.