Zero-Shot Cross-Lingual Summarization via Large Language Models
- URL: http://arxiv.org/abs/2302.14229v4
- Date: Tue, 24 Oct 2023 15:34:47 GMT
- Title: Zero-Shot Cross-Lingual Summarization via Large Language Models
- Authors: Jiaan Wang, Yunlong Liang, Fandong Meng, Beiqi Zou, Zhixu Li, Jianfeng
Qu, Jie Zhou
- Abstract summary: Cross-lingual summarization ( CLS) generates a summary in a different target language.
Recent emergence of Large Language Models (LLMs) has attracted wide attention from the computational linguistics community.
In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms.
- Score: 108.30673793281987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a document in a source language, cross-lingual summarization (CLS) aims
to generate a summary in a different target language. Recently, the emergence
of Large Language Models (LLMs), such as GPT-3.5, ChatGPT and GPT-4, has
attracted wide attention from the computational linguistics community. However,
it is not yet known the performance of LLMs on CLS. In this report, we
empirically use various prompts to guide LLMs to perform zero-shot CLS from
different paradigms (i.e., end-to-end and pipeline), and provide a preliminary
evaluation on the generated summaries. We find that ChatGPT and GPT-4
originally prefer to produce lengthy summaries with detailed information. These
two LLMs can further balance informativeness and conciseness with the help of
an interactive prompt, significantly improving their CLS performance.
Experimental results on three widely-used CLS datasets show that GPT-4 achieves
state-of-the-art zero-shot CLS performance, and performs competitively compared
with the fine-tuned mBART-50. Moreover, we also find some multi-lingual and
bilingual LLMs (i.e., BLOOMZ, ChatGLM-6B, Vicuna-13B and ChatYuan) have limited
zero-shot CLS ability. Due to the composite nature of CLS, which requires
models to perform summarization and translation simultaneously, accomplishing
this task in a zero-shot manner is even a challenge for LLMs. Therefore, we
sincerely hope and recommend future LLM research could use CLS as a testbed.
Related papers
- Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups [1.8377902806196766]
Complex Word Identification (CWI) is an essential step in the lexical simplification task and has recently become a task on its own.
Large language models (LLMs) recently became popular in the Natural Language Processing community because of their versatility and capability to solve unseen tasks in zero/few-shot settings.
Our work investigates LLM usage, specifically open-source models such as Llama 2, Llama 3, and Vicuna v1.5, and closed-source, such as ChatGPT-3.5-turbo and GPT-4o, in the CWI, LCP, and MWE settings.
arXiv Detail & Related papers (2024-11-03T22:31:02Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents [4.029675201787349]
Cross-lingual summarization is a sophisticated branch in Natural Language Processing.
There is no feasible solution for CLS when there is no available high-quality CLS data.
We propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning.
arXiv Detail & Related papers (2024-08-17T19:03:53Z) - Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models [4.9325206373289125]
Cross-lingual summarization (XLS) aims to generate a summary in a target language different from the source language document.
Large language models (LLMs) have shown promising zero-shot XLS performance, their few-shot capabilities on this task remain unexplored.
We investigate the few-shot XLS performance of various models, including Mistral-7B-Instruct-v0.2, GPT-3.5, and GPT-4.
arXiv Detail & Related papers (2024-06-07T04:31:41Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - SCALE: Synergized Collaboration of Asymmetric Language Translation
Engines [105.8983433641208]
We introduce a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine.
By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM.
Our experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings.
arXiv Detail & Related papers (2023-09-29T08:46:38Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.