Related papers: CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models

CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models

URL: http://arxiv.org/abs/2405.12174v1
Date: Mon, 20 May 2024 16:58:02 GMT
Title: CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models
Authors: Haoxiang Shi, Jiaan Wang, Jiarong Xu, Cen Wang, Tetsuya Sakai,
Abstract summary: Existing text-to-table datasets are typically oriented English. Large language models (LLMs) have shown great success as general task solvers in multi-lingual settings. We propose a Chinese text-to-table dataset, CT-Eval, to benchmark LLMs on this task.
Score: 36.82189550072201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-Table aims to generate structured tables to convey the key information from unstructured documents. Existing text-to-table datasets are typically oriented English, limiting the research in non-English languages. Meanwhile, the emergence of large language models (LLMs) has shown great success as general task solvers in multi-lingual settings (e.g., ChatGPT), theoretically enabling text-to-table in other languages. In this paper, we propose a Chinese text-to-table dataset, CT-Eval, to benchmark LLMs on this task. Our preliminary analysis of English text-to-table datasets highlights two key factors for dataset construction: data diversity and data hallucination. Inspired by this, the CT-Eval dataset selects a popular Chinese multidisciplinary online encyclopedia as the source and covers 28 domains to ensure data diversity. To minimize data hallucination, we first train an LLM to judge and filter out the task samples with hallucination, then employ human annotators to clean the hallucinations in the validation and testing sets. After this process, CT-Eval contains 88.6K task samples. Using CT-Eval, we evaluate the performance of open-source and closed-source LLMs. Our results reveal that zero-shot LLMs (including GPT-4) still have a significant performance gap compared with human judgment. Furthermore, after fine-tuning, open-source LLMs can significantly improve their text-to-table ability, outperforming GPT-4 by a large margin. In short, CT-Eval not only helps researchers evaluate and quickly understand the Chinese text-to-table ability of existing LLMs but also serves as a valuable resource to significantly improve the text-to-table performance of LLMs.

Related papers

An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques [0.0]
Large Language Models (LLMs) continue to advance natural language processing with their ability to generate human-like text.<n>We present a systematic evaluation of six LLMs across four datasets: CNN/Daily Mail and NewsRoom (news), SAMSum (dialog), and ArXiv (scientific)<n>Our study evaluates the performance using the ROUGE and BERTScore metrics.<n>For Long documents, introduce a sentence-based chunking strategy that enables LLMs with shorter context windows to summarize extended inputs in multiple stages.
arXiv Detail & Related papers (2025-07-07T15:34:05Z)
Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From [61.63091726904068]
We evaluate the cross-lingual context retrieval ability of over 40 large language models (LLMs) across 12 languages. Several small, post-trained open LLMs show strong cross-lingual context retrieval ability. Our results also indicate that larger-scale pretraining cannot improve the xMRC performance.
arXiv Detail & Related papers (2025-04-15T06:35:27Z)
Synthetic Data Generation for Culturally Nuanced Commonsense Reasoning in Low-Resource Languages [5.376127198656944]
We compare three dataset creation strategies: (1) LLM-assisted dataset generation, (2) machine translation, and (3) human-written data by native speakers, to build a culturally nuanced story comprehension dataset. Our findings indicate that LLM-assisted data creation outperforms machine translation.
arXiv Detail & Related papers (2025-02-18T15:14:58Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language. Currently, instruction-tuned large language models (LLMs) excel at various English tasks. Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
CUDRT: Benchmarking the Detection Models of Human vs. Large Language Models Generated Texts [9.682499180341273]
Large language models (LLMs) have greatly enhanced text generation across industries. Their human-like outputs make distinguishing between human and AI authorship challenging. Current benchmarks mainly rely on static datasets, limiting their effectiveness in assessing model-based detectors.
arXiv Detail & Related papers (2024-06-13T12:43:40Z)
Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction [36.915250638481986]
We introduce LiveSum, a new benchmark dataset for generating summary tables of competitions based on real-time commentary texts. We evaluate the performances of state-of-the-art Large Language Models on this task in both fine-tuning and zero-shot settings. We additionally propose a novel pipeline called $T3$(Text-Tuple-Table) to improve their performances.
arXiv Detail & Related papers (2024-04-22T14:31:28Z)
UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields. In this work, we construct an open-source multilingual supervised fine-tuning dataset. The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z)
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed. We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z)
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)
LLM-powered Data Augmentation for Enhanced Cross-lingual Performance [24.20730298894794]
This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in commonsense reasoning datasets. To achieve this, we utilise several LLMs, namely Dolly-v2, StableVicuna, ChatGPT, and GPT-4, to augment three datasets: XCOPA, XWinograd, and XStoryCloze. We evaluate the effectiveness of fine-tuning smaller multilingual models, mBERT and XLMR, using the synthesised data.
arXiv Detail & Related papers (2023-05-23T17:33:27Z)
Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language. To collect large-scale CLS data, existing datasets typically involve translation in their creation. In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.