Fugu-MT 論文翻訳(概要): Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation

論文の概要: Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation

arxiv url: http://arxiv.org/abs/2112.06240v1
Date: Sun, 12 Dec 2021 13:50:18 GMT
ステータス: 翻訳完了
システム内更新日: 2021-12-14 16:21:31.056504
Title: Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation
Title（参考訳）: トピック条件付きデータ拡張と論理フォーム生成による論理レベル自然言語生成の改善
Authors: Ao Liu, Congjian Luo, Naoaki Okazaki
Abstract要約: 本稿ではトピック条件付きデータ拡張(TopicDA)を提案し,テーブルから直接論理形式とテキスト記述を生成する。論理形式生成(LG)はLogic2textの2つのタスクであり、テーブルのテキスト記述に基づいて有効な論理形式を生成する必要がある。また,ラベル付きデータと拡張データの両方でLogic2textとLGモデルを併用した半教師付き学習手法を提案する。
参考スコア（独自算出の注目度）: 18.93964332724296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. \citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.
Abstract（参考訳）: 論理自然言語生成(英: Logical Natural Language Generation)、すなわち、構造化テーブルによって論理的に関連付けられるテキスト記述を生成することは、生成の忠実度が低いために課題となっている。 \citet{chen2020logic2text} は、中間論理プログラムをアノテートして生成内容とセマンティクスを制御することでこの問題に対処し、テーブル認識論理形式のタスクをテキスト(logic2text)生成に提示した。しかし、実世界ではテーブルインスタンスは豊富であるが、テキスト記述と組み合わせた論理形式は、神経モデルの性能を制限するコストのかかる人間のアノテーション作業を必要とする。そこで,本研究では,gpt-2を用いてテーブルから直接ペアリングされていない論理形式やテキスト記述を生成するトピックコンディションデータ拡張 (topicda) を提案する。さらに、テーブルのテキスト記述に基づいて論理形式を生成する必要があるLogic2textの二重タスクである論理形式生成(LG)についても紹介する。また,ラベル付きデータと拡張データの両方でLogic2textとLGモデルを併用した半教師付き学習手法を提案する。 2つのモデルは、バックトランスレーションを通じて追加の監督信号を提供することで相互に利益をもたらす。 Logic2text データセットと LG タスクの実験結果から,提案手法は拡張データを効果的に活用し,教師付きベースラインを実質的なマージンで上回ることを示す。

論文の概要: Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation

関連論文リスト