Fugu-MT 論文翻訳(概要): Controlled Text Generation using T5 based Encoder-Decoder Soft Prompt Tuning and Analysis of the Utility of Generated Text in AI

論文の概要: Controlled Text Generation using T5 based Encoder-Decoder Soft Prompt Tuning and Analysis of the Utility of Generated Text in AI

arxiv url: http://arxiv.org/abs/2212.02924v1
Date: Tue, 6 Dec 2022 12:31:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-07 15:37:39.423272
Title: Controlled Text Generation using T5 based Encoder-Decoder Soft Prompt Tuning and Analysis of the Utility of Generated Text in AI
Title（参考訳）: t5ベースのエンコーダ・デコーダソフトプロンプトチューニングによるテキスト生成制御とaiにおける生成テキストの有用性の分析
Authors: Damith Chamalke Senadeera, Julia Ive
Abstract要約: 我々は,T5モデルにおいて,エンコーダレベルとデコーダレベルの両方でソフトプロンプトを使用する新しいソフトプロンプトチューニング手法を提案する。また,この拡張軟性誘導型T5モデルのデコーダレベルでの出力のステアリングの可能性についても検討した。
参考スコア（独自算出の注目度）: 2.381686610905853
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Controlled text generation is a very important task in the arena of natural language processing due to its promising applications. In order to achieve this task we mainly introduce the novel soft prompt tuning method of using soft prompts at both encoder and decoder levels together in a T5 model and investigate the performance as the behaviour of an additional soft prompt related to the decoder of a T5 model in controlled text generation remained unexplored. Then we also investigate the feasibility of steering the output of this extended soft prompted T5 model at decoder level and finally analyse the utility of generated text to be used in AI related tasks such as training AI models with an interpretability analysis of the classifier trained with synthetic text, as there is a lack of proper analysis of methodologies in generating properly labelled data to be utilized in AI tasks. Through the performed in-depth intrinsic and extrinsic evaluations of this generation model along with the artificially generated data, we found that this model produced better results compared to the T5 model with a single soft prompt at encoder level and the sentiment classifier trained using this artificially generated data can produce comparable classification results to the results of a classifier trained with real labelled data and also the classifier decision is interpretable with respect to the input text content.
Abstract（参考訳）: 制御されたテキスト生成は、自然言語処理の分野で非常に重要なタスクである。この課題を達成するために,t5モデルにおいて,エンコーダとデコーダの両方のレベルでソフトプロンプトを併用する新しいソフトプロンプトチューニング法を主に導入し,制御テキスト生成におけるt5モデルのデコーダに関連する追加ソフトプロンプトの振る舞いとしての性能を検証した。さらに,この拡張ソフトトリガーT5モデルの出力をデコーダレベルで操る可能性についても検討し,AIタスクで適切にラベル付けされたデータを生成する手法が欠如していることから,合成テキストで訓練された分類器の解釈可能性分析によるAIモデルの訓練など,AI関連タスクで使用される生成されたテキストの有用性について分析する。 Through the performed in-depth intrinsic and extrinsic evaluations of this generation model along with the artificially generated data, we found that this model produced better results compared to the T5 model with a single soft prompt at encoder level and the sentiment classifier trained using this artificially generated data can produce comparable classification results to the results of a classifier trained with real labelled data and also the classifier decision is interpretable with respect to the input text content.

関連論文リスト

Synthetic Data Generation Using Large Language Models: Advances in Text and Code [0.0]
大規模言語モデル(LLM)は、自然言語とコードドメインの両方で合成トレーニングデータ生成を変換している。我々は、プロンプトベースの生成、検索拡張パイプライン、反復的な自己精製といった重要なテクニックを強調した。本稿では,生成テキストにおける事実的不正確性,文体的あるいは分布的リアリズムの不足,バイアス増幅のリスクなど,関連する課題について論じる。
論文参考訳（メタデータ） (2025-03-18T08:34:03Z)
Sarang at DEFACTIFY 4.0: Detecting AI-Generated Text Using Noised Data and an Ensemble of DeBERTa Models [0.0]
本稿では,AI生成テキストの検出に有効な手法を提案する。 Defactify 4.0共有タスクのために、マルチモーダルな事実チェックとヘイトスピーチ検出に関する第4ワークショップで開発された。私たちのチーム(Sarang)は、それぞれ1.0点と0.9531点のF1スコアで、両方のタスクで1位を獲得しました。
論文参考訳（メタデータ） (2025-02-24T05:32:00Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
機械生成コンテンツは、学術プラジャリズムや誤報の拡散といった課題を提起する。これらの課題を克服するために、新しい方法論とデータセットを導入します。人間の筆記スタイルをエミュレートするエンコーダデコーダモデルであるMhBARTを提案する。また,PDTB前処理による談話解析を統合し,構造的特徴を符号化するモデルであるDTransformerを提案する。
論文参考訳（メタデータ） (2024-12-17T08:47:41Z)
A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation [5.930799903736776]
本研究は,BERTのセマンティック解釈強度とGPT-4の生成能力を組み合わせた新しいテキスト生成モデルを提案する。このモデルはセマンティックディープを強化し、スムーズな人間のようなテキストフローを維持し、以前のモデルに見られる制限を克服する。
論文参考訳（メタデータ） (2024-11-19T01:41:56Z)
Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
自然言語は、人間が機械とシームレスに対話するための共通かつ直接的な制御信号として機能する。ラベルのないデータを用いて教師なし拡散モデルを用いて基礎となるデータ分布を理解する新しいアプローチであるText2Dataを提案する。制御性を確保し、破滅的な忘れを効果的に防止する、新しい制約最適化ベースの学習目標を通じて制御可能な微調整を行う。
論文参考訳（メタデータ） (2024-02-08T03:41:39Z)
A Simple yet Efficient Ensemble Approach for AI-generated Text Detection [0.5840089113969194]
大規模言語モデル(LLM)は、人間の文章によく似たテキストを生成する際、顕著な能力を示した。人工的に生成されたテキストと人間が作成したテキストを区別できる自動化アプローチを構築することが不可欠である。本稿では,複数の構成 LLM からの予測をまとめて,シンプルで効率的な解を提案する。
論文参考訳（メタデータ） (2023-11-06T13:11:02Z)
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation [16.78350863261211]
本稿では,エンコーダモデルを用いたチューニングや,同じ条件下での大規模言語モデルなど,様々な手法を比較する。実験結果から, 調律エンコーダモデルと比較すると, 調律デコーダモデルの性能は低かった。また、ChatGPTのような非常に大きなデコーダベースのモデルのコンテキスト内学習は、きめ細かいセマンティックな違いを識別することが困難であることも明らかにした。
論文参考訳（メタデータ） (2023-10-17T06:53:00Z)
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection [71.20871905457174]
言語モデル(LM)は、私たちが情報と対話する方法に革命をもたらしたが、しばしば非現実的なテキストを生成する。従来の手法では、外部知識をテキスト生成の参照として使用して事実性を高めるが、無関係な参照の知識の混在に苦慮することが多い。本稿では,テキスト生成プロセスを反復処理に分割するDKGenを提案する。
論文参考訳（メタデータ） (2023-08-30T02:22:40Z)
Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
業界で広く使われているストリーミングモデルTransformer-Transducer(T-T)の性能向上のためのテキスト生成とインジェクションについて検討する。まず、コードスイッチングテキストデータを生成し、テキスト-to-Speech(TTS)変換または暗黙的に音声とテキストの潜在空間を結び付けることによって、T-Tモデルに生成されたテキストを明示的に注入する戦略を提案する。実際のマンダリン・イングリッシュ音声の1,800時間を含むデータセットを用いて訓練したT-Tモデルの実験結果から,生成したコードスイッチングテキストを注入する手法により,T-Tモデルの性能が著しく向上することが示された。
論文参考訳（メタデータ） (2023-03-20T09:13:27Z)
On the Reliability and Explainability of Language Models for Program Generation [15.569926313298337]
自動プログラム生成手法の能力と限界について検討する。私たちは、コード変換に大きく貢献するトークンを強調するために、高度な説明可能なAIアプローチを採用しています。解析の結果,言語モデルではコード文法や構造情報を認識できるが,入力シーケンスの変化に対するロバスト性は限られていることがわかった。
論文参考訳（メタデータ） (2023-02-19T14:59:52Z)
Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG [0.913755431537592]
本稿では,エンコーダ・デコーダモデルのクロスアテンションから解釈可能な情報を抽出する新しい復号法を提案する。生成した出力のセマンティックエラーを劇的に低減する3つのデータセットを示す。
論文参考訳（メタデータ） (2021-09-15T01:42:51Z)
Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
トレーニングモードとテストモードで生成されたシーケンスに最適なトランスポート(OT)を提案する。テキストシーケンスの構造的および文脈的情報に基づいて、OT学習を改善するための拡張も提案されている。提案手法の有効性は,機械翻訳,テキスト要約,テキスト生成タスクにおいて検証される。
論文参考訳（メタデータ） (2020-10-12T19:42:25Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
ハードコントラストテキスト生成のための新しい挿入ベースアプローチであるPOINTERを提案する。提案手法は,既存のトークン間で段階的に新しいトークンを並列に挿入することによって動作する。結果として生じる粗大な階層構造は、生成プロセスを直感的で解釈可能である。
論文参考訳（メタデータ） (2020-05-01T18:11:54Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
変分オートエンコーダ(VAE)は、エンドツーエンドの表現学習において必須のツールである。 VAEは強い自己回帰デコーダで潜伏変数を無視する傾向がある。よりコンパクトな潜在空間において暗黙的な潜在特徴マッチングを強制する原理的アプローチを提案する。
論文参考訳（メタデータ） (2020-04-22T14:41:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。