Prompt learning is a new paradigm in the Natural Language Processing (NLP)
field which has shown impressive performance on a number of natural language
tasks with common benchmarking text datasets in full, few-shot, and zero-shot
train-evaluation setups. Recently, it has even been observed that large but
frozen pre-trained language models (PLMs) with prompt learning outperform
smaller but fine-tuned models. However, as with many recent NLP trends, the
performance of even the largest PLMs such as GPT-3 do not perform well on
specialized domains (e.g. medical text), and the common practice to achieve
State of the Art (SoTA) results still consists of pre-training and fine-tuning
the PLMs on downstream tasks. The reliance on fine-tuning large PLMs is
problematic in clinical settings where data is often held in non-GPU
environments, and more resource efficient methods of training specialized
domain models is crucial. We investigated the viability of prompt learning on
clinically meaningful decision tasks and directly compared with more
traditional fine-tuning methods. Results are partially in line with the prompt
learning literature, with prompt learning able to match or improve on
traditional fine-tuning with substantially fewer trainable parameters and
requiring less training data. We argue that prompt learning therefore provides
lower computational resource costs applicable to clinical settings, that can
serve as an alternative to fine-tuning ever increasing in size PLMs.
Complementary code to reproduce experiments presented in this work can be found
at: https://github.com/N taylorOX/Public_Clin ical_Prompt.
Frozen Language Models Niall Taylor1∗ Alejo Nevado-Holgado1
凍結言語モデル Niall Taylor1∗ Alejo Nevado-Holgado1
0.53
Yi Zhang1∗∗
Yi Zhang1∗∗
0.35
Dan W Joyce1,2 Andrey Kormilitzin1
Dan W Joyce1,2 Andrey Kormilitzin1
0.35
2 2 0 2 y a M 1 1
2 2 0 2 y a m 1 1 である。
0.54
] L C . s c [ 1 v 5 3 5 5 0
]LC。 sc [ 1 v 5 3 5 5 0
0.30
. 5 0 2 2 : v i X r a
. 5 0 2 2 : v i X r a
0.42
1Department of Psychiatry, University of Oxford, Oxford, OX3 7JX, UK 2NIHR Oxford Health Biomedical Research Centre, Oxford, OX3 7JX, UK
オックスフォード大学精神医学部 OX3 7JX, UK 2NIHR Oxford Health Biomedical Research Centre, Oxford, OX3 7JX
0.74
{first_name}.
first_name}です。
0.67
{last_name}@psych.ox.ac.uk
last_name}@psych.ox.ac.uk
0.51
Abstract Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown impressive performance on a number of natural language tasks with common benchmarking text datasets in full, few-shot, and zero-shot train-evaluation setups.
Recently, it has even been observed that large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models.
However, as with many recent NLP trends, the performance of even the largest PLMs such as GPT-3 do not perform well on specialized domains (e g medical text), and the common practice to achieve State of the Art (SoTA) results still consists of pre-training and fine-tuning the PLMs on downstream tasks.
しかし、近年のNLPの動向と同様に、GPT-3のような大規模PLMでも、特定のドメイン(例えば医療用テキスト)では性能が良くなく、また、State of the Art(SoTA)を達成するための一般的な実践は、下流タスクにおけるPLMの事前学習と微調整から成り立っている。
0.59
The reliance on fine-tuning large PLMs is problematic in clinical settings where data is often held in non-GPU environments, and more resource efficient methods of training specialized domain models is crucial.
We investigated the viability of prompt learning on clinically meaningful decision tasks and directly compared with more traditional fine-tuning methods.
臨床的に有意な意思決定課題における即時学習の有効性について検討し,従来の微調整法と直接比較した。
0.55
Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning with substantially fewer trainable parameters and requiring less training data.
We argue that prompt learning therefore provides lower computational resource costs applicable to clinical settings, that can serve as an alternative to fine-tuning ever increasing in size PLMs.
Index terms— Prompt learning, BERT, transfer learning, clinical decision support, few-shot
インデックス用語 — プロンプト学習、BERT、トランスファー学習、臨床決定支援、少数ショット
0.73
1 Introduction The field of Natural Language Processing (NLP) has seen a surge in the use of deep learning in recent years, partly due to the increased capacity and availability of powerful GPUs and cloud computing globally.
Both academic and industry research have subsequently become dominated by the use of large Pretrained Language Models (PLMs), which are typically commercially produced and trained on enormous amounts of text data in a self-supervised manner through language modelling objectives such as Masked Language Modeling (MLM) and next word prediction.
Two major PLM families are the bidirectional encoder representations from transformers (BERT) Devlin et al [2019] which originally had 110 million trainable parameters, and Generative Pre-trained Transformer 3 (GPT-
2つの主要なPLMファミリーは、もともと1億1000万のトレーニング可能なパラメータを持つ変換器(BERT)の双方向エンコーダ表現であるDevlin et al [2019]と、生成前トレーニングトランス3(GPT)である。 訳抜け防止モード: 2つの主要なPLMファミリーは、もともと1億1000万のトレーニング可能なパラメータを持つ変換器(BERT)の双方向エンコーダ表現である。 および生成前訓練変圧器3(GPT)
0.78
∗These authors contributed equally to this work.
これらの著者はこの作品に等しく貢献した。
0.57
Preprint. Under review.
プレプリント。 レビュー中。
0.63
英語(論文から抽出)
日本語訳
スコア
3) Radford et al [2019], Brown et al [2020a] and the new Meta’s Open Pre-trained Transformer Language Model (OPT) Zhang et al [2022], with ∼ 175 billion parameters.
3) Radford et al [2019], Brown et al [2020a], and the new Meta’s Open Pre-trained Transformer Language Model (OPT) Zhang et al [2022]。 訳抜け防止モード: 3 ) Radford et al [ 2019 ], Brown et al [ 2020a ] そして新しいMeta ’s Open Pre - Training Transformer Language Model (OPT ) Zhang et al [2022 ], 175億のパラメータを持つ。
0.72
With these PLMs one can fine-tune on new domains and design downstream tasks with relative ease, often resulting in state-of-the-art results on a number of popular datasets and tasks Devlin et al [2019], Lester et al [2021].
これらのplmでは、新しいドメインを微調整し、比較的簡単に下流のタスクを設計できるため、人気の高いデータセットやタスクの最新の結果がdevlin et al [2019]、lester et al [2021]にもたらされることが多い。
0.60
However, "out of the box" PLMs typically do not perform well on out-of-domain texts Han et al [2021]: Thus taking a BERT model trained on non-medical texts and applying it to a niche medical text domain often leads to a lackluster performance Lee et al [2019], Huang et al [2019].
しかし、"アウト・オブ・ザ・ボックス"のplmは通常、ドメイン外のテキストであるhan et al [2021] ではうまく機能しない: 医学的でないテキストでトレーニングされたbertモデルをニッチな医療用テキストドメインに適用すると、lee et al [2019], huang et al [2019] などのパフォーマンスが低下することが多い。
0.65
Instead domain specific PLMs are often created through continued pre-training on domain specific corpora when available Alsentzer et al [2019], Peng et al [2019], Gururangan et al [2020], Senior et al [2020], Vaci et al [2021].
その代わりにドメイン固有のPLMは、Alsentzer et al [2019], Peng et al [2019], Gururangan et al [2020], Senior et al [2020], Vaci et al [2021] が利用できる場合に、ドメイン固有のコーパスの継続的な事前トレーニングを通じて作成されることが多い。
0.77
Moreover, to then leverage the knowledge of these domain specific PLMs to achieve a downstream task requires further training of a task-specific module, such as a classification head, attached to the end of the PLM Devlin et al [2019], Wolf et al [2020].
さらに、ダウンストリームタスクを達成するためにこれらのドメイン固有のplmの知識を活用するには、plm devlin et al [2019], wolf et al [2020]の最後に付属する分類ヘッドのようなタスク固有のモジュールのさらなるトレーニングが必要である。
0.76
Typically downstream task fine-tuning requires further training of all of the PLMs parameters, in addition to the attached task specific head(s).
On top of this, the written language used in clinical text can differ drastically to that of the same language found in general written texts, and even between clinical institutions Huang et al [2019], Leaman et al [2015], Kormilitzin et al [2021].
これに加えて、臨床テキストで使われる言語は、一般的なテキストで見られるのと同じ言語の言語と大きく異なり、臨床機関のHuang et al [2019]、Leaman et al [2015]、Kormilitzin et al [2021]でさえも異なる。
0.73
Together this makes creating general purpose clinical PLMs quite difficult.
これにより、汎用的な臨床PLMの作成が極めて困難になる。
0.56
Additionally, the NLP community has seen a trend of increasing model size to enhance performance; Microsoft recently produced a monolithic 530 billion parameter model named Megatron for state of the art performance on generative tasks Smith et al [2022].
Whilst impressive, to utilise such models for specific domains of interest will likely require full or partial fine-tuning, which has the massive computational, financial investment and of course, environmental impacts Bender et al [2021].
Regardless of the size issues of the PLMs, there is still a real benefit in their application to new domains and downstream tasks through traditional fine-tuning, including the biomedical domain Huang et al [2019], Alsentzer et al [2019], van Aken et al [2021].
PLMのサイズの問題にかかわらず、バイオメディカルドメインであるHuang et al [2019]、Alsentzer et al [2019]、van Aken et al [2021]など、従来の微調整を通じて、新しいドメインや下流タスクへの適用には、依然として大きなメリットがあります。
0.71
The persistent concern is the need to fine-tune both the entire PLM and task specific head to produce viable performance on many tasks.
In the case of the recently produced super large PLMs, this can require the continual training of models that require large suites of high end GPUs, with proportional financial costs.
Further to this, traditional fine-tuning can lead to a very specific fine-tuned model that is now very far from its initial pre-trained state, which may cause catastrophic forgetting of the pretrained knowledge Chen et al [2020].
さらに、従来の微調整は、現在その初期の事前訓練状態から非常に遠く離れている非常に特異な微調整モデルをもたらす可能性があるため、事前訓練された知識である Chen や al [2020] を破滅的に忘れてしまう可能性がある。
0.60
Fine-tuning has also been reported to exploit spurious correlations of the smaller domain-specific dataset, damaging its generalizability Gururangan et al [2018], Niven and Kao [2019].
ファインチューニングはまた、小さなドメイン固有のデータセットの急激な相関を利用して、その一般化可能性であるGururangan et al [2018], Niven and Kao [2019]を損なうと報告されている。
0.54
We have also observed this lack of generalizability in medical text when fine-tuning and then validating across American and British English Hofer et al [2018].
微調整時の医学テキストの一般化可能性の欠如を観察し,米国および英国英語のhofer et al[2018]を検証した。
0.66
Considering the limitations outlined above, we recognise there is now a movement in the NLP community back towards resource efficient training regimes and models to avoid the need for full scale domain specific training.
One promising strategy is known as prompt learning, which aims to close the design gap between the PLMs training objectives and downstream tasks by reformulating the downstream tasks as language modelling objectives Li and Liang [2021], Liu et al [2021a].
PLMのトレーニング目標と下流タスクの間の設計ギャップを埋めることを目的として,言語モデリング目標であるLiとLiang[2021],Liu et al[2021a]として,下流タスクを再構築する。
0.75
Prompt learning has evolved from earlier works which have reformulated all NLP downstream tasks as text-to-text tasks Raffel et al [2020] and more recently using task examples within the input text as a form of prompt in auto-regressive PLMs Brown et al [2020b].
An exciting direction in the prompt learning research space has been its potential in few-shot or low resource settings, relying on frozen PLMs Tsimpoukelli et al [2021] instead of fine-tuning them: The number of parameters to train decreases dramatically when using frozen PLMs and thus reduces computational requirements Lu et al [2021].
プロンプト・ラーニング研究分野におけるエキサイティングな方向性は、細かいチューニングではなく、凍結したplm tsimpoukelli et al [2021]に頼ることで、少数または低いリソース設定でその可能性を秘めている: 凍結したplmを使用すると、トレーニングするパラメータの数は劇的に減少し、計算要件lu et al [2021]を減少させる。
0.65
The major gap in the literature is in the application of prompt learning to clinical or biomedical datasets, and in particular clinical support tasks.
We explore the suitability and performance of prompt learning applied to clinical classification tasks with a direct comparison to traditional fine-tuning methods in full and few-shot training scenarios.
Our primary focus is on the performance of these approaches when using a frozen PLM, which is desirable for many reasons, but primarily the consequent reduction in training cost and computational resources required to adapt to new domains or downstream tasks.
Conceptually we are not introducing a new methodology, rather exploring different applications of prompt learning to the biomedical
概念的には、新しい方法論を導入するのではなく、生物医学への即興学習の異なる応用を探求する。
0.63
2
2
0.42
英語(論文から抽出)
日本語訳
スコア
domain and importantly to clinical tasks, rather than simple natural language probing tasks.
単純な自然言語探索タスクではなく、ドメインと重要な臨床タスク。
0.74
We observed that prompt learning strategies can outperform traditional fine-tuning on different clinical tasks in both few-shot and full training scenarios with frozen PLMs.
2 Related Work Since the summer of 2021 there has been a steady influx of research papers concerning prompt learning for common benchmarking open-NLP datasets such as Stanford Sentiment Treebank v2 (SST2), and the General Language Understanding Evaluation (GLUE) Liu et al [2021a], Brown et al [2020b], Sanh et al [2022], Lester et al [2021], Liu et al [2021b], Li and Liang [2021].
2 関連作業 2021年の夏以降、Stanford Sentiment Treebank v2 (SST2) や General Language Understanding Evaluation (GLUE) Liu et al [2021a], Brown et al [2020b], Sanh et al [2022], Lester et al [2021b], Liu et al [2021b], Li and Liang [2021b] といったオープンNLPデータセットの早期学習に関する研究論文が絶え間なく流入してきた。
0.80
The datasets and tasks are standard in the field of NLP, and revolve around natural language understanding (NLU) tasks.
データセットとタスクはNLPの分野で標準であり、自然言語理解(NLU)タスクを中心に進化している。
0.77
The common finding is that prompt learning can reach the performance of traditional fine-tuning, and often outperform in few-shot settings.
一般的な発見は、プロンプト学習が従来の微調整のパフォーマンスに到達できることだ。
0.58
Although the ability of prompt learning to match performance of traditional fine-tuning seems to scale with PLM size Liu et al [2021b].
従来の微調整性能に即習学習の能力はPLMサイズ(Lu et al [2021b])に匹敵すると思われる。
0.71
One notable paper has investigated the use of GPT-3 for biomedical text datasets in a few-shot setting, finding a decrease in performance when compared to similar tasks in the standard NLU datasets Moradi et al [2021].
バイオメディカルテキストデータセットにGPT-3を使用した場合,標準的なNLUデータセットであるMoradi et al[2021]と比較すると,性能が低下することがわかった。
0.59
This suggests that even the largest PLMs cannot be applied directly to specialised domains and expect good performance, and that domain specific PLMs are still sought for optimal results.
Recently, prompt learning was used to investigate the zero-shot performance on a clinical task using different PLMs and manual prompt templates Sivarajkumar and Wang [2022].
近年,手動プロンプトテンプレートSivarajkumar と Wang [2022] を用いて,臨床作業におけるゼロショット性能を調査するために,プロンプト学習を用いた。
0.80
They found that biomedically trained PLMs outpeformed general PLMs for one task, and we hope to extend these findings by introducing different prompt learning training strategies and clinical tasks.
In the case of document classification, the downstream task head is an MLP fMLP(·) which takes the pooled sentence embedding output by the PLM as input and generates an n-dimensional vector, where n is the number of classes.
That is, given an input text x, we first process the raw input with the PLM to get the m- dimensional embedding of each token.
すなわち、入力テキスト x が与えられた場合、まず PLM で生の入力を処理し、各トークンの m-次元埋め込みを得る。
0.67
Then a pooling operation, such as
その後、プール操作などを行う。
0.68
3 Patient is complaining of severe chest pain Pretrained Tokenizer and Encoder (Bio-ClinicalBERT)Wh ole Sentence Embedding Classfication Head Softmax operation Aggregation algorithmBack propagation(Optional ) Back propagation
. The MLP block can have any depth of layers m ∈ N, while in our experiments, we opted for d = 2.
. MLPブロックは任意の層 m ∈ N の深さを持つことができるが、実験ではd = 2 を選択した。
0.64
Since the additional MLP block and PLMs are modular, their respective parameters are stored separately and we can opt to freeze the parameters of one or the other.
An example of processing a short input text sequence using this method is shown in Fig 1.
この方法を用いた短い入力テキストシーケンスの処理例を図1に示す。
0.77
3.2 Prompt Learning
3.2 プロンプト学習
0.72
Generally, prompt learning can be achieved via the following steps: Given an input text x, we modify it to a prompt format x(cid:48) = fp(x), where fp, often called a template, will normally prepend, append, or insert a number of additional token embeddings to the original input along with a masked token, denoted by <[MASK]>.
We then feed x(cid:48) into the PLM to predict the masked token, which is the same as the Masked Language Modelling (MLM) pre-training objective of most BERT-based models.
次に、PLMにx(cid:48)を入力し、ほとんどのBERTベースのモデルのMasked Language Modelling(MLM)のトレーニング対象であるマスク付きトークンを予測する。
0.68
The result of the model will be a distribution over the fixed vocabulary V of the tokenizer.
モデルの結果は、トークン化子の固定語彙 V 上の分布となる。
0.68
A second and crucial step is to map tokens or words in the known vocabulary of the PLM to class labels in the downstream task, achieved with a mapping g : V (cid:55)→ C, where C is the set of classes.
第二に重要なステップは、PLMの既知の語彙のトークンや単語を下流タスクのクラスラベルにマッピングし、マッピング g : V (cid:55) → C で達成し、C はクラスの集合である。
0.77
This is known as answer engineering, or verbalization (we will use the term verbalizer and verbalization throughout).
これは「回答工学」、あるいは「動詞化」と呼ばれる(「動詞化」という用語は全体として使われる)。
0.48
The verbalizer can be seen as a mapping between single, or multiple different tokens to distinct class labels.
The embedding or hidden state represented at the <[MASK]> position output by the PLM is then passed through a standard language model head, or classifier, and probabilities of the verbalizer derived class label tokens are derived.
A simple prompt-based clinical classification example could be to determine whether a patient has heart disease with class labels as sick and healthy, a prompt learning setup could be as follows: Take the template “<clinical text> <prompt=“Patient is”> <[MASK]>”, where <clinical text> represents the original input text, the <[MASK]> token is the label or class to predict.
The verbalizer will map certain tokens to each class of sick and healthy separately, essentially a dictionary mapping e g { “Healthy”: ‘fine’, and “Sick”: ‘unwell’}.
Thus, the sentence “Patient is complaining of severe chest pain.” will first be wrapped by the pre-defined template as “Patient is complaining of severe chest pain. Patient is <[MASK]>”.
The wrapped sentence is then tokenized and fed into the PLM to predict the distribution over vocabulary on the <[MASK]> token position, although we just care about the probabilities of the tokens (‘fine’ and ‘unwell’) that are mapped to each of the classes that are contained in V, with “unwell” hopefully having a higher probability to be predicted by the masked language model predictor and the class “sick” ultimately being predicted.
その後、包まれた文はトークン化され、plmに送られ、<[mask]>トークンの位置の語彙上の分布を予測するが、単に v に含まれる各クラスにマッピングされるトークン( ‘fine’ と ‘unwell’)の確率を気にするだけであり、"unwell" はマスク言語モデル予測器によって予測される確率が高く、最終的に "sick" クラスが予測される。
0.77
We offer an illustration of the basic prompt framework in Fig 2.
図2の基本的なプロンプトフレームワークの例を示します。
0.71
Within the broad prompt learning framework there are important decisions to make about the construction of prompt templates and verbalizers.
At its infancy templates were manually created, often based on human knowledge of the task domain, with massive variance in performance with even subtle perturbations of the template and verbalizer Lester et al [2021], Hu et al [2021].
初期のテンプレートは手作業で作成され、しばしばタスクドメインの人間的知識に基づいており、テンプレートの微妙な摂動や動詞化子 lester et al [2021], hu et al [2021] によるパフォーマンスの大幅なばらつきがあった。
0.81
To enable a standardised framework for prompt learning a team have developed OpenPrompt to enable reproducible prompt based research by creating a open source and unified code-base Ding et al [2021].
オープンソースで統一されたコードベース ding et al[2021]を作成して、再現可能なプロンプトベースの研究を可能にするために、チームがopenpromptを開発した。
0.75
We shall first define the templates and verbalizers used in the framework and our experiments.
まず、フレームワークと実験で使用されるテンプレートと言語化ツールを定義します。
0.64
We refer to the classical prompt learning strategy with handcrafted templates and verbalizers as manual templates and manual verbalizers respectively.
本稿では,手書きテンプレートと手書きテンプレートを用いた古典的プロンプト学習戦略について述べる。
0.67
This strategy was first proposed as the Pattern-Exploiting Training (PET) Schick and Schütze [2021].
この戦略は最初、PET (Pattern-Exploiting Training) Schick and Schütze [2021]として提案された。
0.76
We denote the set of words in the verbalizer for each class y ∈ C to be V y.
各クラス y ∈ C の動詞化子における単語の集合を V y とする。
0.68
The probability of each class given the input x and its prompt x(cid:48) is thus:
i=1 exp Manual templates and verbalizers are discrete and bounded to the PLMs vocabulary, so there are no extra parameters to train, although fine-tuning the PLM is possible.
Figure 2: Illustration of manual template and verbalizer in prompt learning.
図2: 素早い学習における手動テンプレートと言語化の図示。
0.77
The engineering of the manual components of prompt learning is not straight forward, with large variations in performance emerging from small changes to the tokens, and typically domain expertise is required.
Soft prompt learning operates in the same manner as manual approach, but replaces the fixed manual components with trainable embeddings (continuous vectors) of same dimension as the original PLM.
The error from the downstream task can then be backpropagated to tune only the embeddings for the template and verbalizer Lester et al [2021].
ダウンストリームタスクからのエラーは、テンプレートと動詞化子lester et al [2021]の埋め込みのみをチューニングするためにバックプロパゲーションされる。
0.75
Normally, a manual template has the form of x(cid:48) = {[P0, P1, . . . , Pj] , x, [Pj+1, Pj+2, . . . , Pk] , [MASK]}, where for i ∈ {0, 1, . . . , k}, Pi denotes the token of the template.
And since x(cid:48) is fed to the PLM to get h(x(cid:48)), the prompt tokens Pi are also mapped to the embedding space, where we can assume h(Pi) to be optimized during training and such tokens are denoted as <[soft]> in the template format.
A template where all tokens are <[soft]> is called a soft template, while a template with a mixture of manual and <[soft]> tokens is called a mixed template.
Therefore, when using the soft verbalizer, there is no need to build the mapping from
したがって、ソフトな動詞を用いた場合、マッピングを構築する必要はない。
0.69
5 Patient is complaining of severe chest pain.Patient is[MASK]Pretrained Masked Language ModelProbability Distribution over Vocabulary: Verbalizer filtering Verbalizer mapping(Optional) Back propagationMasked token predictionPatient is complaining of severe chest pain.
[Pseudo Token][MASK]Pretrained Language Model Encoder Patient is Input text sequence Fixed prompt Soft prompt [MASK]Masked Language Model Processor Soft verbalizer Inner product & Softmax Prompt input
[Pseudo Token][MASK]Pretrained Language Model Encoder patient is Input text sequence Fixed prompt [MASK]Masked Language Model Processor Soft verbalizer inner product & Softmax Prompt input 訳抜け防止モード: [ pseudo token][mask]pretrained language model encoder patient is input text sequence fixed prompt soft prompt [ mask]masked language model processor soft verbalizer inner product ソフトマックス・プロンプト・インプット
0.76
英語(論文から抽出)
日本語訳
スコア
vocabulary V to class labels C as the trainable vectors do not have semantic meaning.
訓練可能なベクトルとしてのクラスラベル C への語彙 V は意味論的意味を持たない。
0.68
The resulting verbalizer then becomes a matrix operator Θ ∈ Rn×m, where n represents the number of classes and m represents the dimension of generated hidden embeddings.
結果として得られる動詞化子は行列作用素 θ ∈ rn×m となり、ここで n はクラスの数、m は生成される隠れ埋め込みの次元を表す。
0.57
For better understanding, we denote the i-th row of Θ as θi for each trainable vector of class i.
よりよく理解するために、クラス i の訓練可能なベクトルごとに θi の i-列を θi と表現する。
0.69
To compile with the soft verbalizer which takes hidden embeddings from the PLM as input, the original decoder head of the PLM is removed.
We denote the resulting mapping from h(x(cid:48)) ∈ Rl×m to the prediction of hidden representation of <[MASK]> as fmask : Rl×m → Rm, where l is the sequence length of x(cid:48).
y fmask(h(x(cid:48)))( cid:1) i fmask(h(x(cid:48)))( cid:1) .
y fmask(h(x(cid:48)))( cid:1) i fmask(h(x(cid:48))(c id:1) である。
0.83
For further details and origins of prompt learning see: P-tuning Liu et al [2021c], prefix tuning Li and Liang [2021] and WARP Hambardzumyan et al [2021].
P-tuning Liu et al [2021c]、プレフィックスチューニング Li and Liang [2021]、WARP Hambardzumyan et al [2021]。 訳抜け防止モード: 即興学習のさらなる詳細と起源について : p - tuning liu et al [2021c] プレフィックスチューニング li and liang [2021 ] and warp hambardzumyan et al [2021 ]。
0.78
3.3 Pre-trained Language Model
3.3 事前学習型言語モデル
0.57
As we wanted to compare the performance of prompt learning and traditional fine-tuning in a best case scenario, we chose the Bio-ClininalBERT Alsentzer et al [2019].
ベストケースのシナリオでは、プロンプト学習と従来の微調整のパフォーマンスを比較するために、bio-clininalbert alsentzer et al [2019]を選択しました。
0.61
Bio-ClinicalBERT was essentially pre-trained on all MIMIC-III notes and a large collection of PubMed abstracts and full articles by being initialized from weights produced by another biomedical BERT model, BioBERT Lee et al [2019].
Bio-ClinicalBERTは、基本的に全てのMIMIC-IIIノートとPubMedの抽象資料と全記事のコレクションで事前訓練され、別のバイオメディカルBERTモデルであるBioBERT Lee et al [2019] によって生産された重量から初期化されている。 訳抜け防止モード: Bio - ClinicalBERTは基本的にすべてのMIMICで事前訓練された PubMedの抽象資料と全文をまとめて 他のバイオメディカルBERTモデルであるBioBERT Leeらによって生産された重量から初期化されています。
0.72
Whilst we appreciate this may be an overly optimized model for the dataset used in this paper, we argue the point of the experiments presented here is to compare and contrast the ability of the different modelling frameworks to leverage what has been learned by a PLM for clinical tasks.
As has already been shown extensively, PLMs benefit from domain specific pre-training Gururangan et al [2020], what is lesser known is whether current pre-prompt learning approaches are fully utilising these language models.
すでに広く示されているように、plmはドメイン固有の事前学習であるgururangan et al [2020]の恩恵を受けているが、あまり知られていないのは、現在の事前学習アプローチがこれらの言語モデルを完全に活用しているかどうかである。 訳抜け防止モード: すでに広く示されているように、PLMはドメイン固有の事前訓練の恩恵を受けています。 あまり知られていないのは 現在のプレ-即時学習アプローチは、これらの言語モデルを完全に活用しています。
0.55
3.4 Clinical Dataset We use the Medical Information Mart for Intensive Care III (MIMIC-III) [Johnson et al , 2016], an open source medical dataset developed by the MIT Lab for Computational Physiology.
3.4 臨床データ 我々は,mit計算生理学研究所が開発したオープンソースの医療データセットである intensive care iii (mimic-iii) [johnson et al , 2016] のために,medical information mart を使用している。 訳抜け防止モード: 3.4 臨床データ 医療情報マートを用いた集中治療III(MIMIC-III) [Johnson et al, これはMIT Lab for Computational Physiologyによって開発されたオープンソースの医療データセットです。
0.77
It comprises of de-identified health data associated with 38,597 critical care patients and 58,976 intensive care unit (ICU) admissions at the Beth Israel Deaconess Medical Center between 2001 and 2012.
2001年から2012年にかけて、38,597人のクリティカルケア患者と58,976人の集中治療ユニット(icu)がbeth israel deaconess medical centerで入院した。
0.68
Data includes demographics, vital signs, laboratory tests, medications, caregiver notes, imaging reports, and mortality in and out of hospital.
Moreover, to allow comparisons with other baselines we derive clinical task datasets used in previous research van Aken et al [2021], Pellegrini et al [2022], Wang et al [2020], Boag et al [2018] as well as deriving our own triage task, described below.
さらに,他のベースラインと比較するために,ファン・エイケンら(2021年),ペレグリニら(2022年),ワングら(2020年),ボアグら(2018年)に用いられた臨床タスクデータセットを導出するとともに,以下に示すようなトリアージタスクを導出する。 訳抜け防止モード: さらに、他のベースラインとの比較を可能にするために、以前の研究であるvan aken et al[2021]で使用される臨床タスクデータセットを導出する。 pellegrini et al [2022 ], wang et al [2020 ] boag et al [2018] も 下記のトリアージタスクを導出します。
0.76
An important note is that whilst some of the derived clinical tasks may benefit from utilising the multi-modal data available for each patient, we focus purely on the free text clinical notes.
Full details and code for reproducing these datasets and experiments is provided by authors.
これらのデータセットと実験を再現するための詳細とコードは、著者によって提供されている。
0.54
2 4 Experiments - Clinical tasks
2 4 実験-臨床課題
0.60
ICD-9 50 Within the MIMIC-III data and other EHRs are standardised International Classification of Diseases version 9 (ICD-9) codes, which are used to record diagnosis and procedures.
ICD-9 50 MIMIC-IIIデータやその他のEHRはICD-9(International Classification of Diseases Version 9)コードとして標準化されている。
0.77
A common task is to classify the ICD-9 diagnosis code based on a patients data and automate the whole process, and one can do so from the free text notes alone.
There are approximately 2,000 diagnosis codes present in the MIMIC-III dataset, with a very skewed distribution, and a resulting extreme multi-class problem which is beyond the scope of this paper.
Thus for our classification task we opt to subset top 50 most frequent ICD-9 diagnosis codes that have a corresponding set of clinical notes, as has been done before Yuan et al [2022], Wang et al [2020], van Aken et al [2021].
そこで我々は,Yuan et al [2022],Wang et al [2020],van Aken et al [2021] に先立って行われたように,対応する臨床書のセットを有するICD-9の診断符号を最多の50個に分類する。
0.79
2complementary code to reproduce experiments is provided at: https://github.com/N taylorOX/
実験を再現するための2つの補完コードは以下の通りである。
0.52
Public_Clinical_Prom pt
Public_Clinical_Prom pt
0.20
6
6
0.43
英語(論文から抽出)
日本語訳
スコア
ICD-9 Triage task A potential concern with the ICD-9 diagnosis code classification is that the codes themselves may be mentioned explicitly in the notes van Aken et al [2021]3, and further, simply classifying patients’ ICU discharge notes by ICD-9 code lacks ecological validity as a clinical decision support task.
For example, within a hospital setting, patients admitted to an ICU will be treated and then “stepped down” (discharged) to another ward or team to progress their treatment when they no longer require ICU.
With assistance from clinicians, we therefore designed a novel task that aims to make the classification task more similar to the decision making process of arranging patient flow on discharge from the ICU.
Similarly, a patient admitted to ICU with obstetric complications will likely be stepped-down to a maternity ward.
同様に、産科合併症でicuに入院した患者は、産科に退院する可能性が高い。
0.64
In essence we grouped together the ICD-9 diagnosis codes into “teams” that reflect the triage or patient-flow decision making found in hospital settings.
For this task we selected the top 20 most frequent ICD-9 diagnosis codes in MIMIC-III and a clinician derived triage groups based on which team would likely continue the patient’s care on being stepped down from ICU.
The training classes are therefore many-to-one mappings of ICD9 codes to discharge teams and we derived the following seven post-ICU discharge destination teams: Cardiology, Obstetrics, Respiratory Medicine, Neurology, Gastroenterology, Acute or Internal Medicine, and Oncology.
The resultant dataset consists of 15,000 clinical notes across the 7 triage categories.
得られたデータセットは、7つのトリアージカテゴリにわたる15,000の臨床ノートで構成されている。
0.52
In hospital mortality One of the most frequently used benchmark clinical support tasks with the MIMIC-III dataset is the prediction of whether a patient will survive their hospital episode.
Within the MIMIC-III database are structured data relating to the mortality status of a patient, which paired with a date and timestamp allows for easy labelling of the data.
Only notes prior to the mortality flag are considered, and some simple regular expression rules were used to filter any notes that had explicit mentions of a patients death, similar to that of previous work Boag et al [2018], van Aken et al [2021].
死亡フラグの前のメモのみが考慮され、以前のboag et al [2018]、van aken et al [2021]のような、患者の死亡を明示的に言及したメモをフィルタリングするために、いくつかの単純な正規表現規則が使用された。
0.73
Length of stay in ICU Predicting how long a patient will require ICU is of significant value to hospitals who aim to optimise the flow of patients in resource-limited settings (that is, there are usually very few ICU beds compared to the hospital’s overall bed capacity).
We model this as a three way classification task, binning length of stay in the following categories: Under 3 days, 3 to 7 days, 1 week to 2 weeks, more than 2 weeks van Aken et al [2021].
3日間,3~7日間,1週間~2週間,2週間以上のvan Aken et al [2021]。 訳抜け防止モード: 私たちはこれを3つの分類タスクとしてモデル化し、以下のカテゴリに滞在期間を2つにまとめる。 3日から7日,1週間から2週間,2週間以上のvan Aken氏ら(2021年)。
0.75
Full and few-shot training We will be comparing the performance of models in full and few-shot training setups.
An important note for our few-shot experiments is that sample size will refer to the number of samples per class, i.e. N = s × c where N is the total training samples, s is the sample size per class and c is the number of unique classes.
少数の実験で重要なことは、サンプルサイズはクラス毎のサンプル数、すなわち N = s × c であり、N はトレーニング全体のサンプル数、s はクラス毎のサンプルサイズ、c はユニークなクラス数である。
0.71
Note in some instances not all classes can fill the sample size, so for some few-shot experiments there will remain a class imbalance.
All results presented are on held-out test sets for each task.
提示されるすべての結果は、各タスクの保留テストセットにある。
0.67
5 Results 5.1 Different prompt learning setups
結果5 5.1 異なるプロンプト学習設定
0.78
The number of possible combinations of templates and verbalizers in the prompt learning framework is vast, and as such we have opted to utilise previous research to derive the most suitable for our use case.
To this end we conducted an initial experiment comparing the performance of four prompt learning combinations on one clinical task to establish the best performing combination.
We chose the ICD-9 Triage task as the baseline due to it being a relatively straight forward multi-class classification problem and with a reasonably balanced distribution of classes when compared to the other tasks.
The results are summarised in Table1 The performance across the different prompt combinations is very similar in the setting where the PLM is fine-tuned, however there is greater variance when the PLM is frozen.
is of most interest, and whilst the soft template and soft verbalizer combination performs the best overall, we opt to use the more interpretable combination of mixed template and soft verbalizer as our prompt learning benchmark going forward.
The mixed template is a mixture of manual prompting and prefix tuning, whereby both discrete tokens known to the PLM and newly introduced, trainable continuous vectors of the same dimension as the PLM token embeddings are combined.
In the case of the frozen PLM, only the parameters introduced by traditional fine-tuning or prompt learning are updated during training.
凍結PLMの場合、従来の微調整や即時学習で導入されたパラメータのみをトレーニング中に更新する。
0.67
We found that prompt learning can match or improve on traditional fine-tuning, with a much smaller gap in performance between the frozen and fine-tuned PLM setting across few-shot and full training setups, see Fig 4.
There are considerable variations in any neural networks performance with changes to hyperparameters, in particular learning rates and hidden layer dimensions.
Our initial experiments used sensible hyperparameters based on previous research using traditional fine-tuning and prompt learning, where prompt learning and traditional fine-tuning achieved similar performance when the PLM was fine-tuned, see Fig 4.
However, when freezing the PLMs, performance differences arose between the two frameworks, especially for few-shot settings in favor of prompt learning.
The hyperparameter search space is provided below in Table 2, with results of the subsequent optimized training runs for the ICD-9 Triage task presented in Table 3.
Further details of the hyperparameter search and results are presented in supplementary materials, see Appendix A.
ハイパーパラメーターの検索と結果の詳細は補助的な資料で示される(Appendix Aを参照)。
0.63
5.4 Sensitivity analyses Results suggested that on certain tasks prompt learning outperformed the traditional fine-tuning model when using a frozen PLM Fig
4. We will focus on the triage task again, for which we optimized each of the frameworks.
4. 私たちは再びトリアージタスクに集中し、それぞれのフレームワークを最適化します。
0.70
There is a risk that the performance drop for the traditional fine-tuning classification head is due to over or under fitting with its larger number of trainable parameters in the original setting.
5. Adjusting the number of trainable parameters for traditional fine-tuning involves adjusting the number of layers and hidden dimension size of the classification head, whilst adjusting number of trainable parameters for prompt learning requires
Table 3: Hyperparameter optimized model comparison with frozen PLM for ICD9 triage.
表3: ハイパーパラメータ最適化モデルとCD9トリアージ用冷凍PLMの比較。
0.77
Paradigm Traditional fine-tuning Prompt learning
パラダイム伝統的微調整プロンプト学習
0.71
Balanced accuracy F1 weighted 0.8919 0.9246
バランス精度F1重み0.8919 0.9246
0.63
0.8162 0.8698
0.8162 0.8698
0.25
AUC 0.9811 0.9889
AUC 0.9811 0.9889
0.29
just changing the number of soft template tokens and whether to include a soft verbalizer (manual templates and verbalizers have no trainable parameters).
Note that prompt learning with the fewest trainable parameters (N params = 1,536) achieves comparable performance to the traditional fine-tuning model with 1000 times the number of trainable parameters (N params = 1,552,007).
The variability in prompt learning performance based on the template and verbalizer has been well established Liu et al [2021a], Li and Liang [2021], Ding et al [2021].
テンプレートと発声器に基づく即時学習性能の変動は,Lu et al [2021a],Li and Liang [2021],Ding et al [2021]とよく確立されている。
0.81
We opted to focus on the use
私たちはその使用に焦点をあてた
0.73
9
9
0.43
! 30 9:30/
! 30 9:30/
0.37
!7403
!7403
0.42
%7,0
%7,0
0.44
4$:
4$:
0.45
:479,9$,25080,,3.0/,.
:479,9$,25080,,3.0/,.
0.47
.:7,. ! ,7,/2!
.:7,. ! ,7,/2!
0.71
742590,733%7,/943,130 9:33
742590,733%7,/943,130 9:33
0.20
英語(論文から抽出)
日本語訳
スコア
Figure 5: Balanced accuracy for prompt learning versus traditional fine-tuning across increasing number of trainable parameters with frozen PLM.
For readability, logarithmic scale is used for x-axis.
可読性では、対数スケールがx軸に使用される。
0.63
of a mixed template format which is based around designing a common sense manual template for the task alongside soft and trainable tokens or embeddings.
Moreover these soft tokens can be initialised from a known token of the PLM’s vocabulary.
さらに、これらのソフトトークンは、PLMの語彙の既知のトークンから初期化することができる。
0.61
To determine whether mixed templates benefit from a common sense or domain specific manual template, we compared performance of different templates including one with a mix of unrelated and random tokens.
Results are shown in Table 4 and we can see that having just one soft token or a set of random and unrelated manual tokens leads to a drop in performance.
The experiments presented here have attempted to directly compare the prompt learning paradigm with the traditional fine-tuning paradigm across a number of clinical tasks that frame classification as a clinical decision support task.
The objective was to ascertain whether the literature describing promising performance for prompt learning in general domain text datasets can be leveraged on a more niche biomedical domain.
In the full training scenario, prompt learning can typically match the performance of traditional fine-tuning, and prompt learning outperforms traditional fine-tuning in the few-shot setting.
Of particular interest was the performance of each model with frozen the PLM, where only parameters added to the PLM after pre-training are tuned for downstream classification
tasks. This is where prompt learning appears to prove superior, out-performing traditional fine-tuning with considerably fewer trainable parameters, see Figure .5.
Moreover, the use of a mixed template appears to allow the intuitive common sense approach to domain derived prompts, whilst maintaining a trainable soft embedding that can reduce the difficulty in finding optimal manual prompts.
Understanding how models arrive at a decision is especially important in high-stake applications, such as medicine Taylor et al [2021], Rajpurkar et al [2022].
モデルがどのように決定を下すかを理解することは、医学のtaylor et al [2021]、rajpurkar et al [2022]のような高リスクな応用において特に重要である。
0.69
Future work should focus on the utility of interpretable prompts for helping clinicians understand a model’s decision making.
6.1 Limitations Pre-training data leakage A notable limitation was the choice of PLM, which is arguably too well suited to the clinical tasks presented, with probable data leakage from initial pre-training and the subsequent downstream tasks.
Although it must be stated that this would have benefited both paradigms, but there is the possibility that the reformulation of the downstream tasks as a masked language modelling style objective may allow easier "remembering" for prompt learning when compared to traditional fine-tuning.
However, we include results for the ICD-9 Triage task using biomedical BERT (trained only on biomedical literature) and this yielded a similar pattern of results, see Appendix D.
Task performance variance We presented four clinical tasks derived from MIMIC-III notes data, and whilst we achieved results in line with previous research, the relative performance on the length of stay and mortality prediction tasks were quite poor regardless of the framework.
Similarly we did find that using hyperparameter search for the ICD-9 Triage task improved the frozen PLM performance of the traditional fine-tuning approach by a reasonable margin and a more extensive hyperparameter search may shift this further.
6.2 Conclusion The key finding was that prompt learning outperforms the traditional fine-tuning approach when PLMs are frozen during training on the downstream task.
This is in line with previous prompt learning research and may offer a useful framework for building clinical support tools in low compute resource settings, as well as enabling a faster, flexible, modular training pipeline for new downstream tasks and novel data.
The ability to utilise a single, frozen PLM and share or reuse these embeddings across a number of task specific modules, each with their own trainable prompt is very desirable for specialised domains.
Whilst using smaller PLMs and prompts may not achieve the state-of-the-art performance on certain tasks, it can approach similar levels of performance with a fraction of the model size and training time.
In the field of clinical support tools, a computationally efficient and interpretable model with good enough performance that can run on a CPU is arguably more desirable than a trillion parameter model that requires high-performance computing clusters with arrays of GPUs.
The prompt learning framework is an evolving paradigm with variants being introduced regularly, thus we cannot claim to have fully covered prompt learning in this work.
This work can act as a basis for further clinical prompt learning work, and may encourage the use of relatively small domain specific PLMs rather than relying on the giant PLMs produced by commercial enterprises.
We suggest that it is more efficient to train a small BERT model on a specialised domain and applying prompt learning, than attempting to apply prompt learning directly to models such as GPT-3 which often lack the domain knowledge required.
Acknowledgement NT is supported by the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1).
認定 NT は EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1) によって支援されている。
0.58
AK, ANH, YZ and DWJ were supported in part by the NIHR AI Award for Health and Social Care (NIHR-AI-AWARD0-2183 ); AK and ANH declare a research grant from GlaxoSmithKline.
AK、ANH、YZ、DWJは、一部はNIHR AI Award for Health and Social Care (NIHR-AI-AWARD0-2183 )によって支援された。
0.64
DWJ is supported by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005).
The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR, the UK Department of Health, or the University of Oxford.
表現された見解は、著者のものであり、必ずしも英国国民保健サービス、nihr、英国保健省、オックスフォード大学のものとは限らない。 訳抜け防止モード: 表現された見解は、著者のものであり、必ずしも英国国民衛生局の見解ではない。 英国保健省(nihr)、オックスフォード大学(university of oxford)。
0.65
References Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
jacob devlin, ming-wei chang, kenton lee, kristina toutanovaを参照。
0.67
BERT: Pre-training of deep bidirectional transformers for language understanding.
BERT: 言語理解のための双方向トランスフォーマーの事前トレーニング。
0.76
In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019.
The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), page 4171–4186, Minneapolis, Minnesota, June 2019 訳抜け防止モード: 計算言語学会北米支部2019年大会の成果 : ヒューマン・ランゲージ・テクノロジー 第1巻(長文・短文)、4171-4186頁。 ミネアポリス、ミネソタ、2019年6月。
0.47
Association for Computational Linguistics. doi: 10.18653/v1/N19-1423 .
計算言語学会会員。 doi: 10.18653/v1/n19-1423 。
0.35
URL https: //aclanthology.org/N 19-1423.
URL https: //aclanthology.org/N 19-1423。
0.45
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
Language models are unsupervised multitask learners.
言語 モデルは教師なしマルチタスク学習者です
0.72
2019. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al Language models are few-shot learners.
2019. tom brown, benjamin mann, nick ryder, melanie subbiah, jared d kaplan, prafulla dhariwal, arvind neelakantan, pranav shyam, girish sastry, amanda askell, et al language modelsは、わずかなショット学習モデルである。
0.57
Advances in neural information processing systems, 33:1877–1901, 2020a.
神経情報処理システムの進歩 33:1877–1901, 2020a
0.80
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al Opt: Open pre-trained transformer language models.
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al Opt: 事前訓練されたトランスフォーマー言語モデル。
0.87
arXiv preprint arXiv:2205.01068, 2022.
arXiv preprint arXiv:2205.01068, 2022
0.40
Brian Lester, Rami Al-Rfou, and Noah Constant.
ブライアン・レスター、ラミ・アル=ルフー、ノア・コンスタン。
0.40
The power of scale for parameter-efficient prompt In Proceedings of the 2021 Conference on Empirical Methods in Natural Language tuning.
Robust transfer learning with pretrained language models through adapters.
アダプターによる事前学習言語モデルによるロバスト変換学習
0.73
In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 854–861, Online, August 2021.
Identifying predictors of suicide in severe mental illness: a feasibility study of a clinical prediction rule (oxford mental illness and suicide tool or oxmis).
Real-world effectiveness, its predictors and onset of action of cholinesterase inhibitors and memantine in dementia: retrospective health record study.
The British Journal of Psychiatry, 218(5):261–267, 2021.
the british journal of psychiatry, 218(5):261–267, 2021 (英語)
0.80
doi: 10.1192/bjp.2020.136 .
doi: 10.1192/bjp.2020.136 。
0.44
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander Rush 訳抜け防止モード: トーマス・ウルフ、lysandre、victor sanh、julien chaumond。 clement delangue, anthony moi, pierric cistac, tim rault, remi louf, モーガン・ファントウィッツ ジョー・デイヴィソン サム・シュライファー パトリック・フォン・プラトン clara ma, yacine jernite, julien plu, canwen xu, teven le scao, sylvain gugger, mariama drame, quentin lhoestなど。 アレキサンダー・ラッシュ
0.57
Transformers: State-of-theart natural language processing.
Transformers: 最先端の自然言語処理。
0.78
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020.
Clinical outcome prediction from admission notes using self-supervised In Proceedings of the 16th Conference of the European Chapter of knowledge integration.
第16回欧州知識統合章紀要における自己監督による入試ノートによる臨床成績予測
0.65
the Association for Computational Linguistics: Main Volume, pages 881–893, Online, April 2021.
The Association for Computational Linguistics: Main Volume, page 881–893, Online, April 2021
0.42
Association for Computational Linguistics. doi: 10.18653/v1/2021.eac l-main.75.
計算言語学会会員。 doi: 10.18653/v1/2021.eac l-main.75
0.34
URL https://aclanthology .org/2021.eacl-main. 75.
URL https://aclanthology .org/2021.eacl-main. 75
0.20
Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, and Xiangzhan Yu.
サンユアンチェン、ユタイ・ウー、イミング・キュイ、ワンチャン・チェ、ティン・リウ、チャン・ユ。
0.41
Recall and learn: Fine-tuning deep pretrained language models with less forgetting.
リコールと学習: 忘れることが少なく、訓練済みの深い言語モデルを微調整する。
0.60
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7870–7881, Online, November 2020.
Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith.
例えば、gururangan、swabha swayamdipta、omer levy、roy schwartz、samuel bowman、noah a. smithである。
0.63
Annotation artifacts in natural language inference data.
自然言語推論データにおけるアノテーションアーティファクト。
0.81
In Proceedings of the 13
訴訟の手続において 13
0.68
英語(論文から抽出)
日本語訳
スコア
2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, New Orleans, Louisiana, June 2018.
2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), page 107-112, New Orleans, Louisiana, June 2018 訳抜け防止モード: 2018 conference of the north american chapter of the association for computational linguistics: human language technologies (英語) 第2巻(短い論文)、107-112頁、ルイジアナ州ニューオーリンズ。 2018年6月。
0.77
Association for Computational Linguistics. doi: 10.18653/v1/N18-2017 .
計算言語学会会員。 doi: 10.18653/v1/n18-2017 。
0.45
URL https: //aclanthology.org/N 18-2017.
URL https: //aclanthology.org/N 18-2017
0.24
Timothy Niven and Hung-Yu Kao.
ティモシー・ニヴェンとハングユ・カオ。
0.39
Probing neural network comprehension of natural language arguments.
自然言語引数のニューラルネットワーク理解の探索
0.69
In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4658–4664, Florence, Italy, July 2019.
第57回計算言語学会年次総会では、2019年7月、フィレンツェで4658-4664頁が開催された。
0.61
Association for Computational Linguistics. doi: 10.18653/v1/P19-1459 .
計算言語学会会員。 doi: 10.18653/v1/p19-1459 。
0.35
URL https://aclanthology .org/P19-1459.
URL https://aclanthology .org/P19-1459。
0.46
Maximilian Hofer, Andrey Kormilitzin, Paul Goldberg, and Alejo Nevado-Holgado.
Few-shot learning for named entity recognition in medical text.
一部 医学テキストで名付けられたエンティティ認識の学習
0.56
arXiv preprint arXiv:1811.05468, 2018.
arXiv preprint arXiv:1811.05468, 2018
0.40
Xiang Lisa Li and Percy Liang.
xiang lisa li と percy liang。
0.61
Prefix-tuning: Optimizing continuous prompts for generation.
Prefix-tuning: 生成のための継続的プロンプトの最適化。
0.57
In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, August 2021.
Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, 2021a.
プレトレイン,プロンプト,予測:自然言語処理におけるプロンプト手法の体系的調査,2021a。
0.75
URL https://arxiv.org/ab s/2107.13586.
URL https://arxiv.org/ab s/2107.13586
0.46
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
コリン・ラフェル、ノーム・シャザー、アダム・ロバーツ、キャサリン・リー、シャラン・ナラン、マイケル・マテナ、ヤンチー・周、ウェイ・リー、ピーター・j・リュー。 訳抜け防止モード: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li ピーター・J・リュー(Peter J. Liu)。
0.92
Exploring the limits of transfer learning with a unified text-to-text transformer.
統一テキスト-テキストトランスフォーマによるトランスファー学習の限界の検討
0.82
J. Mach. Learn. Res., 21:140:1–140:67, 2020.
j・マッハ 学ぶ。 21:140:1–140:67、2020年。
0.48
URL http://jmlr.org/pape rs/ v21/20-074.html.
URL http://jmlr.org/pape rs/ v21/20-074.html
0.20
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateuss Litwin, Scott Gray, Benjamin Chesss, Jack Clark, Christopher Berner, McCandlish, Alec Radford, Ia Sutsk, Dario D. ^ 訳抜け防止モード: Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry Amanda Askell, Sandhini Agarwal, Ariel Herbert - Voss, Gretchen Krueger, Tom Henighan Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu クレメンス・ウィンター、クリス・ヘッセン、マーク・チェン、エリック・シグラー。 Mateusz Litwin、Scott Gray、Benjamin Chess、Jack Clark Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever とDario Amodei。
0.80
Language models are few-shot learners.
言語モデルはわずかな学習者です。
0.69
In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. ^ 訳抜け防止モード: H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan とH. Lin, 編集者, ニューラル情報処理システムの発展 第33巻、1877-1901頁。
Victor Sanh, Albert Webson, Colin Raffel, Stephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Teven Le Scao, Stella Biderman, Leo Gao, Thomas Wolf, and Alexander M Rush.
Victor Sanh, Albert Webson, Colin Raffel, Stephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Teven Le Scao, Stella Biderman, Leo Gao, Thomas Wolf, and Alexander M Rush. 訳抜け防止モード: ヴィクター・サン アルバート・ウェブソン コリン・ラフフェル スティーブン・バッハ lintang sutawika, zaid alyafeai, antoine chaffin, arnaud stiegler, arun raja manan dey, m saiful bari, canwen xu, urmish thakker, shanya sharma sharma, eliza szczechla, taewoon kim, gunjan chhablani。 nihal nayak, debajyoti datta, jonathan chang, mike tian - jian jiang, ハン・ワン マッテオ・マニカ シェン・シェン・シン・ヨン 過酷なパンディー レイチェル・バウデン トーマス・ワン トリシャラ・ネラジ ジョス・ローゼン abheesht sharma, andrea santilli, thibault fevry, jason alan fries。 ライアン・ティーハン ティブン・ル・スカオ ステラ・ビダーマン レオ・ガオ トーマス・ウルフとアレクサンダー・m・ラッシュ。
0.57
Multitask prompted training enables zero-shot task generalization.
マルチタスク起動トレーニングは、ゼロショットタスクの一般化を可能にする。
0.45
In International Conference on Learning Representations, 2022.
英語) international conference on learning representations, 2022年。
0.80
URL https://openreview.n et/forum?
URL https://openreview.n et/forum?
0.29
id=9Vrb9D0WI4.
id=9Vrb9D0WI4。
0.15
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang.
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, Jie Tang
0.34
P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks, 2021b.
Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, and Maosong Sun.
Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
0.38
Openprompt: An open-source framework for prompt-learning, 2021.
openprompt: 2021年,プロンプトラーニングのためのオープンソースフレームワーク。
0.74
URL https://arxiv.
URL https://arxiv.com
0.72
org/abs/2111.01998.
通称/2111.01998。
0.33
Timo Schick and Hinrich Schütze.
ティモ・シックとヒンリッヒ・シュッツェ
0.39
Exploiting cloze-questions for few-shot text classification and natural language inference.
少数のテキスト分類と自然言語推論にcloze-questionsを活用する。
0.68
In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online, April 2021.
The 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, page 255–269, April 2021 訳抜け防止モード: 第16回欧州計算言語学会欧州支部講演会要旨 : 主巻 255-269頁、オンライン、2021年4月。
0.53
Association for Computational Linguistics. doi: 10.18653/v1/2021.eac l-main.20.
計算言語学会会員。 doi: 10.18653/v1/2021.eac l-main.20
0.34
URL https://aclanthology .org/2021.eacl-main. 20.
URL https://aclanthology .org/2021.eacl-main. 20
0.20
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang.
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang 訳抜け防止モード: Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding ユジー・チャン、ジリン・ヤン、ジー・タン。
0.68
Gpt understands, too, 2021c.
Gpt 2021cもわかってる
0.43
URL https://arxiv.org/ab s/2103.10385.
URL https://arxiv.org/ab s/2103.10385
0.23
Karen Hambardzumyan, Hrant Khachatrian, and Jonathan May.
In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4921–4933, Online, August 2021.
Association for Computational Linguistics. doi: 10.18653/v1/2021.acl -long.381.
計算言語学会会員。 doi: 10.18653/v1/2021.acl -long.381。
0.42
URL https://aclanthology .org/2021.acl-long.
URL https://aclanthology .org/2021.acl-long
0.22
381. Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li Wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark.
381. Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li Wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, Roger G. Mark 訳抜け防止モード: 381. Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li Wei H. Lehman Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits レオ・アンソニー・セリ、ロジャー・G・マーク。
0.65
Mimiciii, a freely accessible critical care database.
mimiciiiは、無料でアクセス可能なクリティカル・ケア・データベース。
0.52
Scientific Data, 3, 5 2016.
科学誌、2016年3月5日。
0.72
ISSN 20524463.
ISSN 20524463。
0.81
doi: 10.1038/sdata.2016.3 5.
doi: 10.1038/sdata.2016.3 5。
0.22
Chantal Pellegrini, Anees Kazi, and Nassir Navab.
Chantal Pellegrini、Anees Kazi、Nassir Navab。
0.32
Unsupervised pre-training on patient population
患者人口の教師なし事前訓練
0.62
graphs for patient-level predictions, 2022.
患者レベル予測グラフ, 2022。
0.68
URL https://arxiv.org/ab s/2203.12616.
url https://arxiv.org/ab s/2203.12616。
0.41
Shirly Wang, Matthew B. A. McDermott, Geeticka Chauhan, Marzyeh Ghassemi, Michael C. Hughes, and Tristan Naumann.
Shirly Wang、Matthew B. A. McDermott、Geeticka Chauhan、Marzyeh Ghassemi、Michael C. Hughes、Tristan Naumann。 訳抜け防止モード: Shirly Wang, Matthew B. A. McDermott, Geeticka Chauhan, Marzyeh Ghassemi マイケル・C・ヒューズとトリスタン・ナウマン。
0.81
Mimic-extract: A data extraction, preprocessing, and representation pipeline for mimic-iii.
Mimic-Extract: 模倣のためのデータ抽出、前処理、表現パイプライン。
0.79
In Proceedings of the ACM Conference on Health, Inference, and Learning, CHIL ’20, page 222–235, New York, NY, USA, 2020.
acm conference on health, inference, and learningの議事録には、chil ’20, page 222-235, new york, ny, usa, 2020がある。
0.81
Association for Computing Machinery.
アソシエーション・フォー・コンピューティング・マシンズ(Association for Computing Machinery)の略。
0.36
ISBN 9781450370462.
ISBN9781450370462。
0.76
doi: 10.1145/3368555.3384 469.
doi 10.1145/3368555.3384 469
0.29
URL https://doi.org/10.1 145/3368555.
URL https://doi.org/10.1 145/3368555
0.23
3384469. Willie Boag, Dustin Doss, Tristan Naumann, and Peter Szolovits.
Adafactor: Adaptive learning rates with sublinear memory cost.
Adafactor: サブリニアメモリコストによる適応的学習率。
0.67
In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4596–4604.
Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, Volume 80 of Proceedings of Machine Learning Research, page 4596–4604
In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
第7回学習表現に関する国際会議, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019
0.86
OpenReview.net, 2019.
OpenReview.net、2019年。
0.64
URL https://openreview.n et/forum?
URL https://openreview.n et/forum?
0.29
id=Bkg6RiCqY7.
id=Bkg6RiCqY7。
0.42
15
15
0.43
英語(論文から抽出)
日本語訳
スコア
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon.
ユーグ、ロバート・ティン、ハ・チェン、マイケル・ルーカス、ウズヤマナオト、シャオドン・リウ、トリスタン・ナウマン、ジャンフェン・ガオ、ホフング・プーン。 訳抜け防止モード: Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas 宇須山直任・西王龍・トリスタン・ナウマン・ジャンフォン・ガオ とHoifung Poon。
0.78
Domain-specific language model pretraining for biomedical natural language processing, 2020.
バイオメディカル自然言語処理のためのドメイン固有言語モデル(2020年)
0.74
Appendix A Training details We implement our experiments using a combination of the OpenPrompt framework Ding et al [2021] and the Pytorch packages.
付録 訓練の詳細 我々はopenprompt フレームワーク ding et al [2021] と pytorch パッケージを組み合わせて実験を実装した。
0.66
For prompt learning, we use Adafactor Shazeer and Stern [2018] optimizer for soft and mixed templates, and AdamW Loshchilov and Hutter [2019] optimizer for language models and soft verbalizers.
Due to relatively limited computational resource, this was only performed for the ICD-9 Triage task and a sub-sample of the training data was used, similar to that of our few-shot experiments with 128 samples per class.
Traditional fine-tuning Prompt learning 0.0121 4 3 0.1536 adafactor 0.007
従来の微調整プロンプト学習 0.0121 4 3 0.1536 adafactor 0.007
0.61
0.0048 4 4 0.382 adamw n/a
0.0048 4 4 0.382 adamw n/a
0.32
B Dataset details
B データセットの詳細
0.69
Mortality and Length of Stay For all clinical tasks a combination of available clinical notes pertaining to the outcome of interest were used, including admission and discharge summaries.
Each task dataset was created separately and a 70-10-20 split of training-validation- test sets was used.
各タスクデータセットは別々に作成され、トレーニング検証テストセットの70-10-20が使用された。
0.59
We followed the data engineering steps outlined in the clinical outcomes paper van Aken et al [2021].
臨床成果論文van aken et al [2021] で概説されたデータエンジニアリングのステップを追跡した。
0.78
ICD-9 50 and ICD-9 Triage The ICD-9 50 task was simply all clinical notes data corresponding to the top 50 most frequently occuring ICD-9 diagnosis codes.
The production of the ICD-9 Triage task was derived from taking the top 20 ICD-9 diagnosis codes.
ICD-9 トリアージタスクは、上位20のICD-9診断コードから作成されている。
0.76
From this subsample, a clinician derived suitable groups representing the destination team on discharge from ICU: Cardiology, Obstetrics, Respiratory Medicine, Neurology, Gastroenterology, Acute or Internal Medicine, and Oncology.
length. D Prompt learning versus Traditional fine-tuning with PubMed BERT
長さ PubMed BERTを用いたDプロンプト学習と従来の微調整
0.72
The PLM used for all presented results in the main body of the paper was the Bio-ClinicalBERT Alsentzer et al [2019], which we have observed was trained using Mimic-III notes.
論文の本文で提示された全結果のplmはbio-clinicalbert alsentzer et al [2019] であり、我々はmitt-iii noteを用いてトレーニングを行った。
0.64
Whilst this was arguably advantageous for both traditional fine-tuning and prompt learning, it may have overly favoured prompt learning due to the reformulation of the classification task as a Masked Language Modelling (MLM) objective.
Therefore we present results of another biomedical BERT model from Microsoft, the PubMedBERT, which was pre-trained from scratch using abstracts from PubMed Gu et al [2020] in Table D.1.
そこで本研究では,Microsoft の別のバイオメディカル BERT モデルである PubMedBERT を表D.1 の PubMed Gu et al [2020] の抽象概念を用いて,ゼロから事前学習した。
0.75
It can be seen that prompt learning still outperforms traditional fine-tuning by a large margin on the ICD-9 Triage task, in line with our other results.