The extreme multi-label classification (XMC) task aims at tagging content
with a subset of labels from an extremely large label set. The label vocabulary
is typically defined in advance by domain experts and assumed to capture all
necessary tags. However in real world scenarios this label set, although large,
is often incomplete and experts frequently need to refine it. To develop
systems that simplify this process, we introduce the task of open vocabulary
XMC (OXMC): given a piece of content, predict a set of labels, some of which
may be outside of the known tag set. Hence, in addition to not having training
data for some labels - as is the case in zero-shot classification - models need
to invent some labels on-the-fly. We propose GROOV, a fine-tuned seq2seq model
for OXMC that generates the set of labels as a flat sequence and is trained
using a novel loss independent of predicted label order. We show the efficacy
of the approach, experimenting with popular XMC datasets for which GROOV is
able to predict meaningful labels outside the given vocabulary while performing
on par with state-of-the-art solutions for known labels.
To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set.
We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order.
We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels.
1 Introduction Extreme multi-label classification (XMC) aims at predicting a set of labels for a given input instance from an extremely large labels set (Yen et al , 2016, 2017; Babbar and Sch¨olkopf, 2017, 2019).
1 はじめに Extreme Multi-label Classification (XMC)は、非常に大きなラベルセット(Yen et al , 2016; Babbar and Sch solkopf, 2017 2019)から与えられた入力インスタンスのラベルセットを予測することを目的としている。
0.59
Examples for applying extreme classification are labeling a new article with Wikipedia’s categories, classifying a product with catalog labels, classifying a resume into a collection of pertinent job titles.
Despite the the scale of the label space, it is challenging to a priori capture all the possible ways in which an input instance can be categorized, especially at the industrial scale.
In this work we introduce the open vocabulary XMC task, where we measure the ability of models to go beyond the given vocabulary and automatically propose new labels that might complement the existing ones and fill gaps in the vocabulary.
Note that this differs from a zero-shot formulation of the XMC problem (Gupta et al , 2021) where, although no training instance is available for some labels, they are still present in the given vocabulary.
これは XMC 問題 (Gupta et al , 2021) のゼロショットの定式化とは異なっているが、いくつかのラベルでトレーニングインスタンスは利用できないが、与えられた語彙にはまだ存在する。
0.79
To tackle the problem we propose GROOV, an autoregressive model that maps input sequences to a set of sequences.
Inputs are documents/text, and outputs are collections of textual labels from an open vocabulary.
入力は文書/テキストであり、出力はオープン語彙からのテキストラベルのコレクションである。
0.71
We investigate multiple sequenceto-set-of-se quences instantiations, in particular an off-the shelf approach based on a encoder-decoder language model (T5, Raffel et al (2019)) and a variant that uses a modified softmax function (i.e., multi-softmax) that does not penalize the model for assigning high probability to any gold label.
特にエンコーダ・デコーダ言語モデル(T5, Raffel et al (2019))に基づくオフ・ザ・シェルフアプローチと, 金のラベルに高い確率を割り当てるモデルをペナルティしない修正ソフトマックス関数(multi-softmax)を用いた変種について検討する。
0.74
This latter version inherently treats the target as a set of sequences (instead of a flat sequence) and outperforms the off-the shelf approach.
Differently from previous works, we assume models are unaware of such labels (i.e., they don’t appear in the given label vocabulary) and need to find them with open-ended text generation.
One-vs-all methods such as DiSMEC (Babbar and Sch¨olkopf, 2017), ProXML (Babbar and Sch¨olkopf, 2019), PDSparse (Yen et al , 2016), and PPDSparse (Yen et al , 2017), which treat each label as a binary classification problem, can achieve acceptable performance.
DiSMEC (Babbar and Sch solkopf, 2017), ProXML (Babbar and Sch solkopf, 2019), PDSparse (Yen et al , 2016), PPDSparse (Yen et al , 2017) など,各ラベルをバイナリ分類問題として扱い,許容可能なパフォーマンスを実現している。
0.73
One-vs-all methods suffer from computationally expensive complexity and large model size.
one-vs-allメソッドは計算コストのかかる複雑さと大きなモデルサイズに苦しむ。
0.61
Also, the classification tasks are independent of each other, and label dependency is not directly modeled.
また、分類タスクは互いに独立しており、ラベル依存は直接モデル化されない。
0.73
The high computational complexity in one-vs-all methods can be further improved by incorporating different partitioning techniques on the label spaces.
For instance, Parabel (Prabhu et al , 2018) partitions the labels through a balanced 2-means label tree using label features constructed from the instances.
例えば、parabel (prabhu et al , 2018)は、インスタンスから構築されたラベル機能を使用して、バランスのとれた2平均のラベルツリーを介してラベルを分割する。
0.62
Recently, several approaches have been proposed to improve Parabel.
最近、Parabelを改善するためのいくつかのアプローチが提案されている。
0.54
Bonsai (Khandagale et al , 2019) relaxes two main constraints in Parabel; allowing multi-way instead of binary partitionings of the label set at each intermediate node and also removing strict balancing constraints on the partitions.
Bonsai (Khandagale et al , 2019)は、Parabelの2つの主要な制約を緩和し、各中間ノードに設定されたラベルのバイナリパーティショニングの代わりにマルチウェイを可能にするとともに、パーティショニングの厳格なバランス制約を取り除く。
0.61
SLICE (Jain et al , 2019) considers building an approximate nearest neighbor (ANN) graph as an indexing structure over the labels.
SLICE (Jain et al , 2019) は、近い近傍グラフ (ANN) をラベル上のインデックス構造として構築することを考えている。
0.79
For a given instance, the relevant labels can be found quickly from the nearest neighbors of the instance via the ANN graph.
These models rely on sparse features engineered from the text, which is cumbersome and, most importantly, doesn’t benefit from the knowledge of pre-trained LMs.
Moreover, the partitioning of the label space is done independently from the classifier’s training.
さらに、ラベル空間の分割は分類器のトレーニングとは独立して行われる。
0.70
In this paper, we leverage pre-trained language models and show that generative models efficiently partition the label space, token by token, and there is no need for crafting a tree of labels separate from the classifier.
Deep learning models have improved extreme multi-label classification by learning better text rep-
深層学習モデルによるテキストレポジトリの学習による極端マルチラベル分類の改善
0.77
resentation from raw text. But the main challenge to these methods is how to couple with millions of labels.
原文からの恨み。 しかし、これらの方法の最大の課題は、何百万ものラベルをどう組み合わせるかだ。
0.44
AttentionXML (You et al , 2019) shows success in extreme multi-label classification, overpassed all traditional machine learning methods, and proved the superiority of the raw text compared to sparse features.
AttentionXML (You et al , 2019)は、極端なマルチラベル分類の成功を示し、従来の機械学習メソッドをすべて超越し、スパース機能に比べて生テキストの優位性を証明した。
0.72
AttentionXML uses a label tree, and a new classification model is trained for each layer of this tree that makes inference slow in predicting.
X-Transformer (Chang et al , 2020) only uses pre-trained LMs to match the label clusters for a given raw text and then ranks these labels by linear classifications with the sparse features.
x-transformer (chang et al , 2020)は、トレーニング済みのlmsのみを使用して、与えられた生テキストのラベルクラスタをマッチングし、これらのラベルを線形分類してスパース特徴を分類する。
0.68
XTransformer is the first method of using pre-trained LMs in extreme multi-label classification.
XTransformerは、訓練済みのLMを極端に多ラベルの分類に使用する最初の方法である。
0.59
Due to the high computational complexity of transformer models, it only fine-tunes transformer models as the label clusters matcher, which can not fully exploit the power of transformer models.
Recently, GENRE (Cao et al , 2021) showed that seq2seq auto-recursive models using pre-trained models could effectively partition and traverse a set of large labels by generating tokens incrementally.
最近、GENRE (Cao et al , 2021) は、事前訓練されたモデルを用いたSeq2seq自動再帰モデルは、トークンを漸進的に生成することで、大きなラベルの集合を効果的に分割し、トラバースすることができることを示した。
0.49
In extreme multi-label classification, the output is a set of labels.
極端なマルチラベル分類では、出力はラベルの集合である。
0.80
Turning the set to a sequence of labels requires an ordering among labels, which might not be straightforward in many applications.
Another advantage of our work to other set-output methods is that we model the multi-label classification as a set of sequences of tokens instead of a set of label identifiers.
Therefore, we leverage more effectively the LM’s knowledge in understanding each label.
したがって、各ラベルの理解にlmの知識をより効果的に活用する。
0.74
(Gupta et al , 2021) tackles the problem of zeroshot learning in extreme multi-label classification in which it tags each input with a set of labels consisting of both seen and unseen labels during the training.
(Gupta et al ,2021)は、極度多ラベル分類におけるゼロショット学習の課題に対処し、トレーニング中に見知らぬラベルと見えないラベルのセットで各入力をタグ付けする。
0.79
Not only do we build an effective and efficient zero to few-shot learning, but we also want to go beyond that and tackle the problem of open vocabulary classification in which the taxonomy is not known to us entirely.
Related to the open vocabulary extreme classification is the Open Set Recognition(Geng et al , 2021) in the computer vision community.
オープン語彙の極端な分類は、コンピュータビジョンコミュニティにおけるオープンセット認識(Geng et al , 2021)である。
0.82
Models
モデル
0.77
英語(論文から抽出)
日本語訳
スコア
proposed to solve the open set recognition have a different signature from our work.
オープンセットの認識を 解決する提案は 我々の仕事とは 異なる署名を持ってる
0.75
They define novel classes only in terms of sets of data points and do not generate names for classes that could then be compared against the true labels in a test set.
Also, they operate only on images, and the methods’ generalization to other modalities is not examined.
また,画像のみで動作するため,他のモダリティへの一般化は検討されていない。
0.72
Similar in spirit, (Wang et al , 2019) generates hashtags for microblogs and measures the ability of their model in generating new hashtags.
Wang et al , 2019 と同様、マイクロブログのハッシュタグを生成し、新しいハッシュタグを生成する際のモデルの能力を測定する。
0.62
The authors use a GRU-based dual encoder to generate hashtags.
著者らはGRUベースのデュアルエンコーダを用いてハッシュタグを生成する。
0.59
While there are similarities, our work is first in studying large generative pretrained LM for open vocabulary extreme tagging by jointly modeling all golden labels using a novel loss (multi-softmax).
3 Open-Vocabulary Tagging Consider N training data points {(Xi, Yi)}i=1..
3 Open-Vocabulary Tagging N training data points {(Xi, Yi)}i=1。
0.46
N where Xi is the text corresponding to the i-th instance and Yi ⊆ Y ∗ is the set of tags Xi was annotated with.
Xi が i 番目のインスタンスに対応するテキストであり、Yi は Xi がアノテートされたタグの集合である。
0.68
Importantly, we consider the set of all possible tags Y ∗ to be unknown both at training and inference time.
重要なことに、可能なすべてのタグ Y ∗ の集合は、トレーニングと推論の時間の両方で未知であると考える。
0.62
We do assume, however, that each tag lk ∈ Y ∗ can be described by natural language, that is by a sequence of tokens, N(cid:83) tok(lk) = {tk,j}j=1..
しかし、各タグ lk ∈ Y ∗ は、N(cid:83) tok(lk) = {tk,j}j=1 というトークンの列によって、自然言語で記述できると仮定する。
0.78
len(lk). Lastly let Yseen = Yi denote the set of labels encountered at traini=1 ing time.
レン(lk)。 最後に、Yseen = Yi は Traini=1 ing で遭遇したラベルの集合を表す。
0.63
Throughout this work we will pay special attention to labels outside of this set, which we refer to as unseen labels.
この作業を通じて、私たちはこのセット以外のラベルに特別な注意を払っています。
0.66
The above presented formulation of the topic tagging task is incompatible with currently prevalent XMC paradigms in several ways: First, most traditional classifiers require not only Y ∗ to be known in advance, but assume that for each label lk there are some examples tagged with lk so that a classifier can be learned for that particular label.
第一に、ほとんどの伝統的な分類器は前もって y ∗ を必要とするだけでなく、各ラベル lk に対して lk でタグ付けされたいくつかの例があり、そのラベルに対して分類器が学習可能であると仮定する。 訳抜け防止モード: 上記のトピックタギングタスクの定式化は、現在普及しているXMCパラダイムといくつかの点で相容れない。 第一に、ほとんどの伝統的な分類器は単に必要なだけでなく Y ∗ は事前に知られている。 しかし各ラベルlkには lkでタグ付けされた例がいくつかあります 分類器は特定のラベルで学べます
0.67
These methods often don’t rely on the label representation tok(lk) itself.
これらのメソッドはラベル表現 tok(lk) 自体に依存しないことが多い。
0.80
Second, more recent zero-shot work (Gupta et al , 2021) makes tagging possible even for previously unencountered labels yk (cid:54)∈ Yseen.
第二に、より最近のゼロショット作品(Gupta et al , 2021)は、未発表のレーベルyk (cid:54)のYseenでもタグ付けが可能である。 訳抜け防止モード: 第2に、より最近のゼロショットワーク(Gupta et al, 2021 ) 未記載ラベル yk ( cid:54)のYseen もタグ付け可能。
0.75
To our best knowledge, all of these methods rely on access to Y ∗ in order to build some kind of index using label features.
我々の知る限り、これらの手法はすべてラベル機能を使ってある種のインデックスを構築するために Y ∗ へのアクセスに依存しています。
0.69
Finally, current datasets have their limitations too: (Jain et al ) and (Schultheis et al ) highlight that as the set of possible label grows it is unrealistic to expect that human annotators consider every single possible label in Y ∗ when annotating a document, thus we can expect
最後に、現在のデータセットにも制限がある: (Jain et al ) と (Schultheis et al ) は、可能なラベルの集合が成長するにつれて、人間のアノテータが文書のアノテート時に Y ∗ 内の全ての可能なラベルを考えることは非現実的であることを強調している。
0.66
all extreme classification datasets to be generally under-annotated.
すべての極端な分類データセットは、一般的にアンダーアノテートされる。
0.49
As we will see in Section 7 this hinders our ability to measure the precision of any open vocabulary tagging system.
第7節で見られるように、これはオープンな語彙タグ付けシステムの精度を測定する能力を妨げます。
0.77
In the following section we introduce a novel class of models that is particularly well-suited for exploring the whole label space Y ∗ while maintaining good performance on the set of known labels Yseen.
下記の節では、特にラベル空間 Y ∗ 全体を探索し、既知のラベルの集合 Yseen 上で良い性能を維持しながら好適な新しいモデルのクラスを紹介します。
0.76
4 Model Below we illustrate how to frame OXMC as seq2seq problem, propose a loss captures the setnature of label sets more directly and then show how individual labels in the sets can be scored.
Open-vocabulary topic tagging can also be formulated as such sequence-tosequence problem: Given text Xi, a set of tags Yi and a permutation π that returns an ordered list of the elements of Yi, we ask the model to predict the concatenation of the appropriate tags1 in the order defined by π.
与えられたテキストxi、タグyiのセット、およびyiの要素の順序リストを返す置換πにより、我々はモデルに、πによって定義された順序における適切なタグ1の結合を予測するように要求する。 訳抜け防止モード: Open - vocabulary topic tagging は、そのようなシーケンス-tosequence problem : given text Xi, タグ Yi と置換 π の集合 Yi の要素の順序付きリストを返します。 モデルに尋ねます πで定義される順序で適切なタグ1の連結を予測する。
0.77
Formally, the target output can be defined as
正式には、目標出力を定義できる。
0.70
(cid:16)(cid:104)
(cid:16)(cid:104)
0.37
(cid:105)|Yi|
(cid:105)|yi|
0.31
(cid:17) k=1
(cid:17) k=1 である。
0.35
T (Yi, π) = Concat
T(Yi, π) = Concat
0.42
tok(π(Yi)[k])
tok(π(Yi)[k])
0.35
. The need for the extra permutation input π in T reflects the fact that we are attempting to use a sequential model that produces ordered list of tokens to predict an unordered set of labels.
. T の余剰置換入力 π の必要性は、順序付けされていないラベルの集合を予測するために順序付けされたトークンのリストを生成するシーケンシャルモデルを使うことを反映している。
0.59
This has a number of practical implications that we need to address.
これには、私たちが対処する必要がある多くの実践的な意味があります。
0.50
At training time one needs to decide which ordering of the labels to feed to the model as target.
トレーニング時には、ターゲットとしてモデルに供給するラベルの順序を決定する必要があります。
0.78
At inference time, the model might assign different probabilities to different orderings of the very same set of labels (as opposed to traditional classifiers that would assign a well defined probability to a particular set of labels)
We then split the produced output text by the separator token, resulting in a set of strings - these will be our predicted tags.
次に、生成された出力テキストをセパレータトークンで分割し、結果として文字列のセットを生成します。
0.62
Note that there’s no guarantee that the tags generated this way will be part of the labels used in the dataset, but our hope is that the model will learn what constitutes a good tag.
4.2 Multi-Softmax Loss Assume a training example has gold labels A, B, and C and that in a particular training step we feed the permutation B, A, C to the model as the target.
4.2 Multi-Softmax Loss トレーニング例は、金ラベルA、B、Cを持ち、特定のトレーニングステップにおいて、ターゲットとしてモデルに置換B、A、Cを供給すると仮定する。
0.75
Let the logit corresponding to the first tokens of labels A, B, C be zA, zB, zC.
ラベルA,B,Cの最初のトークンに対応するロジットをzA,zB,zCとする。
0.70
The softmax function inside the Cross-Entropy loss will be as follows:
クロスエントロピー損失内のソフトマックス関数は以下のとおりである。
0.77
σB(z) = (2)
σB(z) = (2)
0.46
ezB(cid:80)N
ezB(cid:80)N
0.44
i=1 ezi The sum in the denominator also includes terms for the logits zA, zC and thus the loss will eventually increase if the model assigns higher probabilities to tokens corresponding to labels A and C - even though those predictions would be completely reasonable.
i=1エジ 分母の和は、対数 zA, zC の項も含むので、モデルがラベル A と C に対応するトークンに高い確率を割り当てれば、損失は最終的に増加する。 訳抜け防止モード: i=1エジ 分母の和もlogits zaの項を含む。 zc、つまり損失は、最終的に増加する。 モデルは、ラベルaとcに対応するトークンにより高い確率を割り当てる -その予測が完全に合理的であっても。
0.64
Unfortunately, the more labels an example has on average, the more prevalent this problem will become.
残念ながら、例が平均するラベルが多ければ多いほど、この問題が広まるでしょう。
0.63
In order to overcome this issue, we propose a modified softmax function dubbed MultiSoftmax (MSM) that does not penalize the model for assigning a high probability to any token that could lead to decoding a gold label that has been not produced yet.
At a given decoding step let G be the set of token indices that could lead to producing
与えられた復号ステップにおいて、g を生成に繋がるトークンインデックスの集合とする。
0.62
a gold label (in our example A, B or C).
金のラベル(A、B、またはCの例)。
0.69
Then the multi-softmax function is defined as:
次に、マルチソフトマックス関数を次のように定義する。
0.52
σG(z) = (3)
σG(z) = (3)
0.46
(cid:80) (cid:80)N
(cid:80) (cid:80)N
0.39
i∈G ezi i=1 ezi
i-G ezi i=1 ezi
0.28
We experiment with replacing the softmax term in the Cross-Entropy loss of T5 to this newly proposed version in the hope that it will learn more efficiently.
4.3 Scoring Labels With the proposed sequential approach there is no simple way to compute a score for an individual label: at decoding time we can only access the probability of the next label given the previously decoded labels.
Of course this is computationally intractable, so instead in practice we can run a beam search of beam size B and sum up the probabilities of the beams that contain a particular label in order to approximate its marginal probability.
Let b1, . ., bB be the label sequences resulting from such a beam search.
b1にしよう。 bb はそのようなビーム探索の結果生じるラベルシーケンスである。
0.67
Our approximation to the marginal probability of label li can be written as:
ラベルliの限界確率に対する我々の近似は、次のように書ける。
0.62
P (li) = 1(li ∈ bk)P (bk)
P(li) = 1(li ∈ bk)P(bk)
0.42
(4) k=1
(4) k=1 である。
0.37
5 Experimental Setting 5.1 Datasets In order to focus on the ability of models to tag text with previously unseen labels, one might consider using the same datasets that are used to benchmark traditional zero-shot XMC.
We evaluate our models on the two topic tagging datasets2 (Gupta et al , 2021) report results on.
2つのトピックタギングデータセット (Gupta et al , 2021) を用いて, モデルの評価を行った。
0.64
EURLex-4.3K (Chalkidis et al , 2019) is a collection of roughly 50K EU Legal documents annotated
EURLex-4.3K (Chalkidis et al , 2019) は、約50KのEU法文書を注釈付きでまとめたものである。
0.58
2Other datasets in that work are focused on item similarity-
2 作業中の他のデータセットはアイテムの類似性に重点を置いている。
0.50
based recommendation rather than real tagging.
タグ付けではなく 推奨に基づいています
0.48
B(cid:88)
B(第88回)
0.62
英語(論文から抽出)
日本語訳
スコア
To that end, we evaluate our models using the
そのために、私たちはモデルを評価します。
0.66
following metrics: Propensity-Scored Precision @ K (PSP@K) is a variant of the commonly used Precision@K metric introduced by Jain et al that assigns higher rewards for getting infrequent labels right (and by extrapolation, even higher reward for previously unseen labels).
The scoring function is motivated by the observation that less frequent tags are more likely to be under-labeled as well as by the intuition that tagging with more granular, less frequent tags is likely of more value.
We refer to the original paper for the implementation details of this metric.
本尺度の実装の詳細については,本論文を参照。
0.64
Code for computing this metric is provided by the Extreme Classification Repository (Bhatia et al , 2016)
このメトリクスを計算するコードは、extreme classification repository (bhatia et al , 2016)によって提供されている。
0.66
Metrics on unseen labels.
unseenラベルのメトリクス。
0.53
For a data point
データポイントのために
0.87
with model predictions (cid:101)Yi and gold labels Yi, let Yunseen,i = Yi \ Yseen and (cid:101)Yunseen,i = (cid:101)Yi \ Yseen.
モデル予測 (cid:101) とゴールドラベル Yi, let Yunseen,i = Yi \ Yseen and (cid:101)Yunseen,i = (cid:101)Yi \ Yseen。
0.86
(cid:101)Yunseen,k and Yunseen,k.
(→101)ユンシーク、ユンシーク、そしてユンシーク。
0.53
On top of these instance-
これらのインスタンスの上に
0.62
We calculate the standard Precision@K and Recall@K metrics considering these two sets,
これら2つの集合から標準精度@kとrecall@kを算出した。
0.69
wise metrics we also define a metric on the entire test set that measures how many of the unseen labels in the test set has the model produced at least once.
Wikipedia-1M (Gupta et al , 2021) is a large collection of Wikipedia articles associated with 1M+ Wiki categories.
Wikipedia-1M (Gupta et al , 2021) はウィキペディアの1M以上のカテゴリに関連付けられた膨大なコレクションである。
0.75
The above two datasets all contain some amount of unseen labels (see Table 1) but are on the two extreme sides of the spectrum: EURLex-4.3K only contains 163 unseen labels, whereas most of the labels in the test set of Wikipedia-1M are in fact not present in the training set.
In order to effectively study the open-vocabulary tagging properties of this new class of models, we construct a third dataset motivated by a real world scenario that aims to be in the middle of this spectrum.
This dataset does not contain unseen labels in its test set, so we create a new dataset by
このデータセットはテストセットに見えないラベルを含まないので、新しいデータセットを作成します。
0.74
1) randomly choosing 1000 labels from the set of labels that appear in the training split and
1)トレーニング分割に現れるラベルの集合から1000のラベルをランダムに選択し、
0.75
2) moving all examples in the training set that contain any of these 1000 labels to the test set.3
2)これらの1000のラベルを含むトレーニングセットのすべての例をテストセットに移動させる。
0.84
We refer to this newly introduced version of the AmazonCat13K dataset as AmazonCat-OV, as it enables measuring the Open Vocabulary performance of models.
Table 1: Basic statistics of datasets used in this work
表1:この研究で使われるデータセットの基本統計
0.88
N LSR@K = 5.2 Evaluation Metrics We expect two basic properties from the proposed new class of models:
N LSR@K = 5.2 評価指標 提案する新しいモデルから2つの基本的な特性を期待する。
0.60
• Irrespective of the new labels, these models need to perform just as well as other XMC models on the overall dataset (including more frequent tags too).
• Additionally, we expect our proposed models to produce some of the labels that it has never seen and has no knowledge of - demonstrating some understanding of the structure of the label space and the ability to generalize beyond a predefined taxonomy.
Soft-matching based metrics Since the model has no knowledge of what the gold labels might look like, it is possible that it would produce some labels that are semantically equivalent to a gold label but would have a slightly different surface form.
On datasets where validation set is not provided, we train for a fix number of Epochs (100 and 1 for EURLex and AmazonCat-OV respectively) and use beam size 15 for decoding.
Then, we look at the out of vocabulary performance by relaxing the definition of label matching to account for semantically similar labels with different surface forms.
Table 2 contains our results on entire label set, as measured by the PSP@K metric introduced above.
表2は、上述のPSP@Kメトリックによって測定された、ラベルセット全体の結果を含む。
0.77
Given the large number of XMC models available today, we only show the top-few best performing models from each family of models that we referenced in Section 2.
Our simplest method that uses T5 as-is outperforms many of the XMC models developed in the past years.
T5as-isを用いた最も単純な方法は、過去数年間に開発されたXMCモデルよりも優れている。
0.56
Using the methods described in Section 4 we established a system that performs on par with the best available model on EUR-Lex4.3K and is the second-best model on Wikipedia-1M, only 2% point below the designed explicitly for the zero-shot model.
Our scoring by marginalization improves the performance in Wikipedia-1M dataset, especially at the top 3 and 5 tags, showing it effectively builds a calibrated score for labels.
We conjecture the generative model learns to output the more confident tags first and then moves to the less confident ones.
生成モデルは、まずより確実なタグを出力し、次に信頼性の低いタグに移動することを学習する。
0.53
Our MultiSoftmax loss consistency improves the performance in comparison to the base model.
我々のMultiSoftmax損失の整合性はベースモデルと比較して性能を向上する。
0.73
6.2 Out-Of-Vocabulary Performance What distinguishes our model from previous zeroshot approaches is that it is able to generate previously unseen labels without being told about their existence in advance.
Table 3 shows our measurements of recall and precision when only considering unseen labels.
表3は、未確認ラベルのみを考慮すると、リコールと精度の測定値を示す。
0.59
For this section, we use the two larger datasets with a reasonably large set of unseen labels.
このセクションでは、かなり大きなラベルセットを持つ2つの大きなデータセットを使用します。
0.78
To our best knowledge no other XMC system can achieve a non-zero score in this setting.
我々の最善の知識では、この設定で他のxmcシステムはゼロではないスコアを達成できない。
0.59
Recall@K metrics on both of these datasets demonstrate that the model can generalize beyond the labels it has seen and produce correct, novel labels in some percentage of the cases - although there is room for significant improvements still.
A highlight is that on the AmazonCat-OV dataset, nearly one-quarter of the labels that we removed from the training set were generated as the top outof-vocabulary prediction at least once in the test set.
Due to the ambiguous nature of evaluating open-vocabulary tags produced by generative models, recall and precision measurements based on exact label match are merely a lower bound on the practical performance of the model.
We investigate this further in the following sections and find that these numbers are underestimating our model’s true ability to produce previously unseen but valid tags.
Additionally, the mismatch can be due to related terms or synonyms being generated instead of the exact label (for example ”Kids’ books” instead of ”Childrens’ books”).
Metrics like precision and recall would count all such generations as false positives, and this may not accurately describe the generative model’s performance.
To tackle this, we also measure soft precision and soft recall.
これに対処するため、ソフト精度とソフトリコールも測定します。
0.56
We introduce Soft Lexical Recall/Precision, which addresses the lexical differences.
語彙差に対処するソフトレキシカルリコール/精度を導入する。
0.58
These metrics work exactly in the same way as normal precision and recall with the difference that any generated label ˆY is matched with a label from the golden set Y if their edit distance is smaller than | ˆY |/DF + 1, where DF is the division factor used to regulate
これらのメトリクスは、通常の精度と全く同じ方法で動作し、生成したラベル sy がgolden set y のラベルと一致する場合、編集距離が | sy |/df + 1 よりも小さい場合、df が制御に使用される分割係数である場合の違いを思い出す。
0.75
英語(論文から抽出)
日本語訳
スコア
Algorithm EUR-Lex 4.3K
アルゴリズム EUR-Lex 4.3K
0.47
Wikipedia-1M
Wikipedia-1M
0.24
PSP@1 PSP@3
psp@1 PSP@3
0.63
PSP@5 PSP@1
psp@5 psp@1
0.83
PSP@3 PSP@5
PSP@3 psp@5
0.63
GROOV + sorted by marginal probabilities + MSM + T5-large ZestXML-tuned (Gupta et al , 2021) AttentionXML (You et al , 2019) XReg (Prabhu et al , 2020) Parabel (Prabhu et al , 2018) DiSMEC (Babbar and Sch¨olkopf, 2017) Bonsai (Khandagale et al , 2019) PfastreXML (Jain et al , 2016) FastText ANNS (Joulin et al , 2017) BERT ANNS (Reimers and Gurevych, 2019)
GROOV + sorted by marginal probabilities + MSM + T5-large ZestXML-tuned (Gupta et al , 2021) AttentionXML (You et al , 2019) XReg (Prabhu et al , 2020) Parabel (Prabhu et al , 2018) DiSMEC (Babbar and Sch solkopf, 2017) Bonsai (Khandagale et al , 2019) PfastreXML (Jain et al , 2016) FastText ANNS (Joulin et al , 2017) BERT ANNS (Reimers and Gurevych, 2019)
Table 3: Performance of our best performing models on the set of unseen labels
表3: 目に見えないラベルのセットにおける最高のパフォーマンスモデルのパフォーマンス
0.78
the flexibility and accuracy of this matching.
このマッチングの柔軟性と正確さです
0.81
In our measurements we set DF = 10.
我々の測定では df = 10 と設定した。
0.65
We also introduce Soft Semantic Recall/Precision to address the problem with slightly different formulations of the same label or synonym words in the labels.
Similar to the Soft Lexical metrics described above, we change the matching criteria between ˆY and Y from exact lexical match to a BertScore (Zhang et al , 2020) based metric.
上述のソフト・レキシカル・メトリックと同様に、正確なレキシカル・マッチからバーツコア(zhang et al, 2020)ベースのメトリックへ、yとyのマッチング基準を変更します。
0.55
We check the F1 score generated by BertScore and use a threshold of 0.94
我々はBertScoreが生成したF1スコアをチェックし、閾値0.94を使用する。
0.59
4. This threshold is selected to make sure soft semantic matches correlates highly with sensibility in our human evaluation.
Still, we observe significant improvement in our precision/recall compared to the exact match, confirming that the model generates some correct tags with slight surface differences.
7.1 In our experiment with the AmazonCat-OV dataset, our model correctly generated more than 400 different, novel categories that only appeared in the test set as ground truth labels.
In order to qualitatively understand what type of model behavior led to producing these labels, we manually compared the input texts and the generated novel labels.
We found that in most cases (89%) the model effectively employs a very simple two-step strategy.
ほとんどのケース(89%)では、モデルは極めて単純な2ステップ戦略を効果的に採用しています。
0.69
First it identifies an n-gram in the input text that could be a meaningful category.
まず、意味のあるカテゴリである入力テキスト中のn-gramを識別する。
0.73
Then the model decides if it makes sense to generate a label that is the verbatim copy of this n-gram (”London”, ”Table Tennis”, ”Bartending”) or alternatively, it converts the n-gram into its plural form (”Kitchen Sinks”, ”Sleeping Pads”).
In the rest of the cases (11%), however, we found evidence that the model is able to creatively compose information from across the item description in order to produce a label that does not appear verbatim in the text.
The labels that are predicted correctly(potentiall y with soft lexical matching) are colored green.
正しく予測されるラベル(潜在的にはソフトレキシカルマッチング)は緑色である。
0.73
Those predicted falsely from the label set are colored red.
ラベルセットから誤って予測されたものは、赤です。
0.68
The labels that could not be matched with any labels from the known label set are colored blue.
既知のラベルセットのどのラベルとも一致しないラベルは、青色である。
0.67
In Figure 1a we see that the model generates several completely novel labels ”eyebrow pencils”, ”eyebrow treatments” and both singular and plural forms of ”eyebrow”.
Taxonomists could use such a prediction to improve the taxonomy and potentially the training dataset itself.
分類学者はこのような予測を分類学や訓練データセット自体の改善に利用することができる。
0.55
On the contrary, the novel label generated in Figure 1b is not related to the input text at all and is just a false positive.
それとは逆に、図1bで生成された新規ラベルは入力テキストとは無関係であり、単に偽陽性である。
0.71
7.2 Sensibleness and Informativeness of
7.2 感性と情報性
0.81
Novel Labels Sometimes the model generates completely new terms that do not appear as a ground truth label in the test set.
新規ラベル 時には、モデルはテストセットの基底真理ラベルとして現れない全く新しい用語を生成する。
0.67
Even though these could indeed be false positives - as no taxonomy is ever complete - they could also be sensible, and informative new tags that could help the taxonomists expand the known label set.
Due to this, our quantitative precision results might significantly underestimate the usefulness of the generated novel labels.
これにより,生成する新規ラベルの有用性を定量的に過小評価できる可能性が示唆された。
0.59
We inspected a random sample of 100 model predictions (142 novel labels) containing out of vocabulary labels and manually assessed their sensibleness and informativeness using human review.
This is similar to the work of Shuster et al (2021), where Consistency, Engagingness, and Knowledgeability of the responses of generative models in a conversational setting were manually measured.
これはshuster et al(2021年)の作品と似ており、会話環境における生成モデルの応答の一貫性、エンゲージメント、および知識性が手動で測定された。
0.70
We focus on the two characteristics of sensible and informative as a new tag in the taxonomy needs to be both.
我々は,分類学における新しいタグとして,センシブルとインフォメーションの2つの特徴に注目した。
0.73
It needs to make sense while being different enough from existing labels.
既存のラベルと十分に異なることは理にかなっている必要がある。
0.61
In Figure 1 we present two examples of novel, entirely out of vocabulary generated labels.
図1では、語彙生成ラベルから完全に外れた、小説の2つの例を示す。
0.72
The color-map denotes the lexical similarity of generated predictions to the golden set, with gold meaning a perfect match and black being a complete mismatch.
The Y-Axis of the color map corresponds to the golden set labels, and the individual labels in the golden set are colored gold when they are missing from the training set.
We also want to measure the ability of the semantic soft matching introduced in section 6.3 against the newly introduced sensitive and informative framework.
We see in Table 5 that using the semantic matching with the mentioned threshold detects with 96% precision the sensibleness and it also improves the precision for detecting informativeness.
Some more examples of these novel labels generated by the model and their evaluation based on the sensible and informative characteristics can be found in Appendix A. Note that as this manual labeling process is expensive and time-consuming, our initial sample sets have been small.
of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM ’19, page 528–536, New York, NY, USA.
第12回 acm international conference on web search and data mining, wsdm ’19, page 528–536, new york, ny, usa (英語) 訳抜け防止モード: 第12回 ACM International Conference on Web Search and Data Mining に参加して WSDM ’19 page 528–536, New York, NY, USA.
0.92
Association for Computing Machinery.
アソシエーション・フォー・コンピューティング・マシンズ(Association for Computing Machinery)の略。
0.36
Himanshu Jain, Yashoteja Prabhu, and Manik Varma.
Himanshu Jain, Yashoteja Prabhu, Manik Varma
0.28
Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications.
Recommendation, Tagging, Ranking & Other Missing Label Applications のマルチラベルロス関数。
0.71
Himanshu Jain, Yashoteja Prabhu, and Manik Varma.
Himanshu Jain, Yashoteja Prabhu, Manik Varma
0.28
2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications.
アソシエーション・フォー・コンピューティング・マシンズ(Association for Computing Machinery)の略。
0.36
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov.
アルマン・ジュリン、エドゥアルド・グレイヴ、ピョートル・ボヤノフスキー、トマス・ミコロフ。
0.37
2017. Bag of tricks for efficient text classification.
2017. 効率的なテキスト分類のためのトリックの袋。
0.55
In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427–431, Valencia, Spain.
2018. A deep reinforced sequence-to-set model for multi-label text classification.
2018. マルチラベルテキスト分類のための深層強化シーケンス・ツー・セットモデル
0.53
Ian E.H. Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing.
Ian E.H. Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, Eric Xing
0.40
2017. Ppdsparse: A parallel primal-dual sparse method for extreme classification.
2017. ppdsparse: 極端分類のための並列原始的スパース法。
0.59
KDD ’17, New York, NY, USA.
kdd ’17, new york, new york, usa. (英語)
0.68
Association for Computing Machinery.
アソシエーション・フォー・コンピューティング・マシンズ(Association for Computing Machinery)の略。
0.36
Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon.
Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, Inderjit Dhillon。
0.41
2016. Pd-sparse : A primal and dual sparse approach to extreme multiclass and multilabel classification.
2016. Pd-スパース : 極端多クラス・多ラベル分類への原始的・二重スパースアプローチ
0.55
In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 3069–3077, New York, New York, USA.
2020. Extreme regression for dynamic search advertising.
2020. 動的検索広告における極端回帰
0.59
WSDM ’20, page 456–464, New York, NY, USA.
wsdm ’20, page 456–464, new york, ny, usa. (英語)
0.65
Association for Computing Machinery.
アソシエーション・フォー・コンピューティング・マシンズ(Association for Computing Machinery)の略。
0.36
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
コリン・ラフェル、ノーム・シャザー、アダム・ロバーツ、キャサリン・リー、シャラン・ナラン、マイケル・マテナ、ヤンチー・周、ウェイ・リー、ピーター・j・リュー。 訳抜け防止モード: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li ピーター・J・リュー(Peter J. Liu)。
0.92
2019. Exploring the Limits of Transfer Learning with a Unified Textto-Text Transformer.
2019. 統一テキストからテキストへのトランスフォーマによるトランスファー学習の限界を探る。
0.55
Journal of Machine Learning Research, 21:1–67.
Journal of Machine Learning Research, 21:1–67。
0.38
Nils Reimers and Iryna Gurevych.
Nils ReimersとIryna Gurevych。
0.81
2019. SentenceBERT: Sentence embeddings using Siamese BERTnetworks.
In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China.
Association for Computational Linguistics. Erik Schultheis, Mohammadreza Qaraei, Priyanshu Gupta, and Rohit Babbar.
計算言語学会会員。 Erik Schultheis, Mohammadreza Qaraei, Priyanshu Gupta, Rohit Babbar
0.42
Unbiased Loss Functions for Extreme Classification With Missing Labels.
欠失ラベルを用いた極端分類のための非バイアス損失関数
0.68
Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston.
カート・シュスター、スペンサー・ポフ、モヤ・チェン、ドゥウェ・キエラ、ジェイソン・ウェストン。
0.50
2021. Retrieval augmentation reduces hallucination in conversation.
2021. 検索増強は会話の幻覚を減少させる。
0.48
Oriol Vinyals, Samy Bengio, and Manjunath Kudlur.
オリオール・ヴィニールス、サミー・ベンジオ、マンジュナス・クドゥル。
0.52
2016. Order matters: Sequence to sequence for sets.
2016. 順序問題: 集合のシーケンスからシーケンスへの順序。
0.59
Yue Wang, Jing Li, Irwin King, Michael R. Lyu, and Shuming Shi.
yue wang、jing li、irwin king、michael r. lyu、shuming shi。
0.55
2019. Microblog hashtag generation In Proceedvia encoding conversation contexts.
2019. マイクロブログハッシュタグ生成 Proceedvia における会話コンテキストの符号化
0.53
ings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1624–1633, Minneapolis, Minnesota.
ings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 1624–1633, minneapolis, minnesota (英語)
0.41
Association for Computational Linguistics. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´emi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush.
計算言語学会会員。 Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R ́emi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander M. Rush 訳抜け防止モード: 計算言語学会会員。 トーマス・ウルフ、lysandre、victor sanh、julien chaumond。 clement delangue, anthony moi, pierric cistac, tim rault, r ́emi louf, モーガン・ファントウィッツ ジョー・デイヴィソン サム・シュライファー パトリック・フォン・プラトン clara ma, yacine jernite, julien plu, canwen xu, teven le scao, sylvain gugger, mariama drame, quentin lhoestなど。 アレキサンダー・m・ラッシュ
0.56
2020. Transformers: State-of-the-art natural language processing.
2020. Transformers: 最先端の自然言語処理。
0.61
In Proceedings of the 2020 Conference on
2020年国際会議の議事録において
0.71
英語(論文から抽出)
日本語訳
スコア
A Appendix: Detailed Summary of Novel Generated Labels or Unseen Labels in Gold
A Appendix:ゴールドの新世代ラベルや未確認ラベルの詳細な概要
0.70
Set In this appendix we list the subset of novel generated labels or instances with unseen labels in their gold set by the model that we studied in section 7.
The color-map denotes the lexical similarity of generated predictions to the golden set with gold meaning a perfect match and black being a complete mismatch.
For this lexical similarity we use the Levenshtein distance similar to section 6.3.
この語彙的類似性については、セクション6.3と似たレヴェンシュテイン距離を用いる。
0.62
The Y-Axis of the color map corresponds to the golden set labels and the individual labels in the golden set are colored gold when they are missing from the training set.
The labels that are predicted correctly are colored green, those predicted falsely from the label set are colored red and the labels that could not be matched with any labels from the known label set are colored blue.
In the left column, we discuss each such novel generated label and evaluate it based on our sensible and informative framework.
左のコラムでは,これらの新奇なラベルについて論じ,その妥当性と情報的枠組みに基づいて評価する。
0.66
Table 6: A sample of predictions where the model generated novel labels on AmazonCat dataset
表6: モデルがAmazonCatデータセット上で新しいラベルを生成する予測のサンプル。
0.84
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
in”air kits”: take but sensible not informative as there is anvery other similar label set in could that have been generated ”intake system”: but sensible not informative
in "air kits": take but sensible not informative 類似のラベルセットが "intake system" で生成されているため、情報提供はできない。
0.70
gold K&N 57-9014-1 Fuel Injection Performance Kit Gen2 Air Intake Kit The kit replaces your vehicle’s restrictive factory air filter and air intake housing.
This tag seems to be missing from label set and the closest matching ones ”electric basses” and ”bass guitars” is missing from golden set other The forms with ”/” and ”and” similarly are sensible and informative
このタグはラベルセットから欠落しており、最も近い「エレクトリックベース」と「ベースギター」はゴールデンセットから欠落しているようです。 訳抜け防止モード: このタグはラベルセットに欠けているようで、最も近いのが”electric basses”である。 and ” bass guitars” is missing from golden set other the forms with ”/” ” と ” も同様に賢明で情報に富んでいる。
0.84
1/2 Carat Sterling Silver CZ Cross Stud Earrings The look of white gold at a silver price!
Rhodium is a metal that is part of the platinum family.
ロジウム(Rhodium)は、白金族に属する金属である。
0.72
High-end silver and gold are rhodium treated to prevent oxidation and to have the white shiny look associated with platinum and white gold.
ハイエンドの銀と金はロジウム処理により酸化を防ぎ、白金や白金と関連する白色の光沢を持つ。
0.73
These earrings’ rhodium finish will prevent them from tarnishing.
これらのイヤリングのロジウムフィニッシュは、汚染を防ぐ。
0.54
Dean Acoustic-Electric Bass Cutaway Satin Finish Offering a large body with deep, full tone, this Dean acoustic-electric bass guitar (model EABC) also looks great on stage with a handsome satin-finished top made of select spruce wood and an abalone sound hole accent.
Dean Acoustic-Electric Bass Cutaway Satin Finish: Dean Acoustic-Electric Bass Cutaway Satin Finish ディーアン・アコースティック・エレクトリック・ベース・ギター(モデルEABC)も舞台に登場しました。
0.53
It also features Dean’s passive pre-amp electronics, a 34-inch scale, and a rosewood fingerboard with pearl dotted inlays.
Specifications Top: Select spruce Body: Mahogany Neck: Mahogany Fingerboard: Rosewood with pearl dot inlays Bridge: Rosewood Scale: 34 inches Tuners: Die cast Electronics: Dean passive pre-amp Finish: Satin natural Dean EABC Electric Acoustic Bass is a Large Body, Big Sounding Acoustic Bass.
スペックトップ: select spruce body: mahogany neck: mahogany fingerboard: rosewood with pearl dot inlays bridge: rosewood scale: 34inch tuners: die cast electronics: dean passive pre-amp finish: satin natural dean eabc electric acoustic bassは大きなボディ、大きな音響ベースである。
0.83
Dean EABC comes with passive pre amp and is available in satin natural.
Dean EABCには受動的プリアンプがあり、自然界で利用できる。
0.72
Dean EABC is the BEST VALUE in a acoustic/electric bass on the market today.
Dean EABCは、今日市場に出回っているアコースティック/エレクトリックベース、BEST VALUEだ。
0.76
EABC Select Spruce Top 34” scale Mahogany bound neck Rosewood fingerboard Pearl DOT Inlayes Die Cast Tuners Set Neck Celluliod Binding/Rosette R...
EABC Select Spruce Top 34" scale Mahogany bound neck Rosewood fingerboard Pearl DOT Inlayes Die Cast Tuners Set Neck Celluliod Binding/Rosette R...
0.49
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
”ni-cad nails”: Not Sensible and not informative.
ni-cad nails”: 意味がなく、情報もない。
0.69
The input text is about nailers and not nails ”straight nails”: Not Sensible and not informative for similar reasons as above
The only difference you’ll feel between this and a traditional pneumatic is that you’re not tethered to an air hose.
従来の空気圧との違いは、空気ホースに繋がっていないことだ。
0.47
It’s just as fast and fires just as powerfully into both soft and hard joints.
ほぼ同じ速度で、柔らかく硬い関節に火をつけるのと同じくらい強力です。
0.57
We love that you can choose bump or sequential mode for precision or speed, something most nailers don’t offer, and the integrated headlight is another impressive addition, really lighting up your workpiece even in the worst conditions.
The topic being discussed is Usability Inspection for UIs.
議論されているトピックはUIのユーザビリティ検査である。
0.64
The labels seems to be missing from both label set and golden set.
レーベルは、レーベルセットとゴールデンセットの両方に欠けているようだ。
0.72
mi- ”mono crophones”: Not Sensible and not informative as mono microphones are not mentioned in text ”single crophones”: Not Sensible and not informative similar for reasons as above
mi- Usability Inspection Methods Considered the founder of this research area, Nielsen presents a contributed exposition written by the foremost experts in this rapidly growing and important field.
Devised for user interface practitioners searching for cost-effective ways of improving their designs, the book begins with descriptions of simple discount usability engineering methods such as heuristic evaluation which can be learned quickly and immediately applied to the reader’s current project.
Later chapters cover more formal inspection techniques offering additional benefits and discuss practical aspects of comparing the methods and user testing along with suggestions for when to use what techniques.
The last few years have seen the emergence of usability inspection (UI) as an important new tool to help user interface designers and software developers guarantee that their products meet the highest standards of usability.
Everywhere UI methods have been implemented they have proven to be f...
UIメソッドが実装されているすべての場所で、fが証明されています。
0.50
Audio Technica ATM8010 ATM10a Artist Series Fixed-Charge ’Omni’ Condenser Microphone Ideal for group vocals, strings, cymbal overheads, acoustic guitar and piano.
Audio Technica ATM8010 ATM10a Artist Series Fixed-Charge ’Omni’ Condenser Microphone Ideal for group vocals, strings, cymbal overheads, acoustic guitar and piano. 訳抜け防止モード: audio technica atm8010 atm10a artist series fixed - charge ’ omni ’ condenser microphone ideal for group vocals 弦、シンバルオーバヘッド、アコースティックギター、ピアノ。
0.78
Omni pattern provides maximum ambient pickup.
Omniパターンは、最大環境ピックアップを提供する。
0.65
Extremely smooth, extended response on- and off-axis.
非常に滑らかで、伸長した応答がオン・オフ軸である。
0.47
Low sensitivity to popping and overload.
ポッピングやオーバーロードに対する感度が低い。
0.64
Operates on battery or phantom power.
バッテリーまたはファントムパワーで動作する。
0.78
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
”wrench holders”: Not sensible and not informative.
”wrenchholders”: 賢明で、情報的ではない。
0.79
”martini boxes”: Not sensible and not informative.
martini box”: 意味がなく,情報的でもない。
0.64
This mistake is perhaps due to the term ”Martin” being mentioned multiple times in another context in the input
この間違いは、入力の別の文脈で何度も言及される“Martin”という用語が原因だろう。
0.78
DEWALT DW2050 Quick Change 3-Inch Magnetic Bit Tip Holder DeWalt DW2050 Quick Change 3-Inch Magnetic Bit Tip Holder 115-DW2050 Magnetic Holder Quick Change Magnetic Holder Unit Sold is in measure of 1 Box
We’ve earned our reputation for excellence from over three decades of experience in providing automotive replacement parts, fasteners and service line products primarily for the automotive aftermarket.
Our prestigious position stems from a unique combination of application expertise, innovative product design, and breadth of product offerings, many of which are not conveniently or economically available elsewhere.
At Dorman, we take pride in the quality of our products and in your satisfaction.
Dormanでは、プロダクトの品質とあなたの満足度を誇りに思っています。
0.77
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
”kids’ books”: sensible but not informative as we have a similar known label ”childrens’ books”
キッドの本」:「子どもの本」と似たラベルがあるため、賢明だが有意義ではない
0.76
Science in Seconds at the Beach: Exciting Experiments You Can Do in Ten Minutes or Less Science in Seconds at the Beach teaches children dozens of activities that investigate the mysteries of animals, plants, sand, shells, sun and water.
science in seconds at the beach: エキサイティングな実験を10分以内に行うことができる ビーチでのエキサイティングな実験は、動物、植物、砂、貝殻、太陽、水のミステリーを調査する多くの活動を子供たちに教える。
0.81
Easy step-by-step instructions and illustrations are provided for each activity.
アクティビティ毎に簡単にステップバイステップの指示とイラストが提供される。
0.64
”–Asbury Park Press Surf’s up for science fun with these quick and easy activities.
Asbury Park Press Surf(アスベリー・パーク・プレス・サーフィン)は、こうした素早くて簡単な活動で科学を楽しみにしている。
0.51
This book offers over 150 quick and easy experiments that will help children investigate the mysteries of animals, plants, sand, shells, sun, and water.
Each activity takes ten minutes or less to complete, and answers a provocative question like: Do fish close their eyes?
各アクティビティの完了には10分かそれ以下を要し、挑発的な質問に答える:魚は目を閉じているか?
0.73
Can you hold your breath longer than a whale?
クジラより呼吸が長持ちできますか。
0.53
How is sand made? How can seaweed forecast the weather?
砂の作り方は? 海藻はどうやって天気を予報できますか。
0.68
Do all snail shells coil in the same direction?
すべての貝殻は同じ方向にコイルしますか。
0.79
And why do we seem to hear the ocean in empty sea shells?
なぜ海が空の貝殻で聞こえるのか?
0.55
Do fish close their eyes?
魚は目を閉じますか。
0.72
Can you hold your breath longer than a whale?
クジラより呼吸が長持ちできますか。
0.53
How is sand made? Why do we hear the ocean in e...
砂の作り方は? なぜ私たちは海を耳にしますか。
0.75
英語(論文から抽出)
日本語訳
スコア
Table 7: A sample of predictions where the model generated novel labels on Wiki dataset
表7:モデルがWikiデータセット上で新しいラベルを生成する予測のサンプル
0.81
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
in United ”Events the States”: sensible but not informative ”Events in WashingDS”: ton sensible but not informative in ”Dinners the United States”: sensible but not informative
in United ”events the states”: 賢明だが情報的ではない”events in washds”: ton sensible but not informative in ”dinners the united states”: 賢明だが、情報的ではない
0.61
White House Iftar dinner use American English date June 2017 use mdy dates date June 2017 The White House Iftar dinner is an annual reception held at the White House and hosted by the President of the United States U S President and the First Lady of the United States First Lady to celebrate the Muslim month of Ramadan The annual tradition started in 1996 when Hillary Clinton hosted a Ramadan Eid al Fitr Eid celebration Iftar dinner The modern iteration of the reception is attended by prominent members of the Muslim American community including politicians community leaders and students Thomas Jefferson held the first White House dinner with a Muslim while hosting Sidi Soliman Mellimelli an envoy of Beylik of Tunis on December 9 1805 during the First Barbary War lt ref gt cite web last Shellnutt first Kate date August 4 2011 title Thomas Jefferson held first White House Ramadan celebration website IIP Digital publisher blog chron com url http blog chron com believeitornot 2011 08 thoma...
White House Iftar dinner use American English date June 2017 use mdy dates date June 2017 The White House Iftar dinner is an annual reception held at the White House and hosted by the President of the United States U S President and the First Lady of the United States First Lady to celebrate the Muslim month of Ramadan The annual tradition started in 1996 when Hillary Clinton hosted a Ramadan Eid al Fitr Eid celebration Iftar dinner The modern iteration of the reception is attended by prominent members of the Muslim American community including politicians community leaders and students Thomas Jefferson held the first White House dinner with a Muslim while hosting Sidi Soliman Mellimelli an envoy of Beylik of Tunis on December 9 1805 during the First Barbary War lt ref gt cite web last Shellnutt first Kate date August 4 2011 title Thomas Jefferson held first White House Ramadan celebration website IIP Digital publisher blog chron com url http blog chron com believeitornot 2011 08 thoma... 訳抜け防止モード: ホワイトハウス・イブター・ディナー (White House Iftar dinner) は、2017年6月、アメリカ英語で、2017年6月、ホワイトハウス・イブター・ディナー (White House Iftar dinner) は、ホワイトハウスで毎年開催されるレセプションである。 アメリカ合衆国大統領によって主催された 1996年、ヒラリー・クリントンがラマダン・エイド・アル・フィトル・エイドの祝祭「イブター・ディナー」を開催した。 学生のトーマス・ジェファーソンは、1805年12月9日にチュニスのベイリクの使者であるシーディ・ソリマン・メリメッリ(Sidi Soliman Mellimelli)を第1回バーバリ戦争中の1805年12月9日に司会し、第1回シェルナット・ファースト・ケイト(Shellnutt first Kate) 2011年8月4日 タイトルのトーマス・ジェファーソン(Thomas Jefferson)は、最初のホワイトハウス・ラマダンの祝祭ウェブサイトIIP Digital Publishs chron com url http blog chron com believeitornot 2011 08 thoma.. .
0.65
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
”People’s Democratic Party Turkey Politicians”: sensible but not informative as there is another very label similar set in could that have been generated ”MEPs for 2014-19”: sensible informative
国民民主党のトルコ人政治家」:「2014-19」の「MEPs for 2014-19」が生み出された別の非常にラベルの似たセットが存在するため、賢明だが情報的ではない
0.64
Turkey gold and Feleknas Uca Use dmy dates date October 2013 Infobox officeholder name Feleknas Uca office Grand National Assembly of Turkey Composition Member of the Grand National Assembly honorific suffix Member of Parliament Turkey MP image Feleknas Uca jpg constituency Diyarbak r electoral district Diyarbak r June 2015 Turkish general election June 2015 November 2015 Turkish general election Nov 2015 lt br gt Batman electoral district Batman 2018 Turkish general election 2018 signature signature alt party Peoples Democratic Party Turkey Peoples Democratic Party lt br gt lt br gt otherparty Party of Democratic Socialism Germany Party of Democratic Socialism 1999 2007 lt br gt The Left Germany Die Linke 2007 2009 office1 Member of the European Parliament for Germany birth date Birth date and age 1976 09 17 birth place Celle Lower Saxony West Germany death date lt Death date and age YYYY MM DD YYYY MM DD gt death place resting place nationality alma mater occupation website awards image size 220px t...
トルコ 金 そして Feleknas Uca Use dmy dates date October 2013 Infobox officeholder name Feleknas Uca office Grand National Assembly of Turkey Composition Member of the Grand National Assembly honorific suffix Member of Parliament Turkey MP image Feleknas Uca jpg constituency Diyarbak r electoral district Diyarbak r June 2015 Turkish general election June 2015 November 2015 Turkish general election Nov 2015 lt br gt Batman electoral district Batman 2018 Turkish general election 2018 signature signature alt party Peoples Democratic Party Turkey Peoples Democratic Party lt br gt lt br gt otherparty Party of Democratic Socialism Germany Party of Democratic Socialism 1999 2007 lt br gt The Left Germany Die Linke 2007 2009 office1 Member of the European Parliament for Germany birth date Birth date and age 1976 09 17 birth place Celle Lower Saxony West Germany death date lt Death date and age YYYY MM DD YYYY MM DD gt death place resting place nationality alma mater occupation website awards image size 220px t...
0.65
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
films”: and ”Valhalla Enterteinment sensible informative is as another very label similar set in that could have been generated
there gold Armageddon (1998 film) use mdy dates date June 2012 Infobox film name Armageddon image Armageddon poster06 jpg alt caption Theatrical release poster director Michael Bay producer Plainlist Jerry Bruckheimer Gale Anne Hurd Michael Bay screenplay Plainlist Jonathan Hensleigh J J Abrams story Plainlist Robert Roy Pool Jonathan Hensleigh starring plainlist Bruce Willis Billy Bob Thornton Liv Tyler Ben Affleck Will Patton Peter Stormare Keith David Steve Buscemi narrator lt Used in documentaries only gt music Plainlist Trevor Rabin cinematography John Schwartzman editing Plainlist Mark Goldblatt Chris Lebenzon Glen Scantlebury studio Plainlist Touchstone Pictures Jerry Bruckheimer Films Valhalla Entertainment Valhalla Motion Pictures distributor Buena Vista Pictures released Film date 1998 07 01 runtime 151 minutes lt Theatrical runtime 150 20 gt lt ref gt cite web url https bbfc co uk releases armageddon 1970 6 title ARMAGEDDON 12 work British Board of Film Classification date July 7 1998 ...
そこ 金 Armageddon (1998 film) use mdy dates date June 2012 Infobox film name Armageddon image Armageddon poster06 jpg alt caption Theatrical release poster director Michael Bay producer Plainlist Jerry Bruckheimer Gale Anne Hurd Michael Bay screenplay Plainlist Jonathan Hensleigh J J Abrams story Plainlist Robert Roy Pool Jonathan Hensleigh starring plainlist Bruce Willis Billy Bob Thornton Liv Tyler Ben Affleck Will Patton Peter Stormare Keith David Steve Buscemi narrator lt Used in documentaries only gt music Plainlist Trevor Rabin cinematography John Schwartzman editing Plainlist Mark Goldblatt Chris Lebenzon Glen Scantlebury studio Plainlist Touchstone Pictures Jerry Bruckheimer Films Valhalla Entertainment Valhalla Motion Pictures distributor Buena Vista Pictures released Film date 1998 07 01 runtime 151 minutes lt Theatrical runtime 150 20 gt lt ref gt cite web url https bbfc co uk releases armageddon 1970 6 title ARMAGEDDON 12 work British Board of Film Classification date July 7 1998 ...
0.60
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
”Bulgaria Under-20 international footballers”: sensible and informative
ブルガリアu-20国際サッカー選手」:賢明で情報に富む
0.66
Todor Kolev (footballer, born 1980) Other people Todor Kolev Use dmy dates date August 2012 Infobox football biography name Todor Kolev image Kolev todor jpg caption Kolev playing for Ludogorets in 2011 fullname Todor Aleksandrov Kolev birth date Birth date and age 1980 2 8 df y birth place Veliko Tarnovo Bulgaria height convert 1 86 m ftin 0 abbr on currentclub SFC Etar Veliko Tarnovo Etar II Etar Veliko Tarnovo II clubnumber 10 position Forward association football Forward youthyears1 youthclubs1 F C Etar Etar Veliko Tarnovo years1 1997 1999 clubs1 F C Etar Etar Veliko Tarnovo caps1 goals1 years2 1999 2005 clubs2 PFC Levski Sofia Levski Sofia caps2 55 goals2 16 years3 2000 2002 clubs3 PFC Spartak Pleven Spartak Pleven loan caps3 49 goals3 57 years4 2005 clubs4 PFC Marek Dupnitsa Marek Dupnitsa loan caps4 4 goals4 1 years5 2005 2007 clubs5 PFC Slavia Sofia Slavia Sofia caps5 55 goals5 32 years6 2007 2008 clubs6 Alemannia Aachen caps6 20 goals6 5 years7 2008 2010 clubs7 PFC Slavia Sofi...
Todor Kolev (footballer, born 1980) Other people Todor Kolev Use dmy dates date August 2012 Infobox football biography name Todor Kolev image Kolev todor jpg caption Kolev playing for Ludogorets in 2011 fullname Todor Aleksandrov Kolev birth date Birth date and age 1980 2 8 df y birth place Veliko Tarnovo Bulgaria height convert 1 86 m ftin 0 abbr on currentclub SFC Etar Veliko Tarnovo Etar II Etar Veliko Tarnovo II clubnumber 10 position Forward association football Forward youthyears1 youthclubs1 F C Etar Etar Veliko Tarnovo years1 1997 1999 clubs1 F C Etar Etar Veliko Tarnovo caps1 goals1 years2 1999 2005 clubs2 PFC Levski Sofia Levski Sofia caps2 55 goals2 16 years3 2000 2002 clubs3 PFC Spartak Pleven Spartak Pleven loan caps3 49 goals3 57 years4 2005 clubs4 PFC Marek Dupnitsa Marek Dupnitsa loan caps4 4 goals4 1 years5 2005 2007 clubs5 PFC Slavia Sofia Slavia Sofia caps5 55 goals5 32 years6 2007 2008 clubs6 Alemannia Aachen caps6 20 goals6 5 years7 2008 2010 clubs7 PFC Slavia Sofi...
0.48
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
John Borstlap John Borstlap 4 November 1950 Rotterdam is a Dutch composer lt ref gt cite book title Entartete Musik publisher Emanuel Overbeeke amp Leo Samama url https books google com id NydqmVZUhlEC amp pg PA175 amp lpg PA175 amp dq john borstlap v onepage amp q john 20borstlap amp f false isbn 9789053567159 year 2004 lt ref gt and author on cultural subjects related to music and the visual arts He claims to be rooted in German musical traditions and is a proponent of a revival of tonal and classical traditions
John Borstlap John Borstlap 4 November 1950 Rotterdam is a Dutch composer lt ref gt cite book title Entartete Musik publisher Emanuel Overbeeke amp Leo Samama url https books google com id NydqmVZUhlEC amp pg PA175 amp lpg PA175 amp dq john borstlap v onepage amp q john 20borstlap amp f false isbn 9789053567159 year 2004 lt ref gt and author on cultural subjects related to music and the visual arts He claims to be rooted in German musical traditions and is a proponent of a revival of tonal and classical traditions 訳抜け防止モード: ジョン・ボルストラップ・ジョン・ボルストラップ(John Borstlap John Borstlap, 1950年11月4日 - )は、オランダの作曲家。 lt ref gt cite book Emanuel Overbeeke amp Leo Samama url https book google com i d NydqmVzuhlEC amp pg PA175 amp lpg PA175 amp dq john borstlap v onepage amp q john 20borstlap amp f false isbn 9789053567159 year 2004 lt ref gt 音楽と視覚芸術に関する文化的な主題の著者で、彼はドイツの音楽伝統に根ざしていると主張している 古典的伝統の復活を提唱しています
0.92
英語(論文から抽出)
日本語訳
スコア
Novel Labels Lexical Similarity Map & Input Text
新規ラベル 語彙類似性マップと入力テキスト
0.71
”Artists from Changzhou”: sensible and informative ”Qianlong people”: sensible informative
長州美術家』:賢明で情報に富んだ『qianlong people』:センシブル・インフォメーション
0.64
and Yun Bing Infobox artist name Yun Bing native name native name lang zh birth place Wujin District Changzhou known for notable works Hairpin Scroll 1735 1796 lt br gt Quiet Provisions of the Studio 1735 1796 style Bird and flower painting quot Boneless quot technique movement spouse Mao Hongtiao module Infobox Chinese child yes t s p Y n B ng w Y n Ping altname Qingyu c2 linktext p2 Q ngy w2 Ch ing y patrons memorials Yun Bing zh c dates unknown courtesy names Qingyu zh c and Haoru zh c was a Chinese painter during the Qianlong era She is well known for her bird and flower painting s executing the quot boneless quot technique and became the most famed of the Yun family s female artists lt ref name lu gt cite title trans title Discussion of the achievements of the influential family near the mound the Yun clan language Chinese author Lu Haiyang journal Changzhou gong xueyuan xuebao shekeban volume 31 issue 1 date 2013 pages 1 7 lt ref gt
そして Yun Bing Infobox artist name Yun Bing native name native name lang zh birth place Wujin District Changzhou known for notable works Hairpin Scroll 1735 1796 lt br gt Quiet Provisions of the Studio 1735 1796 style Bird and flower painting quot Boneless quot technique movement spouse Mao Hongtiao module Infobox Chinese child yes t s p Y n B ng w Y n Ping altname Qingyu c2 linktext p2 Q ngy w2 Ch ing y patrons memorials Yun Bing zh c dates unknown courtesy names Qingyu zh c and Haoru zh c was a Chinese painter during the Qianlong era She is well known for her bird and flower painting s executing the quot boneless quot technique and became the most famed of the Yun family s female artists lt ref name lu gt cite title trans title Discussion of the achievements of the influential family near the mound the Yun clan language Chinese author Lu Haiyang journal Changzhou gong xueyuan xuebao shekeban volume 31 issue 1 date 2013 pages 1 7 lt ref gt 訳抜け防止モード: そして yun bing infobox artist name yun bing native name lang zh birth place wujin district changzhou famous works hairpin scroll 1735 1796 lt br gt quiet provisions of the studio 1735 1796 style bird そしてフラワーペインティング quot boneless quot technique movement spouse mao hongtiao module infobox chinese child yes t s p y n b ng w y n ping altname qingyu c2 linktext p2 q ngy w2 ch ing y patrons memorials yun bing zh c 年月日不詳 清州zhc ハオル・zhcは 清龍時代の中国の画家で 鳥や花の絵で有名で そして、ユン家の女性アーティストとして最も有名になった lt ref name lu gt は、ユン氏言語学者lu haiyang journal changzhou gong xueyuan xuebao shekeban volume 31 issue 1 date 2013 pages 1 7 lt ref gt