Motivated by the desire to generate labels for real-time data we develop a
method to estimate the dependency structure and accuracy of weak supervision
sources incrementally. Our method first estimates the dependency structure
associated with the supervision sources and then uses this to iteratively
update the estimated source accuracies as new data is received. Using both
off-the-shelf classification models trained using publicly-available datasets
and heuristic functions as supervision sources we show that our method
generates probabilistic labels with an accuracy matching that of existing
off-line methods.
Accuracy Estimation Richard Correro, Department of Statistics, Stanford University,
精度推定 スタンフォード大学統計学部リチャード・コレロ教授。
0.73
1 Motivated by the desire to generate labels for real-time data we develop a method to estimate the dependency structure and accuracy of weak supervision sources incrementally.
Our method first estimates the dependency structure associated with the supervision sources and then uses this to iteratively update the estimated source accuracies as new data is received.
Using both off-the-shelf classification models trained using publicly-available datasets and heuristic functions as supervision sources we show that our method generates probabilistic labels with an accuracy matching that of existing off-line methods.
[2]. By combining multiple supervision sources and modeling their dependency structure we may infer the true labels based on the outputs of the supervision sources.
Using this we may estimate the conditional density
これを用いて条件密度を推定できる
0.81
f (y, λ)
f (複数形 fs)
0.65
(1) These sources may take many forms but we restrict ourselves to the case in which λi ∈ {0, . . . , k} and thus the label functions generate labels belonging to the same domain as Y .
Varma et al [3] and Ratner, et al [4] model the joint distribution of λ1, . . . , λm, Y in the classification setting as a Markov
varma et al [3] and ratner, et al [4] model the joint distribution of λ1, . . . . , λm, y in the classification setting as a markov 訳抜け防止モード: varma et al [3 ] と ratner, et al [4 ] はλ1, ... の合同分布をモデル化する。 マルコフとしての分類設定におけるλm , y
0.81
Random Field fG(λ1, . . . , λm, y) =
ランダムフィールド fg(λ1, . . . , λm, y) =
0.59
1 Z exp θiλi +
1Z exp θiλi +
0.48
θi,jλiλj + θY y +
θi,jλiλj + θY y +
0.34
θY,yyλi II.
θy,yyλi II。
0.54
RELATED WORK (cid:88)
関連作業 ※(第88回)
0.48
λi∈V (cid:88)
λiōV (cid:88)
0.33
(λi,λj )∈E
(λi,λj )htmle
0.34
(cid:88) λi∈V
(cid:88) λiōV
0.33
associated with graph G = (V, E) where θi,j 1 ≤ i, j ≤ m + 1 denote the canonical parameters associated with the supervision sources and Y , and Z is a partition function [here V = {λ1, . . . , λm} ∪ {Y }].
θi,j 1 ≤ i, j ≤ m + 1 であるグラフ G = (V, E) に付随し、Z は分割函数(以下 V = {λ1, . . , λm} > {Y }] である。 訳抜け防止モード: θi のグラフ g = ( v, e ) に付随する。 j 1 ≤ i, j ≤ m + 1 は監督元に関連する標準パラメータを表す。 y と z は分配関数 [ ここで v = { λ1, ..., λm {\displaystyle λm}} は y {\displaystyle y} である。
0.59
If λi is not independent of λj conditional on Y and all sources λk, k ∈ {1, . . . , m} \ {i, j}, then (λi, λj) is an edge in E.
λi が y 上の λj 条件とすべてのソース λk, k ∈ {1, . . . , m} \ {i, j} とは独立でないなら、(λi, λj) は e の辺である。
0.89
Let Σ denote the covariance matrix of the supervision sources and Y .
and without the ground truth labels, Varma et al assume that G is sparse and therefore that the inverse covariance matrix Σ−1 associated with λ1, . . . , λm, Y is graph-structured.
そして、基底の真理ラベルがなければ、ヴァルマらは G はスパースであり、従って逆共分散行列 Σ−1 は λ1, . , λm, Y に付随する。
0.72
Since Y is a latent variable the full covariance matrix Σ is unobserved.
Y は潜在変数であるため、完全共変行列 Σ は観測されない。
0.65
We may write the covariance matrix in block-matrix form as follows:
共分散行列をブロック行列形式で書くことができる。
0.62
Cov[O ∪ S] := Σ =
Cov[O > S] := Σ =
0.41
(cid:20) ΣO ΣOS
(→20)ΣOΣOS
0.62
(cid:21) ΣT
(出典:21) ΣT
0.53
OS ΣS
OS ΣS
0.44
英語(論文から抽出)
日本語訳
スコア
Inverting Σ, we write ΣO may be estimated empirically:
Σ を反転して書く ΣO は経験的に推定できる。
0.65
Σ−1 = ˆΣO =
Σ−1 = シュΣO =
0.47
ΛΛT n − ννT (cid:20) KO KOS
~Tn -ννT (系統:20)甲子
0.42
K T OS KS (cid:21)
K T OS KS (出典:21)
0.51
2 where Λ = [λ1λ2, . . . , λn] denotes the m × n matrix of labels generates by the sources and ν = ˆE[O] ∈ Rm denotes the observed labeling rates.
2 ここで λ = [λ1λ2, . . , λn] は源によって生成されるラベルの m × n 行列を表し、ν = νE[O] ∈ Rm は観測されたラベル付け率を表す。
0.59
Using the block-matrix inversion formula, Varma et al show that
ブロック行列逆数公式を用いて、Varma et al は、
0.64
where c = (ΣS − ΣT
ここで c = (σs − σt) である。
0.43
OSΣ−1 KO = Σ−1 O + cΣ−1 √ O ΣOS)−1 ∈ R+.
OSΣ−1 ko = σ−1 o + cσ−1 かつ o σos)−1 ∈ r+ である。
0.37
Letting z = cΣ−1 O = KO − zzT Σ−1
z = cΣ−1 O = KO − zzT Σ−1 とする。
0.68
OSΣ−1 O ΣOS, they write
OSΣ−1 O ΣOS,
0.39
O ΣOSΣT O where KO is sparse and zzT is low-rank positive semi definite.
O ΣOSΣT お KO はスパースであり、zzT はローランク正半定値である。
0.49
Because Σ−1 matrix we may use Robust Principal Components Analysis [5] to solve the following:
Varma et al then show that we may learn the structure of G from KO and we may learn the accuracies of the sources from z using the following algorithm: Algorithm 1: Weak Supervision Structure Learning and Source Estimation Using Robust PCA (From [3]) Result: ˆG = (V, ˆE), ˆL Inputs : Estimate of covariance matrix ˆΣO, parameter γ, threshold T Solve : s.t. S − L = ˆΣ−1 ˆE ←− {(i, j) : i < j, ˆSi,j > T} Note that ˆL = zzT .
アルゴリズム 1: 弱スーパービジョン構造学習(英語版)(Weak Supervision Structure Learning and Source Estimation using Robust PCA (From [3]) 結果: ^G = (V, ^E), ^L Inputs : Estimate of covariance matrix ^ΣO, parameter γ, threshold T Solve : s.t. S − L = .t. L = .t. L . E ^ {(i, j) : i < j, ^Si,j > T} 注意。
0.64
Ratner, et al [4] show that we may estimate the source accuracies ˆµ from z and they propose a simpler algorithm for estimating z if the graph structure is already known: If E is already known we may construct a dependency mask Ω = {(i, j) : (λi, λj) (cid:54)∈ E}.
ratner, et al [4] は、z からソース accuracies を推定し、グラフ構造が既に知られているならば z を推定するためのより単純なアルゴリズムを提案している: e が既に知られているならば、依存性マスク ω = {(i, j) : (λi, λj) (cid:54)ψ e} を構成することができる。
0.78
They use this in the following algorithm: Algorithm 2: Source Estimation for Weak Supervision (From [4]) Result: ˆµ Inputs : Observed labeling rates ˆE[O] and covariance ˆΣO; class balance ˆE[Y ] and variance ˆΣS; dependency mask Ω ˆz ←− argminZ|| ˆΣ−1 ˆc ←− Σ−1 √ S (1 + ˆzT ˆΣO ˆz) ˆΣOS ←− ˆΣO ˆz/ ˆµ ←− ˆΣOS + ˆE[Y ]ˆE[O] Snorkel, an open-source Python package, provides an implementation of algorithm 2 [6].
Although the algorithm proposed by Varma et al may be used determine the source dependency structure and source accuracy, it requires a robust principal components decomposition of the matrix ˆΣO which is equivalent to a convex Principal Components Pursuit (PCP) problem [5].
Both algorithms, however, require the observed labeling rates and covariance estimates of the supervision sources over the entire dataset and therefore cannot be used in an on-line setting.
We therefore develop an on-line approach which estimates the structure of G using algorithm 1 on an initial ”minibatch” of unlabeled examples and then iteratively updates the source accuracy estimate ˆµ using using a modified implementation of algorithm 2.
Given an initial batch b1 of unlabeled examples Xb1 = {x1, . . . , xk} we estimate G by first soliciting labels λ1, . . . , λk for Xb1 from the sources.
Using the fact that ˆL = zzT we recover ˆz by first calculating
L = zzT であるという事実を用いて、最初に計算することで tz を回復する。
0.49
IV. METHODS (cid:113)
IV。 方法 (cid:113)
0.34
|ˆz| = diag( ˆL)
|-z| = diag (複数形 diags)
0.49
3 We then break the symmetry using the method in [4].
3 そして、[4]でこの方法を用いて対称性を壊す。
0.60
Note that if a source λi is conditionally independent of the others then the sign of zi determines the sign of all other elements of z.
ソース λi が条件付きで他の要素と独立であれば、zi の符号は z の他のすべての要素の符号を決定する。
0.80
Using ˆz, ˆE[O], ˆΣOb1, class balance prior ˆE[Y ] and class variance prior ˆΣS we calculate ˆµ, an estimate of the source accuracies [if we have no prior beliefs about the class distribution then we simply substitute uninformative priors for ˆE[O] and ˆΣOb1].
z, s[o], sσob1, class balance prior se[y ], class variance before sσs を用いて、ソース・アキュラシーの推定値である sμ を計算する [もしクラス分布について事前の信念がなければ、単に se[o] と sσob1] の非インフォーマティブな前処理を置換するだけである]。
0.67
For each following batch bp of unlabeled examples Xbp we estimate ΣObp and E[O]bp.
ラベルのない例 Xbp の各バッチ bp に対して、ΣObp と E[O]bp を推定する。
0.70
Using these along with ˆE[O] and ˆΣOb1 we calculate ˆµbp, an estimate of the source accuracies over the batch.
これらを用いて、このバッチ上でのソース精度の推定値である「μbp」を計算します。
0.62
We then update ˆµ using the following update rule:
次に以下のアップデートルールを使って、μ を更新します。
0.64
ˆµ := (1 − α)ˆµ + αµbp
ˆµ := (1 − α)ˆµ + αµbp
0.44
Using the estimated source accuracies and dependency structure we may estimate p(y, λ) which we may then use to estimate
Algorithm 3: Incremental Source Accuracy Estimation Result: ˆµ Inputs : Observed labeling rates ˆE[O]b and covariance ˆΣOb; class balance ˆE[Y ] and variance ˆΣS for each batch b do
The first model was trained using a subset of the IMDB movie reviews dataset which consists of a corpus of texts labeled by perceived sentiment [either ”positive” or ”negative”].
Because the labels associated with this dataset are binary the classifier generates binary labels.
このデータセットに関連付けられたラベルはバイナリであるため、分類器はバイナリラベルを生成する。
0.58
The second classifier was trained using another openly-available dataset, this one consisting of a corpus of text extracted from tweets associated with air carriers in the United States and labeled according to sentiment.
These labels in this dataset belong to three seperate classes [”positive”, ”neutral”, and ”negative”] and therefore the model trained using this dataset classifies examples according to these classes.
The final supervision source is the Textblob Pattern Analyzer.
最後の監視ソースはTextblob Pattern Analyzerである。
0.76
This is a heuristic function which classifies text by polarity and subjectivity using a lookup-table consisting of strings mapped to polarity/subjectivit y estimates.
• If polarity is greater than 0.33 we generate a positive label • If polarity is less than or equal to 0.33 but greater than -0.33 we generate a neutral label • If polarity is less than or equal to 0.33 we generate a negative label
Test Data We test our incremental model using a set of temporally-ordered text data extracted from tweets associated with a 2016 GOP primary debate labeled by sentiment [”positive”, ”neutral”, or ”negative”].
We do so by solicting labels λ1, . . . , λn associated with the n examples from the three supervision sources.
我々は、ラベル λ1, . . . , λn を3つの監督源の n 個の例に関連づけることで、そうする。
0.68
Weak Supervision as Transfer Learning
転校学習としての弱い監督
0.53
Note that this setting is an example of a transfer learning problem [7].
この設定は転写学習問題の例 [7] である。
0.64
Specifically, since we are using models pre-trained on datasets similar to the target dataset we may view the Naive Bayes models as transferring knowledge from those two domains [Tweets associated with airlines and movie reviews, respectively] to provide supervision signal in the target domain [7].
Data Folding Procedure: We split the text corpus into five folds.
データ折り畳み手順:テキストコーパスを5つの折りたたみに分割する。
0.73
The examples are not shuffled to perserve temporal order within folds.
例は、折りたたみの中で時間的順序を保つためにシャッフルされない。
0.55
Using these folds we perform 5 separate tests, each using four of the five folds in order.
これらの折り目を使って5つの別々のテストを行い、それぞれ5つの折り目のうち4つを順番に使います。
0.56
For example, the fifth test uses the fold 5 and folds 1—3, in that order.
例えば、第5テストでは、折りたたみ5と折りたたみ1〜3をその順番で使用する。
0.73
Partition Tests: For each set of folds we further partition the data into k = 100 batches of size q which we refer to as ”minibatches” [as they are subsets of the folds].
Note that both pretrained classifiers first transform the text by tokenizing the strings and then calculating the term-frequency to inverse document frequency (Tf-idf) for each token.
method by comparing the generated labels ˆy with the ground-truth labels y:
生成されたラベル y と接地木ラベル y を比較する方法
0.72
accuracy(y, ˆy) = 1 q
精度(y,y) = 1q
0.36
1(ˆyi = yi) q−1(cid:88)
1(シュイ=イイ) q−1(cid:88)
0.48
i=0 We then average the accuracy scores associated with each minibatch over the number of minibatches used in each test to calculate the average per-test accuracy [calculated using four of the five folds of the overall dataset].
We then compare the average accuracies of the labels produced using our incremental method to the accuracies of the labels produced by an existing off-line source accuracy estimation method based on algorithm 2 [6].
Using these this method generates its own set of generated labels ˆybaseline with which we then calculate the baseline accuracy using the accuracy metric above.
Finally, we compare the accuracy of the labels generated by our method with the accuracy of the labels generated by each
最後に,提案手法が生成するラベルの精度と,各ラベルが生成するラベルの精度を比較した。
0.81
Comparing Values of α: We then follow the same procedure as above to generate labels for our method, except this time
α の値を比較する: 今回は例外を除いて、上と同じ手順でメソッドのラベルを生成する。
0.80
of the supervision sources. we use different values of α.
監督の情報源です αの異なる値を使います
0.56
Our tests demonstrate the following: 1) Our model generates labels which are more accurate than those generated by the baseline [when averaged over all 5
This result is not surprising as we would expect our source accuracy estimate approaches the true accuracy ˆµ −→ µ as the number of examples seen increases.
This implies that the incremental approach we propose generates more accurate labels as a function of the number of examples seen, unlike the supervision sources which are pre-trained and therefore do not generate more accurate labels as the number of labeled examples grows.
These tests also suggest that an optimal value for α for this problem is approximately 0.05 which is in the interior of the set of values tested for α.
Since we used 100 minibatches in each test of the incremental model this implies that choosing an α which places greater weight on more recent examples yields better performance, although more tests are necessary to make any stronger claims.
This is not unexpected as the supervision sources were intentionally chosen to be ”off-the-shelf” models and no feature engineering was performed on the underlying text data, neither for the datasets used in pre-training the two classifier supervision sources nor for the test set [besides Tf-idf vectorization].
We develop an incremental approach for estimating weak supervision source accuracies.
我々は、弱い監督源の精度を推定するための漸進的なアプローチを開発する。
0.50
We show that our method generates labels for unlabeled data which are more accurate than those generated by pre-existing non-incremental approaches.
提案手法は,既存の非インクリメンタル手法よりも精度の高いラベル付きデータのラベルを生成する。
0.67
We frame our specific test case in which we use pre-trained models and heuristic functions as supervision sources as a transfer learning problem and we show that our method generates labels which are more accurate than those generated by the supervision sources themselves.