論文の概要、ライセンス

# (参考訳) 自然言語生成のためのインクリメンタルビーム操作 [全文訳有]

Incremental Beam Manipulation for Natural Language Generation ( http://arxiv.org/abs/2102.02574v1 )

ライセンス: CC BY 4.0
James Hargreaves, Andreas Vlachos, Guy Emerson(参考訳) 自然言語生成システムの性能は、現代のニューラルネットワークで大幅に向上した。 テスト時には通常、局所的に最適だがグローバルに最適化された予測を避けるためにビーム探索を用いる。 しかし, モデル誤差のため, ビームサイズが大きくなると, 評価基準による劣化が生じる可能性がある。 そのため、ビームサーチの出力を再帰することは一般的であるが、これはビームサーチに頼って仮説のよいセットを生成し、ポテンシャルゲインを制限する。 ビームサーチの他の代替手段では、ビームサーチと比較して適用性を制限するモデルのトレーニングを変更する必要がある。 本稿では,インクリメンタルビーム操作を提案する。 終端のみではなく、デコード中にビーム内の仮説を再ランク付けする。 このように、良い最終的な出力につながる可能性が低い仮説は破棄され、それらの場所で無視されたであろう仮説は代わりに考慮されます。 インクリメンタルビーム操作を適用すると、それぞれE2EとWebNLGの試験セットのバニラビーム探索よりも1.93と5.82のBLEUポイントが改善される。 提案手法は、WebNLGデータセットでそれと同等である一方で、E2Eチャレンジで1.04 BLEUポイントで強力なリランクラーを上回った。

The performance of natural language generation systems has improved substantially with modern neural networks. At test time they typically employ beam search to avoid locally optimal but globally suboptimal predictions. However, due to model errors, a larger beam size can lead to deteriorating performance according to the evaluation metric. For this reason, it is common to rerank the output of beam search, but this relies on beam search to produce a good set of hypotheses, which limits the potential gains. Other alternatives to beam search require changes to the training of the model, which restricts their applicability compared to beam search. This paper proposes incremental beam manipulation, i.e. reranking the hypotheses in the beam during decoding instead of only at the end. This way, hypotheses that are unlikely to lead to a good final output are discarded, and in their place hypotheses that would have been ignored will be considered instead. Applying incremental beam manipulation leads to an improvement of 1.93 and 5.82 BLEU points over vanilla beam search for the test sets of the E2E and WebNLG challenges respectively. The proposed method also outperformed a strong reranker by 1.04 BLEU points on the E2E challenge, while being on par with it on the WebNLG dataset.
公開日: Thu, 4 Feb 2021 12:26:47 GMT

※ 翻訳結果を表に示しています。PDFがオリジナルの論文です。翻訳結果のライセンスはCC BY-SA 4.0です。詳細はトップページをご参照ください。

翻訳結果

    Page: /      
英語(論文から抽出)日本語訳スコア
IncrementalBeamManip ulationforNaturalLan guageGenerationJames HargreavesTheTradeDe sk&UniversityofCambridg ejames.hargreaves@th etradedesk.comAndrea sVlachosUniversityof Cambridgeav308@cam.a c.ukGuyEmersonUniver sityofCambridgegete2 @cam.ac.ukAbstractTh eperformanceofnatura llanguagegenera-tion systemshasimprovedsu bstantiallywithmoder nneuralnetworks.Atte sttimetheytypi-cally employbeamsearchtoav oidlocallyopti-malbu tgloballysuboptimalp redictions.How-ever, duetomodelerrors,ala rgerbeamsizecanleadt odeterioratingperfor manceaccord-ingtothe evaluationmetric.For thisreason,itiscommo ntoreranktheoutputof beamsearch,butthisre liesonbeamsearchtopr oduceagoodsetofhypot heses,whichlimitsthe poten-tialgains.Othe ralternativestobeams earchrequirechangest othetrainingofthemod el,whichrestrictsthe irapplicabilitycompa redtobeamsearch.This paperproposesincreme n-talbeammanipulatio n,i.e.rerankingthehy -pothesesinthebeamdu ringdecodinginsteado fonlyattheend.Thiswa y,hypothesesthatareu nlikelytoleadtoagood finaloutputarediscarde d,andintheirplacehyp othesesthatwouldhave beenignoredwillbecon sideredinstead.Apply ingincrementalbeamma nip-ulationleadstoan improvementof1.93and 5.82BLEUpointsoverva nillabeamsearchforth etestsetsoftheE2Eand WebNLGchal-lengesres pectively.Thepropose dmethodalsooutperfor medastrongrerankerby 1.04BLEUpointsontheE 2Echallenge,whilebei ngonparwithitontheWe bNLGdataset.1Introdu ctionInnaturallangua gegeneration(NLG),th egoalistogeneratetex trepresentingstructu redinformation(e.g.a databaserecordoramea ningrepresentation)t hatisbothfluentandcontainstheri ghtinforma-tion.Sequ ence-to-sequencemode ls(seq2seq)havebeene ffectiveonmanytasksi nNLG(forexam-ple:Wen etal.,2015;DuˇsekandJurˇc´ıˇcek,2016).Thesesyste msfirstcreateanembedding fortheinputinformati on.Thisembeddingisus edincre-mentallyduri ngdecoding,generatin gonetokenatatime.Seq 2seqmodelsaregeneral lydecodedusingbeamse arch,tomitigatetheef fectoflocallyoptimal butgloballysuboptima ldecisionsmadebygree dysearch.Theperforma nceofNLGsystemscanpl ateauorevendecreasew henbeamsizeslargerth an10areused,whichisc ounter-intuitivesinc elargerbeamsproducem orelikelysequencesac cordingtothemodel.Fo rexample,DuˇsekandJurˇc´ıˇcek(2016)usedabeamsi zeof10,andAsgharetal . IncrementalBeamManip ulationforNaturalLan guageGenerationJames HargreavesTheTradeDe sk&UniversityofCambridg ejames.hargreaves@th etradedesk.comAndrea sVlachosUniversityof Cambridgeav308@cam.a c.ukGuyEmersonUniver sityofCambridgegete2 @cam.ac.ukAbstractTh eperformanceofnatura llanguagegenera-tion systemshasimprovedsu bstantiallywithmoder nneuralnetworks.Atte sttimetheytypi-cally employbeamsearchtoav oidlocallyopti-malbu tgloballysuboptimalp redictions.How-ever, duetomodelerrors,ala rgerbeamsizecanleadt odeterioratingperfor manceaccord-ingtothe evaluationmetric.For thisreason,itiscommo ntoreranktheoutputof beamsearch,butthisre liesonbeamsearchtopr oduceagoodsetofhypot heses,whichlimitsthe poten-tialgains.Othe ralternativestobeams earchrequirechangest othetrainingofthemod el,whichrestrictsthe irapplicabilitycompa redtobeamsearch.This paperproposesincreme n-talbeammanipulatio n,i.e.rerankingthehy -pothesesinthebeamdu ringdecodinginsteado fonlyattheend.Thiswa y,hypothesesthatareu nlikelytoleadtoagood finaloutputarediscarde d,andintheirplacehyp othesesthatwouldhave beenignoredwillbecon sideredinstead.Apply ingincrementalbeamma nip-ulationleadstoan improvementof1.93and 5.82BLEUpointsoverva nillabeamsearchforth etestsetsoftheE2Eand WebNLGchal-lengesres pectively.Thepropose dmethodalsooutperfor medastrongrerankerby 1.04BLEUpointsontheE 2Echallenge,whilebei ngonparwithitontheWe bNLGdataset.1Introdu ctionInnaturallangua gegeneration(NLG),th egoalistogeneratetex trepresentingstructu redinformation(e.g.a databaserecordoramea ningrepresentation)t hatisbothfluentandcontainstheri ghtinforma-tion.Sequ ence-to-sequencemode ls(seq2seq)havebeene ffectiveonmanytasksi nNLG(forexam-ple:Wen etal.,2015;DuˇsekandJurˇc´ıˇcek,2016).Thesesyste msfirstcreateanembedding fortheinputinformati on.Thisembeddingisus edincre-mentallyduri ngdecoding,generatin gonetokenatatime.Seq 2seqmodelsaregeneral lydecodedusingbeamse arch,tomitigatetheef fectoflocallyoptimal butgloballysuboptima ldecisionsmadebygree dysearch.Theperforma nceofNLGsystemscanpl ateauorevendecreasew henbeamsizeslargerth an10areused,whichisc ounter-intuitivesinc elargerbeamsproducem orelikelysequencesac cordingtothemodel.Fo rexample,DuˇsekandJurˇc´ıˇcek(2016)usedabeamsi zeof10,andAsgharetal . 0.08
(2017)foundasizeof5t obeoptimal.Decreasin gper-formancehasbeen foundacrossarangeoft asksincluding(Cohena ndBeck,2019).Moreove r,itandwasgivenbyKoe hnandKnowles(2017)as oneofthesixmainchall engesfacingneuralma- chinetranslation.Toi nvestigatethis,Stahl bergandByrne(2019)pr esentedanexactsearch algorithmtofindthemostlikelyoutpu taccordingtoaseq2seq model.However,thispe rformedpoorlycompare dtobeamsearch,demons tratingthatsearcherr ors(frombeamsearch)c anmaskmodelerrors(fr omtheseq2seqmodel).T omitigatethelimitati onsofbeamsearch,itis commonpracticetoappl yarerankertothefinalsetofhypotheses.T hiscanbedonebydefiningarerankingcriter ion(forexample:Kumar andByrne,2004;Blainetal.,2017;BorgeaudandEmerson,2 020)orbytrainingarer ankertopredictthebes thypothesisinabeam(f orexample:DuˇsekandJurˇc´ıˇcek,2016;Agarwaletal.,2018).T rainingarerankerallo wsustotakeintoaccoun tinformationfromouts idethemodelandmitiga temodelerrors.Howeve r,rerankerscanonlych ooseahypothesisfromt hefinalbeam,whichlimitst heirpotential.Toquan tifythis,wetrainedth eseq2seqmodelpro-pos edbyDuˇsekandJurˇc´ıˇcek(2016),andapplied ittotheE2Evalidation set(Novikovaetal.,20 17b).Foreachinstance ,werecordedthepointa twhicharXiv:2102.025 74v1 [cs.CL] 4 Feb 2021 (2017)foundasizeof5t obeoptimal.Decreasin gper-formancehasbeen foundacrossarangeoft asksincluding(Cohena ndBeck,2019).Moreove r,itandwasgivenbyKoe hnandKnowles(2017)as oneofthesixmainchall engesfacingneuralma- chinetranslation.Toi nvestigatethis,Stahl bergandByrne(2019)pr esentedanexactsearch algorithmtofindthemostlikelyoutpu taccordingtoaseq2seq model.However,thispe rformedpoorlycompare dtobeamsearch,demons tratingthatsearcherr ors(frombeamsearch)c anmaskmodelerrors(fr omtheseq2seqmodel).T omitigatethelimitati onsofbeamsearch,itis commonpracticetoappl yarerankertothefinalsetofhypotheses.T hiscanbedonebydefiningarerankingcriter ion(forexample:Kumar andByrne,2004;Blainetal.,2017;BorgeaudandEmerson,2 020)orbytrainingarer ankertopredictthebes thypothesisinabeam(f orexample:DuˇsekandJurˇc´ıˇcek,2016;Agarwaletal.,2018).T rainingarerankerallo wsustotakeintoaccoun tinformationfromouts idethemodelandmitiga temodelerrors.Howeve r,rerankerscanonlych ooseahypothesisfromt hefinalbeam,whichlimitst heirpotential.Toquan tifythis,wetrainedth eseq2seqmodelpro-pos edbyDuˇsekandJurˇc´ıˇcek(2016),andapplied ittotheE2Evalidation set(Novikovaetal.,20 17b).Foreachinstance ,werecordedthepointa twhicharXiv:2102.025 74v1 [cs.CL] 4 Feb 2021 0.24
英語(論文から抽出)日本語訳スコア
Figure1:Thepercentag eofbeamswhichcontain aref-erence(orange), orwhichcouldstilllea dtoareference(blue), usingthemodelofDuˇsekandJurˇc´ıˇcek(2016)withbeamsiz e3ontheE2Evalidation set.allgold-standard referencesfelloutoft hebeam,meaningthatno neofthepartialhypoth esesinthebeamcouldbe extendedtoagoldrefer ence.Afinalbeamcontainingatl eastoneofthereferenc eswouldscoreoptimall ywithanoraclereranke r(pro-vidinganupperb oundonperformance).F igure1showstheresult sforbeamsize3.1Thefinalbeamcontainedaref erenceinonly60outof5 47cases(11%).Forther emaining89%ofthecase s,evenanoraclererank erwouldbeunabletogiv eoptimalresults.Thefigurealsoshowsthatino verhalfofthecases,al lreferencesfelloutin thefirst6steps.Incontrast ,referencesthatweres tillinthebeamatstep1 5werealmostcertainto stayinthebeamuntilth eend.Theseobservatio nssuggestthatanearly manipulationofthebea mhasastrongpotential toimproveperformance .Inthispaper,wepropo seamethodformanipula t-ingwhichitemsarepr unedfromthebeamateac hstageofdecoding.Wet henpresentevidenceth atthisisasuccessfula pproach:itledtoanimp rove-mentof1.93,and5 .82BLEUpointsovervan illabeamsearchontheE 2EandWebNLGchallenge s,respectively.Whenc omparingtoastrongrer anker,theperformance ofincrementalbeamman ipula-tionwassimilar ontheWebNLGdataset,w hilstincreasingthepe rformanceontheE2Echa llengeby1.04points.W ealsoappliedbeammani pula-tionontopofleng thnormalisation(Murr ayandChiang,2018),an dincrementalbeammani pulationwasabletoimp roveitsperformance.1 Forlargerbeamsizes,t hesamegeneraltrendsw ereob-served.SeeAppe ndix1.1forbeamsize10 .2RelatedWorkThispap erisfarfromthefirsttotrytoimprovebea msearchfornaturallan guagegeneration.Onem odificationistouseavariab lebeamsizein-steadof afixedone(FreitagandAl- Onaizan,2017).Howeve r,thiscanonlyimprove decodingspeed,asther ankingofthehypothese sinthebeamremainsunc hanged,andthusmodele rrorsareexposedbythe reductionofsearcherr ors.Lengthnormalisat ion(MurrayandChiang, 2018)iswidelyusedstr ategythatoftenimprov estheper-formanceofa beamsearchdecoder,by mitigatingthefacttha tseq2seqmodelsarebia sedtowardsgenerating shortersequences.Rat herthandirectlyusing modelprobabilitiesto orderthehypothesesin thebeam,eachprobabil ityisnormalisedaccor dingtothelengthofthe hypothesis,sothatsho rterhy-pothesesarepe nalised.However,this onlyhasanimpactoncet hehypotheseswithinth ebeamhavedifferentle ngths.Thisonlyoccurs towardstheendofthede codingprocess,andwes howedinthepre-viouss ectionthatthereferen cehypothesesoftenfal loutofthebeamrelativ elyearly.Furthermore ,StahlbergandByrne(2 019)showedthatthebia sescausingthedeterio ratingmodelperforman cearemorecomplexthan asimplelengthbias.Wi semanandRush(2016)mo difiedthetrainingprocedu reforseq2seqmodels.T heyranbeamsearchandi ntroducedalosseachti methegoldstandardseq uencefelloutofthebea m.Goyaletal.(2018)an dCollobertetal.(2019 )alsomodifiedthetrainingprocedu re.Theyaddedatermtot helossfunctionthatap proximatedthelosstha tthemodelwouldreceiv ewhengeneratingusing abeamsearchmethodfor eachexampleinthetrai ningset.However,oneo fthereasonsthatbeams earchhasbeensowidely usedisthatitcanbeapp liedontopofalanguage modelwithoutchanging thetrainingprocedure ,andthisislostwithth eseapproaches.Guetal .(2017)manipulatedth ehiddenstateofthelan guagemodelateachstep ofthedecoding.Thiswa sachievedviaamulti-o utputregressorthatpr oducedavectorthatisa ddedtothehiddenstate usedindecoding.There gressorwastrainedvia reinforcementlearnin g,andthetrainingsign alwasgatheredbyinjec tingunstructurednois etothehid-denstate.C henetal. Figure1:Thepercentag eofbeamswhichcontain aref-erence(orange), orwhichcouldstilllea dtoareference(blue), usingthemodelofDuˇsekandJurˇc´ıˇcek(2016)withbeamsiz e3ontheE2Evalidation set.allgold-standard referencesfelloutoft hebeam,meaningthatno neofthepartialhypoth esesinthebeamcouldbe extendedtoagoldrefer ence.Afinalbeamcontainingatl eastoneofthereferenc eswouldscoreoptimall ywithanoraclereranke r(pro-vidinganupperb oundonperformance).F igure1showstheresult sforbeamsize3.1Thefinalbeamcontainedaref erenceinonly60outof5 47cases(11%).Forther emaining89%ofthecase s,evenanoraclererank erwouldbeunabletogiv eoptimalresults.Thefigurealsoshowsthatino verhalfofthecases,al lreferencesfelloutin thefirst6steps.Incontrast ,referencesthatweres tillinthebeamatstep1 5werealmostcertainto stayinthebeamuntilth eend.Theseobservatio nssuggestthatanearly manipulationofthebea mhasastrongpotential toimproveperformance .Inthispaper,wepropo seamethodformanipula t-ingwhichitemsarepr unedfromthebeamateac hstageofdecoding.Wet henpresentevidenceth atthisisasuccessfula pproach:itledtoanimp rove-mentof1.93,and5 .82BLEUpointsovervan illabeamsearchontheE 2EandWebNLGchallenge s,respectively.Whenc omparingtoastrongrer anker,theperformance ofincrementalbeamman ipula-tionwassimilar ontheWebNLGdataset,w hilstincreasingthepe rformanceontheE2Echa llengeby1.04points.W ealsoappliedbeammani pula-tionontopofleng thnormalisation(Murr ayandChiang,2018),an dincrementalbeammani pulationwasabletoimp roveitsperformance.1 Forlargerbeamsizes,t hesamegeneraltrendsw ereob-served.SeeAppe ndix1.1forbeamsize10 .2RelatedWorkThispap erisfarfromthefirsttotrytoimprovebea msearchfornaturallan guagegeneration.Onem odificationistouseavariab lebeamsizein-steadof afixedone(FreitagandAl- Onaizan,2017).Howeve r,thiscanonlyimprove decodingspeed,asther ankingofthehypothese sinthebeamremainsunc hanged,andthusmodele rrorsareexposedbythe reductionofsearcherr ors.Lengthnormalisat ion(MurrayandChiang, 2018)iswidelyusedstr ategythatoftenimprov estheper-formanceofa beamsearchdecoder,by mitigatingthefacttha tseq2seqmodelsarebia sedtowardsgenerating shortersequences.Rat herthandirectlyusing modelprobabilitiesto orderthehypothesesin thebeam,eachprobabil ityisnormalisedaccor dingtothelengthofthe hypothesis,sothatsho rterhy-pothesesarepe nalised.However,this onlyhasanimpactoncet hehypotheseswithinth ebeamhavedifferentle ngths.Thisonlyoccurs towardstheendofthede codingprocess,andwes howedinthepre-viouss ectionthatthereferen cehypothesesoftenfal loutofthebeamrelativ elyearly.Furthermore ,StahlbergandByrne(2 019)showedthatthebia sescausingthedeterio ratingmodelperforman cearemorecomplexthan asimplelengthbias.Wi semanandRush(2016)mo difiedthetrainingprocedu reforseq2seqmodels.T heyranbeamsearchandi ntroducedalosseachti methegoldstandardseq uencefelloutofthebea m.Goyaletal.(2018)an dCollobertetal.(2019 )alsomodifiedthetrainingprocedu re.Theyaddedatermtot helossfunctionthatap proximatedthelosstha tthemodelwouldreceiv ewhengeneratingusing abeamsearchmethodfor eachexampleinthetrai ningset.However,oneo fthereasonsthatbeams earchhasbeensowidely usedisthatitcanbeapp liedontopofalanguage modelwithoutchanging thetrainingprocedure ,andthisislostwithth eseapproaches.Guetal .(2017)manipulatedth ehiddenstateofthelan guagemodelateachstep ofthedecoding.Thiswa sachievedviaamulti-o utputregressorthatpr oducedavectorthatisa ddedtothehiddenstate usedindecoding.There gressorwastrainedvia reinforcementlearnin g,andthetrainingsign alwasgatheredbyinjec tingunstructurednois etothehid-denstate.C henetal. 0.08
(2018)alsomanipulate thehiddenstate.Forea chtraininginstance,t heyap-plybeamsearcha ndtakethehypothesisw iththe (2018)またmanipulatethehiddens tate.Foreachtraining instance,theyap-plyb eamsearchandtaketheh ypothesiswiththe 0.30
英語(論文から抽出)日本語訳スコア
highestBLEUscore.The manipulatornetworkis trainedtoencourageag reedydecodertoproduc ethisoutput.Bothofth eseapproachesrelyoni nfer-ringabetterhidd enstatetobeusedinthe decoding,whichisnots traightforwardtodefine.Weinsteadmanipula tethehypothesesinthe beamdirectly.Finally ,Negrinhoetal. Themanipulatornetwor kistrained toencourageagreedyde codertoproducethisou tput.Bothoftheseappr oachesrelyoninfer-ri ngabetterhiddenstate tobeusedinthedecodin g,whichisnotstraight forwardtodefine.Wein steadmanipulatethehy pothesesinthebeamdir ectly,Negrinhoetal. 0.08
(2018)presentedafram e-workforlearningabe amsearchframeworkvia imitationlearning.Th isresultedinabeamawa realgorithmwhichwasp rovedtohavenoregretg uar-antees.Whilethis papermakesacompellin gar-gumentforthismet hodintheory,puttingi tintopracticerequire sanumberoffurthereng ineeringdecisions.Ou rworkcanbeseenasaway ofap-plyingthisgener alframeworkusingasim pleandcomputationall yefficientroll-outstrateg y.3IncrementalBeamMa nipulationInordertod escribeourmethodfori ncrementalbeamsearch ,wefirstintroduceterminol ogytode-scribeastand ardbeamsearchdecoder .Thede-coderproduces asequenceiteratively ,tokenbytoken.Ateach iterationitperforms3 actions:ex-pand,rank andprune.Theexpandst epgeneratesallpossib lenextstephypotheses .Therankstepordersth esehypothesesfrommos tlikelytoleastlikely .Thepruningstepthenr emovesthehypothe-ses thatareneartheendoft hisorder.Thisformula tionofthebeamsearcha lgorithmen-ablesusto viewbeammanipulation asarankingproblemsin ceexpandisdetermined bythe(fixed)decoderandthesiz eofthebeamchosendete r-minesthepruning.Th erankstepdeterminesw hichhypotheseswillno tbekeptinthenextiter ationandhencediscard ed.Bymodifyingtheran kingmethodused,wecan choosethepartialhypo thesesexpandedduring beamsearch,takingint oaccountthecurrentst ateofthebeamaswellas signalsbeyondmodelsc ores.Itisworthnoting thatwhilethispaperap pliesbeammanipulatio nontopofaseq2seqmode l,thetech-niquesused couldbeappliedwithou tchangetoanyconditio nalorunconditionalne urallanguagemodeltha tcanbedecodedusingbe amsearch.3.1Rankingv iaRoll-outPartialhyp othesesaremoredifficulttorankthancomple tehypothesessincethe restofthegener-atedt extisunknown.Forexam ple,considerthefollo wingpartialhypothese s:LochFyneisarestaur antlocated...Thereis afamilyfriendly...Bo thoftheseconveysomei nformationaboutafami ly-friendlyrestauran tnamed‘LochFyne’.Itishardtoknowwhich partialsequencewilll eadtoabettercomplete sentence,whichiswhat wewouldlikearankerto tellus.Existingreran kersoftenrelyondetec tingmissinginformati on,butsomeinformatio nmaystillbetocomefor partialhypotheses.Wh atweneedisawaytorank partialhypothesesbas edonhowtheseq2seqmod elislikelytocom-plet ethem.Weproposeranki ngpartialhypothesesb asedonagreedyroll-ou t.Thisisacomputation -allyefficientapproximationof howtheseq2seqmodelmi ghtcompletethepartia lhypothesis.Intheexi stingliterature,roll -outsaregenerallyuse dattrainingtime(Chan getal.,2015),forthes ituationwherethemode l’ssubsequentdecisions influencethelossfunction foranindividualdeci- sion.Theroll-outsare usedtoproduceanappro xi-mationtothefinalsequencethatwould bereachediftheorigin alactionwastaken.Thi senablesavalueforthe lossoftheoriginaldec isiontobepredicted.O ntheotherhand,increm entalbeammanipulatio naimstopredictwhichp artialhypotheseswill leadtogoodcompleteds equences.Similartotr aditionalroll-outs,t hisisimpactedbythege neratingmodel’ssubsequentdecisions .Inthiscase,thediffe renceisthatroll-outs areusedtoprovidefeat uresinad-ditiontoobt ainingtrainingsignal .Itisalsoworthnoting thatforincrementalbe ammanipulationweuser oll-outsattesttimeas wellasattrainingtime .Beammanipulationcan beappliedafteranyste pinthebeamsearchdeco ding.Figure2illustra tesasinglemanipulati on.Theroll-outsareus edtoproduceapproxima tionstothecompletedh ypothe-sesthatwouldb eproducedifthepartia lhypothesisremainedi nthebeam.Thesecomple tedsequencesarethenr ankedtodefineanorderofthepartia lsequences.Sincethis mayresultindifferent hy-pothesesremaining inthebeam,theareaoft hesearchspaceconside redduringthedecoding hasbeenmanipulated. (2018)presentedafram e-workforlearningabe amsearchframeworkvia imitationlearning.Th isresultedinabeamawa realgorithmwhichwasp rovedtohavenoregretg uar-antees.Whilethis papermakesacompellin gar-gumentforthismet hodintheory,puttingi tintopracticerequire sanumberoffurthereng ineeringdecisions.Ou rworkcanbeseenasaway ofap-plyingthisgener alframeworkusingasim pleandcomputationall yefficientroll-outstrateg y.3IncrementalBeamMa nipulationInordertod escribeourmethodfori ncrementalbeamsearch ,wefirstintroduceterminol ogytode-scribeastand ardbeamsearchdecoder .Thede-coderproduces asequenceiteratively ,tokenbytoken.Ateach iterationitperforms3 actions:ex-pand,rank andprune.Theexpandst epgeneratesallpossib lenextstephypotheses .Therankstepordersth esehypothesesfrommos tlikelytoleastlikely .Thepruningstepthenr emovesthehypothe-ses thatareneartheendoft hisorder.Thisformula tionofthebeamsearcha lgorithmen-ablesusto viewbeammanipulation asarankingproblemsin ceexpandisdetermined bythe(fixed)decoderandthesiz eofthebeamchosendete r-minesthepruning.Th erankstepdeterminesw hichhypotheseswillno tbekeptinthenextiter ationandhencediscard ed.Bymodifyingtheran kingmethodused,wecan choosethepartialhypo thesesexpandedduring beamsearch,takingint oaccountthecurrentst ateofthebeamaswellas signalsbeyondmodelsc ores.Itisworthnoting thatwhilethispaperap pliesbeammanipulatio nontopofaseq2seqmode l,thetech-niquesused couldbeappliedwithou tchangetoanyconditio nalorunconditionalne urallanguagemodeltha tcanbedecodedusingbe amsearch.3.1Rankingv iaRoll-outPartialhyp othesesaremoredifficulttorankthancomple tehypothesessincethe restofthegener-atedt extisunknown.Forexam ple,considerthefollo wingpartialhypothese s:LochFyneisarestaur antlocated...Thereis afamilyfriendly...Bo thoftheseconveysomei nformationaboutafami ly-friendlyrestauran tnamed‘LochFyne’.Itishardtoknowwhich partialsequencewilll eadtoabettercomplete sentence,whichiswhat wewouldlikearankerto tellus.Existingreran kersoftenrelyondetec tingmissinginformati on,butsomeinformatio nmaystillbetocomefor partialhypotheses.Wh atweneedisawaytorank partialhypothesesbas edonhowtheseq2seqmod elislikelytocom-plet ethem.Weproposeranki ngpartialhypothesesb asedonagreedyroll-ou t.Thisisacomputation -allyefficientapproximationof howtheseq2seqmodelmi ghtcompletethepartia lhypothesis.Intheexi stingliterature,roll -outsaregenerallyuse dattrainingtime(Chan getal.,2015),forthes ituationwherethemode l’ssubsequentdecisions influencethelossfunction foranindividualdeci- sion.Theroll-outsare usedtoproduceanappro xi-mationtothefinalsequencethatwould bereachediftheorigin alactionwastaken.Thi senablesavalueforthe lossoftheoriginaldec isiontobepredicted.O ntheotherhand,increm entalbeammanipulatio naimstopredictwhichp artialhypotheseswill leadtogoodcompleteds equences.Similartotr aditionalroll-outs,t hisisimpactedbythege neratingmodel’ssubsequentdecisions .Inthiscase,thediffe renceisthatroll-outs areusedtoprovidefeat uresinad-ditiontoobt ainingtrainingsignal .Itisalsoworthnoting thatforincrementalbe ammanipulationweuser oll-outsattesttimeas wellasattrainingtime .Beammanipulationcan beappliedafteranyste pinthebeamsearchdeco ding.Figure2illustra tesasinglemanipulati on.Theroll-outsareus edtoproduceapproxima tionstothecompletedh ypothe-sesthatwouldb eproducedifthepartia lhypothesisremainedi nthebeam.Thesecomple tedsequencesarethenr ankedtodefineanorderofthepartia lsequences.Sincethis mayresultindifferent hy-pothesesremaining inthebeam,theareaoft hesearchspaceconside redduringthedecoding hasbeenmanipulated. 0.05
英語(論文から抽出)日本語訳スコア
CurrentbeamNexttoken andgreedyroll-outRan kLochFyneisafamilyfr iendlyrestaurantinth ecitycentre<E>1restaurantlocatedin thecentreofthecity&l t;E>3Thereisarestauranti nthecitycentrecalled LochFyne<E>4childfriendlyrestau rantcalledLochFyne&l t;E>2Figure2:Incremental beammanipulation,for abeamsizeof2,whengen eratingthe4thtoken.E achelementofthebeami sexpanded,andeachpar tialhypothesisisgree dilyrolledouttoacomp letehypothesis.Theco mpletehypothesesarer ankedandthenpruned.T hepartialhypothesesa rethenusedinthenexts tepofbeamsearch.3.2R erankerArchitectureI ncrementalbeammanipu lationrequiresametho dtorankcompletedhypo theses.Therearemanye x-istingrerankersdes ignedforthistask,suc hastheTGENreranker(D uˇsekandJurˇc´ıˇcek,2016).How-ever,t hesererankersareunli kelytobeeffectivewhe nusedinincrementalbe ammanipulation.Inthe latter,thepartialhyp othesesneedtoberanke daccordingtotheirpot entialtoproducegoodc om-pletedhypotheses. Thisisarelatedbutdif ferenttasktothatofat raditionalrerankerth ataimstoidentifytheh ypothesesthatwillsco rebestagainstsomemet ricsuchasBLEUscore.R erankersofcompletedh ypothesestypicallyre lyoninputinfor-matio nmissingfromtheoutpu tasthesignal;how-ever,thisisnotne cessarilyusefulwhenr erankingpartialhypot heses.Forexample,iti smoreusefultoidentif ypartialhypotheseswh ichareindicativeofmo delfailureatanearlys tageofdecoding.Toran kthepartialhypothese sviaroll-outsasin-tr oducedinthepreviouss ection,weexploredtwo commonlyusedtechniqu esinthefieldofinfor-mationret rieval:pointwiserank ingandpairwiserankin g(Liu,2009).Pointwis eapproachespredictan umericalvalueforeach item,andtheitemsareo rderedbysortingthema ccordingtothisvalue. Pairwiseapproaches,g ivenapairofhypothese s,outputwhichofthemw ouldrankmorehighly,a ndtechniquessuchasth eCopelandmethod(Saar iandMerlin,1996)aret henusedtoproduceato- talorderingfromthese pairwisecomparisons. Inpreliminaryexperim ents,thepointwiseapp roachoutperformedthe pairwiseapproach.The seresultsaresummaris edinAppendix1.2.Fort heremain-derofthispa per,wewillfocusonthe pointwiseapproach.Th einputstothereranker were:•Themeaningrepresenta tion(i.e.thestruc-tu redinformationinputf orNLG).Thiswaspassed asasequence.•Thegeneratedtextprod ucedbytheroll-outoft hepartialhypothesis. Thiswaspassedasasequ encesurroundedbythes tarttoken<S>andendtoken<E>.•Therank,accordingtot heseq2seqmodelprobab ility,ofthecompleted sequenceinthebeam.Si ncethiswasacategoric alvariable,itwaspass edasaone-hotvector.T hearchitectureofther erankerissummarisedi nFigure3.Thismodelis usedtoassignavalueto eachhypothesisindivi dually.Themeaningrep re-sentationandthero lled-outtextareeachp assedthroughanLSTM,2 andthenallinputsarec on-catenatedandpasse dthroughtwofullyconn ectedlayers.Atinfere ncetimethererankeris usedtoidentifypoorly performinghypotheses sothatthesecanbeprun ed.Thisenabledthetas kofthererankertobesi mplifiedtodistinguishingbe tweenthosehypothe-se snearthebottomoftheb eamfromthehypothe-se sintherestofthebeam. Therefore,weusedther eranker’sscorestosplitthehyp othesesintotwogroups :thosewithscoresinth ebottomquartile2Fore fficientbatching,itisco mmonpracticetoaddpad dingtokenstomakesequ encesthesamelength.W eprependedpaddingtok ensratherthanappendi ngthem,asthisledtobe tterperformanceinpre liminaryexperiments. Thisispresumablybeca useprependedtokensha veasmallerimpactonth efinalhiddenstate. CurrentbeamNexttoken andgreedyroll-outRan kLochFyneisafamilyfr iendlyrestaurantinth ecitycentre<E>1restaurantlocatedin thecentreofthecity&l t;E>3Thereisarestauranti nthecitycentrecalled LochFyne<E>4childfriendlyrestau rantcalledLochFyne&l t;E>2Figure2:Incremental beammanipulation,for abeamsizeof2,whengen eratingthe4thtoken.E achelementofthebeami sexpanded,andeachpar tialhypothesisisgree dilyrolledouttoacomp letehypothesis.Theco mpletehypothesesarer ankedandthenpruned.T hepartialhypothesesa rethenusedinthenexts tepofbeamsearch.3.2R erankerArchitectureI ncrementalbeammanipu lationrequiresametho dtorankcompletedhypo theses.Therearemanye x-istingrerankersdes ignedforthistask,suc hastheTGENreranker(D uˇsekandJurˇc´ıˇcek,2016).How-ever,t hesererankersareunli kelytobeeffectivewhe nusedinincrementalbe ammanipulation.Inthe latter,thepartialhyp othesesneedtoberanke daccordingtotheirpot entialtoproducegoodc om-pletedhypotheses. Thisisarelatedbutdif ferenttasktothatofat raditionalrerankerth ataimstoidentifytheh ypothesesthatwillsco rebestagainstsomemet ricsuchasBLEUscore.R erankersofcompletedh ypothesestypicallyre lyoninputinfor-matio nmissingfromtheoutpu tasthesignal;how-ever,thisisnotne cessarilyusefulwhenr erankingpartialhypot heses.Forexample,iti smoreusefultoidentif ypartialhypotheseswh ichareindicativeofmo delfailureatanearlys tageofdecoding.Toran kthepartialhypothese sviaroll-outsasin-tr oducedinthepreviouss ection,weexploredtwo commonlyusedtechniqu esinthefieldofinfor-mationret rieval:pointwiserank ingandpairwiserankin g(Liu,2009).Pointwis eapproachespredictan umericalvalueforeach item,andtheitemsareo rderedbysortingthema ccordingtothisvalue. Pairwiseapproaches,g ivenapairofhypothese s,outputwhichofthemw ouldrankmorehighly,a ndtechniquessuchasth eCopelandmethod(Saar iandMerlin,1996)aret henusedtoproduceato- talorderingfromthese pairwisecomparisons. Inpreliminaryexperim ents,thepointwiseapp roachoutperformedthe pairwiseapproach.The seresultsaresummaris edinAppendix1.2.Fort heremain-derofthispa per,wewillfocusonthe pointwiseapproach.Th einputstothereranker were:•Themeaningrepresenta tion(i.e.thestruc-tu redinformationinputf orNLG).Thiswaspassed asasequence.•Thegeneratedtextprod ucedbytheroll-outoft hepartialhypothesis. Thiswaspassedasasequ encesurroundedbythes tarttoken<S>andendtoken<E>.•Therank,accordingtot heseq2seqmodelprobab ility,ofthecompleted sequenceinthebeam.Si ncethiswasacategoric alvariable,itwaspass edasaone-hotvector.T hearchitectureofther erankerissummarisedi nFigure3.Thismodelis usedtoassignavalueto eachhypothesisindivi dually.Themeaningrep re-sentationandthero lled-outtextareeachp assedthroughanLSTM,2 andthenallinputsarec on-catenatedandpasse dthroughtwofullyconn ectedlayers.Atinfere ncetimethererankeris usedtoidentifypoorly performinghypotheses sothatthesecanbeprun ed.Thisenabledthetas kofthererankertobesi mplifiedtodistinguishingbe tweenthosehypothe-se snearthebottomoftheb eamfromthehypothe-se sintherestofthebeam. Therefore,weusedther eranker’sscorestosplitthehyp othesesintotwogroups :thosewithscoresinth ebottomquartile2Fore fficientbatching,itisco mmonpracticetoaddpad dingtokenstomakesequ encesthesamelength.W eprependedpaddingtok ensratherthanappendi ngthem,asthisledtobe tterperformanceinpre liminaryexperiments. Thisispresumablybeca useprependedtokensha veasmallerimpactonth efinalhiddenstate. 0.08
英語(論文から抽出)日本語訳スコア
Figure3:Architecture ofthereranker.Nisthe numberofhiddenunitsf ortheLSTMsandthenumb erofout-putnodesfort hefullyconnectedlaye rs.Forthefullyconnec tedlayers,theactivat ionfunctionisshown.a ndthosewithscoresint hetopthreequartiles. Thehypotheseswithine achgroupwereorderedb ytheseq2seqmodel’sprobability.Inpreli minaryexperiments,we alsotriedusingtherer ankertoprovideatotal ordering.However,usi ngittoprovideacoarse partialordering(asde scribedabove)gavemor epromisingresults.3. 3TrainingtheReranker Wetrainedthereranker oncompletedhypothese sthatwererankedfromb esttoworst.Thesequen ceswereproducedbygen eratingtextsequences usingtheseq2seqmodel .Abeamsearchdecoding ,withalargebeamsize, wasappliedtoeachinst anceinthetrainingset .3Thesetofhypotheses presentinthefinalbeamofthesearchwa srankedfrombesttowor standrecorded.Thenot ionofthebesthypothes iswassimplifiedtotheonethatreceiv edthehighestBLEUscor e(Papinenietal.,2002 )againstthemanuallyw rit-tenreferences.BL EUwaschosenduetoitsw ideadoptionasanautom aticevaluationmeasur e,butanyautomaticmet riccouldhavebeenused initsplace.Asdiscuss edattheendoftheprevi oussection,weneedthe rerankertodistinguis hbetweenhypothe-sest hatshouldbeprunedfro mthosethatshouldbeke pt.Furthermore,forth epurposeofrerank-ing ,onlyrelativediffere ncesinBLEUscorematte r,notabsolutevalues. Therefore,whengenera tingtrainingdata,hyp othesesinthebottomse ctionofthebeam(accor dingtoBLEU)wereassig nedthe3Inpreliminary experiments,weheldba ckaportionofthetrain ingsetfromtheseq2seq model’strainingphase,sotha tthebeammanipulation rankerwastrainedonou tputsthattheseq2seqm odelhadnotseen.Howev er,thesystemwasunabl etorecoverfromthelow erperformanceofthese q2seqmodel.targetval ue-1,andtherestwerea ssignedthetargetvalu e1.Similarly,itisonl ythedifferencesinthe reranker’sscoresthatmatter,an dnottheabsolutevalue s.Therefore,afterapp lyingthererankertoea chhy-pothesisinabeam ,wenormalisethescore stohaveameanof0.Usin gthenormalisedBLEUsc ores(-1or1)andnormal isedrerankerscores(w ithameanof0),weusere lativemeanabsoluteer ror(RMAE)asthetraini ngobjective,asshowni nEquation(1),wherebi sthesetofhypothesesi nthebeam,xisahy-poth esis,ˆxrankeristhenormalis edscorepredictedbyth ereranker,andxBLEUis thenormalisedtargetd erivedfromtheBLEUsco reorderingofthebeam. RMAE(b)=Xx∈b|ˆxranker−ˆxBLEU|(1)Severalotherrelat ivelossfunctionshave beenshowntobesuccess fulinothersituations (Zhangetal.,2019).In preliminaryexperimen ts,weeval-uatedanumb eroftheseincludinglo gcosherrorandmeansqu areerror,buttheydidn otoutperformRMAE.3.4 ChoosingwhentoManipu lateIntheory,itwould bedesirabletomanipul atethebeamateveryste pofhypothesisgenerat ion,butinpractice,th edifficultyofrankingpartia lhypothesescouldlimi titsbenefits.Whilemanipulating thebeamcanavoidcerta inmodelerrors,itmigh talsointroduceothere rrors,eitherfromtheg reedyroll-outstrateg yorthereranker.Reran kingateverystepmayco mpoundsucherrors.Emp irically,wefounditwa smoreeffectivetoappl ybeammanipulationtos omeratherthanallstep s.Choosingwhentomani pulateisthusanimport antdecision.Itisadvi sabletoavoidmanipula tingthebeamtooearly: notonlyitishardertor ankhy-potheseswithve ryfewtokens,butitisa lsolesslikelytobeben eficial.AsshowninFigure 1,inthefirstfewstepsevenarela tivelysmallbeamsizec ankeephypothesesthat couldleadtotherefere nceoutputs.Ontheothe rhand,itisalsoadvisa blenottomanipulateto olate:oncehypotheses havefallenoutofthebe am,theycannotbeputba ckin.Astheoptimalcho iceofwhentomanipulat ethebeamisdependento nthedatasetandthemod el, Figure3:Architecture ofthereranker.Nisthe numberofhiddenunitsf ortheLSTMsandthenumb erofout-putnodesfort hefullyconnectedlaye rs.Forthefullyconnec tedlayers,theactivat ionfunctionisshown.a ndthosewithscoresint hetopthreequartiles. Thehypotheseswithine achgroupwereorderedb ytheseq2seqmodel’sprobability.Inpreli minaryexperiments,we alsotriedusingtherer ankertoprovideatotal ordering.However,usi ngittoprovideacoarse partialordering(asde scribedabove)gavemor epromisingresults.3. 3TrainingtheReranker Wetrainedthereranker oncompletedhypothese sthatwererankedfromb esttoworst.Thesequen ceswereproducedbygen eratingtextsequences usingtheseq2seqmodel .Abeamsearchdecoding ,withalargebeamsize, wasappliedtoeachinst anceinthetrainingset .3Thesetofhypotheses presentinthefinalbeamofthesearchwa srankedfrombesttowor standrecorded.Thenot ionofthebesthypothes iswassimplifiedtotheonethatreceiv edthehighestBLEUscor e(Papinenietal.,2002 )againstthemanuallyw rit-tenreferences.BL EUwaschosenduetoitsw ideadoptionasanautom aticevaluationmeasur e,butanyautomaticmet riccouldhavebeenused initsplace.Asdiscuss edattheendoftheprevi oussection,weneedthe rerankertodistinguis hbetweenhypothe-sest hatshouldbeprunedfro mthosethatshouldbeke pt.Furthermore,forth epurposeofrerank-ing ,onlyrelativediffere ncesinBLEUscorematte r,notabsolutevalues. Therefore,whengenera tingtrainingdata,hyp othesesinthebottomse ctionofthebeam(accor dingtoBLEU)wereassig nedthe3Inpreliminary experiments,weheldba ckaportionofthetrain ingsetfromtheseq2seq model’strainingphase,sotha tthebeammanipulation rankerwastrainedonou tputsthattheseq2seqm odelhadnotseen.Howev er,thesystemwasunabl etorecoverfromthelow erperformanceofthese q2seqmodel.targetval ue-1,andtherestwerea ssignedthetargetvalu e1.Similarly,itisonl ythedifferencesinthe reranker’sscoresthatmatter,an dnottheabsolutevalue s.Therefore,afterapp lyingthererankertoea chhy-pothesisinabeam ,wenormalisethescore stohaveameanof0.Usin gthenormalisedBLEUsc ores(-1or1)andnormal isedrerankerscores(w ithameanof0),weusere lativemeanabsoluteer ror(RMAE)asthetraini ngobjective,asshowni nEquation(1),wherebi sthesetofhypothesesi nthebeam,xisahy-poth esis,ˆxrankeristhenormalis edscorepredictedbyth ereranker,andxBLEUis thenormalisedtargetd erivedfromtheBLEUsco reorderingofthebeam. RMAE(b)=Xx∈b|ˆxranker−ˆxBLEU|(1)Severalotherrelat ivelossfunctionshave beenshowntobesuccess fulinothersituations (Zhangetal.,2019).In preliminaryexperimen ts,weeval-uatedanumb eroftheseincludinglo gcosherrorandmeansqu areerror,buttheydidn otoutperformRMAE.3.4 ChoosingwhentoManipu lateIntheory,itwould bedesirabletomanipul atethebeamateveryste pofhypothesisgenerat ion,butinpractice,th edifficultyofrankingpartia lhypothesescouldlimi titsbenefits.Whilemanipulating thebeamcanavoidcerta inmodelerrors,itmigh talsointroduceothere rrors,eitherfromtheg reedyroll-outstrateg yorthereranker.Reran kingateverystepmayco mpoundsucherrors.Emp irically,wefounditwa smoreeffectivetoappl ybeammanipulationtos omeratherthanallstep s.Choosingwhentomani pulateisthusanimport antdecision.Itisadvi sabletoavoidmanipula tingthebeamtooearly: notonlyitishardertor ankhy-potheseswithve ryfewtokens,butitisa lsolesslikelytobeben eficial.AsshowninFigure 1,inthefirstfewstepsevenarela tivelysmallbeamsizec ankeephypothesesthat couldleadtotherefere nceoutputs.Ontheothe rhand,itisalsoadvisa blenottomanipulateto olate:oncehypotheses havefallenoutofthebe am,theycannotbeputba ckin.Astheoptimalcho iceofwhentomanipulat ethebeamisdependento nthedatasetandthemod el, 0.11
英語(論文から抽出)日本語訳スコア
wetreatthisasahyperp arametertobetunedont hevalidationset.4Exp erimentsInthissectio nwewillpresentresult sontheE2E(Novikovaet al.,2017b)andWebNLGc hallenges(Gardenteta l.,2017).Weevaluatet hesystemsus-ingtheBL EUimplementationused intheoriginalE2Echal lenge.4Foralltheexpe rimentsreportedweuse theseq2seqarchitectu refromDuˇsekandJurˇc´ıˇcek(2016)fortheunder lyingmodelthatwearet ryingtomanipulate.5I tiswell-knownthatBLE Uisnotacompletelyrel iablemethodforpredic tinghumanperceptions ofthequalityofindivi dualNLGoutputs(forex am-ple:Callison-Burc hetal.,2006;Novikovaetal.,2017a) .However,inthiscase, wearecomparingoutput sfromvariantsofthesa mesystem,andthusBLEU ismorelikelytoprovid ereasonableesti-mate softheirfelicitytoth ereferences,asargued bybothCallison-Burch etal.andNovikovaetal ..Tosupporttheideabe hindourapproach,i.e. manip-ulatingthebeam duringdecodinginstea dofonlyattheend,weco mpareagainstexisting rerankersappliedtoth ebeamattheendofdecod ing.TheseincludetheT GENreranker,proposed byDuˇsekandJurˇc´ıˇcek(2016),thathasach ievedstate-of-the-ar tBLEUscoresonNLGtask s,aswellasthesamerer ankerarchitecturedefinedinSection3.2.Both architecturesaretrai nedtoperformrerankin gofthefinalbeamonly.Whencomp aringbetweenthesetwo methodsofrerankingth efinalbeam,nosignificantdifferenceinperf ormancewasfound.Inth issection,wereportre sultsforthearchitect uredefinedinSection3.2.Forc ompleteness,resultsf orbothrerankersarein cludedinAppendix1.3. Byusingthesamearchit ecturebothforthefinal-beamrerankerandf orbeammanipulation,w ecanbemoreconfidentthatanydifferenc einresultsisduetothe beammanipulationstra tegy(rerankingduring decoding,notjustatth eend).Inourexperimen ts,wealsoconsiderlen gthnor-malisation(Mu rrayandChiang,2018), whichisawell-knownte chniquethatoftenincr easesthe4https://git hub.com/tuetschek/e2 e-metrics5Thesourcec odeforthispaperisath ttps://github.com/ja mesHargreaves12/incr emental_beam_manipul ationFigure4:Results onthetestsetoftheE2E challenge.LN=LengthNormalisation. BLEUscoreoftheresult ingsequencessinceita d-dressesthebiasofla nguagemodelstofavour shortsequences,asdis cussedinSection2.Alt houghthevaluesassign edtoasequencewhenusi nglengthnormalisatio narenolongerprobabil ities,itstillperform stheexpand,rankandpr unestepsateachiterat ionofthedecoding.Hen ce,wecanapplybeamman ipulationintandemwit hlengthnormal-isatio n.Finally,wealsocons iderednucleussam-pli ng(Holtzmanetal.,202 0)asabaseline.How-ev er,itwasfoundtodecre aseperformanceevenwh encomparedtovanillab eamsearch.Inwhatfoll ows,wewillnotonlycom mentontestresultswhe nthebeamsizeistunedo nthevalidationsets,b utwewillalsocommento ntestresultsacrossal lbeamsizes.Thereason fordoingthisisthatco nsideringallbeamsize sassesseswhetherthet echniqueisrobusttoch angesinbeamsize.Inou ropinion,thismakesfo ramoreconvincingresu ltthenjustindicating adifferenceinperfor- manceatasinglebeamsi ze.Thisisespeciallyp ertinentsinceawell-d ocumentedissueofbeam searchisthatlargerbe amsizescanleadtodete rio-ratingperformanc e.TheresultstableinA ppendix1.3indicatest hevalidationset’soptimalbeamsizefore achofthesystems.4.1E 2EBLEUResultsFigure4 indicatestheresultso ntheE2Etestset.Thefirstthingtonoteisthat increasingthebeamsiz edidnotleadtoanycons iderablegaininper-fo rmanceforthevanillab eamsearchstrategy.Fo rallbeamsizesexcept1 0,theperformanceiswo rse wetreatthisasahyperp arametertobetunedont hevalidationset.4Exp erimentsInthissectio nwewillpresentresult sontheE2E(Novikovaet al.,2017b)andWebNLGc hallenges(Gardenteta l.,2017).Weevaluatet hesystemsus-ingtheBL EUimplementationused intheoriginalE2Echal lenge.4Foralltheexpe rimentsreportedweuse theseq2seqarchitectu refromDuˇsekandJurˇc´ıˇcek(2016)fortheunder lyingmodelthatwearet ryingtomanipulate.5I tiswell-knownthatBLE Uisnotacompletelyrel iablemethodforpredic tinghumanperceptions ofthequalityofindivi dualNLGoutputs(forex am-ple:Callison-Burc hetal.,2006;Novikovaetal.,2017a) .However,inthiscase, wearecomparingoutput sfromvariantsofthesa mesystem,andthusBLEU ismorelikelytoprovid ereasonableesti-mate softheirfelicitytoth ereferences,asargued bybothCallison-Burch etal.andNovikovaetal ..Tosupporttheideabe hindourapproach,i.e. manip-ulatingthebeam duringdecodinginstea dofonlyattheend,weco mpareagainstexisting rerankersappliedtoth ebeamattheendofdecod ing.TheseincludetheT GENreranker,proposed byDuˇsekandJurˇc´ıˇcek(2016),thathasach ievedstate-of-the-ar tBLEUscoresonNLGtask s,aswellasthesamerer ankerarchitecturedefinedinSection3.2.Both architecturesaretrai nedtoperformrerankin gofthefinalbeamonly.Whencomp aringbetweenthesetwo methodsofrerankingth efinalbeam,nosignificantdifferenceinperf ormancewasfound.Inth issection,wereportre sultsforthearchitect uredefinedinSection3.2.Forc ompleteness,resultsf orbothrerankersarein cludedinAppendix1.3. Byusingthesamearchit ecturebothforthefinal-beamrerankerandf orbeammanipulation,w ecanbemoreconfidentthatanydifferenc einresultsisduetothe beammanipulationstra tegy(rerankingduring decoding,notjustatth eend).Inourexperimen ts,wealsoconsiderlen gthnor-malisation(Mu rrayandChiang,2018), whichisawell-knownte chniquethatoftenincr easesthe4https://git hub.com/tuetschek/e2 e-metrics5Thesourcec odeforthispaperisath ttps://github.com/ja mesHargreaves12/incr emental_beam_manipul ationFigure4:Results onthetestsetoftheE2E challenge.LN=LengthNormalisation. BLEUscoreoftheresult ingsequencessinceita d-dressesthebiasofla nguagemodelstofavour shortsequences,asdis cussedinSection2.Alt houghthevaluesassign edtoasequencewhenusi nglengthnormalisatio narenolongerprobabil ities,itstillperform stheexpand,rankandpr unestepsateachiterat ionofthedecoding.Hen ce,wecanapplybeamman ipulationintandemwit hlengthnormal-isatio n.Finally,wealsocons iderednucleussam-pli ng(Holtzmanetal.,202 0)asabaseline.How-ev er,itwasfoundtodecre aseperformanceevenwh encomparedtovanillab eamsearch.Inwhatfoll ows,wewillnotonlycom mentontestresultswhe nthebeamsizeistunedo nthevalidationsets,b utwewillalsocommento ntestresultsacrossal lbeamsizes.Thereason fordoingthisisthatco nsideringallbeamsize sassesseswhetherthet echniqueisrobusttoch angesinbeamsize.Inou ropinion,thismakesfo ramoreconvincingresu ltthenjustindicating adifferenceinperfor- manceatasinglebeamsi ze.Thisisespeciallyp ertinentsinceawell-d ocumentedissueofbeam searchisthatlargerbe amsizescanleadtodete rio-ratingperformanc e.TheresultstableinA ppendix1.3indicatest hevalidationset’soptimalbeamsizefore achofthesystems.4.1E 2EBLEUResultsFigure4 indicatestheresultso ntheE2Etestset.Thefirstthingtonoteisthat increasingthebeamsiz edidnotleadtoanycons iderablegaininper-fo rmanceforthevanillab eamsearchstrategy.Fo rallbeamsizesexcept1 0,theperformanceiswo rse 0.09
英語(論文から抽出)日本語訳スコア
thangreedydecoding,w hileusingabeamsizeof 10onlyincreasedperfo rmanceby0.06points.C omparedtovanillabeam search,rerankingwasa neffectivestrategy,i ncreasingtheperforma nceatallbeamsizes.Si milarly,applyingincr ementalbeammanipulat ionwasabletooutperfo rmbothmethodsatallbe amsizes.Usingthevali dationsettotunethebe amsize,theBLEUscores are0.89and1.93BLEUpo intshigherfortherera nkerandincrementalbe ammanipulation,respe ctively.Thedifferenc einBLEUscoresbetween incremen-talbeammani pulationandrerankerm ethodswasfoundtobesi gnificant(usingapermutati ontestwithsignificancelevel0.01).Leng thnormalisationwasth estrongestbaselinein -creasingtheBLEUscor eofvanillaby1.69poin ts.Addingthereranker ontopoflengthnormali sationdecreasesperfo rmanceforallbeamsize slessthan30.Thestron gperformanceoflength normalisa-tionislike lyduetothefactthatth eE2Etestsetcontained longer,andmorecomple xinputs(andhencerefe rences)thanthetraini ngandvalidationset(D uˇseketal.,2020).Never theless,applyingincr ementalbeammanipulat ionontopoflengthnorm alisationwasabletoin creasetheBLEUscorefo rallbeamsizesexcept5 .Itisworthpointingou tthatwhileincrementa lbeammanipulationimp rovedbothvanillabeam searchandlengthnorma lisation,theoverallB LEUscoreforthecombin ationwiththelatterwa slowerforallsizesoth erthansize3.Thisissu rprisingcon-sidering thatvanillabeamsearc hperformedworsethanl engthnormalisationwh ennotcombinedwithinc rementalbeammanipula tion.Thiscouldbeduet othefactthatthegreed yroll-outapproximati onislessaccurateforl engthnormalisationth anvanillabeamsearchs incelengthnormalisat iononlyhasanimpacton cesomeitemsinthebeam havebeencompleted.4. 2WebNLGBLEUresultsFi gure5indicatestheres ultsontheWebNLGtests et.Asintheresultsfor E2E,wecanseethatin-c reasingthebeamsizeof vanillabeamsearchwas notaneffectivewaytoi ncreaseBLEUscore.Agr eedydecodeoutperform editatallbeamsizes.R erankingthefinalbeamwasmoreeffect ive,in-creasingtheBL EUscoreby5.83points. Applyingincrementalb eammanipulationhadav erysimilarperformanc etoreranking,increas ingtheperfor-Figure5 :Resultsonthetestset oftheWebNLGchal-leng e.LN=LengthNormalisation. manceatbeamsizes3and 10butreducingitatsiz e5.Thelengthnormalis ationbaselineimprove duponthevanillabasel ine,increasingtheBLE Uscoreby5.01points.R erankingthefinalbeamofthelengthno rmalisedbeamsearchwa smoreeffec-tiveonthe WebNLGdatasetthanthe E2Edataset;applyingthererankero utperformedlengthnor mal-isationateverybe amsize.Focusingonthe beamsizesthatperform edoptimallyonthevali dationset,theBLEUsco reonthetestsetwasinc reasedby0.43points.A pplyingincrementalbe ammanipula-tionontop oflengthnormalisatio nreceivedayethigherB LEUscorethanreranked lengthnormalisa-tion forallbeamsizes.Incr easingtheBLEUscoreby 1.33pointscomparedto thelengthnormalisa-t ion.Theimprovementin BLEUscoresachievedby applyingincrementalb eammanipulationtothe lengthnormalisedbeam searchwasfoundtobesi gnificantwhencomparedtole ngthnormalisation(wi thorwithoutfinalbeamreranking).Un liketheE2Edataset,be ammanipulationhadhig herperformancewhenap pliedontopoflengthno rmalisationrathertha nvanillabeamsearch,o ut-performingitforal lbeamsizesexcept3.Th eBLEUscorewas0.52poi ntshigherwhentakingt hevaluesatthebeamsiz eswiththehighestper- formancesonthevalida tionset.4.3Falloutwi thBeamManipulationIn Section1,weexplained thatreferencesoftenf alloutabeamrelativel yearlyduringdecoding ,andreportedresultso ntheE2Etask.Werepeat edthesameexperimentf orwhenapplyingincrem ental thangreedydecoding,w hileusingabeamsizeof 10onlyincreasedperfo rmanceby0.06points.C omparedtovanillabeam search,rerankingwasa neffectivestrategy,i ncreasingtheperforma nceatallbeamsizes.Si milarly,applyingincr ementalbeammanipulat ionwasabletooutperfo rmbothmethodsatallbe amsizes.Usingthevali dationsettotunethebe amsize,theBLEUscores are0.89and1.93BLEUpo intshigherfortherera nkerandincrementalbe ammanipulation,respe ctively.Thedifferenc einBLEUscoresbetween incremen-talbeammani pulationandrerankerm ethodswasfoundtobesi gnificant(usingapermutati ontestwithsignificancelevel0.01).Leng thnormalisationwasth estrongestbaselinein -creasingtheBLEUscor eofvanillaby1.69poin ts.Addingthereranker ontopoflengthnormali sationdecreasesperfo rmanceforallbeamsize slessthan30.Thestron gperformanceoflength normalisa-tionislike lyduetothefactthatth eE2Etestsetcontained longer,andmorecomple xinputs(andhencerefe rences)thanthetraini ngandvalidationset(D uˇseketal.,2020).Never theless,applyingincr ementalbeammanipulat ionontopoflengthnorm alisationwasabletoin creasetheBLEUscorefo rallbeamsizesexcept5 .Itisworthpointingou tthatwhileincrementa lbeammanipulationimp rovedbothvanillabeam searchandlengthnorma lisation,theoverallB LEUscoreforthecombin ationwiththelatterwa slowerforallsizesoth erthansize3.Thisissu rprisingcon-sidering thatvanillabeamsearc hperformedworsethanl engthnormalisationwh ennotcombinedwithinc rementalbeammanipula tion.Thiscouldbeduet othefactthatthegreed yroll-outapproximati onislessaccurateforl engthnormalisationth anvanillabeamsearchs incelengthnormalisat iononlyhasanimpacton cesomeitemsinthebeam havebeencompleted.4. 2WebNLGBLEUresultsFi gure5indicatestheres ultsontheWebNLGtests et.Asintheresultsfor E2E,wecanseethatin-c reasingthebeamsizeof vanillabeamsearchwas notaneffectivewaytoi ncreaseBLEUscore.Agr eedydecodeoutperform editatallbeamsizes.R erankingthefinalbeamwasmoreeffect ive,in-creasingtheBL EUscoreby5.83points. Applyingincrementalb eammanipulationhadav erysimilarperformanc etoreranking,increas ingtheperfor-Figure5 :Resultsonthetestset oftheWebNLGchal-leng e.LN=LengthNormalisation. manceatbeamsizes3and 10butreducingitatsiz e5.Thelengthnormalis ationbaselineimprove duponthevanillabasel ine,increasingtheBLE Uscoreby5.01points.R erankingthefinalbeamofthelengthno rmalisedbeamsearchwa smoreeffec-tiveonthe WebNLGdatasetthanthe E2Edataset;applyingthererankero utperformedlengthnor mal-isationateverybe amsize.Focusingonthe beamsizesthatperform edoptimallyonthevali dationset,theBLEUsco reonthetestsetwasinc reasedby0.43points.A pplyingincrementalbe ammanipula-tionontop oflengthnormalisatio nreceivedayethigherB LEUscorethanreranked lengthnormalisa-tion forallbeamsizes.Incr easingtheBLEUscoreby 1.33pointscomparedto thelengthnormalisa-t ion.Theimprovementin BLEUscoresachievedby applyingincrementalb eammanipulationtothe lengthnormalisedbeam searchwasfoundtobesi gnificantwhencomparedtole ngthnormalisation(wi thorwithoutfinalbeamreranking).Un liketheE2Edataset,be ammanipulationhadhig herperformancewhenap pliedontopoflengthno rmalisationrathertha nvanillabeamsearch,o ut-performingitforal lbeamsizesexcept3.Th eBLEUscorewas0.52poi ntshigherwhentakingt hevaluesatthebeamsiz eswiththehighestper- formancesonthevalida tionset.4.3Falloutwi thBeamManipulationIn Section1,weexplained thatreferencesoftenf alloutabeamrelativel yearlyduringdecoding ,andreportedresultso ntheE2Etask.Werepeat edthesameexperimentf orwhenapplyingincrem ental 0.04
英語(論文から抽出)日本語訳スコア
Figure6:Thepercentag eofbeamswhichcontain areference(orange),o rwhichcouldstilllead toaref-erence(blue), usingIncrementalBeam Manipulationwithbeam size3ontheE2Evalidat ionset.Thisim-proves onvanillabeamsearch, showninFig.1.beamman ipulation.Abeamsizeo f3wasusedsothatthere sultscouldbedirectly comparedtothoseforva nillabeamsearchinFig ure1.Theresultsaresh owninFigure6.Thegrap hindicatesthatbeamma nipulationindeedamel io-ratesthisissue.Th efinalbeamcontainsa(cor rect)referencein100/ 547cases(approx18%), alargeincreasefromth e60ofvanillabeamsear ch.Thisismainlydueto reducingthenumberofr eferencesthatfallout insteps5to15,whichis consistentwiththefac tthatwearemanipulati ngatsteps5,10,15and2 0.Wealsoobservedthat mostoftheretentionga inisduetoearliermani pulationsteps.4.4Hum anEvaluationTofurthe rinvestigatethediffe rencesbetweenthesyst ems,weconductedahuma nevaluationcompar-in gtheincrementalbeamm anipulationsystem’soutputagainsttheout putofthestrongestbas elineontheE2Edataset –lengthnormalisation. Thehumanevaluationwa sperformedbythesec-o ndandthirdauthorsoft hepaper.Whiletheanno -tatorshadbeeninvolv edinthedesignofthesy stem,theyhadnotseent extualoutputsfromthe systempriortotheanno tation.Theoutputswer epresentedinarandomo rder,withoutindicati ngwhichout-putcamefr omwhichsystem.Thesys temswerecomparedinte rmsofbothfluencyandadequacy.For fluency,eachannotatorc omparedthesystemoutp utsforthesamemeaning representationinput( withoutseeingit)andi ndicatedtheirprefere nce.Bothannotatorsan notated50examplesfro mtheE2Etestset.Littl edifferencebetweenth eoutputswasfound.The systemswerelabelleda sequallyfluentin76%ofcases(inc rementalbeammanipula tionwaspre-ferredin1 2%ofallcases).Tojudg etheadequacyofthegen erations,humanraters werepresentedwiththe meaningrepresen-tati oninputandthetextgen eratedbyasystem.They wereaskedtolabelanyh allucinationsandanyr epetitions.Theywerea skedtoignoremiss-ing informationasthehuma nreferencesthatthesy stemhadbeentrainedon containedfrequentex- amplesofmissinginfor mation(Duˇseketal.,2020),sofor thisdataset,missingi nformationisbetterse enascontentselection .Betweentheannotator s,acombinedtotalof52 4exampleswerelabelle dforbothforhallucina tionandrepetition.On ceagain,theresultswe renotconclusiveinsup -portofeithersystem, withnostatisticallys ignif-icantdifferenc ebetweenthem.Theover allper-formancewasve ryhigh:for95%ofthein puts,bothsystemsexhi bitednosignsofhalluc inationorrepetition. Thiserroranalysisdid ,however,high-lightt hatsomeerrorsarerepe atedmultipletimes,al mostwordforword.Fore xample,all5casesofre petitionfortheincrem entalbeammanipulatio nsystemhadthefollowi ngform:“ThereisapubcalledXlo catednearY.Itisapub. ”Itisworthre-iteratin gthatthesystemwasopt i-misedforBLEU,andno tfluencyoradequacy.Thef actthatanimprovement inBLEUhasnotledtoani mprovementinahumanev aluationsuggeststhat BLEUmaynotbeanaccura teenoughmetricforthi stask,evenwhencompar ingsimilarsystems.Th erefore,BLEUmaybeeve nmorelimitedinuseful nessthanCallison-Bur chetal.(2006)andNovi kovaetal. Figure6:Thepercentag eofbeamswhichcontain areference(orange),o rwhichcouldstilllead toaref-erence(blue), usingIncrementalBeam Manipulationwithbeam size3ontheE2Evalidat ionset.Thisim-proves onvanillabeamsearch, showninFig.1.beamman ipulation.Abeamsizeo f3wasusedsothatthere sultscouldbedirectly comparedtothoseforva nillabeamsearchinFig ure1.Theresultsaresh owninFigure6.Thegrap hindicatesthatbeamma nipulationindeedamel io-ratesthisissue.Th efinalbeamcontainsa(cor rect)referencein100/ 547cases(approx18%), alargeincreasefromth e60ofvanillabeamsear ch.Thisismainlydueto reducingthenumberofr eferencesthatfallout insteps5to15,whichis consistentwiththefac tthatwearemanipulati ngatsteps5,10,15and2 0.Wealsoobservedthat mostoftheretentionga inisduetoearliermani pulationsteps.4.4Hum anEvaluationTofurthe rinvestigatethediffe rencesbetweenthesyst ems,weconductedahuma nevaluationcompar-in gtheincrementalbeamm anipulationsystem’soutputagainsttheout putofthestrongestbas elineontheE2Edataset –lengthnormalisation. Thehumanevaluationwa sperformedbythesec-o ndandthirdauthorsoft hepaper.Whiletheanno -tatorshadbeeninvolv edinthedesignofthesy stem,theyhadnotseent extualoutputsfromthe systempriortotheanno tation.Theoutputswer epresentedinarandomo rder,withoutindicati ngwhichout-putcamefr omwhichsystem.Thesys temswerecomparedinte rmsofbothfluencyandadequacy.For fluency,eachannotatorc omparedthesystemoutp utsforthesamemeaning representationinput( withoutseeingit)andi ndicatedtheirprefere nce.Bothannotatorsan notated50examplesfro mtheE2Etestset.Littl edifferencebetweenth eoutputswasfound.The systemswerelabelleda sequallyfluentin76%ofcases(inc rementalbeammanipula tionwaspre-ferredin1 2%ofallcases).Tojudg etheadequacyofthegen erations,humanraters werepresentedwiththe meaningrepresen-tati oninputandthetextgen eratedbyasystem.They wereaskedtolabelanyh allucinationsandanyr epetitions.Theywerea skedtoignoremiss-ing informationasthehuma nreferencesthatthesy stemhadbeentrainedon containedfrequentex- amplesofmissinginfor mation(Duˇseketal.,2020),sofor thisdataset,missingi nformationisbetterse enascontentselection .Betweentheannotator s,acombinedtotalof52 4exampleswerelabelle dforbothforhallucina tionandrepetition.On ceagain,theresultswe renotconclusiveinsup -portofeithersystem, withnostatisticallys ignif-icantdifferenc ebetweenthem.Theover allper-formancewasve ryhigh:for95%ofthein puts,bothsystemsexhi bitednosignsofhalluc inationorrepetition. Thiserroranalysisdid ,however,high-lightt hatsomeerrorsarerepe atedmultipletimes,al mostwordforword.Fore xample,all5casesofre petitionfortheincrem entalbeammanipulatio nsystemhadthefollowi ngform:“ThereisapubcalledXlo catednearY.Itisapub. ”Itisworthre-iteratin gthatthesystemwasopt i-misedforBLEU,andno tfluencyoradequacy.Thef actthatanimprovement inBLEUhasnotledtoani mprovementinahumanev aluationsuggeststhat BLEUmaynotbeanaccura teenoughmetricforthi stask,evenwhencompar ingsimilarsystems.Th erefore,BLEUmaybeeve nmorelimitedinuseful nessthanCallison-Bur chetal.(2006)andNovi kovaetal. 0.10
(2017a)suggested.4.5 ExampleoutputsWenowp resentacoupleofexamp leswherema-nipulatin gthebeamduringdecodi ngledtoanim-provemen tinthequalityoftheou tput.Thesewereselect edfromthesetofexampl esforwhichtheoutputo fthebeammanipulatorw aspreferredbythehuma nannotatorsintermsof adequacy.Theexamples aregiveninFigure7.In thefirstexample,wecanseet hatlengthnor-malisat ionleadstoarepetitio nofthefactthatThe (2017a)suggested.4.5 ExampleoutputsWenowa coupleofexampleswher ema-nipulatingthebep resentacoupleofexamp leswherema-nipulatin gthebeamduringdecodi ngledtoanim-provemen tinthequalityoftheou tput.Theseweres selectededfromtheset ofexamplesforwhichth eoutputoftheammanipu latorwaspreferredbyt hethe Humanannotatorsinter msofadequacy.Theexam plesaregiveninFigure 7。 0.05
英語(論文から抽出)日本語訳スコア
Input:name=TheCricketers|eattype=restaurant|food=English|pricerange=cheap|rating=average|area=citycentre|familyfriendly=yes|near=Caf´eRougeBM:TheCrickete rsservescheapEnglish foodinthecitycentren earCaf´eRouge.Ithasanaverag ecustomerratingandis family-friendly.LN:T heCricketersisacheap ,family-friendly,Eng lishrestaurantwithan averagecustomerratin g.Itislocatedintheci tycentrenearCaf´eRougeandhasanaverag ecustomerrating.RR:T heCricketersisacheap ,family-friendlyrest aurantlocatedincityc entrenearCaf´eRouge.Input:name=ThePhoenix|eattype=pub|food=French|pricerange=£20-25|rating=3outof5|area=riverside|familyfriendly=no|near=Caf´eSiciliaBM:ThePhoeni xisaFrenchrestaurant inriversidenearCaf´eSicilia.Ithasamoder atepricerangeandacus tomerratingof3outof5 .Itisnotkidfriendly. LN:ThePhoenixisarest aurantprovidingFrenc hfoodinthe£20-25pricerange.Itis locatedintheriversid e.ItisnearCaf´eSicilia.Itscustomer ratingis5outof5.RR:T hePhoenixisarestaura ntprovidingFrenchfoo dinthe£20-25pricerange.Itis locatedintheriversid e.ItisnearCaf´eSicilia.Itscustomer ratingishigh.Figure7 :Exampleoutputsfordi fferentsystems.BM=Incrementalbeammanip ulationsystem,LN=Vanillabeamsearchwit hlengthnormalisation ,RR=Vanillabeamsearchwit hrerankingappliedtot hefinalbeam.Cricketersha dan‘averagecustomerratin g’,show-ingthedownside sofatechniquethatjus tfavourslongeroutput s.Neithertheoutputfr omthebeammanipulator northererankedapproa chcontainrep-etition s,althoughwecanseeth atmoreoftheinputinfo rmationisrealisedint hecaseofthebeamma-ni pulator.Thesecondexa mplecontainsahalluci nationforboththeleng thnormalisedandreran kedsystems-theinputc learlystatesthatthec ustomerratingwas‘3outof5’.Incontrast,whereast hesesystemsclaimthat itwas‘5outof5’and‘high’respectively.Thebeam manipulationsystemav oidedthisissue.5Conc lusionsRerankersarec ommonlyusedtoincreas etheper-formanceofNL Gsystemsdecodedbybea msearch,bymodifyingw hichhypothesisfromth efinalbeamischosen.This meansthatrerankersar edependentongoodhypo thesesreachingthefinalbeam.How-ever,thi sisoftennotthecase;onthevalidationsetof E2Echallenge,only11% ofreferenceswerepres entinthefinalbeamwhentheseq2se qmodelfrom(DuˇsekandJurˇc´ıˇcek,2016)wasdecodedw ithabeamsizeof3.Toad dressthislimitation, weproposedincrementa lbeammanipulation,wh ichmodifiestherankingofpartia lhypotheseswithinthe beamatintermediatest epsofthedecoding,and hencechooseswhichare pruned.Weevaluatedth ismethodonboththeE2E andWebNLGchallenges. Theresultsshowedthat applyingbeammanipula tion,insteadofareran ker,wasabletoincreas etheBLEUscoreby1.04o ntheE2Echallenge.Wef urthershowedthatincr e-mentalbeammanipula tionwasabletoincreas eperformancewhenappl iedontopoflengthnorm al-isation.Theoptima lrerankerforincremen talbeammanip-ulation maydifferateachstepo fgeneration(forexamp le,token5vs.token20) .Infuturework,weinte ndtorefineourmethodfurtherby conditioningthereran keronhowfarthroughth ebeamsearchweare.Ref erencesShubhamAgarwa l,MarcDymetman,andEr icGaussier.2018.Char 2chargenerationwithr erankingfortheE2ENLG challenge.pages451–456.NabihaAsghar,Pas calPoupart,XinJiang, andHangLi.2017.Deepa ctivelearningfordial oguegeneration.InPro ceedingsofthe6thJoin tConferenceonLexical andComputationalSema ntics(*SEM2017),page s78–83.Fr´ed´ericBlain,LuciaSpeci a,andPranavaMadhyast ha.2017.Exploringhyp othesesspacesinneura lmachinetranslation. InProceedingsofthe16 thMachineTrans-latio nSummit(MTSummitXVI) .Asia-PacificAssocia-tionforMach ineTranslation(AAMT) .SebastianBorgeaudan dGuyEmerson.2020.Lev erag-ingsentencesimi larityinnaturallangu agegeneration:Improv ingbeamsearchusingra ngevoting.InPro-ceed ingsofthe4thWorkshop onNeuralGeneration Input:name=TheCricketers|eattype=restaurant|food=English|pricerange=cheap|rating=average|area=citycentre|familyfriendly=yes|near=Caf´eRougeBM:TheCrickete rsservescheapEnglish foodinthecitycentren earCaf´eRouge.Ithasanaverag ecustomerratingandis family-friendly.LN:T heCricketersisacheap ,family-friendly,Eng lishrestaurantwithan averagecustomerratin g.Itislocatedintheci tycentrenearCaf´eRougeandhasanaverag ecustomerrating.RR:T heCricketersisacheap ,family-friendlyrest aurantlocatedincityc entrenearCaf´eRouge.Input:name=ThePhoenix|eattype=pub|food=French|pricerange=£20-25|rating=3outof5|area=riverside|familyfriendly=no|near=Caf´eSiciliaBM:ThePhoeni xisaFrenchrestaurant inriversidenearCaf´eSicilia.Ithasamoder atepricerangeandacus tomerratingof3outof5 .Itisnotkidfriendly. LN:ThePhoenixisarest aurantprovidingFrenc hfoodinthe£20-25pricerange.Itis locatedintheriversid e.ItisnearCaf´eSicilia.Itscustomer ratingis5outof5.RR:T hePhoenixisarestaura ntprovidingFrenchfoo dinthe£20-25pricerange.Itis locatedintheriversid e.ItisnearCaf´eSicilia.Itscustomer ratingishigh.Figure7 :Exampleoutputsfordi fferentsystems.BM=Incrementalbeammanip ulationsystem,LN=Vanillabeamsearchwit hlengthnormalisation ,RR=Vanillabeamsearchwit hrerankingappliedtot hefinalbeam.Cricketersha dan‘averagecustomerratin g’,show-ingthedownside sofatechniquethatjus tfavourslongeroutput s.Neithertheoutputfr omthebeammanipulator northererankedapproa chcontainrep-etition s,althoughwecanseeth atmoreoftheinputinfo rmationisrealisedint hecaseofthebeamma-ni pulator.Thesecondexa mplecontainsahalluci nationforboththeleng thnormalisedandreran kedsystems-theinputc learlystatesthatthec ustomerratingwas‘3outof5’.Incontrast,whereast hesesystemsclaimthat itwas‘5outof5’and‘high’respectively.Thebeam manipulationsystemav oidedthisissue.5Conc lusionsRerankersarec ommonlyusedtoincreas etheper-formanceofNL Gsystemsdecodedbybea msearch,bymodifyingw hichhypothesisfromth efinalbeamischosen.This meansthatrerankersar edependentongoodhypo thesesreachingthefinalbeam.How-ever,thi sisoftennotthecase;onthevalidationsetof E2Echallenge,only11% ofreferenceswerepres entinthefinalbeamwhentheseq2se qmodelfrom(DuˇsekandJurˇc´ıˇcek,2016)wasdecodedw ithabeamsizeof3.Toad dressthislimitation, weproposedincrementa lbeammanipulation,wh ichmodifiestherankingofpartia lhypotheseswithinthe beamatintermediatest epsofthedecoding,and hencechooseswhichare pruned.Weevaluatedth ismethodonboththeE2E andWebNLGchallenges. Theresultsshowedthat applyingbeammanipula tion,insteadofareran ker,wasabletoincreas etheBLEUscoreby1.04o ntheE2Echallenge.Wef urthershowedthatincr e-mentalbeammanipula tionwasabletoincreas eperformancewhenappl iedontopoflengthnorm al-isation.Theoptima lrerankerforincremen talbeammanip-ulation maydifferateachstepo fgeneration(forexamp le,token5vs.token20) .Infuturework,weinte ndtorefineourmethodfurtherby conditioningthereran keronhowfarthroughth ebeamsearchweare.Ref erencesShubhamAgarwa l,MarcDymetman,andEr icGaussier.2018.Char 2chargenerationwithr erankingfortheE2ENLG challenge.pages451–456.NabihaAsghar,Pas calPoupart,XinJiang, andHangLi.2017.Deepa ctivelearningfordial oguegeneration.InPro ceedingsofthe6thJoin tConferenceonLexical andComputationalSema ntics(*SEM2017),page s78–83.Fr´ed´ericBlain,LuciaSpeci a,andPranavaMadhyast ha.2017.Exploringhyp othesesspacesinneura lmachinetranslation. InProceedingsofthe16 thMachineTrans-latio nSummit(MTSummitXVI) .Asia-PacificAssocia-tionforMach ineTranslation(AAMT) .SebastianBorgeaudan dGuyEmerson.2020.Lev erag-ingsentencesimi larityinnaturallangu agegeneration:Improv ingbeamsearchusingra ngevoting.InPro-ceed ingsofthe4thWorkshop onNeuralGeneration 0.10
英語(論文から抽出)日本語訳スコア
andTranslation(WNGT) ,pages97–109.AssociationforCo mputationalLinguisti cs.ChrisCallison-Bur ch,MilesOsborne,andP hilippKoehn.2006.Re- evaluatingtheroleofB LEUinma-chinetransla tionresearch.InProce edingsofthe11thConfe renceoftheEuropeanCh apteroftheAssociatio nforComputationalLin guistics(EACL).Kai-W eiChang,AkshayKrishn amurthy,AlekhAgar-wa l,HalDaum´eIII,andJohnLangford .2015.Learn-ingtosea rchbetterthanyourtea cher.InProceed-ingso fthe32ndInternationa lConferenceonMachine Learning(ICML),volum e37ofJMLRProceedings ,pages2058–2066.JMLR.org.YunChe n,KyunghyunCho,Samue lRBowman,andVictorOK Li.2018.Stableandeff ectivetrainablegreed ydecodingforsequence tosequencelearning.I nProceedingsofthe6th InternationalConfere nceonLearningReprese ntations(ICLR2018).E ldanCohenandChristop herBeck.2019.Empiric alanalysisofbeamsear chperformancedegrada tioninneuralsequence models.InProceedings ofthe36thIn-ternatio nalConferenceonMachi neLearning(ICML),pag es1290–1299.RonanCollobert, AwniHannun,andGabrie lSynnaeve.2019.Afull ydifferentiablebeams earchdecoder.InProce edingsofthe36thInter nationalConferenceon MachineLearning(ICML ),pages1341–1350.OndˇrejDuˇsekandFilipJurˇc´ıˇcek.2016.Sequence-to -sequencegenerationf orspokendialogueviad eepsyn-taxtreesandst rings.InProceedingso fthe54thAn-nualMeeti ngoftheAssociationfo rComputationalLin-gu istics(Volume2:Short Papers),pages45–51,Berlin,Germany.As sociationforComputat ionalLinguistics.Ond ˇrejDuˇsek,JekaterinaNoviko va,andVerenaRieser.2 020.Evaluatingthesta te-of-the-artofend-t o-endnat-urallanguag egeneration:TheE2ENL Gchallenge.ComputerS peech&Language,59:123–156.MarkusFreitagand YaserAl-Onaizan.2017 .Beamsearchstrategie sforneuralmachinetra nslation.InProceedin gsoftheFirstWorkshop onNeuralMachineTrans lation,pages56–60,Vancouver.Associa tionforComputational Linguistics.ClaireGa rdent,AnastasiaShimo rina,ShashiNarayan,a ndLauraPerez-Beltrac hini.2017.TheWebNLGc hallenge:Generatingt extfromRDFdata.InPro -ceedingsofthe10thIn ternationalConferenc eonNat-uralLanguageG eneration,pages124–133,SantiagodeCompos tela,Spain.Associati onforComputationalLi nguistics.KartikGoya l,GrahamNeubig,Chris Dyer,andTay-lorBerg- Kirkpatrick.2018.Aco ntinuousrelaxationof beamsearchforend-to- endtrainingofneurals equencemodels.InProc eedingsofthe32ndAAAI ConferenceonArtificialIntelligence,pag es3045–3052.JiataoGu,Kyungh yunCho,andVictorOKLi .2017.Trainablegreed ydecodingforneuralma chinetrans-lation.In Proceedingsofthe2017 ConferenceonEmpirica lMethodsinNaturalLan guageProcessing(EMNL P),pages1968–1978.AriHoltzman,Jan Buys,LeoDu,MaxwellFo rbes,andYejinChoi.20 20.Thecuriouscaseofn euraltextde-generati on.PhilippKoehnandRe beccaKnowles.2017.Si xchal-lengesforneura lmachinetranslation. InProceedingsoftheFi rstWorkshoponNeuralM achineTranslation,pa ges28–39.ShankarKumarandWi lliamByrne.2004.Mini mumBayes-Riskdecodin gforstatisticalmachi netranslation.InProc eedingsoftheHumanLan guageTechnologyConfe renceoftheNorthAmeri canChapteroftheAs-so ciationforComputatio nalLinguistics:HLT-N AACL2004,pages169–176.Tie-YanLiu.2009. Learningtorankforinf ormationretrieval.Fo und.TrendsInf.Retr., 3(3):225–331.KentonMurrayandD avidChiang.2018.Corr ectinglengthbiasinne uralmachinetranslati on.InProceed-ingsoft heThirdConferenceonM achineTranslation:Re searchPapers,pages21 2–223.RenatoNegrinho,M atthewR.Gormley,andG eoffreyJ.Gordon.2018 .Learningbeamsearchp oliciesviaimi-tation learning.JekaterinaN ovikova,OndˇrejDuˇsek,AmandaCercasCurr y,andVerenaRieser.20 17a.Whyweneedneweval uationmetricsforNLG. InProceedingsofthe20 17ConferenceonEmpiri calMethodsinNaturalL anguageProcessing,pa ges2241–2252.AssociationforC omputationalLinguist ics.JekaterinaNoviko va,OndˇrejDuˇsek,andVerenaRieser. 2017b.TheE2Edataset: Newchallengesforend- to-endgeneration.InP roceedingsofthe18thA nnualSIG-dialMeeting onDiscourseandDialog ue,pages201–206.KishorePapineni, SalimRoukos,ToddWard ,andWei-JingZhu.2002 .Bleu:amethodforauto maticevalua-tionofma chinetranslation.InP roceedingsofthe40thA nnualMeetingoftheAss ociationforComputati onalLinguistics,page s311–318,Philadelphia,Pen nsylva-nia,USA.Assoc iationforComputation alLinguistics.Donald G.SaariandVincentR.M erlin.1996.TheCopela ndmethod.EconomicThe ory,8(1):51–76.FelixStahlbergand BillByrne.2019.OnNMT searcherrorsandmodel errors:Catgotyourton gue?InPro-ceedingsof the2019ConferenceonE mpiricalMethodsinNat uralLanguageProcessi ngandthe9thInterna-t ionalJointConference onNaturalLanguagePro cess-ing(EMNLP-IJCNL P),pages3347–3353.Tsung-HsienWen, MilicaGasic,DonghoKi m,NikolaMrkˇsi´c,Pei-HaoSu,DavidVan dyke,andSteveYoung.2 015.Stochasticlangua gegenerationindialog ueusingrecurrentneur alnetworkswithconvol utionalsentencereran king.InProceedingsof the16thAnnualMeeting oftheSpecialInterest GrouponDiscourseandD ia-logue,pages275–284. andTranslation(WNGT) ,pages97–109.AssociationforCo mputationalLinguisti cs.ChrisCallison-Bur ch,MilesOsborne,andP hilippKoehn.2006.Re- evaluatingtheroleofB LEUinma-chinetransla tionresearch.InProce edingsofthe11thConfe renceoftheEuropeanCh apteroftheAssociatio nforComputationalLin guistics(EACL).Kai-W eiChang,AkshayKrishn amurthy,AlekhAgar-wa l,HalDaum´eIII,andJohnLangford .2015.Learn-ingtosea rchbetterthanyourtea cher.InProceed-ingso fthe32ndInternationa lConferenceonMachine Learning(ICML),volum e37ofJMLRProceedings ,pages2058–2066.JMLR.org.YunChe n,KyunghyunCho,Samue lRBowman,andVictorOK Li.2018.Stableandeff ectivetrainablegreed ydecodingforsequence tosequencelearning.I nProceedingsofthe6th InternationalConfere nceonLearningReprese ntations(ICLR2018).E ldanCohenandChristop herBeck.2019.Empiric alanalysisofbeamsear chperformancedegrada tioninneuralsequence models.InProceedings ofthe36thIn-ternatio nalConferenceonMachi neLearning(ICML),pag es1290–1299.RonanCollobert, AwniHannun,andGabrie lSynnaeve.2019.Afull ydifferentiablebeams earchdecoder.InProce edingsofthe36thInter nationalConferenceon MachineLearning(ICML ),pages1341–1350.OndˇrejDuˇsekandFilipJurˇc´ıˇcek.2016.Sequence-to -sequencegenerationf orspokendialogueviad eepsyn-taxtreesandst rings.InProceedingso fthe54thAn-nualMeeti ngoftheAssociationfo rComputationalLin-gu istics(Volume2:Short Papers),pages45–51,Berlin,Germany.As sociationforComputat ionalLinguistics.Ond ˇrejDuˇsek,JekaterinaNoviko va,andVerenaRieser.2 020.Evaluatingthesta te-of-the-artofend-t o-endnat-urallanguag egeneration:TheE2ENL Gchallenge.ComputerS peech&Language,59:123–156.MarkusFreitagand YaserAl-Onaizan.2017 .Beamsearchstrategie sforneuralmachinetra nslation.InProceedin gsoftheFirstWorkshop onNeuralMachineTrans lation,pages56–60,Vancouver.Associa tionforComputational Linguistics.ClaireGa rdent,AnastasiaShimo rina,ShashiNarayan,a ndLauraPerez-Beltrac hini.2017.TheWebNLGc hallenge:Generatingt extfromRDFdata.InPro -ceedingsofthe10thIn ternationalConferenc eonNat-uralLanguageG eneration,pages124–133,SantiagodeCompos tela,Spain.Associati onforComputationalLi nguistics.KartikGoya l,GrahamNeubig,Chris Dyer,andTay-lorBerg- Kirkpatrick.2018.Aco ntinuousrelaxationof beamsearchforend-to- endtrainingofneurals equencemodels.InProc eedingsofthe32ndAAAI ConferenceonArtificialIntelligence,pag es3045–3052.JiataoGu,Kyungh yunCho,andVictorOKLi .2017.Trainablegreed ydecodingforneuralma chinetrans-lation.In Proceedingsofthe2017 ConferenceonEmpirica lMethodsinNaturalLan guageProcessing(EMNL P),pages1968–1978.AriHoltzman,Jan Buys,LeoDu,MaxwellFo rbes,andYejinChoi.20 20.Thecuriouscaseofn euraltextde-generati on.PhilippKoehnandRe beccaKnowles.2017.Si xchal-lengesforneura lmachinetranslation. InProceedingsoftheFi rstWorkshoponNeuralM achineTranslation,pa ges28–39.ShankarKumarandWi lliamByrne.2004.Mini mumBayes-Riskdecodin gforstatisticalmachi netranslation.InProc eedingsoftheHumanLan guageTechnologyConfe renceoftheNorthAmeri canChapteroftheAs-so ciationforComputatio nalLinguistics:HLT-N AACL2004,pages169–176.Tie-YanLiu.2009. Learningtorankforinf ormationretrieval.Fo und.TrendsInf.Retr., 3(3):225–331.KentonMurrayandD avidChiang.2018.Corr ectinglengthbiasinne uralmachinetranslati on.InProceed-ingsoft heThirdConferenceonM achineTranslation:Re searchPapers,pages21 2–223.RenatoNegrinho,M atthewR.Gormley,andG eoffreyJ.Gordon.2018 .Learningbeamsearchp oliciesviaimi-tation learning.JekaterinaN ovikova,OndˇrejDuˇsek,AmandaCercasCurr y,andVerenaRieser.20 17a.Whyweneedneweval uationmetricsforNLG. InProceedingsofthe20 17ConferenceonEmpiri calMethodsinNaturalL anguageProcessing,pa ges2241–2252.AssociationforC omputationalLinguist ics.JekaterinaNoviko va,OndˇrejDuˇsek,andVerenaRieser. 2017b.TheE2Edataset: Newchallengesforend- to-endgeneration.InP roceedingsofthe18thA nnualSIG-dialMeeting onDiscourseandDialog ue,pages201–206.KishorePapineni, SalimRoukos,ToddWard ,andWei-JingZhu.2002 .Bleu:amethodforauto maticevalua-tionofma chinetranslation.InP roceedingsofthe40thA nnualMeetingoftheAss ociationforComputati onalLinguistics,page s311–318,Philadelphia,Pen nsylva-nia,USA.Assoc iationforComputation alLinguistics.Donald G.SaariandVincentR.M erlin.1996.TheCopela ndmethod.EconomicThe ory,8(1):51–76.FelixStahlbergand BillByrne.2019.OnNMT searcherrorsandmodel errors:Catgotyourton gue?InPro-ceedingsof the2019ConferenceonE mpiricalMethodsinNat uralLanguageProcessi ngandthe9thInterna-t ionalJointConference onNaturalLanguagePro cess-ing(EMNLP-IJCNL P),pages3347–3353.Tsung-HsienWen, MilicaGasic,DonghoKi m,NikolaMrkˇsi´c,Pei-HaoSu,DavidVan dyke,andSteveYoung.2 015.Stochasticlangua gegenerationindialog ueusingrecurrentneur alnetworkswithconvol utionalsentencereran king.InProceedingsof the16thAnnualMeeting oftheSpecialInterest GrouponDiscourseandD ia-logue,pages275–284. 0.15
英語(論文から抽出)日本語訳スコア
Figure8:Thepercentag eofbeamsthatcontaina refer-encesentenceaf tereachstepofbeamsea rch.Abeamsizeof10was usedtodecodethemodel proposedinDusekandJu rcıcek(2016).Resultsare fortheE2Eval-idation dataset.Theorangebar sindicatethenumberof completedreferencesw ithinthebeam.SamWise manandAlexanderM.Rus h.2016.Sequence-to-s equencelearningasbea m-searchopti-mizatio n.InProceedingsofthe 2016ConferenceonEmpi ricalMethodsinNatura lLanguageProcessing( EMNLP),pages1296–1306.NingZhang,Shui- LongShen,AnnanZhou,a ndYe-ShuangXu.2019.I nvestigationonperfor manceofneu-ralnetwor ksusingquadraticrela tiveerrorcostfunctio n.IEEEAccess,7:10664 2–106652.6Appendices6. 1Falloutexperimentwi thlargerbeamsizeSect ion1containsagraphwh ichindicatesthestepa twhichthereferencese ntencesdropoutoftheb eam(forabeamsizeof3) .Figure8indicatesthe sameresultsforalarge rbeamsizeof10.Thefigureindicatesthatthe numberofreferencesth atwerecontainedinthe finalbeamwashigherfora beamsize10.Fortheear lyiterationsofthedec odingthenumberofrefe rencesthatfelloutoft hebeamwasfarlowerfor abeamsizeof10.Alarge rbeamsizemeantthatth ebeamcontainedmorehy pothesesandsohasmore chancestomatchagains tareference.Howevert heshapeofthegraphsis verysimilar.Themajor ityofreferencesthatf elloutdidsorela-tive lyearlyintheprocess. 54%ofreferencesfello utbystep7,increasing to79%bystep9.Atstep2 1thefinallastreferencefell outofthebeamdespitet hefactthatthebeamcon tainedpartiallyrefer encesuptostep40.Figu re9:Comparisonbetwee ntheperformanceofthe pointwiseandpairwise rankerswhenusedasrer ankersontheE2Evalida tionset.6.2Pointwise vsPairwisererankersT hispaperrequiredamet hodofrankingcomplete dhypothesesfromworst tobest.Duringprelimi naryexperimentsweimp lementedrerankersbas edonthePairwiseandPo intwisestrategiesfro mtheIn-formationRetr ievalfield.SeeSection3.2for moredetails.Toevalua tetheperformanceofth edifferentrankerswea ppliedeachoftheranke rsasarerankerofthefinalbeamofavanillabea msearchovertheE2Eval idationset.TheBLEUsc oresforeachoftherera nkerswerecalculatedf oreachbeamsize.There sultsareshowninFigur e9.Wecanseethatthere wasverylittlediffere nceinperformancefort hetwomethodsofrerank ingforthebeamsizesup to10.However,forbeam size30thepointwisere rankersignificantlyoutperformsthe pairwisereranker.The largerthebeamsizethe greaterthenumberofhy pothesesthatthereran kercanpickastopandhe ncethegreatertheimpa ctofthereranker.Thep ointwisererankerrequ iresO(k)runsoftherer ankertoproduceatotal ordering.Ontheotherh andtheCopelandmethod toproduceatotalorder -ingfromthepairwisec omparisonsrequiresO( k2)numberofpairwiseC omparisons.Thesefact orsleadustochoosethe Pointwiserankerovert hepairwiserankerfort heexperimentsinthere sultssection. Figure8:Thepercentag eofbeamsthatcontaina refer-encesentenceaf tereachstepofbeamsea rch.Abeamsizeof10was usedtodecodethemodel proposedinDusekandJu rcıcek(2016).Resultsare fortheE2Eval-idation dataset.Theorangebar sindicatethenumberof completedreferencesw ithinthebeam.SamWise manandAlexanderM.Rus h.2016.Sequence-to-s equencelearningasbea m-searchopti-mizatio n.InProceedingsofthe 2016ConferenceonEmpi ricalMethodsinNatura lLanguageProcessing( EMNLP),pages1296–1306.NingZhang,Shui- LongShen,AnnanZhou,a ndYe-ShuangXu.2019.I nvestigationonperfor manceofneu-ralnetwor ksusingquadraticrela tiveerrorcostfunctio n.IEEEAccess,7:10664 2–106652.6Appendices6. 1Falloutexperimentwi thlargerbeamsizeSect ion1containsagraphwh ichindicatesthestepa twhichthereferencese ntencesdropoutoftheb eam(forabeamsizeof3) .Figure8indicatesthe sameresultsforalarge rbeamsizeof10.Thefigureindicatesthatthe numberofreferencesth atwerecontainedinthe finalbeamwashigherfora beamsize10.Fortheear lyiterationsofthedec odingthenumberofrefe rencesthatfelloutoft hebeamwasfarlowerfor abeamsizeof10.Alarge rbeamsizemeantthatth ebeamcontainedmorehy pothesesandsohasmore chancestomatchagains tareference.Howevert heshapeofthegraphsis verysimilar.Themajor ityofreferencesthatf elloutdidsorela-tive lyearlyintheprocess. 54%ofreferencesfello utbystep7,increasing to79%bystep9.Atstep2 1thefinallastreferencefell outofthebeamdespitet hefactthatthebeamcon tainedpartiallyrefer encesuptostep40.Figu re9:Comparisonbetwee ntheperformanceofthe pointwiseandpairwise rankerswhenusedasrer ankersontheE2Evalida tionset.6.2Pointwise vsPairwisererankersT hispaperrequiredamet hodofrankingcomplete dhypothesesfromworst tobest.Duringprelimi naryexperimentsweimp lementedrerankersbas edonthePairwiseandPo intwisestrategiesfro mtheIn-formationRetr ievalfield.SeeSection3.2for moredetails.Toevalua tetheperformanceofth edifferentrankerswea ppliedeachoftheranke rsasarerankerofthefinalbeamofavanillabea msearchovertheE2Eval idationset.TheBLEUsc oresforeachoftherera nkerswerecalculatedf oreachbeamsize.There sultsareshowninFigur e9.Wecanseethatthere wasverylittlediffere nceinperformancefort hetwomethodsofrerank ingforthebeamsizesup to10.However,forbeam size30thepointwisere rankersignificantlyoutperformsthe pairwisereranker.The largerthebeamsizethe greaterthenumberofhy pothesesthatthereran kercanpickastopandhe ncethegreatertheimpa ctofthereranker.Thep ointwisererankerrequ iresO(k)runsoftherer ankertoproduceatotal ordering.Ontheotherh andtheCopelandmethod toproduceatotalorder -ingfromthepairwisec omparisonsrequiresO( k2)numberofpairwiseC omparisons.Thesefact orsleadustochoosethe Pointwiserankerovert hepairwiserankerfort heexperimentsinthere sultssection. 0.04
英語(論文から抽出)日本語訳スコア
6.3NumericalresultsT hissectionwillpresen tthenumericalresults fortheE2EandWebNLGda tasetssothattheycanb emorereadilycompared infutureworks.Theres ultsaregiveninTable1 andTable2.6.4Hyper-p arametersThroughoutt hispaperanumberofhyp erpara-materswereint roduced.Thevaluesuse dforeachofthemodelsi nthispaperaresumaris edbellow.Notethatthe searchforthesevalues wasfarfromexhaustive sothereisagoodchance thattheresultsofthis papercouldbeimproved uponthroughabetterop timisationprocedure. InSection3.3,thebeam issplitintotwosectio nsbottomandrest.Fora llbeammanipulationmo delsthebottomofthebe amwassettothebottom( ielowestscoring)quar terofthebeam.Wealsos ayalargebeamisusedto generatethedatafortr ainingthebeam.Forall experimentsinthispap erweuseabeamsizeof50 .Akeyhyperparameterf orperformanceofthein -crementalbeammanipu lationwasthestepsoft hebeamsearchatwhicht hebeamwasmanipulated .Thishyperparameterv ariedforthe4seperate beammanipulationmode ls.Thevaluesaresumma risedasfollows:•E2E-IncrementalBeamM anipulationontopofva nillabeamsearch:5,10 ,15,20andfinal.•E2E-IncrementalBeamM anipulationontopofle ngthnormalisedbeamse arch:5,7and10.•WebNLG-IncrementalBe amManipulationontopo fvanillabeamsearch:4 ,12andfinal.•WebNLG-IncrementalBe amManipulationontopo flengthnormalisedbea msearch:5and12.Itisw orthnotingthatmanipu latingthefinalstepisthesameasre rankingthebeamaccord ingtotherankerusedin beammanipulation(i.e .norolloutsareperfor med). 6.3NumericalresultsT hissectionwillpresen tthenumericalresults fortheE2EandWebNLGda tasetssothattheycanb emorereadilycompared infutureworks.Theres ultsaregiveninTable1 andTable2.6.4Hyper-p arametersThroughoutt hispaperanumberofhyp erpara-materswereint roduced.Thevaluesuse dforeachofthemodelsi nthispaperaresumaris edbellow.Notethatthe searchforthesevalues wasfarfromexhaustive sothereisagoodchance thattheresultsofthis papercouldbeimproved uponthroughabetterop timisationprocedure. InSection3.3,thebeam issplitintotwosectio nsbottomandrest.Fora llbeammanipulationmo delsthebottomofthebe amwassettothebottom( ielowestscoring)quar terofthebeam.Wealsos ayalargebeamisusedto generatethedatafortr ainingthebeam.Forall experimentsinthispap erweuseabeamsizeof50 .Akeyhyperparameterf orperformanceofthein -crementalbeammanipu lationwasthestepsoft hebeamsearchatwhicht hebeamwasmanipulated .Thishyperparameterv ariedforthe4seperate beammanipulationmode ls.Thevaluesaresumma risedasfollows:•E2E-IncrementalBeamM anipulationontopofva nillabeamsearch:5,10 ,15,20andfinal.•E2E-IncrementalBeamM anipulationontopofle ngthnormalisedbeamse arch:5,7and10.•WebNLG-IncrementalBe amManipulationontopo fvanillabeamsearch:4 ,12andfinal.•WebNLG-IncrementalBe amManipulationontopo flengthnormalisedbea msearch:5and12.Itisw orthnotingthatmanipu latingthefinalstepisthesameasre rankingthebeamaccord ingtotherankerusedin beammanipulation(i.e .norolloutsareperfor med). 0.02
英語(論文から抽出)日本語訳スコア
BeamsizeVanillaReran kTGENLNLN+RerankBMLN+BM164.7264.7264.7264 .7264.7264.7264.7236 4.4764.9465.3365.936 5.2665.7366.06564.69 65.3665.47*66.4066.1 9*66.4066.271064.78* 65.67*65.5866.47*66. 1966.71*66.613063.65 65.2565.4465.5866.05 66.4066.25*Table1:BL EUscoresforeachofthe differentsystemsonth eE2Etestset.*indicat esthebeamsizewhichsc oredhighestontheresp ectivevalidationsets .boldindicatesthehig hestscoringsystemfor eachbeamsize.Beamsiz eVanillaRerankTGENLN LN+RerankBMLN+BM142.1442.1442.1442 .1442.1442.1442.1434 2.10*47.93*47.28*47. 0247.3748.3847.78541 .7748.1347.4147.4947 .54*47.92*48.391041. 3347.3346.5047.11*47 .7047.6648.213041.20 47.4246.6147.1847.81 47.4148.44*Table2:BL EUscoresforeachofthe differentsystemsonth eWebNLGtestset. BeamsizeVanillaReran kTGENLNLN+RerankBMLN+BM164.7264.7264.7264 .7264.7264.7264.7236 4.4764.9465.3365.936 5.2665.7366.06564.69 65.3665.47*66.4066.1 9*66.4066.271064.78* 65.67*65.5866.47*66. 1966.71*66.613063.65 65.2565.4465.5866.05 66.4066.25*Table1:BL EUscoresforeachofthe differentsystemsonth eE2Etestset.*indicat esthebeamsizewhichsc oredhighestontheresp ectivevalidationsets .boldindicatesthehig hestscoringsystemfor eachbeamsize.Beamsiz eVanillaRerankTGENLN LN+RerankBMLN+BM142.1442.1442.1442 .1442.1442.1442.1434 2.10*47.93*47.28*47. 0247.3748.3847.78541 .7748.1347.4147.4947 .54*47.92*48.391041. 3347.3346.5047.11*47 .7047.6648.213041.20 47.4246.6147.1847.81 47.4148.44*Table2:BL EUscoresforeachofthe differentsystemsonth eWebNLGtestset. 0.14
*indicatesthebeamsiz ewhichscoredhighesto ntherespectivevalida tionsets.boldindicat esthehighestscorings ystemforeachbeamsize . ※indicatesthebeamsize whoscoredhighestonth erespectivevalidatio nsets.boldindicatest hehighestscoringsyst emforeachbeamsize 0.06
英語(論文から抽出)日本語訳スコア
Figure1:Thepercentag eofbeamsthatcontaina refer-encesentenceaf tereachstepofbeamsea rch.Abeamsizeof10was usedtodecodethemodel proposedinDusekandJu rcıcek(2016).Resultsare fortheE2Eval-idation dataset.Theorangebar sindicatethenumberof completedreferencesw ithinthebeam.1Append ices1.1Falloutexperi mentwithlargerbeamsi zeSection1containsag raphwhichindicatesth estepatwhichtherefer encesentencesdropout ofthebeam(forabeamsi zeof3).Figure1indica testhesameresultsfor alargerbeamsizeof10. Thefigureindicatesthatthe numberofreferencesth atwerecontainedinthe finalbeamwashigherfora beamsize10.Fortheear lyiterationsofthedec odingthenumberofrefe rencesthatfelloutoft hebeamwasfarlowerfor abeamsizeof10.Alarge rbeamsizemeantthatth ebeamcontainedmorehy pothesesandsohasmore chancestomatchagains tareference.Howevert heshapeofthegraphsis verysimilar.Themajor ityofreferencesthatf elloutdidsorela-tive lyearlyintheprocess. 54%ofreferencesfello utbystep7,increasing to79%bystep9.Atstep2 1thefinallastreferencefell outofthebeamdespitet hefactthatthebeamcon tainedpartiallyrefer encesuptostep40.1.2P ointwisevsPairwisere rankersThispaperrequ iredamethodofranking completedhypothesesf romworsttobest.Durin gpreliminaryexperime ntsweimplementedrera nkersbasedonthePairw iseandPointwisestrat egiesfromtheIn-forma tionRetrievalfield.SeeSection3.2for moredetails.Figure2: Comparisonbetweenthe performanceofthepoin twiseandpairwiserank erswhenusedasreranke rsontheE2Evalidation set.Toevaluatetheper formanceofthediffere ntrankersweappliedea choftherankersasarer ankerofthefinalbeamofavanillabea msearchovertheE2Eval idationset.TheBLEUsc oresforeachoftherera nkerswerecalculatedf oreachbeamsize.There sultsareshowninFigur e2.Wecanseethatthere wasverylittlediffere nceinperformancefort hetwomethodsofrerank ingforthebeamsizesup to10.However,forbeam size30thepointwisere rankersignificantlyoutperformsthe pairwisereranker.The largerthebeamsizethe greaterthenumberofhy pothesesthatthereran kercanpickastopandhe ncethegreatertheimpa ctofthereranker.Thep ointwisererankerrequ iresO(k)runsoftherer ankertoproduceatotal ordering.Ontheotherh andtheCopelandmethod toproduceatotalorder -ingfromthepairwisec omparisonsrequiresO( k2)numberofpairwiseC omparisons.Thesefact orsleadustochoosethe Pointwiserankerovert hepairwiserankerfort heexperimentsinthere sultssection.1.3Nume ricalresultsThissect ionwillpresentthenum ericalresultsfortheE 2EandWebNLGdatasetss othattheycanbemorere adilycomparedinfutur eworks.Theresultsare giveninTable1andTabl e2.1.4Hyper-paramete rsThroughoutthispape ranumberofhyperpara- materswereintroduced .Thevaluesusedforeac hofthemodelsinthispa peraresumarisedbello w.arXiv:2102.02574v1 [cs.CL] 4 Feb 2021 Figure1:Thepercentag eofbeamsthatcontaina refer-encesentenceaf tereachstepofbeamsea rch.Abeamsizeof10was usedtodecodethemodel proposedinDusekandJu rcıcek(2016).Resultsare fortheE2Eval-idation dataset.Theorangebar sindicatethenumberof completedreferencesw ithinthebeam.1Append ices1.1Falloutexperi mentwithlargerbeamsi zeSection1containsag raphwhichindicatesth estepatwhichtherefer encesentencesdropout ofthebeam(forabeamsi zeof3).Figure1indica testhesameresultsfor alargerbeamsizeof10. Thefigureindicatesthatthe numberofreferencesth atwerecontainedinthe finalbeamwashigherfora beamsize10.Fortheear lyiterationsofthedec odingthenumberofrefe rencesthatfelloutoft hebeamwasfarlowerfor abeamsizeof10.Alarge rbeamsizemeantthatth ebeamcontainedmorehy pothesesandsohasmore chancestomatchagains tareference.Howevert heshapeofthegraphsis verysimilar.Themajor ityofreferencesthatf elloutdidsorela-tive lyearlyintheprocess. 54%ofreferencesfello utbystep7,increasing to79%bystep9.Atstep2 1thefinallastreferencefell outofthebeamdespitet hefactthatthebeamcon tainedpartiallyrefer encesuptostep40.1.2P ointwisevsPairwisere rankersThispaperrequ iredamethodofranking completedhypothesesf romworsttobest.Durin gpreliminaryexperime ntsweimplementedrera nkersbasedonthePairw iseandPointwisestrat egiesfromtheIn-forma tionRetrievalfield.SeeSection3.2for moredetails.Figure2: Comparisonbetweenthe performanceofthepoin twiseandpairwiserank erswhenusedasreranke rsontheE2Evalidation set.Toevaluatetheper formanceofthediffere ntrankersweappliedea choftherankersasarer ankerofthefinalbeamofavanillabea msearchovertheE2Eval idationset.TheBLEUsc oresforeachoftherera nkerswerecalculatedf oreachbeamsize.There sultsareshowninFigur e2.Wecanseethatthere wasverylittlediffere nceinperformancefort hetwomethodsofrerank ingforthebeamsizesup to10.However,forbeam size30thepointwisere rankersignificantlyoutperformsthe pairwisereranker.The largerthebeamsizethe greaterthenumberofhy pothesesthatthereran kercanpickastopandhe ncethegreatertheimpa ctofthereranker.Thep ointwisererankerrequ iresO(k)runsoftherer ankertoproduceatotal ordering.Ontheotherh andtheCopelandmethod toproduceatotalorder -ingfromthepairwisec omparisonsrequiresO( k2)numberofpairwiseC omparisons.Thesefact orsleadustochoosethe Pointwiserankerovert hepairwiserankerfort heexperimentsinthere sultssection.1.3Nume ricalresultsThissect ionwillpresentthenum ericalresultsfortheE 2EandWebNLGdatasetss othattheycanbemorere adilycomparedinfutur eworks.Theresultsare giveninTable1andTabl e2.1.4Hyper-paramete rsThroughoutthispape ranumberofhyperpara- materswereintroduced .Thevaluesusedforeac hofthemodelsinthispa peraresumarisedbello w.arXiv:2102.02574v1 [cs.CL] 4 Feb 2021 0.03
英語(論文から抽出)日本語訳スコア
BeamsizeVanillaReran kTGENLNLN+RerankBMLN+BM164.7264.7264.7264 .7264.7264.7264.7236 4.4764.9465.3365.936 5.2665.7366.06564.69 65.3665.47*66.4066.1 9*66.4066.271064.78* 65.67*65.5866.47*66. 1966.71*66.613063.65 65.2565.4465.5866.05 66.4066.25*Table1:BL EUscoresforeachofthe differentsystemsonth eE2Etestset.*indicat esthebeamsizewhichsc oredhighestontheresp ectivevalidationsets .boldindicatesthehig hestscoringsystemfor eachbeamsize.Beamsiz eVanillaRerankTGENLN LN+RerankBMLN+BM142.1442.1442.1442 .1442.1442.1442.1434 2.10*47.93*47.28*47. 0247.3748.3847.78541 .7748.1347.4147.4947 .54*47.92*48.391041. 3347.3346.5047.11*47 .7047.6648.213041.20 47.4246.6147.1847.81 47.4148.44*Table2:BL EUscoresforeachofthe differentsystemsonth eWebNLGtestset. BeamsizeVanillaReran kTGENLNLN+RerankBMLN+BM164.7264.7264.7264 .7264.7264.7264.7236 4.4764.9465.3365.936 5.2665.7366.06564.69 65.3665.47*66.4066.1 9*66.4066.271064.78* 65.67*65.5866.47*66. 1966.71*66.613063.65 65.2565.4465.5866.05 66.4066.25*Table1:BL EUscoresforeachofthe differentsystemsonth eE2Etestset.*indicat esthebeamsizewhichsc oredhighestontheresp ectivevalidationsets .boldindicatesthehig hestscoringsystemfor eachbeamsize.Beamsiz eVanillaRerankTGENLN LN+RerankBMLN+BM142.1442.1442.1442 .1442.1442.1442.1434 2.10*47.93*47.28*47. 0247.3748.3847.78541 .7748.1347.4147.4947 .54*47.92*48.391041. 3347.3346.5047.11*47 .7047.6648.213041.20 47.4246.6147.1847.81 47.4148.44*Table2:BL EUscoresforeachofthe differentsystemsonth eWebNLGtestset. 0.14
*indicatesthebeamsiz ewhichscoredhighesto ntherespectivevalida tionsets.boldindicat esthehighestscorings ystemforeachbeamsize .Notethatthesearchfo rthesevalueswasfarfr omexhaustivesotherei sagoodchancethatther esultsofthispapercou ldbeimproveduponthro ughabetteroptimisati onprocedure.InSectio n3.3,thebeamisspliti ntotwosectionsbottom andrest.Forallbeamma nipulationmodelstheb ottomofthebeamwasset tothebottom(ielowest scoring)quarterofthe beam.Wealsosayalarge beamisusedtogenerate thedatafortrainingth ebeam.Forallexperime ntsinthispaperweusea beamsizeof50.Akeyhyp erparameterforperfor manceofthein-crement albeammanipulationwa sthestepsofthebeamse archatwhichthebeamwa smanipulated.Thishyp erparametervariedfor the4seperatebeammani pulationmodels.Theva luesaresummarisedasf ollows:•E2E-IncrementalBeamM anipulationontopofva nillabeamsearch:5,10 ,15,20andfinal.•E2E-IncrementalBeamM anipulationontopofle ngthnormalisedbeamse arch:5,7and10.•WebNLG-IncrementalBe amManipulationontopo fvanillabeamsearch:4 ,12andfinal.•WebNLG-IncrementalBe amManipulationontopo flengthnormalisedbea msearch:5and12.Itisw orthnotingthatmanipu latingthefinalstepisthesameasre rankingthebeamaccord ingtotherankerusedin beammanipulation(i.e .norolloutsareperfor med). *indicatesthebeamsiz ewhichscoredhighesto ntherespectivevalida tionsets.boldindicat esthehighestscorings ystemforeachbeamsize .Notethatthesearchfo rthesevalueswasfarfr omexhaustivesotherei sagoodchancethatther esultsofthispapercou ldbeimproveduponthro ughabetteroptimisati onprocedure.InSectio n3.3,thebeamisspliti ntotwosectionsbottom andrest.Forallbeamma nipulationmodelstheb ottomofthebeamwasset tothebottom(ielowest scoring)quarterofthe beam.Wealsosayalarge beamisusedtogenerate thedatafortrainingth ebeam.Forallexperime ntsinthispaperweusea beamsizeof50.Akeyhyp erparameterforperfor manceofthein-crement albeammanipulationwa sthestepsofthebeamse archatwhichthebeamwa smanipulated.Thishyp erparametervariedfor the4seperatebeammani pulationmodels.Theva luesaresummarisedasf ollows:•E2E-IncrementalBeamM anipulationontopofva nillabeamsearch:5,10 ,15,20andfinal.•E2E-IncrementalBeamM anipulationontopofle ngthnormalisedbeamse arch:5,7and10.•WebNLG-IncrementalBe amManipulationontopo fvanillabeamsearch:4 ,12andfinal.•WebNLG-IncrementalBe amManipulationontopo flengthnormalisedbea msearch:5and12.Itisw orthnotingthatmanipu latingthefinalstepisthesameasre rankingthebeamaccord ingtotherankerusedin beammanipulation(i.e .norolloutsareperfor med). 0.03
                               ページの最初に戻る

翻訳にはFugu-Machine Translatorを利用しています。