論文の概要、ライセンス

# (参考訳) S3E-GNN: カメラ再局在のためのグラフニューラルネットワークを用いた空間空間埋め込み [全文訳有]

S3E-GNN: Sparse Spatial Scene Embedding with Graph Neural Networks for Camera Relocalization ( http://arxiv.org/abs/2205.05861v1 )

ライセンス: CC BY 4.0
Ran Cheng, Xinyu Jiang, Yuan Chen, Lige Liu, Tao Sun(参考訳) カメラ再ローカライゼーションは、同時ローカライゼーションとマッピング(SLAM)システムの鍵となるコンポーネントである。 本稿では,グラフニューラルネットワークを用いたS3E-GNN(Sparse Spatial Scene Embedding with Graph Neural Networks)という学習に基づくアプローチを提案する。 S3E-GNNは2つのモジュールから構成される。 符号化モジュールでは、訓練されたS3EネットワークがRGB画像を埋め込みコードにエンコードし、空間的および意味的な埋め込みコードを暗黙的に表現する。 スラムシステムから得られる埋め込み符号と関連するポーズにより、各画像はポーズグラフ内のグラフノードとして表現される。 gnnクエリモジュールでは、ポーズグラフをカメラ再局在化のための埋め込み集約参照グラフに変換する。 課題のある環境で様々なシーンデータセットを収集し,実験を行う。 S3E-GNN法は,学習型埋め込みとGNNによるシーンマッチング機構により,従来のBag-of-words(BoW)よりも優れていた。

Camera relocalization is the key component of simultaneous localization and mapping (SLAM) systems. This paper proposes a learning-based approach, named Sparse Spatial Scene Embedding with Graph Neural Networks (S3E-GNN), as an end-to-end framework for efficient and robust camera relocalization. S3E-GNN consists of two modules. In the encoding module, a trained S3E network encodes RGB images into embedding codes to implicitly represent spatial and semantic embedding code. With embedding codes and the associated poses obtained from a SLAM system, each image is represented as a graph node in a pose graph. In the GNN query module, the pose graph is transformed to form a embedding-aggregated reference graph for camera relocalization. We collect various scene datasets in the challenging environments to perform experiments. Our results demonstrate that S3E-GNN method outperforms the traditional Bag-of-words (BoW) for camera relocalization due to learning-based embedding and GNN powered scene matching mechanism.
公開日: Thu, 12 May 2022 03:21:45 GMT

※ 翻訳結果を表に示しています。PDFがオリジナルの論文です。翻訳結果のライセンスはCC BY-SA 4.0です。詳細はトップページをご参照ください。

翻訳結果

    Page: /      
英語(論文から抽出)日本語訳スコア
S3E-GNN:SparseSpatia lSceneEmbeddingwithG raphNeuralNetworksfo rCameraRelocalizatio nRanCheng1‡∗,XinyuJiang1‡,YuanChen1,LigeLiu1, 2,TaoSun1,2∗Abstract—Camerarelocalization isthekeycomponentofs imultaneouslocalizat ionandmapping(SLAM)s ystems.Thispaperprop osesalearning-baseda pproach,namedSparseS patialSceneEmbedding withGraphNeuralNetwo rks(S3E-GNN),asanend -to-endframeworkfore fficientandrobustcamera relocalization.S3E-G NNconsistsoftwomodul es.Intheencodingmodu le,atrainedS3Enetwor kencodesRGBimagesint oembeddingcodestoimp licitlyrepresentspat ialandsemanticembedd ingcode.Withembeddin gcodesandtheassociat edposesobtainedfroma SLAMsystem,eachimage isrepresentedasagrap hnodeinaposegraph.In theGNNquerymodule,th eposegraphistransfor medtoformaembedding- aggregatedreferenceg raphforcamerarelocal ization.Wecollectvar iousscenedatasetsint hechallengingenviron mentstoperformexperi ments.Ourresultsdemo nstratethatS3E-GNNme thodoutperformsthetr aditionalBag-of-word s(BoW)forcamerareloc alizationduetolearni ng-basedembeddingand GNNpoweredscenematch ingmechanism.I. S3E-GNN:SparseSpatia lSceneEmbeddingwithG raphNeuralNetworksfo rCameraRelocalizatio nRanCheng1‡∗,XinyuJiang1‡,YuanChen1,LigeLiu1, 2,TaoSun1,2∗Abstract—Camerarelocalization isthekeycomponentofs imultaneouslocalizat ionandmapping(SLAM)s ystems.Thispaperprop osesalearning-baseda pproach,namedSparseS patialSceneEmbedding withGraphNeuralNetwo rks(S3E-GNN),asanend -to-endframeworkfore fficientandrobustcamera relocalization.S3E-G NNconsistsoftwomodul es.Intheencodingmodu le,atrainedS3Enetwor kencodesRGBimagesint oembeddingcodestoimp licitlyrepresentspat ialandsemanticembedd ingcode.Withembeddin gcodesandtheassociat edposesobtainedfroma SLAMsystem,eachimage isrepresentedasagrap hnodeinaposegraph.In theGNNquerymodule,th eposegraphistransfor medtoformaembedding- aggregatedreferenceg raphforcamerarelocal ization.Wecollectvar iousscenedatasetsint hechallengingenviron mentstoperformexperi ments.Ourresultsdemo nstratethatS3E-GNNme thodoutperformsthetr aditionalBag-of-word s(BoW)forcamerareloc alizationduetolearni ng-basedembeddingand GNNpoweredscenematch ingmechanism.I. 0.04
INTRODUCTIONSLAMisth ecoretechnologyformo bilerobotstomaptheir environmentsandlocal izethemselvesfornavi gationplanning.Ithas beenintensivelystudi edintheroboticscommu nityinthepasttwodeca des[1],[2],[3],[4],[5],[6]. Ithasbeenintensively studiedintherobotics community inthepasttwodecades[1],[2],[3],[4],[5],[6] 0.27
Camerarelocalization playsanimportantrole inSLAMsystems.Itreco versthecameraposewhe ntherobotisinthestat eoftrackinglost.Itca nalsobeappliedinthel oopclosuredetectiont ocorrecttheaccumulat eddriftinthetrajecto rytoobtainaconsisten tmap.Traditionalcame rarelocalizationmeth odsrelyonhandcrafted featuressuchasSIFT[7],SURF[8],ORB[9]andetc. Traditionalcamerarel ocalizationmethodsre lyonhandcraftedfeatu ressuchasSIFT[7],SURF[8],ORB[9]andetc 0.11
,andthebag-of-words( BoW)matchingapproach es[10],[11],[12]. そして、bow)matchingapproach es[10],[11],[12]。 0.40
Duetothepoint-basedc haracteristics,theha ndcraftedfeaturematc hingprocessmaynotber obustunderchallengin genvironmentswithcon ditionsoflowtexture, repetitivepatterns,i lluminationvariation .Inrecentyears,deepl earning-basedmethods havebecomepopularwit houttheneedofextract ingandmatchingthehan dcraftedfeatures.Ken dalletal. duetothepoint-basedc haracteristics,theha ndcraftedfeaturematc hingprocessmaynotber obustunderchallengen vironmentswithcondit ionsoflowtexture,rep etitivepatterns,illu minationvariation.in recentyears,deeplear ning-basedmethodshav ebecomepopularwithou ttheneedextractingan dmatchingthehandcraf tedfeatures.kendalle tal 0.04
[13]proposedPoseNet,whic h1arewithLab2030,Mid eaRobozoneInc,Shangh ai,201704,China. 13]PoseNet, which1arewithLab2030 ,MideaRobozoneInc,Sh anghai,201704, China. 0.42
((e-mail:chengran1,j iangxy77,chenyuan,li ulg12,tsun@midea.com ). (電子メール:chengran1,jiangxy77 ,chenyuan,liulg12,ts un@midea.com) 0.86
)2arewithSchoolofEng ineering,Massachuset tsInstituteofTechnol ogy,Cambridge,MA,021 39,USA. 2arewithSchoolofEngi neering,Massachusett sInstituteofTechnolo gy,Cambridge,MA,0213 9,USA。 0.27
(e-mail:xtllg,taosun @mit.edu)*Correspond ingauthors. (e-mail:xtllg,taosun @mit.edu)*correspond ingauthors。 0.42
‡Indicatesequalcontri bution.canregress6-D OFcameraposequicklyi nanend-to-endfashion .However,thereareiss uesofover-fittingandlackofgenera lizationoutsidetrain ingdatasets[14]. indicatesequalcontri bution.canregress6-d ofcameraposequicklyi nanend-to-endfashion .しかし、exitsofover-fittinga ndlackofgeneralizati onoutsidetrainingdat asets[14]がある。 0.12
Toovercometheselimit ations,researchersco mbinedpartialgeometr icconstraintswithdee plearning-basedmetho ds,suchas[15],[16],[17]. toovercometheselimit ations,researchersco mbinedpartialgeometr icconstraints with deeplearning-basedme thods,suchas[15],[16],[17]。 0.25
Thesealgorithmsachie vestate-of-the-artpe rformanceinexistingi ndoorandoutdoordatas ets,buttheyonlyregre sstheinitialcamerapo seestimationandrelyo ntheresultsofsequenc eestimation.Ifthepre viousestimationisnot accurate,errorswillb eaccumulated,resulti nginasystemcrash.Toa ddresstheaboveissues ,inthiswork,wepropos eanend-to-endlearnin gframeworknamedSpars eSpatialSceneEmbeddi ngwithGraphNeuralNet works(S3E-GNN)forefficientandrobustcamera relocalization,assho wninFig.1.TheS3E-GNN pipelinehastwomodule s.Intheencodingmodul e,theS3Enetworkencod esimagesintoembeddin gcodesandgenerateapo segraphwiththeestima tedposes.Inthegraphq uerymodule,aGNNtrans formstheposegraphtoa referencegraphwheret hefeaturesoftheposeg raphnodesareaggregat ed.Torelocatethecame raposeofaqueryimage, wedevelopanefficientgraphnodequeryi ngalgorithmbasedonca lculatingtheinversee ntropymultiplication betweentheembeddingo fthequeryimageandthe referencegraph.Tosum marize,themaincontri butionsofthispaperar e:•Alearning-basedembed dingnetwork,S3E,enco dessensormeasurement s,asgraphnodes. Thesealgorithmsachie vestate-of-the-artpe rformanceinexistingi ndoorandoutdoordatas ets,buttheyonlyregre sstheinitialcamerapo seestimationandrelyo ntheresultsofsequenc eestimation.Ifthepre viousestimationisnot accurate,errorswillb eaccumulated,resulti nginasystemcrash.Toa ddresstheaboveissues ,inthiswork,wepropos eanend-to-endlearnin gframeworknamedSpars eSpatialSceneEmbeddi ngwithGraphNeuralNet works(S3E-GNN)forefficientandrobustcamera relocalization,assho wninFig.1.TheS3E-GNN pipelinehastwomodule s.Intheencodingmodul e,theS3Enetworkencod esimagesintoembeddin gcodesandgenerateapo segraphwiththeestima tedposes.Inthegraphq uerymodule,aGNNtrans formstheposegraphtoa referencegraphwheret hefeaturesoftheposeg raphnodesareaggregat ed.Torelocatethecame raposeofaqueryimage, wedevelopanefficientgraphnodequeryi ngalgorithmbasedonca lculatingtheinversee ntropymultiplication betweentheembeddingo fthequeryimageandthe referencegraph.Tosum marize,themaincontri butionsofthispaperar e:•Alearning-basedembed dingnetwork,S3E,enco dessensormeasurement s,asgraphnodes. 0.04
•AGNNtotransformapose graphintoareferenceg raphwithaggregatedfe aturesforeachposenod e. •AGNNtotransformapose graphintoareferenceg raphwithaggregatedfe aturesforeachposenod e。 0.03
•Afastnodequeryalgori thmtolocatethecloses tposenodeintherefere ncegraphtothequeryim age.Tothebestofourkn owledge,thisisthefirstworktocombinealea rningnetworkforimage embeddingandaGNNforp osequeryingtoperform camerarelocalization .Ourmethoddoesnotnee dbuildatreestructure [10]orusethenearestneigh bors[13],[18]tosearchtheBoW.Inadd ition,weuseaGNNquery methodratherthantheC NN-basedposeregressi onmethods,[13],[19],[20]toimprovetherobustne ssoftheSLAMsysteminc hallengingenvironmen ts.II.RELATEDWORKSA. Handcraftedfeature-b asedmethodsIn2Dvisio n,animageisrepresent edbytheBoWmethod.The cameraposeofthequery imagecanarXiv:2205.0 5861v1 [cs.CV] 12 May 2022 •Afastnodequeryalgori thmtolocatethecloses tposenodeintherefere ncegraphtothequeryim age.Tothebestofourkn owledge,thisisthefirstworktocombinealea rningnetworkforimage embeddingandaGNNforp osequeryingtoperform camerarelocalization .Ourmethoddoesnotnee dbuildatreestructure [10]orusethenearestneigh bors[13],[18]tosearchtheBoW.Inadd ition,weuseaGNNquery methodratherthantheC NN-basedposeregressi onmethods,[13],[19],[20]toimprovetherobustne ssoftheSLAMsysteminc hallengingenvironmen ts.II.RELATEDWORKSA. Handcraftedfeature-b asedmethodsIn2Dvisio n,animageisrepresent edbytheBoWmethod.The cameraposeofthequery imagecanarXiv:2205.0 5861v1 [cs.CV] 12 May 2022 0.08
英語(論文から抽出)日本語訳スコア
Reference Pose GraphEdge FeaturesNode Featuresencoderembed ding codesMap keyframesencoderquer y framesquery embedding codesSparse Spatial EncoderBuild GraphEmbedding Pose GraphGraph Neural NetworkAggregated Reference Graph[MxK][Ex2K][MxK][NxK][MxK][MxHxWxC][NxK][KxM][NxM]argmaxquery result index[Nx1]conv1dconv1dKeyframe s embedTransposed Map Embedding Matrixquery embedding matrixS3E Encoding ModuleGNN Graph Query ModuleAssociate Map GraphFig.1:S3E-GNNpi pelineforcamerareloc alization.beestimate dbymatchingitwiththe mostsimilarimageinth eimagedatabase.In3Dv ision,a3Dscenemapcom posedofpointcloudsca nbegeneratedfromtheS tructurefromMotion(S fM)[21]withRGB-Dimages.Relo calizing2Dimagesinth e3Dmapcanbeachievedb yfindingthe2D-3Dcorresp ondencesbetween2Dima gepixelsand3Dpoints. Shottonetal. Reference Pose GraphEdge FeaturesNode Featuresencoderembed ding codesMap keyframesencoderquer y framesquery embedding codesSparse Spatial EncoderBuild GraphEmbedding Pose GraphGraph Neural NetworkAggregated Reference Graph[MxK][Ex2K][MxK][NxK][MxK][MxHxWxC][NxK][KxM][NxM]argmaxquery result index[Nx1]conv1dconv1dKeyframe s embedTransposed Map Embedding Matrixquery embedding matrixS3E Encoding ModuleGNN Graph Query ModuleAssociate Map GraphFig.1:S3E-GNNpi pelineforcamerareloc alization.beestimate dbymatchingitwiththe mostsimilarimageinth eimagedatabase.In3Dv ision,a3Dscenemapcom posedofpointcloudsca nbegeneratedfromtheS tructurefromMotion(S fM)[21]withRGB-Dimages.Relo calizing2Dimagesinth e3Dmapcanbeachievedb yfindingthe2D-3Dcorresp ondencesbetween2Dima gepixelsand3Dpoints. Shottonetal.
訳抜け防止モード: Reference Pose GraphEdge FeaturesNode Featuresencoderembed ding codesMap keyframesencoderquer y framesquery embedding codesSparse Spatial EncoderBuild GraphEmbedding Pose GraphGraph Neural NetworkAggregated Reference Graph[MxK][Ex2K][MxK][NxK][MxK][MxHxWxC][NxK][KxM][NxM]argmaxquery result index[Nx1]conv1dconv1dKeyframe s embedTransposed Map Embedding Matrixquery embedding matrixS3E Encoding ModuleGNN Graph Query ModuleAssociate Map GraphFig.1 : S3E - GNNpipelineforcamera relocalization.beest imatedbymatchingitwi ththemostsimilarimag eintheimagedatabase . In3Dvision, a3Dscenemapcomposedo fpointcloudscanbe generated from theStructurefromMoti on(SfM)[21]withRGB - Dimages . Relocalizing2Dimages inthe3Dmapcanbeachie vedbyfindingthe2D-3D cor correspondingencesbe tween2Dimagepixelsan d3Dpoints . Shottonetal
0.26
[22]developedscenecoordi nateregressionforest stopredictthecorresp ondencesbetweenimage pixelsand3Dpointsint he3Dworldframe.Recen tly,Nadeemetal. 22] developingscenecoord inateregressionfores tstopredictthecorres pondences betweenimagepixelsan d3dpointsinthe3dworl dframe.recently,nade emetal 0.07
[23]trainedaDescriptor-M atchertodirectlyfindthecorrespondences betweenthe2Ddescript orsintheRGBqueryimag esandthe3Ddescriptor sinthedense3Dpointcl oudstorelocatethepos eofthe6-DOFcamera.B. 関連スポンサーコンテンツ [23]trainedaDescriptor-M atchertodirectlyfind thecorssociatedences betweenthe2Ddescript orsintheRGBqueryimag esandthe3Ddescriptor sinthedense3Dpointcl oudstorelocatethepos eofthe6-DOFcamera.B. 0.09
Learningfeature-base dmethodGaoandZhang[24]firstusedthestackedden oisingauto-encoderto automaticallylearnco mpressedrepresentati onsfromrawimagesinan unsupervisedmanner.A lthoughnomanualdesig nofvisualfeaturesisr equired,thenetwork’seffectivenessdependshea vilyonthetuningofhyp erparameters.Kendall etal. Learningfeature-base dmethodGaoandZhang[24]firstusedthestackedd enoisingauto-encoder toautomaticallylearn compressedrepresenta tions fromrawimagesinanuns upervisedmanner.Al Thoughnomanualdesign ofvisualfeaturesisre quired,thenetwork&#x 27;s Effectivenessdepends heavilyontheningofhy perparameters.Kendal letal 0.05
[13]proposedPoseNettoreg ress6-DOFcamerapose. Althoughtheimageembe ddingsfromCNNscontai nbothhighlevelinform ation,theposeinforma tion,whichissignificantfordataassociati on,isnotstored.GNNsb enefitfromtheirgraphstruc ture,whichcanpassinf ormationbetweennodes throughedges,makingt hemmoresuitableforha ndlingrelationaldata .Recently,theuseofGN Nsinclose-loopdetect ion[25],[26],camerarelocalizatio n[20],[27],etc.hasbeendevelopi ngrapidly.Forcamerar elocalization,Xueeta l. 13]proposedposenettoreg ress6-dofcamerapose. althoughtheimageembe ddingsfromcnnscontai nbothhighlevelinform ation,theposeinforma tion,whoissignifican tfordataassociation, isnotstored.gnnsbene fitfromtheirgraphstr ucture,whocanpassinf ormation betweennodesthroughe dges,makingthemmores uitableforhandlingre lationaldata.recentl y,theuseofgnnsinclos e-loopdetection[25],[26],camerarelocalizatio n[20],[27],etc.hasbeendevelopi ngrapidly.xueetal 0.16
[20]extractedimagefeatur esusingCNN,andusedth eextractedfeaturemap asgraphnodestobuildt hegraphbyjointGNNand CNNiterations.Elmoog yetal. [20] Extractedimagefeatur esusingCNN,andusedth etractedfeaturemapas graphnodestobuildbyj ointGNNandCNNiterati ons.Elmoogyetal. 0.07
[27]usedResNet50forfeatu reextractionoftheinp utimageandusedthevec torsflattenedfromtheextrac tedfeaturemapsasgrap hthenobtainedthecame raposebyGNN.III.METH ODSOurS3E-GNNframewo rkiscomposedoftwomod ules:theS3Eencodingm oduleandtheGNNquerym odule.A. usedResNet50forfeatu re Extractionoftheinput imageandusedthevecto rsflattened fromthetractedfeatur emapsasgraphthenobta inedthecameraposebyG NN.III.METHODSOurS3E -GNNframeworkiscompo sedoftwomodules:theS 3Eencodingmoduleandt heGNNquerymodule.A 0.04
S3EEncodingModuleThe S3Emoduleencodesthea nimageintoitsembeddi ngcode.Forencodingefficiency,wedidnotdirec tlyusetheentireimage astheinput.Instead,w efirstlyuseVINS-RGBD[28]SLAMsystemtorandomly select128ORBfeaturep ointsintheimage.Then ,wesliceda16x16patch aroundeachfeaturepoi nt.Thetotal128pieces ofpatchestogetherwit htheirpixelcoordinat es,whichcontainspati alinformation,asther epresentationoftheor iginalimagearefedint otheencoder.Wedevelo ptwotypesofS3Eencode rbackbones.OneusedVi sualTransformer(ViT) [29]andtheother,asshowni nFig.2,isbasedonspar seconvolution(Sparse Conv)[30]. S3EEncodingModuleThe S3Emoduleencodesthea nimageintoitsembeddi ngcode.Forencodingefficiency,wedidnotdirec tlyusetheentireimage astheinput.Instead,w efirstlyuseVINS-RGBD[28]SLAMsystemtorandomly select128ORBfeaturep ointsintheimage.Then ,wesliceda16x16patch aroundeachfeaturepoi nt.Thetotal128pieces ofpatchestogetherwit htheirpixelcoordinat es,whichcontainspati alinformation,asther epresentationoftheor iginalimagearefedint otheencoder.Wedevelo ptwotypesofS3Eencode rbackbones.OneusedVi sualTransformer(ViT) [29]andtheother,asshowni nFig.2,isbasedonspar seconvolution(Sparse Conv)[30]. 0.12
SparseConvencodercan takearbitrarydatasiz e,whileViTencoderreq uiresafixednumberofpatches.A sshowninFig.2,theRGB patchestakenfromthei magearestackedandfed intoSparseConvencode rbuiltfromMinkowskiE ngine[30]. AsshowninFig.2,theRG Bpatchestaken fromtheimagearestack edandfedintoSparseCo nvencoder builtfromMinkowskiEn gine[30] 0.14
TheinputRGBpatchesan dthecorresponding2Dp ixelcoordinatesareco nvertedintoasparsete nsor.Notethatthefeat urepointsontheimagei stoosparsesothatthes parseconvolutionkern elishardtocapturenei ghborhood,thereforew edown-samplethetrack edfeaturepoints’pixelcoordinatesbypa tchsize.Theencoderco nsistsofthreesparsec onvolutions(withbatc hnormandReLU)followe dbyaglobalaveragepoo linglayertoaggregate allthepatchinformati onintoasingleembeddi ngcodetorepresentthe image.Next,webuildth eembeddingdatasettot raintheencoder.Thegr oundtruthdatafortrai ningisgeneratedbyfol lowingthebelowsteps, showninFig.3:1)Estim atecamerakeyframepos egraphusingVINS-RGBD [28]. TheinputRGBpatchesan dthecorresponding2Dp ixelcoordinatesareco nvertedintoasparsete nsor.Notethatthefeat urepointsontheimagei stoosparsesothatthes parseconvolutionkern elishardtocapturenei ghborhood,thereforew edown-samplethetrack edfeaturepoints’pixelcoordinatesbypa tchsize.Theencoderco nsistsofthreesparsec onvolutions(withbatc hnormandReLU)followe dbyaglobalaveragepoo linglayertoaggregate allthepatchinformati onintoasingleembeddi ngcodetorepresentthe image.Next,webuildth eembeddingdatasettot raintheencoder.Thegr oundtruthdatafortrai ningisgeneratedbyfol lowingthebelowsteps, showninFig.3:1)Estim atecamerakeyframepos egraphusingVINS-RGBD [28]. 0.04
Theestimatedcamerapo sesformaposegraph,in whicheachnodeisakeyf rame. Theestimatedcamerapo sesformaposegraph, inwhicheachnodeisake yframe 0.15
英語(論文から抽出)日本語訳スコア
ME.Conv(in=16x16x3, out=128, k=3)...RGB patch [Nx16x16x3]Embedding code [1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Re LUME.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUSAGEConv(in=128, hidden=64, mean)ReLUSAGEConv(in =64, hidden=32, mean)ReLUAggregated Node Features[MxK]Embedded Graph [MxK]sigmoidME.sigmoidRef erence loopQuery loop𝑝𝑖𝑝𝑗𝑝𝑖𝑝𝑗Querysimilarity regionReference similarity regionMax similarity regionFig.2:Encoderb ackbonenetworkbasedo nSparseConvolution,M EmeansMinkowskiEngin e[30]. ME.Conv(in=16x16x3, out=128, k=3)...RGB patch [Nx16x16x3]Embedding code [1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Re LUME.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUSAGEConv(in=128, hidden=64, mean)ReLUSAGEConv(in =64, hidden=32, mean)ReLUAggregated Node Features[MxK]Embedded Graph [MxK]sigmoidME.sigmoidRef erence loopQuery loop𝑝𝑖𝑝𝑗𝑝𝑖𝑝𝑗Querysimilarity regionReference similarity regionMax similarity regionFig.2:Encoderb ackbonenetworkbasedo nSparseConvolution,M EmeansMinkowskiEngin e[30].
訳抜け防止モード: ME.Conv(in=16x16x3, out=128, k=3) ... RGB patch [ Nx16x16x3]埋め込みコード [ 1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Co nv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, Mx2K][MxK]ReLUSAGEConv(in=128, hidden=64, mean)ReLUSAGEConv(in =64, hidden=32, mean)ReLUAggregated Node Features[MxK]Embedded Graph [MxK]sigmoidME.sigmoidRef erence loopQuery looppipjpipjQuerysim ilarity regionReference similarity regionMax similarity regionFig.2 : Encoderbackbonenetwo rk basedonSparseConvolu tion, MemeansMinkowskiEngi ne[30 ]
0.34
Datasetglobal map and pose graphFully Connected Pose GraphSimilarity Matrix𝑝𝑖𝑝𝑗𝑖𝑗calculate reprojection IoUSimilarity score: reprojection IoUof two framescamera frame of 𝑝𝑗camera frame of 𝑝𝑖Fig.3:Embeddingdatas etbuildpipeline.2)Bu ildthefullyconnected graphfromthekeyframe nodesandcalculatethe reprojectionIntersec tionoverUnion(IoU)fo reverypairofthekeyfr ames.3)Establishthes imilaritymatrixasthe groundtruthdatafortr ainingtheencoder.enc oderencodercatsimila rity score0.75FCregressor Shared12embedding codeembedding code1212[R|T]Fig.4:ReprojectionIo Ucalculationillustra tion.s(pi,pj)=Pn∈Nπ(Kξjξ−1i(K−1(xn,zn)))N(1)Tocalc ulatetheIoUbetweentw okeyframes:thesource andthetarget,wecount thenumberofprojected pixelsthatarefromthe sourcekeyframeinside thetargetkeyframeand divideitbythekeyfram eresolutiontoobtaint henormalizedsimilari tyscore.Fig.4andEqua tion.1describedthere projectionIoUcalcula tionprocess.Function πisthevalidationfunct ionthatdetectsifthei nputcoordinateisinsi dethetargetkeyframe. Datasetglobal map and pose graphFully Connected Pose GraphSimilarity Matrix𝑝𝑖𝑝𝑗𝑖𝑗calculate reprojection IoUSimilarity score: reprojection IoUof two framescamera frame of 𝑝𝑗camera frame of 𝑝𝑖Fig.3:Embeddingdatas etbuildpipeline.2)Bu ildthefullyconnected graphfromthekeyframe nodesandcalculatethe reprojectionIntersec tionoverUnion(IoU)fo reverypairofthekeyfr ames.3)Establishthes imilaritymatrixasthe groundtruthdatafortr ainingtheencoder.enc oderencodercatsimila rity score0.75FCregressor Shared12embedding codeembedding code1212[R|T]Fig.4:ReprojectionIo Ucalculationillustra tion.s(pi,pj)=Pn∈Nπ(Kξjξ−1i(K−1(xn,zn)))N(1)Tocalc ulatetheIoUbetweentw okeyframes:thesource andthetarget,wecount thenumberofprojected pixelsthatarefromthe sourcekeyframeinside thetargetkeyframeand divideitbythekeyfram eresolutiontoobtaint henormalizedsimilari tyscore.Fig.4andEqua tion.1describedthere projectionIoUcalcula tionprocess.Function πisthevalidationfunct ionthatdetectsifthei nputcoordinateisinsi dethetargetkeyframe.
訳抜け防止モード: Datasetglobal map and pose graphFully Connected Pose GraphSimilarity Matrix𝑝𝑖𝑝𝑗𝑖𝑗calculate reprojection IoUSimilarity score : reprojection IoUof two framescamera frame of 𝑝𝑗camera frame of 𝑝𝑖Fig.3 : Embeddingdatasetbuil dpipeline.2)Buildthe fullyconnectedgraphf romthekeyframenodesa ndcalculatethereproj ectionIntersectionov erUnion(IoU)forevery pairofthekeyframes.3 )Establishthesimilar itymatrixastheground truthdatafortraining theencoder.encoderen codercatsimilarity score0.75FCregressor Shared12embedding codeembedding code1212[R|T]Fig.4 : ReprojectionIoUcalcu lationillustration.s (pi, pj)=pnhtmlnπ(k)-1i(k−1(xn, zn))n(1)tocalculatet heiou betweentwokeyframes : thesourceandthetarge t, wecountthenumberofpr ojectedpixelsthatare fromthesourcekeyfram einsidethetargetkeyf rameanddivideitbythe keyframe resolutiontoobtainth enormalized similarityscore .fig.4andequation.1d escribedthereproject ionioucalculationpro cess . functionπisthevalidationfunct iondetectsiftheinput coordinateisinsideth etargetkeyframe
0.10
ξisthecamerapose,Nisi mageresolution,Kisth ecameraintrinsics.xn isnthpixelcoordinate .znisthedepthtakenfr omdepthmap.Sincether eprojectionIoUimplic itlyincorporatesthec ameraintrinsicsandde pthinformation,theen codercanlearngeometr icinformation.Thispr opertyadvantagesther obustnessoftheGNNque ryinthenextsection.T hesimilarityscoresof everypairofkeyframes formthesimilaritymat rixasshowninFig.3,wh ichquantitativelyeva luatesthesimilaritya crosskeyframes.Witht hesimilaritymatrixfr omtheIoUcalculationa sthegroundtruthdata, wecantraintheencoder .Fig.5showsthattheen coderconvertstheinpu tRGBpatchesoftwokeyf ramesintotheirlatent embeddings.Thecosine similarityisappliedt othetwolatentembeddi ngstooutputthesimila rityscorefromtheenco der.Thecostfunctionf orbackpropagationisd efinedastheshrinkagelos s[31]betweenthesimilarity scoresfromtheencoder outputandthegroundtr uthdata.Aftertheloss converges,thetrained encodercanbeappliedt oanewquerykeyframeto generateitsembedding code.... ξisthecamerapose,Nisi mageresolution,Kisth ecameraintrinsics.xn isnthpixelcoordinate .znisthedepthtakenfr omdepthmap.Sincether eprojectionIoUimplic itlyincorporatesthec ameraintrinsicsandde pthinformation,theen codercanlearngeometr icinformation.Thispr opertyadvantagesther obustnessoftheGNNque ryinthenextsection.T hesimilarityscoresof everypairofkeyframes formthesimilaritymat rixasshowninFig.3,wh ichquantitativelyeva luatesthesimilaritya crosskeyframes.Witht hesimilaritymatrixfr omtheIoUcalculationa sthegroundtruthdata, wecantraintheencoder .Fig.5showsthattheen coderconvertstheinpu tRGBpatchesoftwokeyf ramesintotheirlatent embeddings.Thecosine similarityisappliedt othetwolatentembeddi ngstooutputthesimila rityscorefromtheenco der.Thecostfunctionf orbackpropagationisd efinedastheshrinkagelos s[31]betweenthesimilarity scoresfromtheencoder outputandthegroundtr uthdata.Aftertheloss converges,thetrained encodercanbeappliedt oanewquerykeyframeto generateitsembedding code.... 0.02
RGB patch [128x16x16x3]MLPTransformer Encoder...position code [128x3]map framescene embedding[1x128]𝜃=𝑐𝑜𝑠(𝜑𝑟,𝜑𝑡)𝜑𝑟similarity score......position code [128x3]RGB patch [128x16x16x3]query framescene embedding[1x128]𝜑𝑡Fig.5:Embeddingnetwo rktrainingpipeline.B . RGB patch [128x16x16x3]MLP Transformer Encoder...position code [128x3]map framescene embedding[1x128]θ=cos(φr,φt)φrsimilarity score...position code [128x3]RGB patch [128x16x16x3]query framescene embedding[1x128]φtFig.5:Embeddingnetw orktrainingpipeline. B 0.32
GNNQueryModuleTheGNN querymoduletransform stheembeddingposegra phintoafeatureaggreg atedreferencegraph,w hichisusedforqueryin gasub-graphcomposedo fsuccessivekeyframes tofindthemostmatchingnod esforcamerarelocaliz ation.Oncewehavethee mbeddingcodeofeveryk eyframe,weconcatenat etheembeddingsoftwon odesconnectedbyanedg easanedgeembedding.T henodeandedgeembeddi ngsareusedtobuildane mbeddingposegraphtof eedintoaGNN(asshowni nFig.6a),whichiscomp risedofthreelayersof SAGEConvlayers[32]followedbyReLUandfinallyisclippedwithaS igmoidfunction.Totra intheGNN,weusethesim ilaritymatrixfromthe reprojectionIoUasthe groundtruthlabelfort heGNNqueryresults.Fi rstwetaketworunsoflo optrajectoriesinthes amescene.Oneloopisus edasthereferenceloop .Theotherisusedasast hequeryloop.Asshowni nFig.6bThegroundtrut hmatchlogitsforframe iinthequerysub-graph andthereferencemapis determinedbyselectin gthecorrespondingcol umns(referencemaploo pregion)inithrowinth ereprojection GNNQueryModuleTheGNN querymoduletransform stheembeddingposegra phintoafeatureaggreg atedreferencegraph,w hichisusedforqueryin gasub-graphcomposedo fsuccessivekeyframes tofindthemostmatchingnod esforcamerarelocaliz ation.Oncewehavethee mbeddingcodeofeveryk eyframe,weconcatenat etheembeddingsoftwon odesconnectedbyanedg easanedgeembedding.T henodeandedgeembeddi ngsareusedtobuildane mbeddingposegraphtof eedintoaGNN(asshowni nFig.6a),whichiscomp risedofthreelayersof SAGEConvlayers[32]followedbyReLUandfinallyisclippedwithaS igmoidfunction.Totra intheGNN,weusethesim ilaritymatrixfromthe reprojectionIoUasthe groundtruthlabelfort heGNNqueryresults.Fi rstwetaketworunsoflo optrajectoriesinthes amescene.Oneloopisus edasthereferenceloop .Theotherisusedasast hequeryloop.Asshowni nFig.6bThegroundtrut hmatchlogitsforframe iinthequerysub-graph andthereferencemapis determinedbyselectin gthecorrespondingcol umns(referencemaploo pregion)inithrowinth ereprojection 0.03
英語(論文から抽出)日本語訳スコア
ME.Conv(in=16x16x4, out=128, k=3)...RGB patch [Nx16x16x3]Embedding code [1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Re LUME.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUSAGEConv(in=128, hidden=64, mean)ReLUSAGEConv(in =64, hidden=32, mean)ReLUAggregated Node Features[MxK]Embedded Graph [MxK]sigmoidME.sigmoid ME.Conv(in=16x16x4, out=128, k=3) ...RGB patch [Nx16x16x3]Embedding code [1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME(in=128, out=64, out=3)ME.Batch_NormME.Re LUME.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUAGEConv(in=28, hidden=128, mean)Me.Batch_NormME .ReLUME.ReLUME.ReLUM E.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32, out=32)]
訳抜け防止モード: ME.Conv(in=16x16x4, out=128, k=3) ... RGB patch [ Nx16x16x3]埋め込みコード [ 1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Co nv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, ex=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUSAGEConv(in=128 , hidden=64 , mean)ReLUSAGEConv(in =64 , hidden=32 , mean)ReLUAggregated Node Features[MxK]Embeded Graph [MxK]sigmoidME.sigmoid
0.31
(a)GNNformapgraphand querysub-graph.ME.Co nv(in=16x16x4, out=128, k=3)...RGB patch [Nx16x16x3]Embedding code [1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Re LUME.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUSAGEConv(in=128, hidden=64, mean)ReLUSAGEConv(in =64, hidden=32, mean)ReLUAggregated Node Features[MxK]Embedded Graph [MxK]sigmoidME.sigmoidRef erence loopQuery loop𝑝𝑖𝑝𝑗𝑝𝑖𝑝𝑗Querysimilarity regionReference similarity regionMax similarity region (a)GNNformapgraphand querysub-graph.ME.Co nv(in=16x16x4, out=128, k=3)...RGB patch [Nx16x16x3]Embedding code [1x32]SAGEConv(in=32, hidden=128, mean)ME.Batch_NormME .ReLUME.Conv(in=128, out=64, k=3)ME.Batch_NormME.Re LUME.Conv(in=64, out=32, k=3)ME.Batch_NormME.Re LUME.Global_avg_pool (in=32, out=32)Edge FeaturesNode Features[Mx2K][MxK]ReLUSAGEConv(in=128, hidden=64, mean)ReLUSAGEConv(in =64, hidden=32, mean)ReLUAggregated Node Features[MxK]Embedded Graph [MxK]sigmoidME.sigmoidRef erence loopQuery loop𝑝𝑖𝑝𝑗𝑝𝑖𝑝𝑗Querysimilarity regionReference similarity regionMax similarity region 0.31
(b)Groundtruthlabelp reparationpipelinefo rGNN.Fig.6IoUmatrix. Thecostfunctionisdefinedasthecrossentropy ofpredictedprobabili tyofgraphquerynetwor kandthecorresponding rowinreprojectionIoU map.WithatrainedGNN, afeatureaggregatedre ferencegraphisobtain edfromtheembeddingpo segraph.Thisreferenc egraphreplacestheBoW schemeforcamerareloc alization.Toimprovet heprecisionandrobust nessofrelocalization ,weuseagroupofsucces sivekeyframesrathert hanasinglekeyframeto formaquerysub-graph. Sincethequerysub-gra phandthereferencegra phhavethesameembeddi ngdimension,wecanapp lyaninversecross-ent ropymatrixmultiplica tionbetweentheembedd ingmatricesofthequer ysub-graphandtherefe rencegraphtoproducea similaritymatrixandd etectaloopclosurebyfindingthemaximumvalue inthesimilaritymatri x.Theinversecross-en tropymultiplicationi sdefinedasbelow:Uij=1η−Pk∈Kqiklog(mkj)(2)which appliestherow-column wiseinversecross-ent ropyinsteadofaggrega tingtherow-columnmul tiplicationresultdir ectly,whereUistheout putoftheinversecross -entropymultiplicati on,Kistheembeddingdi mension,qandmarequer ysub-graphembeddingm atrixandreferencegra phembeddingmatrix.Fo reachelementofU,Uiji scalculatedbytheinve rsecross-entropyofit hrowofqueryembedding matrixandjthcolumnof referencegraphembedd ingmatrix.Withthemul tiplicationresults,w eusegraphoptimizatio nalgorithms[33]tooptimizetheglobalp oses.Foraloopclosure detectionpair:queryf ramepiandreferencefr amepj,thegraphoptimi zationprocessisasbel ow:Tiw=argminT∗wrTi,jΩi,jri,j(3)ri,j=log(TjiTwjTiw)(4)Whe reTiwisthecamerapose offramepi.Weinitiali zeitwithitsprevioust rackedposeTi−1w,Ω=ˆθ×Iistheinformationmat rixprior,whichisscal edbytheembeddingsimi larityvectorˆθ. (b)Groundtruthlabelp reparationpipelinefo rGNN.Fig.6IoUmatrix. Thecostfunctionisdefinedasthecrossentropy ofpredictedprobabili tyofgraphquerynetwor kandthecorresponding rowinreprojectionIoU map.WithatrainedGNN, afeatureaggregatedre ferencegraphisobtain edfromtheembeddingpo segraph.Thisreferenc egraphreplacestheBoW schemeforcamerareloc alization.Toimprovet heprecisionandrobust nessofrelocalization ,weuseagroupofsucces sivekeyframesrathert hanasinglekeyframeto formaquerysub-graph. Sincethequerysub-gra phandthereferencegra phhavethesameembeddi ngdimension,wecanapp lyaninversecross-ent ropymatrixmultiplica tionbetweentheembedd ingmatricesofthequer ysub-graphandtherefe rencegraphtoproducea similaritymatrixandd etectaloopclosurebyfindingthemaximumvalue inthesimilaritymatri x.Theinversecross-en tropymultiplicationi sdefinedasbelow:Uij=1η−Pk∈Kqiklog(mkj)(2)which appliestherow-column wiseinversecross-ent ropyinsteadofaggrega tingtherow-columnmul tiplicationresultdir ectly,whereUistheout putoftheinversecross -entropymultiplicati on,Kistheembeddingdi mension,qandmarequer ysub-graphembeddingm atrixandreferencegra phembeddingmatrix.Fo reachelementofU,Uiji scalculatedbytheinve rsecross-entropyofit hrowofqueryembedding matrixandjthcolumnof referencegraphembedd ingmatrix.Withthemul tiplicationresults,w eusegraphoptimizatio nalgorithms[33]tooptimizetheglobalp oses.Foraloopclosure detectionpair:queryf ramepiandreferencefr amepj,thegraphoptimi zationprocessisasbel ow:Tiw=argminT∗wrTi,jΩi,jri,j(3)ri,j=log(TjiTwjTiw)(4)Whe reTiwisthecamerapose offramepi.Weinitiali zeitwithitsprevioust rackedposeTi−1w,Ω=ˆθ×Iistheinformationmat rixprior,whichisscal edbytheembeddingsimi larityvectorˆθ. 0.05
Alltheposesinthegrap hareadjustedtominimi zingtheresidualbetwe enreferenceframeandq ueryframe:ri,j.Since wesparsifythekeyfram esbycroppingtrackedf eaturepointsneighbor hood,anduseinversecr oss-entropymatrixmul tiplicationtoreplace theBoWdatabaseandmat chingprocedure,theov erallperformanceimpa ctofthismethodonaSLA Msystemismarginal.IV .EXPERIMENTSA.Datase tCollectionandProces singScenedatacollect ion:Werecordmultiple scenesinourofficewithseveralsequenc estoensureatleasttwo completecyclesusingI ntelRealsenseD455i.N otethatwedonotuseope n-sourcedatasetssinc ewefindthatnoneofthemhave multipleloopsandgrou nd-truthtrajectoryto computetheIOUandhigh -precisiondepth.Ourd atasetprovidesRGBima ges,depthimages,andI MUdata.Localizationd atageneration:WefirstusetheVINS-RGBDsy stem[28]toreconstructthescen eusingtherecordedvid eodata.Aftergenerate thescenemap,wesaveth ecameraposetrajector y,keyframesandextrac tedpatchesaroundfeat urepoints.Toincrease thescalability,weext ractpatchesatmultipl escales:16×16,32×32,64×64.Lowtexturedataaug mentation:Duringthes cenedatacollection,w emayencountersomelow textureenvironments. Weproposeafeaturecom pletionstrategytosup plementthefeaturesin thelowtextureenviron ments.Ifthenumberoff eaturesisbelow128,we findallthelinesfromeac hkeypointtotheimagec enter,andthenrandoml yselectthepointsonth elinesandextractthep atches.Fig.7showsthe processofconstructin glinesusingexistedfe aturepointsandrandom patchselection.Fig.7 :Lineextractionandra ndompatchselectionov erview Alltheposesinthegrap hareadjustedtominimi zingtheresidualbetwe enreferenceframeandq ueryframe:ri,j.Since wesparsifythekeyfram esbycroppingtrackedf eaturepointsneighbor hood,anduseinversecr oss-entropymatrixmul tiplicationtoreplace theBoWdatabaseandmat chingprocedure,theov erallperformanceimpa ctofthismethodonaSLA Msystemismarginal.IV .EXPERIMENTSA.Datase tCollectionandProces singScenedatacollect ion:Werecordmultiple scenesinourofficewithseveralsequenc estoensureatleasttwo completecyclesusingI ntelRealsenseD455i.N otethatwedonotuseope n-sourcedatasetssinc ewefindthatnoneofthemhave multipleloopsandgrou nd-truthtrajectoryto computetheIOUandhigh -precisiondepth.Ourd atasetprovidesRGBima ges,depthimages,andI MUdata.Localizationd atageneration:WefirstusetheVINS-RGBDsy stem[28]toreconstructthescen eusingtherecordedvid eodata.Aftergenerate thescenemap,wesaveth ecameraposetrajector y,keyframesandextrac tedpatchesaroundfeat urepoints.Toincrease thescalability,weext ractpatchesatmultipl escales:16×16,32×32,64×64.Lowtexturedataaug mentation:Duringthes cenedatacollection,w emayencountersomelow textureenvironments. Weproposeafeaturecom pletionstrategytosup plementthefeaturesin thelowtextureenviron ments.Ifthenumberoff eaturesisbelow128,we findallthelinesfromeac hkeypointtotheimagec enter,andthenrandoml yselectthepointsonth elinesandextractthep atches.Fig.7showsthe processofconstructin glinesusingexistedfe aturepointsandrandom patchselection.Fig.7 :Lineextractionandra ndompatchselectionov erview 0.03
英語(論文から抽出)日本語訳スコア
Hiddensurfaceremoval :WhenwecalculatetheI oUbyprojectingkeyfra meAtokeyframeB.Ifthe projectedpointsarebe hindthesceneinkeyfra meB,theyshouldnotbev isible.However,witho utincorporatingthede pthinformation,these pointscanbemistakenl yprojectedontokeyfra meB.Tocorrectthishid densurfaceissue,weap plythez-Bufferalgorithm[34]tohandlethedepthrela tionandremovethehidd ensurfacesfromthepro jectedimage.B. Hidden Surfaceremoval:Whenw ecalculatetheIoUbypr ojectingkeyframeAtok eyframeB.Iftheprojec tedpointsarebehindth esceneinkeyframeB,th eyshould notbevisible. しかし、thedepthinformation, thesepointscanbemist akenlyprojectedontok eyframeB.tocorrectth ishidden Surface Issue,weapplythez-Bu fferalgorithm[34]tohandlethedepthrela tionandremovedden Surfacesfromtheproje ctedimageB. 0.08
QuantitativeEvaluati on:Weperformourexper imentsonseveralchall engingenvironments,i ncludinglowtexture,d ynamicobjects,andstr ongilluminationvaria tions.Thetotaldatase tcontainsmorethan164 ,380Kkeyframesand3,2 30Kloopframes.Weappl ySMOTEup-sampling[35]tobalancetheloopandn on-loopframes.Wealso incorporaterandomflipsonthepatchimages. Wecollect5differentscenesanduse2fo rthetrainingsequence s.2forthetestsequenc esand1forthevalidati onsequence.Duetotheh ardwarelimitationoft hecamera,weonlyperfo rmourexperimentsinan indoorenvironment.We designthetrajectoryp athbycollectingthesa mescenewithtworevers edloopssothatourS3E- GNNmethodcangenerate extraedgesasconstrai ntscomparedwiththetr aditionalBoWmethods. WetrainourtwoS3Eback bones:ViTandSparseCo nvwiththesamehyper-p arameters:0.001learn ingrate,10−5weightdecayrate,0.9 5momentum,weadoptcyc licschedulerwithmaxl earningrateas0.1,and SGDoptimizerfor140ep ochs.Wecomparetherun -timespeedandperform anceofthesetwobackbo nesandtheeffectofGNNinloopclosur edetectionusingVINS- MonoSLAM[36]. QuantitativeEvaluati on:Weperformourexper imentsonseveralchall engingenvironments,i ncludinglowtexture,d ynamicobjects,andstr ongilluminationvaria tions.Thetotaldatase tcontainsmorethan164 ,380Kkeyframesand3,2 30Kloopframes.Weappl ySMOTEup-sampling[35]tobalancetheloopandn on-loopframes.Wealso incorporaterandomflipsonthepatchimages. Wecollect5differentscenesanduse2fo rthetrainingsequence s.2forthetestsequenc esand1forthevalidati onsequence.Duetotheh ardwarelimitationoft hecamera,weonlyperfo rmourexperimentsinan indoorenvironment.We designthetrajectoryp athbycollectingthesa mescenewithtworevers edloopssothatourS3E- GNNmethodcangenerate extraedgesasconstrai ntscomparedwiththetr aditionalBoWmethods. WetrainourtwoS3Eback bones:ViTandSparseCo nvwiththesamehyper-p arameters:0.001learn ingrate,10−5weightdecayrate,0.9 5momentum,weadoptcyc licschedulerwithmaxl earningrateas0.1,and SGDoptimizerfor140ep ochs.Wecomparetherun -timespeedandperform anceofthesetwobackbo nesandtheeffectofGNNinloopclosur edetectionusingVINS- MonoSLAM[36]. 0.04
Table.Idemonstratest hecomparisonoftheper formanceboostafterin corporatingtheGNNque rymodule.FortheGNN,s incethenetworkpronet odiverge,weproposeal owerlearningrate:10−5,therestofthehyper- parametersarethesame asS3Ebackbones.LCist heloopclosuregraphop timizationbackend.Th eposeestimationerror dropsdrasticallywith ViTpoweredS3E-GNNwhi letherun-timespeeddr opsaswell.TheSparseC onvpoweredS3E-GNNach ieveslessimprovement yetmaintainshigherru n-timespeed.Thisisbe causetheSparseConvne tworkonlyconvolvesth eimmediateneighborho odpatches,resultingi nlackingglobalcontex t,comparedtoViT.TABL EI:Ablationstudyofpr oposedmethodvsbaseli ne(VINS-Mono),LCmean sloopclosure.Archite ctureS3E(ViT)S3E(Spa rseConv)LCGNN+LCRMSE(RMSE)FPSBasel ine3.73(0.45%)15.3Pr oposed16x16X3.77(0.3 8%)13.932x32X3.73(0. 53%)9.164x64X3.37(0. 54%)5.7XX3.71(0.61%) 14.316x16X3.89(0.38% )12.132x32X3.15(0.41 %)8.364x64X2.74(0.56 %)3.8XX3.13(0.31%)13 .7Tobenchmarkwiththe traditionalBoWloopcl osuredetection,weeva luatethefivestate-of-the-artSL AMmethodswithViTpowe redS3E-GNNasshowninT able.TABLEII:Poseerr orcomparedwithground truthtrajectory.RMSE ofATE(AbsoluteTransl ationalError)isincm, andisaveragedoverall testdatasets(Office02,Office03,Home03)MethodsI MURGBDS3E/BoWRMSEσRMSERMSEMaxORB-SLAM3 [37]XS3E-GNN1.980.82%12. 33BoW3.150.8%20.32VI NS-Mono[36]XS3E-GNN1.330.31%11. 71BoW3.191.0%17.80VI NS-RGBD[28]XXS3E-GNN1.190.4%11. 13BoW3.730.4%25.43S- MSCKF[38]XS3E-GNN2.828.8%13.2 1BoW6.329.2%32.05RGB DTAM[39]XS3E-GNN3.350.2%9.78 BoW9.980.6%11.21II.A llthemethodswiththeS 3E-GNNoutperformthet raditionalBoWloopclo suremethodsbyalargem argin.WeevaluatetheR MSE(RootMeanSquareEr ror),σRMSE(standarddeviati onofRMSE)oftheAbsolu teTranslationalError comparedwiththegroun dtruthtrajectoryaver agedover10runstoensu rerepeatability.Toev aluatethegeneralizat ionofourmethod,wetes tS3E-GNNthatistraine donourofficedatasetonMicrosoft 7-scenesdataset[22]directly.AsshowninTa ble.III,ourworkisonp arwiththestate-of-ar tmethods,whichdemons tratesreasonableout- of-datasetgeneraliza bility.SinceS3E-GNNi strainedontheourofficedataset,wegetthebe stresultontheofficesceneof7scenesdata set.0100020003000400 0frame index0.000.020.040.0 60.080.100.12runtime (s)loop closureGNNS3ETrackin gFig.8:TheCPUruntime ofdifferentcomponentsandth ebreakdownofthepropo sedS3E-GNNsystem.Run timeevaluation:Werun thefullsystem(VINS-M ono+S3E-GNN)onai710700kC PUwithRTX2080GPUPC.F ig.8showstheCPUsingl e-roundruntimeofthef ullsystemrunningonat estingsequence.Theti meconsumptionofS3Ean dGNNcomponentismargi nalcomparedtothetrac kingandloopclosuremo dule.Asadditionaledg eswereintroducedfrom theGNNqueryresults,g raphoptimizationinth eloopclosurethreadco nsumesthemajorityoft hecomputationtime.C. Table.Idemonstratest hecomparisonoftheper formanceboostafterin corporatingtheGNNque rymodule.FortheGNN,s incethenetworkpronet odiverge,weproposeal owerlearningrate:10−5,therestofthehyper- parametersarethesame asS3Ebackbones.LCist heloopclosuregraphop timizationbackend.Th eposeestimationerror dropsdrasticallywith ViTpoweredS3E-GNNwhi letherun-timespeeddr opsaswell.TheSparseC onvpoweredS3E-GNNach ieveslessimprovement yetmaintainshigherru n-timespeed.Thisisbe causetheSparseConvne tworkonlyconvolvesth eimmediateneighborho odpatches,resultingi nlackingglobalcontex t,comparedtoViT.TABL EI:Ablationstudyofpr oposedmethodvsbaseli ne(VINS-Mono),LCmean sloopclosure.Archite ctureS3E(ViT)S3E(Spa rseConv)LCGNN+LCRMSE(RMSE)FPSBasel ine3.73(0.45%)15.3Pr oposed16x16X3.77(0.3 8%)13.932x32X3.73(0. 53%)9.164x64X3.37(0. 54%)5.7XX3.71(0.61%) 14.316x16X3.89(0.38% )12.132x32X3.15(0.41 %)8.364x64X2.74(0.56 %)3.8XX3.13(0.31%)13 .7Tobenchmarkwiththe traditionalBoWloopcl osuredetection,weeva luatethefivestate-of-the-artSL AMmethodswithViTpowe redS3E-GNNasshowninT able.TABLEII:Poseerr orcomparedwithground truthtrajectory.RMSE ofATE(AbsoluteTransl ationalError)isincm, andisaveragedoverall testdatasets(Office02,Office03,Home03)MethodsI MURGBDS3E/BoWRMSEσRMSERMSEMaxORB-SLAM3 [37]XS3E-GNN1.980.82%12. 33BoW3.150.8%20.32VI NS-Mono[36]XS3E-GNN1.330.31%11. 71BoW3.191.0%17.80VI NS-RGBD[28]XXS3E-GNN1.190.4%11. 13BoW3.730.4%25.43S- MSCKF[38]XS3E-GNN2.828.8%13.2 1BoW6.329.2%32.05RGB DTAM[39]XS3E-GNN3.350.2%9.78 BoW9.980.6%11.21II.A llthemethodswiththeS 3E-GNNoutperformthet raditionalBoWloopclo suremethodsbyalargem argin.WeevaluatetheR MSE(RootMeanSquareEr ror),σRMSE(standarddeviati onofRMSE)oftheAbsolu teTranslationalError comparedwiththegroun dtruthtrajectoryaver agedover10runstoensu rerepeatability.Toev aluatethegeneralizat ionofourmethod,wetes tS3E-GNNthatistraine donourofficedatasetonMicrosoft 7-scenesdataset[22]directly.AsshowninTa ble.III,ourworkisonp arwiththestate-of-ar tmethods,whichdemons tratesreasonableout- of-datasetgeneraliza bility.SinceS3E-GNNi strainedontheourofficedataset,wegetthebe stresultontheofficesceneof7scenesdata set.0100020003000400 0frame index0.000.020.040.0 60.080.100.12runtime (s)loop closureGNNS3ETrackin gFig.8:TheCPUruntime ofdifferentcomponentsandth ebreakdownofthepropo sedS3E-GNNsystem.Run timeevaluation:Werun thefullsystem(VINS-M ono+S3E-GNN)onai710700kC PUwithRTX2080GPUPC.F ig.8showstheCPUsingl e-roundruntimeofthef ullsystemrunningonat estingsequence.Theti meconsumptionofS3Ean dGNNcomponentismargi nalcomparedtothetrac kingandloopclosuremo dule.Asadditionaledg eswereintroducedfrom theGNNqueryresults,g raphoptimizationinth eloopclosurethreadco nsumesthemajorityoft hecomputationtime.C. 0.09
QualitativeEvaluatio n:Inthissection,wevi sualizethepredicteds imilarityscorefromVi TpoweredS3Ecodingmod uleincomparisonwithg roundtruthreprojecti onIoUsimilarityscore inanuntrainedsequenc e.Tobetterillustrate theperformanceofthes ceneembeddingmodule, weprovidetheerrormap ofthepredicted qualitativeevaluatio n:inthissection,wevi sualizethepredicted similarityscorefromv itpowereds3ecodingmo duleincomparisonwith groundtruthreproject ioniou similarityscoreinanu ntrainedsequence.tob etterillustratethepe rformanceoftheceneem beddingmodule,weprov idetheerrormapofthep redicted 0.04
英語(論文から抽出)日本語訳スコア
TABLEIII:Generalizat ionexperimentbyonlyu singinferencemodulet hattraininginourselv esofficedataset:translatio n(m)androtation(◦)errorcomparisoninMi crosoft7-scenesdatas et[22]. TABLEIII: Generalizationexperi mentbyonlyusinginfer encemodule thattraininginoursel vesofficedataset:tra nslation(m)androtati on( )errorcomparisoninMi crosoft7-scenesdatas et[22]。 0.17
MethodSequenceChessF ireHeadsOfficePumpkinKitchenStai rsAvgPoseNet15[13]0.32,8.120.47,14.40. 29,12.00.48,7.680.47 ,8.420.59,8.640.47,1 3.80.44,10.4Hourglas s[40]0.15,6.170.27,10.840 .19,11.630.21,8.480. 25,7.010.27,10.150.2 9,12.460.23,9.53LSTM -Pose[41]0.24,5.770.34,11.90. 21,13.70.30,8.080.33 ,7.000.37,8.830.40,1 3.70.31,9.85ANNet[42]0.12,4.300.27,11.600 .16,12.400.19,6.800. 21,5.200.25,6.000.28 ,8.400.21,7.90Branch Net[43]0.18,5.170.34,8.990. 20,14.150.30,7.050.2 7,5.100.33,7.400.38, 10.260.29,8.30GPoseN et[44]0.20,7.110.38,12.30. 21,13.80.28,8.830.37 ,6.940.35,8.150.37,1 2.50.31,9.95MLFBPPos e[45]0.12,5.820.26,11.990 .14,13.540.18,8.240. 21,7.050.22,8.140.38 ,10.260.22,9.29VidLo c[46]0.18,NA0.26,NA0.14,N A0.26,NA0.36,NA0.31, NA0.26,NA0.25,NAMapN et[47]0.08,3.250.27,11.690 .18,13.250.17,5.150. 22,4.020.23,4.930.30 ,12.080.21,7.77LsG[48]0.09,3.280.26,10.920 .17,12.700.18,5.450. 20,3.690.23,4.920.23 ,11.30.19,7.47S3E-GN N0.21,6.120.45,15.27 0.21,11.70.19,5.560. 33,5.120.28,5.630.32 ,13.10.28,8.92simila ritymap.AsshowninFig .9,thereprojectionIo Uheatmapdescribeshow similartwopairsofima gesareinatrackedpose graph.Thediagonaloft heheatmapvalueishigh sinceitsself-project ion.Ontheotherhand,t heoff-diagonalhighlighted partsindicatethereve rsedloop.Sincewehave appliedthez-buffermethodtoremoveoccl usion,thesimilaritym atrixismostlysymmetr ic.Theerrormapproves thatmostactivatedsim ilarareasalignwithth egroundtruthheatmap( whereerrorexists),wh ereasthemaximumabsol utesimilarityerroris lessthan0.1.Thismean sthepredictedsimilar ity,inspiteofnumeric errors,hasimplicitly learnedtherelativetr ansformationanddepth informationforeachpa irofimages.010002000 3000400050006000fram e j0100020003000400050 006000frame iground truth pixel-wise IoU heat map0.00.20.40.60.8 MethodSequenceChessF ireHeadsOfficePumpkinKitchenStai rsAvgPoseNet15[13]0.32,8.120.47,14.40. 29,12.00.48,7.680.47 ,8.420.59,8.640.47,1 3.80.44,10.4Hourglas s[40]0.15,6.170.27,10.840 .19,11.630.21,8.480. 25,7.010.27,10.150.2 9,12.460.23,9.53LSTM -Pose[41]0.24,5.770.34,11.90. 21,13.70.30,8.080.33 ,7.000.37,8.830.40,1 3.70.31,9.85ANNet[42]0.12,4.300.27,11.600 .16,12.400.19,6.800. 21,5.200.25,6.000.28 ,8.400.21,7.90Branch Net[43]0.18,5.170.34,8.990. 20,14.150.30,7.050.2 7,5.100.33,7.400.38, 10.260.29,8.30GPoseN et[44]0.20,7.110.38,12.30. 21,13.80.28,8.830.37 ,6.940.35,8.150.37,1 2.50.31,9.95MLFBPPos e[45]0.12,5.820.26,11.990 .14,13.540.18,8.240. 21,7.050.22,8.140.38 ,10.260.22,9.29VidLo c[46]0.18,NA0.26,NA0.14,N A0.26,NA0.36,NA0.31, NA0.26,NA0.25,NAMapN et[47]0.08,3.250.27,11.690 .18,13.250.17,5.150. 22,4.020.23,4.930.30 ,12.080.21,7.77LsG[48]0.09,3.280.26,10.920 .17,12.700.18,5.450. 20,3.690.23,4.920.23 ,11.30.19,7.47S3E-GN N0.21,6.120.45,15.27 0.21,11.70.19,5.560. 33,5.120.28,5.630.32 ,13.10.28,8.92simila ritymap.AsshowninFig .9,thereprojectionIo Uheatmapdescribeshow similartwopairsofima gesareinatrackedpose graph.Thediagonaloft heheatmapvalueishigh sinceitsself-project ion.Ontheotherhand,t heoff-diagonalhighlighted partsindicatethereve rsedloop.Sincewehave appliedthez-buffermethodtoremoveoccl usion,thesimilaritym atrixismostlysymmetr ic.Theerrormapproves thatmostactivatedsim ilarareasalignwithth egroundtruthheatmap( whereerrorexists),wh ereasthemaximumabsol utesimilarityerroris lessthan0.1.Thismean sthepredictedsimilar ity,inspiteofnumeric errors,hasimplicitly learnedtherelativetr ansformationanddepth informationforeachpa irofimages.010002000 3000400050006000fram e j0100020003000400050 006000frame iground truth pixel-wise IoU heat map0.00.20.40.60.8 0.05
(a)01000200030004000 50006000frame j0100020003000400050 006000frame ipredicted pixel-wise IoU heat map0.00.20.40.60.81. 0 (a)01000200030004000 5000フレームj0100020003000400050 006000フレームipredicted pixel-wise iouヒートマップ0.00.20.40.60.81.0 0.36
(b)01000200030004000 50006000frame j0100020003000400050 006000frame iprediction error of pixel-wise IoU heat map0.000.020.040.060 .08 (b)01000200030005000 6000frame j01000200050006000fr ame IoU熱マップ0.000.020.040.060.08 0.28
(c)Fig.9:Evaluationo fsimilaritypredictio n. (c)第9図:相似性予測の評価 0.56
(a)groundtruthsimila rityheatmapfromrepro jectionIoU (a)地層構造熱マップIoU 0.35
(b)predictedsimilari tyheatmapfromS3E b)predicted similarityheatmapfro ms3e 0.38
(c)predictionerrorhe atmapMoreover,wedemo nstratethecorrespond ing3Dscenewiththeest imatedposetrajectory .Fig.10ashowsthe3Dpo intcloudmapbuiltfrom theVINS-Mono[36]withS3E-GNN.Withthea dditionallooppairsde tectedbyS3E-GNN,thee stimatedmapisprecise thatthepointclouddep ictingthewallsisasth inas2cm.Particularly ,theposecomparisonfr omFig.10bprovesthatt heS3E-GNNimprovesthe poseestimationandopt imizestheglobaltrack edposegraph.Trajecto ryestimationerrorbyV INS-MonowithS3E-GNNi sdeductedby41%,compa redtotheonewithoutS3 E-GNN.V. (c)predictionerrorhe atmapMoreover,wedemo nstratethecorrespond ing3Dscenewiththeest imatedposetrajectory .Fig.10ashowsthe3Dpo intcloudmapbuiltfrom theVINS-Mono[36]withS3E-GNN.Withthea dditionallooppairsde tectedbyS3E-GNN,thee stimatedmapisprecise thatthepointclouddep ictingthewallsisasth inas2cm.Particularly ,theposecomparisonfr omFig.10bprovesthatt heS3E-GNNimprovesthe poseestimationandopt imizestheglobaltrack edposegraph.Trajecto ryestimationerrorbyV INS-MonowithS3E-GNNi sdeductedby41%,compa redtotheonewithoutS3 E-GNN.V. 0.06
FUTUREWORKOurS3E-GNN frameworkiscurrently trainedonadatasetofi ndoorscenesandhasdem onstratedbetterperfo rmancethantraditiona lBoWmethods.Inthe 古典的ボーメトドスに就て 0.03
(a)0.00.20.40.60.81. 00.00.20.40.60.81.0x (m)765432101y (m)432101234z (m)432101234VINS+S3EVINSGT (a)0.00.20.40.60.80. 00.20.40.60.81.0x (m)765432101y (m)432101234z (m)432101234vins+s3evinsgt 0.20
(b)Fig.10:Experiment environment3Dmapandt rajectorycomparison. future,wecanextendth edatasettooutdoorsce nes,whichwillpavethe wayforapplyingourmet hodtoautonomousdrivi ng,drones,etc.Also,t hankstotheflexiblenodeadditionan ddeletionstrategieso fourgraphstructure,w ecanusethismethodtod eploymultiplecoopera tiverobots.Thisappli cationwillplayacruci alroleinquicklycolle ctinginformationover largeareas,buildinge nvironmentmodels,and collectingsituationa lawareness.VI.CONCLU SIONInthispaper,wein troduceS3E-GNN,anend -to-endlearningframe worktoimprovetheefficiencyandrobustnesso fcamerarelocalizatio n.Weproposeasparsesp atialsceneembeddingm oduletoencodethespat ialandsemanticfeatur esofakeyframeintoala tentembeddingcode.By aggregatingtheembedd ingcodesofallthekeyf ramesintoareferencep osegraph,wecanreloca lizecamerascenesusin gaGNNqueryalgorithm. Ourexperimentsdemons trateanotableimprove mentoverthetradition alBoWmethodsbyimplic itlyincorporatethemu lti-viewgeometricali nformationwiththeadd itionalscenematching constraints.Wealsosh owthatourS3E-GNNfram eworkcanbeemployedas aplug-and-playmodule toanystate-of-the-ar tSLAMsystems.Referen ces[1]J.LeonardandH.Durran t-Whyte,“Simultaneousmapbuild ingandlocalizationfo ranautonomousmobiler obot,”inProceedingsIROS’91:IEEE/RSJInternati onalWorkshoponIntell igentRobotsandSystem s’91,1991,pp.1442–1447vol.3. (b)Fig.10:Experiment environment3Dmapandt rajectorycomparison. future,wecanextendth edatasettooutdoorsce nes,whichwillpavethe wayforapplyingourmet hodtoautonomousdrivi ng,drones,etc.Also,t hankstotheflexiblenodeadditionan ddeletionstrategieso fourgraphstructure,w ecanusethismethodtod eploymultiplecoopera tiverobots.Thisappli cationwillplayacruci alroleinquicklycolle ctinginformationover largeareas,buildinge nvironmentmodels,and collectingsituationa lawareness.VI.CONCLU SIONInthispaper,wein troduceS3E-GNN,anend -to-endlearningframe worktoimprovetheefficiencyandrobustnesso fcamerarelocalizatio n.Weproposeasparsesp atialsceneembeddingm oduletoencodethespat ialandsemanticfeatur esofakeyframeintoala tentembeddingcode.By aggregatingtheembedd ingcodesofallthekeyf ramesintoareferencep osegraph,wecanreloca lizecamerascenesusin gaGNNqueryalgorithm. Ourexperimentsdemons trateanotableimprove mentoverthetradition alBoWmethodsbyimplic itlyincorporatethemu lti-viewgeometricali nformationwiththeadd itionalscenematching constraints.Wealsosh owthatourS3E-GNNfram eworkcanbeemployedas aplug-and-playmodule toanystate-of-the-ar tSLAMsystems.Referen ces[1]J.LeonardandH.Durran t-Whyte,“Simultaneousmapbuild ingandlocalizationfo ranautonomousmobiler obot,”inProceedingsIROS’91:IEEE/RSJInternati onalWorkshoponIntell igentRobotsandSystem s’91,1991,pp.1442–1447vol.3. 0.04
英語(論文から抽出)日本語訳スコア
[2]A.J.Davison,I.D.Reid ,N.D.Molton,andO.Sta sse,“Monoslam:Real-timesi nglecameraslam,”IEEETransactionsonPa tternAnalysisandMach ineIntelligence,vol. 29,no.6,pp.1052–1067,2007.[3]R.Mur-Artal,J.M.M.Mo ntiel,andJ.D. [2]A.J.Davison,I.D.Reid ,N.D.Molton,andO.Sta sse, “Monoslam:Real-timesi nglecameraslam”,IEEETransactionsonP atternAnalysisandMac hineIntelligence,vol .29,no.6,pp.1052–1067,2007.[3]R.M.M.Montiel,andJ.D . 0.21
Tard´os,“Orb-slam:Aversatilea ndaccuratemonoculars lamsystem,”IEEETransactionsonRo botics,vol.31,no.5,p p.1147–1163,2015.[4]C.Cadena,L.Carlone,H .Carrillo,Y.Latif,D. Scaramuzza,J.Neira,I .Reid,andJ.J. Tard ́os, “Orb-slam: Aversatileandaccurat emonocularslamsystem ”,IEEETransactionsonR obotics,vol.31,no.5, pp.1147–1163,2015.[4]C.Cadena,L.Carlone,H .Carrillo,Y.Latif,D. Scaramuzza,J.Neira,I .Reid,andJ.J. 0.26
Leonard,“Past,present,andfutu reofsimultaneousloca lizationandmapping:T owardtherobust-perce ptionage,”IEEETransactionsonRo botics,vol.32,no.6,p p.1309–1332,2016.[5]R.Mur-ArtalandJ.D. Leonard, “Past,present,andfutu reofsimultaneous Localization andmapping:Towardthe robust-perceptionage ”,IEEETransactionsonR obotics,vol.32,no.6, pp.1309–1332,2016.[5]R.Mur-ArtalandJ.D。 0.33
Tardos,“Orb-slam2:Anopen-sou rceslamsystemformono cular,stereo,andrgb- dcameras,”IEEETransactionsonRo botics,vol.33,no.5,p .1255–1262,Oct2017.[Online]. Tardos, “Orb-slam2: Anopen-sourceslamsys temformonocular,ster eo,andrgb-dcameras”,IEEETransactionsonR obotics,vol.33,no.5, p.1255–1262,Oct2017. 0.36
Available:http://dx. doi.org/10.1109/TRO. 2017.2705103[6]A.Rosinol,M.Abate,Y. Chang,andL.Carlone,“Kimera:anopen-source libraryforreal-timem etric-semanticlocali zationandmapping,”in2020IEEEInternatio nalConferenceonRobot icsandAutomation(ICR A). 利用可能:http://dx.doi.org/1 0.1109/TRO.2017.2705 103[6]A.Rosinol,M.Abate,Y. Chang,andL.Carlone, “Kimera:anopen-source libraryforreal-timem etric-semantic Localization andmapping”, in2020IEEEInternatio nalConferenceonRobot icsandAutomation (ICRA)。 0.18
IEEE,2020,pp.1689–1696.[7]D.G.Lowe,“Distinctiveimagefeat uresfromscale-invari antkeypoints,”InternationalJournal ofComputerVision,vol .60,pp.91–110,2004.[8]H.Bay,A.Ess,T.Tuytel aars,andL.VanGool,“Speeded-uprobustfeat ures(surf),”Computervisionandima geunderstanding,vol. 110,no.3,pp.346–359,2008.[9]E.Rublee,V.Rabaud,K. Konolige,andG.Bradsk i,“Orb:Anefficientalternativetosi ftorsurf,”in2011InternationalC onferenceonComputerV ision,2011,pp.2564–2571.[10]A.Angeli,D.Filliat,S .Doncieux,andJ. IEEE, 2020,pp.1689–1696.[7]D.G.Lowe, “Distinctiveimagefeat uresfromscale-invari antkeypoints” InternationalJournal ofComputerVision,vol .60,pp.91–110,2004.[8]H.Bay,A.Ess,T.Tuytel aars,andL.VanGool, “Speeded-uprobustfeat ures(surf)”Computervisionandima geunderstanding,vol. 110,no.3,pp.346–359,2008.[9]E.Rublee,V.Rabaud,K. Konolige,G.Bradski, Anefficientaltoaltif torf 2011,ComputerRef.201 1,A.A.Esssssss,T.Tuy telaars,andL.VanGool , “Speeded-uprobustfeat ures(surf)” 0.26
-A.Meyer,“Fastandincrementalme thodforloop-closured etectionusingbagsofv isualwords,”IEEEtransactionsonro botics,vol.24,no.5,p p.1027–1037,2008.[11]D.G´alvez-L´opezandJ.D. A.Meyer, “Fastandincrementalme thodforloop-closured etectionusingbagsofv isualwords”IEEEtransactionsonro botics,vol.24,no.5,p p.1027–1037,2008.[11]D.G ́alvez-L ́opezandJ.D. 0.19
Tard´os,“Real-timeloopdetecti onwithbagsofbinarywo rds,”in2011IEEE/RSJIntern ationalConferenceonI ntelligentRobotsandS ystems,2011,pp.51–58.[12]D.G´alvez-L´opezandJ.D. Tard ́os, “Real-timeloopdetecti onwithbagsofbinarywo rds” in 2011IEEE/RSJInternat ionalConferenceon IntelligentRobotsand Systems, 2011,pp.51–58.[12]D.G ́alvez-L ́opezandJ.D. 0.16
Tardos,“Bagsofbinarywordsfor fastplacerecognition inimagesequences,”IEEETransactionsonRo botics,vol.28,no.5,p p.1188–1197,2012.[13]A.Kendall,M.Grimes,a ndR.Cipolla,“Posenet:Aconvolution alnetworkforreal-tim e6-dofcamerarelocali zation,”inProceedingsoftheIE EEinternationalconfe renceoncomputervisio n,2015,pp.2938–2946.[14]Y.ShavitandR.Ferens, “Introductiontocamera poseestimationwithde eplearning,”arXivpreprintarXiv:1 907.05272,2019.[15]A.KendallandR.Cipoll a,“Geometriclossfunctio nsforcameraposeregre ssionwithdeeplearnin g,”inProceedingsoftheIE EEconferenceoncomput ervisionandpatternre cognition,2017,pp.59 74–5983.[16]A.Valada,N.Radwan,an dW.Burgard,“Deepauxiliarylearnin gforvisuallocalizati onandodometry,”in2018IEEEinternatio nalconferenceonrobot icsandautomation(ICR A). Tardos,“Bagsofbinarywordsfor fastplacerecognition inimagesequences,”IEEETransactionsonRo botics,vol.28,no.5,p p.1188–1197,2012.[13]A.Kendall,M.Grimes,a ndR.Cipolla,“Posenet:Aconvolution alnetworkforreal-tim e6-dofcamerarelocali zation,”inProceedingsoftheIE EEinternationalconfe renceoncomputervisio n,2015,pp.2938–2946.[14]Y.ShavitandR.Ferens, “Introductiontocamera poseestimationwithde eplearning,”arXivpreprintarXiv:1 907.05272,2019.[15]A.KendallandR.Cipoll a,“Geometriclossfunctio nsforcameraposeregre ssionwithdeeplearnin g,”inProceedingsoftheIE EEconferenceoncomput ervisionandpatternre cognition,2017,pp.59 74–5983.[16]A.Valada,N.Radwan,an dW.Burgard,“Deepauxiliarylearnin gforvisuallocalizati onandodometry,”in2018IEEEinternatio nalconferenceonrobot icsandautomation(ICR A). 0.12
IEEE,2018,pp.6939–6946.[17]N.Radwan,A.Valada,an dW.Burgard,“Vlocnet++:Deepmultitasklearni ngforsemanticvisuall ocalizationandodomet ry,”IEEERoboticsandAutom ationLetters,vol.3,n o.4,pp.4407–4414,2018.[18]S.Kang,“k-nearestneighborlea rningwithgraphneural networks,”Mathematics,vol.9,no .8,p.830,2021.[19]T.Sattler,Q.Zhou,M.P ollefeys,andL.Leal-T aixe,“Understandingthelimi tationsofcnn-basedab solutecameraposeregr ession,”inProceedingsoftheIE EE/CVFconferenceonco mputervisionandpatte rnrecognition,2019,p p.3302–3312.[20]F.Xue,X.Wu,S.Cai,and J.Wang,“Learningmulti-viewca merarelocalizationwi thgraphneuralnetwork s,”in2020IEEE/CVFConfer enceonComputerVision andPatternRecognitio n(CVPR). IEEE,2018,pp.6939–6946.[17]N.Radwan,A.Valada,an dW.Burgard,“Vlocnet++:Deepmultitasklearni ngforsemanticvisuall ocalizationandodomet ry,”IEEERoboticsandAutom ationLetters,vol.3,n o.4,pp.4407–4414,2018.[18]S.Kang,“k-nearestneighborlea rningwithgraphneural networks,”Mathematics,vol.9,no .8,p.830,2021.[19]T.Sattler,Q.Zhou,M.P ollefeys,andL.Leal-T aixe,“Understandingthelimi tationsofcnn-basedab solutecameraposeregr ession,”inProceedingsoftheIE EE/CVFconferenceonco mputervisionandpatte rnrecognition,2019,p p.3302–3312.[20]F.Xue,X.Wu,S.Cai,and J.Wang,“Learningmulti-viewca merarelocalizationwi thgraphneuralnetwork s,”in2020IEEE/CVFConfer enceonComputerVision andPatternRecognitio n(CVPR). 0.16
IEEE,2020,pp.11372–11381.[21]T.Jebara,A.Azarbayej ani,andA.Pentland,“3dstructurefrom2dmot ion,”IEEESignalProcessing Magazine,vol.16,no.3 ,pp.66–84,1999.[22]J.Shotton,B.Glocker, C.Zach,S.Izadi,A.Cri minisi,andA.Fitzgibb on,“Scenecoordinateregre ssionforestsforcamer arelocalizationinrgb -dimages,”inProceedingsoftheIE EEConferenceonComput erVisionandPatternRe cognition,2013,pp.29 30–2937.[23]U.Nadeem,M.A.Jalwana ,M.Bennamoun,R.Togne ri,andF.Sohel,“Directimagetopointcl ouddescriptorsmatchi ngfor6-dofcameraloca lizationindense3dpoi ntclouds,”inInternationalConfe renceonNeuralInforma tionProcessing.Sprin ger,2019,pp.222–234.[24]X.GaoandT.Zhang,“Unsupervisedlearning todetectloopsusingde epneuralnetworksforv isualslamsystem,”Autonomousrobots,vol .41,no.1,pp.1–18,2017.[25]S.Cascianelli,G.Cost ante,E.Bellocchio,P. Valigi,M.L.Fravolini ,andT.A. IEEE,2020,pp.11372–11381.[21]T.Jebara,A.Azarbayej ani,andA.Pentland,“3dstructurefrom2dmot ion,”IEEESignalProcessing Magazine,vol.16,no.3 ,pp.66–84,1999.[22]J.Shotton,B.Glocker, C.Zach,S.Izadi,A.Cri minisi,andA.Fitzgibb on,“Scenecoordinateregre ssionforestsforcamer arelocalizationinrgb -dimages,”inProceedingsoftheIE EEConferenceonComput erVisionandPatternRe cognition,2013,pp.29 30–2937.[23]U.Nadeem,M.A.Jalwana ,M.Bennamoun,R.Togne ri,andF.Sohel,“Directimagetopointcl ouddescriptorsmatchi ngfor6-dofcameraloca lizationindense3dpoi ntclouds,”inInternationalConfe renceonNeuralInforma tionProcessing.Sprin ger,2019,pp.222–234.[24]X.GaoandT.Zhang,“Unsupervisedlearning todetectloopsusingde epneuralnetworksforv isualslamsystem,”Autonomousrobots,vol .41,no.1,pp.1–18,2017.[25]S.Cascianelli,G.Cost ante,E.Bellocchio,P. Valigi,M.L.Fravolini ,andT.A. 0.16
Ciarfuglia,“Robustvisualsemi-sem anticloopclosuredete ctionbyacovisibility graphandcnnfeatures, ”RoboticsandAutonomou sSystems,vol.92,pp.5 3–65,2017.[26]H.Yue,J.Miao,Y.Yu,W. Chen,andC.Wen,“Robustloopclosuredet ectionbasedonbagofsu perpointsandgraphver ification,”in2019IEEE/RSJIntern ationalConferenceonI ntelligentRobotsandS ystems(IROS). Ciarfuglia, “Robustvisualsemi-sem anticloopsuredetecti onbyacovisibilitygra phandcnnfeatures”,RoboticsandAutonomo usSystems,vol.92,pp. 53–65,2017.[26]H.Yue,J.Miao,Y.Yu,W. Chen,andC.Wen, “Robustloopsuresurede tection basedonbagofsuperpoi ntsandgraphverificat ion”,in2019IEEE/RSJInter nationalConferenceon IntelligentRobotsand Systems(IROS)”。 0.16
IEEE,2019,pp.3787–3793.[27]A.Elmoogy,X.Dong,T.L u,R.Westendorp,andK. Reddy,“Pose-gnn:Cameraposee stimationsystemusing graphneuralnetworks, ”arXivpreprintarXiv:2 103.09435,2021.[28]Z.Shan,R.Li,andS.Sch wertfeger,“Rgbd-inertialtraject oryestimationandmapp ingforgroundrobots,”Sensors,vol.19,no.10 ,p.2251,2019.[29]A.Dosovitskiy,L.Beye r,A.Kolesnikov,D.Wei ssenborn,X.Zhai,T.Un terthiner,M.Dehghani ,M.Minderer,G.Heigol d,S.Gelly,etal. ieee,2019,pp.3787-37 93.[27]a.elmoogy,x.dong,t.l u,r.westendorp,andk. reddy, “pose-gnn:cameraposee stimationsystemusing graphneuralnetworks”,arxivpreprintarxiv: 2103.09435,2021.[28]z.shan,r.li,ands.sch wertfeger, “rgbd-inertialtraject oryestimationandmapp ingforgroundrobots”sensors,vol.19,no.10 ,p.2251,2019.a.dosov itskiy,l.beyer,a.kol esnikov,d.senborn,z. shan,r.shan,r.li,and s.schwertfeger, “rgbd-inertialtraject oryestimationandmapp ingforgroundrobots”sensors,vol.19,p.225 151,2019.dosovitskiy ,l.beyer,d.dosovitsk iy,d. 0.15
,“Animageisworth16x16w ords:Transformersfor imagerecognitionatsc ale,”arXivpreprintarXiv:2 010.11929,2020.[30]C.Choy,J.Gwak,andS.S avarese,“4dspatio-temporalcon vnets:Minkowskiconvo lutionalneuralnetwor ks,”inProceedingsoftheIE EE/CVFConferenceonCo mputerVisionandPatte rnRecognition,2019,p p.3075–3084.[31]X.Lu,C.Ma,B.Ni,X.Yan g,I.Reid,andM. Animageisworth 16x16words: Transformersforimage recognitionatscale”arXivpreprintarXiv:2 010.11929, 2020.[30]C.Choy,J.Gwak,andS.S avarese, “4dspatio-temporalcon vnets:Minkowskiconvo lutionalneuralnetwor ks”inProceedingsoftIEEE /CVFConferenceonComp uterVisionandPattern Recognition,2019,pp. 3075–3084.[31]X.Lu,C.Ma,B.Ni,I.Rei d,andM。 0.20
-H.Yang,“Deepregressiontracki ngwithshrinkageloss, ”inProceedingsoftheEu ropeanconferenceonco mputervision(ECCV),2 018,pp.353–369.[32]W.L.Hamilton,R.Ying, andJ.Leskovec,“Inductiverepresentat ionlearningonlargegr aphs,”inProceedingsofthe31 stInternationalConfe renceonNeuralInforma tionProcessingSystem s,2017,pp.1025–1035.[33]G.Grisetti,R.K¨ummerle,H.Strasdat,a ndK.Konolige,“g2o:Ageneralframewor kfor(hyper)graphopti mization,”inProceedingsoftheIE EEInternationalConfe renceonRoboticsandAu tomation(ICRA),Shang hai,China,2011,pp.9–13.[34]N.Greene,M.Kass,andG .Miller,“Hierarchicalz-buffervisibility,”inProceedingsofthe20 thAnnualConferenceon ComputerGraphicsandI nteractiveTechniques ,ser.SIGGRAPH’93.NewYork,NY,USA:As sociationforComputin gMachinery,1993,p.23 1–238.[Online]. -H.Yang,“Deepregressiontracki ngwithshrinkageloss, ”inProceedingsoftheEu ropeanconferenceonco mputervision(ECCV),2 018,pp.353–369.[32]W.L.Hamilton,R.Ying, andJ.Leskovec,“Inductiverepresentat ionlearningonlargegr aphs,”inProceedingsofthe31 stInternationalConfe renceonNeuralInforma tionProcessingSystem s,2017,pp.1025–1035.[33]G.Grisetti,R.K¨ummerle,H.Strasdat,a ndK.Konolige,“g2o:Ageneralframewor kfor(hyper)graphopti mization,”inProceedingsoftheIE EEInternationalConfe renceonRoboticsandAu tomation(ICRA),Shang hai,China,2011,pp.9–13.[34]N.Greene,M.Kass,andG .Miller,“Hierarchicalz-buffervisibility,”inProceedingsofthe20 thAnnualConferenceon ComputerGraphicsandI nteractiveTechniques ,ser.SIGGRAPH’93.NewYork,NY,USA:As sociationforComputin gMachinery,1993,p.23 1–238.[Online]. 0.17
Available:https://do i.org/10.1145/166117 .166147[35]N.Chawla,K.Bowyer,L. Hall,andW.P. 利用可能:https://doi.org/10. 1145/166117.166147[35]N.Chawla,K.Bowyer,L. Hall,およびW.P。 0.48
Kegelmeyer,“Smote:Syntheticminor ityover-samplingtech nique,”J.Artif.Intell.Res. Kegelmeyer, “Smote: Syntheticminorityove r-samplingtechnique”. J.Artif.Intell.Res.c om 0.30
,vol.16,pp.321–357,2002.[36]T.Qin,P.Li,andS.Shen ,“Vins-mono:Arobustand versatilemonocularvi sual-inertialstatees timator,”IEEETransactionsonRo botics,vol.34,no.4,p p.1004–1020,2018.[37]C.Campos,R.Elvira,J. J.G.Rodr´ıguez,J.M.Montiel,and J.D. ,vol.16,pp.321–357,2002.[36]T.Qin,P.Li,andS.Shen , “Vins-mono: Arobustandversatilem on eyevisual-inertialst ateestimator”,IEEETransactionsonR obotics,vol.34,no.4, pp.1004–1020,2018.[37]C.Campos,R.Elvira,J. J.G.Rodr ́guez,J.M.Montiel,and J.D.”。 0.27
Tard´os,“Orb-slam3:Anaccurate open-sourcelibraryfo rvisual,visual–inertial,andmultimap slam,”IEEETransactionsonRo botics,2021.[38]P.Geneva,K.Eckenhoff,andG.Huang,“Alinear-complexityek fforvisual-inertialn avigationwithloopclo sures,”in2019InternationalC onferenceonRoboticsa ndAutomation(ICRA). Tard ́os, “Orb-slam3:Anaccurate open-sourcelibraryfo rvisual,visual-inert ial,andmultimapslam”,IEEETransactionsonR obotics,2021.[38]P.Geneva,K.Eckenhoff ,andG.Huang, “Alinear-complexityek fforvisual-inertialn avigation withloopclosures”,in2019International ConferenceonRobotics andAutomation (ICRA)”。 0.24
IEEE,2019,pp.3535–3541.[39]A.ConchaandJ.Civera, “Rgbdtam:Acost-effectiveandaccuratergb -dtrackingandmapping system,”in2017IEEE/RSJintern ationalconferenceoni ntelligentrobotsands ystems(IROS). IEEE, 2019,pp.3535–3541.[39]A.ConchaandJ.Civera, “Rgbdtam:A Cost- Effectiveandaccurate rgb-dtrackingandmapp ingsystems”, in2017IEEE/RSJintern ationalconferenceoni ntelligentrobotsands ystems (IROS)。 0.17
IEEE,2017,pp.6756–6763. IEEE,2017,pp.6756–6763。 0.26
英語(論文から抽出)日本語訳スコア
[40]I.Melekhov,J.Ylioina s,J.Kannala,andE.Rah tu,“Image-basedlocalizat ionusinghourglassnet works,”inProceedingsoftheIE EEinternationalconfe renceoncomputervisio nworkshops,2017,pp.8 79–886.[41]F.Walch,C.Hazirbas,L .Leal-Taixe,T.Sattle r,S.Hilsenbeck,andD. Cremers,“Image-basedlocalizat ionusinglstmsforstru cturedfeaturecorrela tion,”inProceedingsoftheIE EEInternationalConfe renceonComputerVisio n,2017,pp.627–637.[42]M.Bui,C.Baur,N.Navab ,S.Ilic,andS.Albarqo uni,“Adversarialnetworksf orcameraposeregressi onandrefinement,”inProceedingsoftheIE EE/CVFInternationalC onferenceonComputerV isionWorkshops,2019, pp.0–0.[43]J.Wu,L.Ma,andX.Hu,“Delvingdeeperintocon volutionalneuralnetw orksforcamerarelocal ization,”in2017IEEEInternatio nalConferenceonRobot icsandAutomation(ICR A). [40]I.Melekhov,J.Ylioina s,J.Kannala,andE.Rah tu,“Image-basedlocalizat ionusinghourglassnet works,”inProceedingsoftheIE EEinternationalconfe renceoncomputervisio nworkshops,2017,pp.8 79–886.[41]F.Walch,C.Hazirbas,L .Leal-Taixe,T.Sattle r,S.Hilsenbeck,andD. Cremers,“Image-basedlocalizat ionusinglstmsforstru cturedfeaturecorrela tion,”inProceedingsoftheIE EEInternationalConfe renceonComputerVisio n,2017,pp.627–637.[42]M.Bui,C.Baur,N.Navab ,S.Ilic,andS.Albarqo uni,“Adversarialnetworksf orcameraposeregressi onandrefinement,”inProceedingsoftheIE EE/CVFInternationalC onferenceonComputerV isionWorkshops,2019, pp.0–0.[43]J.Wu,L.Ma,andX.Hu,“Delvingdeeperintocon volutionalneuralnetw orksforcamerarelocal ization,”in2017IEEEInternatio nalConferenceonRobot icsandAutomation(ICR A). 0.12
IEEE,2017,pp.5644–5651.[44]M.Cai,C.Shen,andI.Re id,“Ahybridprobabilistic modelforcamerareloca lization,”2019.[45]X.Wang,X.Wang,C.Wang ,X.Bai,J.Wu,andE.R. ieee, 2017,pp.5644-5651.[44]m.cai,c.shen,andi.re id, “ahybridprobabilistic model for camera relocalization”2019.[45]x.wang,x.wang,c.wang ,x.bai,j.wu,ande.r 0.26
Hancock,“Discriminativefeatur esmatter:Multi-layer bilinearpoolingforca meralocalization,”inBritishMachineVisi onConference.York,20 19.[46]R.Clark,S.Wang,A.Mar kham,N.Trigoni,andH. Wen,“Vidloc:Adeepspatio-t emporalmodelfor6-dof video-cliprelocaliza tion,”inProceedingsoftheIE EEConferenceonComput erVisionandPatternRe cognition,2017,pp.68 56–6864.[47]H.GaoandS.Ji,“Graphu-nets,”ininternationalconfe renceonmachinelearni ng.PMLR,2019,pp.2083 –2092.[48]F.Xue,X.Wang,Z.Yan,Q .Wang,J.Wang,andH.Zh a,“Localsupportsglobal: Deepcamerarelocaliza tionwithsequenceenha ncement,”inProceedingsoftheIE EE/CVFInternationalC onferenceonComputerV ision,2019,pp.2841–2850. Hancock,“Discriminativefeatur esmatter:Multi-layer bilinearpoolingforca meralocalization,”inBritishMachineVisi onConference.York,20 19.[46]R.Clark,S.Wang,A.Mar kham,N.Trigoni,andH. Wen,“Vidloc:Adeepspatio-t emporalmodelfor6-dof video-cliprelocaliza tion,”inProceedingsoftheIE EEConferenceonComput erVisionandPatternRe cognition,2017,pp.68 56–6864.[47]H.GaoandS.Ji,“Graphu-nets,”ininternationalconfe renceonmachinelearni ng.PMLR,2019,pp.2083 –2092.[48]F.Xue,X.Wang,Z.Yan,Q .Wang,J.Wang,andH.Zh a,“Localsupportsglobal: Deepcamerarelocaliza tionwithsequenceenha ncement,”inProceedingsoftheIE EE/CVFInternationalC onferenceonComputerV ision,2019,pp.2841–2850. 0.16
英語(論文から抽出)日本語訳スコア
Attention-basedRepre sentationsinDeepRein forcementLearningfor AutonomousDrivingI.A PPENDIXA:EXPERIMENTD ETAILSA.ProposedMode lDetailsAdditionalde tailsofourproposedmo del,whichin-cludesth eattentionpipelinean dthereinforcementlea rningpipeline,andthe ircorrespondingtrain ingschemesareout-lin edbelow.AttentionPip eline.Theattentionpi pelineisusedtoconstr uctaninformativeatte ntionmapfromasinglec olorimageinfourmains teps:semanticsegment ationviaU-Netsegment ationnetwork[?],boundaryextraction, boundarydiffusion[?],andinversedepthfusi on(fromground-truthd epthmaps). Attention-basedRepre sentationsinDeepRein forcementLearningfor AutonomousDrivingI.A PPENDIXA:EXPERIMENTD ETAILSA.ProposedMode lDetailsAdditionalde tailsofourproposedmo del,whichin-cludesth eattentionpipelinean dthereinforcementlea rningpipeline,andthe ircorrespondingtrain ingschemesareout-lin edbelow.AttentionPip eline.Theattentionpi pelineisusedtoconstr uctaninformativeatte ntionmapfromasinglec olorimageinfourmains teps:semanticsegment ationviaU-Netsegment ationnetwork[?],boundaryextraction, boundarydiffusion[?],andinversedepthfusi on(fromground-truthd epthmaps). 0.06
Ofthesecomponents,th eU-Netsegmentationne tworkistheonlylearni ngbasedmoduleandispr e-trainedinsimulator withthefollowingsche me:0.001learningrate ,0.001weightdecay,2e -5decayrate,stepsche duler,andcross-entro pylossfunction.Notet hatnogradientupdates aremadetothesegmenta tionnetworkduringthe reinforcementlearnin gprocess,renderingth eattentionpipelineas fixedstaterepresentati onmodule.However,wee nabletheRLagenttoaug mentthestaterepresen tations(i.e.attentio nmaps)accordingtothe taskthroughattention augmentationasdescri bedinthepaper.Theuns upervisedvariantofth eattentionpipelineis showninFig.1,relying onaclassicaledgedete ctionalgorithmforloc atingobjectboundarie sratherthansegmentat ion.Notethatthispipe linecontainsnolearni ng-basedcomponents,a ndusesafixedLaplacianofGaussi an(LoG)kernel(3x3)to producetheboundaryma p.AlthoughtheLoGedge detectionalgorithmis moresusceptibletoenv ironmentalnoise(e g leaves,snow),ourexpe rimentalresultsindic atethattheunsupervis edpipelinestillstron glyimprovesupontheen d-to-endRLbaselinesa ndiscomparabletoourf ullmodel.Reinforceme ntLearningPipeline.O urproposedre-inforce mentlearningmodelfol lowstheDQN[?]setup,andfeaturessev eralkeyextensions-th emostprominentofthem beingtheintegrationw ithafixedstateattentionmod ule,butalsoauniqueco nfigurationofneuralnetw orksthatpromotesfast ,stable,androbustsel f-drivingbehaviourle arning.Networkarchit ecturespecificsandhyperparameters arepresentedinTable. II.B. Ofthesecomponents,th eU-Netsegmentationne tworkistheonlylearni ngbasedmoduleandispr e-trainedinsimulator withthefollowingsche me:0.001learningrate ,0.001weightdecay,2e -5decayrate,stepsche duler,andcross-entro pylossfunction.Notet hatnogradientupdates aremadetothesegmenta tionnetworkduringthe reinforcementlearnin gprocess,renderingth eattentionpipelineas fixedstaterepresentati onmodule.However,wee nabletheRLagenttoaug mentthestaterepresen tations(i.e.attentio nmaps)accordingtothe taskthroughattention augmentationasdescri bedinthepaper.Theuns upervisedvariantofth eattentionpipelineis showninFig.1,relying onaclassicaledgedete ctionalgorithmforloc atingobjectboundarie sratherthansegmentat ion.Notethatthispipe linecontainsnolearni ng-basedcomponents,a ndusesafixedLaplacianofGaussi an(LoG)kernel(3x3)to producetheboundaryma p.AlthoughtheLoGedge detectionalgorithmis moresusceptibletoenv ironmentalnoise(e g leaves,snow),ourexpe rimentalresultsindic atethattheunsupervis edpipelinestillstron glyimprovesupontheen d-to-endRLbaselinesa ndiscomparabletoourf ullmodel.Reinforceme ntLearningPipeline.O urproposedre-inforce mentlearningmodelfol lowstheDQN[?]setup,andfeaturessev eralkeyextensions-th emostprominentofthem beingtheintegrationw ithafixedstateattentionmod ule,butalsoauniqueco nfigurationofneuralnetw orksthatpromotesfast ,stable,androbustsel f-drivingbehaviourle arning.Networkarchit ecturespecificsandhyperparameters arepresentedinTable. II.B. 0.05
BaselineModelDetails Sinceourmodelisbased offtheDQNframework,w ebenchmarkit’sperformanceagainstt hetraditionalDQN[?]frameworkandDDPG[?]thatfeaturesactorand criticnetworks.Botho fthesemodelslearndir ectlyfromcolorimages (end-to-end)andareup datedbythesamelossLa yer(type)OutputShape Param#Conv2d-1[-1,64,256,256]1,792Conv2d-4[-1,64,256,256]36,928Conv2d-10[-1,128,128,128]73,856Conv2d-13[-1,128,128,128]147,584Conv2d-19[-1,256,64,64]295,168Conv2d-22[-1,256,64,64]590,080Conv2d-28[-1,512,32,32]1,180,160Conv2d-31[-1,512,32,32]2,359,808Conv2d-37[-1,512,16,16]2,359,808Conv2d-40[-1,512,16,16]2,359,808ConvTranspo se2d-45[-1,512,32,32]1,049,088Conv2d-46[-1,256,32,32]2,359,552Conv2d-49[-1,256,32,32]590,080ConvTranspose 2d-54[-1,256,64,64]262,400Conv2d-55[-1,128,64,64]589,952Conv2d-58[-1,128,64,64]147,584ConvTranspose 2d-63[-1,128,128,128]65,664Conv2d-64[-1,64,128,128]147,520Conv2d-67[-1,64,128,128]36,928ConvTranspose2 d-72[-1,64,256,256]16,448Conv2d-73[-1,64,256,256]73,792Conv2d-76[-1,64,256,256]36,928Conv2d-81[-1,1,256,256]65Totalparams:14,788 ,929Trainableparams: 14,788,929TABLEIUNSU PERVISEDENCODER-DECO DERMODELARCHITECTURE .LayersLayersinputsh ape84x84inputshape32 x32inputshape16x16Co nvConvconv1(1,32,8,1 ,1)conv1(1,32,5,1,1) conv1(1,32,3,1,1)con v2(32,64,5,1,1)conv2 (32,64,3,1,1)conv2(3 2,64,1,1,1)conv3(64, 64,3,1,1)conv3(64,64 ,1,1,1)conv3(64,64,1 ,1,1)output[64,4,4][64,4,4][64,4,4]LSTMLSTM[32,192,16](windowsize,concated features,featurevect or)LSTM(16,128)outpu t[32,192,128](windowsize,concated features,featurevect or)LinearLinearLinea r(24576,512)Linear(5 12,2)output[32,2]Learningrate:0.0001, playinterval:900,tar getupdateinterval:10 00Replaymemory:50000 ,epsilonstart:1,epsi lonend:0.01,epsilond ecay:100000TABLEIILS TMDQNARCHITECTUREAND HYPER-PARAMETERS.fun ctionandsparsereward sasourproposedmethod forconsistency;seeTable.IIIforarchi tectureandtrainingde -tails.Weextendthese twobaselinesbyintegr atingthemwiththeatte ntionpipelineandempl oyingattentionaugmen tationwiththesameuns upervisedencoder/dec odernetwork;thus,comparisonhighl ightstheeffectofourm ulti-scalelatentenco dingandLSTMQ-network inourmethod.Weobserv easignificantimprovementtothe lower-boundconvergen ceofourbaselinesafte rincorporatingattent ionaugmentation.Also ,asubstantialincreas eindataefficiencyandreducedvari anceisobservedwhentr ainingourmethod.Ther esultsarXiv:2205.058 61v1 [cs.CV] 12 May 2022 BaselineModelDetails Sinceourmodelisbased offtheDQNframework,w ebenchmarkit’sperformanceagainstt hetraditionalDQN[?]frameworkandDDPG[?]thatfeaturesactorand criticnetworks.Botho fthesemodelslearndir ectlyfromcolorimages (end-to-end)andareup datedbythesamelossLa yer(type)OutputShape Param#Conv2d-1[-1,64,256,256]1,792Conv2d-4[-1,64,256,256]36,928Conv2d-10[-1,128,128,128]73,856Conv2d-13[-1,128,128,128]147,584Conv2d-19[-1,256,64,64]295,168Conv2d-22[-1,256,64,64]590,080Conv2d-28[-1,512,32,32]1,180,160Conv2d-31[-1,512,32,32]2,359,808Conv2d-37[-1,512,16,16]2,359,808Conv2d-40[-1,512,16,16]2,359,808ConvTranspo se2d-45[-1,512,32,32]1,049,088Conv2d-46[-1,256,32,32]2,359,552Conv2d-49[-1,256,32,32]590,080ConvTranspose 2d-54[-1,256,64,64]262,400Conv2d-55[-1,128,64,64]589,952Conv2d-58[-1,128,64,64]147,584ConvTranspose 2d-63[-1,128,128,128]65,664Conv2d-64[-1,64,128,128]147,520Conv2d-67[-1,64,128,128]36,928ConvTranspose2 d-72[-1,64,256,256]16,448Conv2d-73[-1,64,256,256]73,792Conv2d-76[-1,64,256,256]36,928Conv2d-81[-1,1,256,256]65Totalparams:14,788 ,929Trainableparams: 14,788,929TABLEIUNSU PERVISEDENCODER-DECO DERMODELARCHITECTURE .LayersLayersinputsh ape84x84inputshape32 x32inputshape16x16Co nvConvconv1(1,32,8,1 ,1)conv1(1,32,5,1,1) conv1(1,32,3,1,1)con v2(32,64,5,1,1)conv2 (32,64,3,1,1)conv2(3 2,64,1,1,1)conv3(64, 64,3,1,1)conv3(64,64 ,1,1,1)conv3(64,64,1 ,1,1)output[64,4,4][64,4,4][64,4,4]LSTMLSTM[32,192,16](windowsize,concated features,featurevect or)LSTM(16,128)outpu t[32,192,128](windowsize,concated features,featurevect or)LinearLinearLinea r(24576,512)Linear(5 12,2)output[32,2]Learningrate:0.0001, playinterval:900,tar getupdateinterval:10 00Replaymemory:50000 ,epsilonstart:1,epsi lonend:0.01,epsilond ecay:100000TABLEIILS TMDQNARCHITECTUREAND HYPER-PARAMETERS.fun ctionandsparsereward sasourproposedmethod forconsistency;seeTable.IIIforarchi tectureandtrainingde -tails.Weextendthese twobaselinesbyintegr atingthemwiththeatte ntionpipelineandempl oyingattentionaugmen tationwiththesameuns upervisedencoder/dec odernetwork;thus,comparisonhighl ightstheeffectofourm ulti-scalelatentenco dingandLSTMQ-network inourmethod.Weobserv easignificantimprovementtothe lower-boundconvergen ceofourbaselinesafte rincorporatingattent ionaugmentation.Also ,asubstantialincreas eindataefficiencyandreducedvari anceisobservedwhentr ainingourmethod.Ther esultsarXiv:2205.058 61v1 [cs.CV] 12 May 2022 0.13
英語(論文から抽出)日本語訳スコア
Depth ImageBoundary MapRGB ImageDepth ImageLoG Edge DetectionBoundary DiffusionInverse Depth FusionDiffused Boundary MapAttention MapFig.1.Unsupervise dAttentionPipeline.C onstructsanattention mapfromRGBimageandde pthmapinputsusingaLa placianofGaussianedg edetectorasopposedto semanticsegmentation .Notethattheboundari esareslightlynoisier duetoartifactsonther oadwhichgeneratesatt entioninthoseregions .DQNDDPGLayer(type)O utputShapeParam#Laye r(type)OutputShapePa ram#Conv2d(4,32,8,4, 0)[-1,32,20,20]8,224Conv2d(4,32,8,4 ,0)[-1,32,40,40]3,232Conv2d(32,64,4, 2,0)[-1,64,9,9]32,832Conv2d(32,64,4 ,2,0)[-1,64,38,38]18,496Conv2d(64,64,3 ,1,0)[-1,64,7,7]36,928Conv2d(64,64,3 ,1,0)[-1,64,38,38]4,160Linear(3236,512 )[-1,512]1,606,144Linear(9241 6+1,512)[-1,512]47,318,016Linear(512 ,2)[-1,2]1,026Linear(512,2)[-1,2]1,026Totalparams:1,6 85,154Totalparams:47 ,344,930Trainablepar ams:1,685,154Trainab leparams:47344930Ple aseRefertoTableIIBuf fersize:100000,batch size:32,γ:0.99τ:0.001,LRA:0.0001,LR C:0.001Explore:1,000 ,000Reward0.915Rewar d1.357τ:targetnetworkhyper- parameterLRA:learnin grateforactornetwork LRC:learningrateforc riticnetworkTABLEIII DQNANDDDPGMODELARCHI TECTUREANDHYPER-PARA METERS.ofallexperime ntsareaveragedover7r uns. Depth ImageBoundary MapRGB ImageDepth ImageLoG Edge DetectionBoundary DiffusionInverse Depth FusionDiffused Boundary MapAttention MapFig.1.Unsupervise dAttentionPipeline.C onstructsanattention mapfromRGBimageandde pthmapinputsusingaLa placianofGaussianedg edetectorasopposedto semanticsegmentation .Notethattheboundari esareslightlynoisier duetoartifactsonther oadwhichgeneratesatt entioninthoseregions .DQNDDPGLayer(type)O utputShapeParam#Laye r(type)OutputShapePa ram#Conv2d(4,32,8,4, 0)[-1,32,20,20]8,224Conv2d(4,32,8,4 ,0)[-1,32,40,40]3,232Conv2d(32,64,4, 2,0)[-1,64,9,9]32,832Conv2d(32,64,4 ,2,0)[-1,64,38,38]18,496Conv2d(64,64,3 ,1,0)[-1,64,7,7]36,928Conv2d(64,64,3 ,1,0)[-1,64,38,38]4,160Linear(3236,512 )[-1,512]1,606,144Linear(9241 6+1,512)[-1,512]47,318,016Linear(512 ,2)[-1,2]1,026Linear(512,2)[-1,2]1,026Totalparams:1,6 85,154Totalparams:47 ,344,930Trainablepar ams:1,685,154Trainab leparams:47344930Ple aseRefertoTableIIBuf fersize:100000,batch size:32,γ:0.99τ:0.001,LRA:0.0001,LR C:0.001Explore:1,000 ,000Reward0.915Rewar d1.357τ:targetnetworkhyper- parameterLRA:learnin grateforactornetwork LRC:learningrateforc riticnetworkTABLEIII DQNANDDDPGMODELARCHI TECTUREANDHYPER-PARA METERS.ofallexperime ntsareaveragedover7r uns. 0.12
                     ページの最初に戻る

翻訳にはFugu-Machine Translatorを利用しています。