Semantic segmentation is applied extensively in autonomous driving and
intelligent transportation with methods that highly demand spatial and semantic
information. Here, an STDC-MA network is proposed to meet these demands. First,
the STDC-Seg structure is employed in STDC-MA to ensure a lightweight and
efficient structure. Subsequently, the feature alignment module (FAM) is
applied to understand the offset between high-level and low-level features,
solving the problem of pixel offset related to upsampling on the high-level
feature map. Our approach implements the effective fusion between high-level
features and low-level features. A hierarchical multiscale attention mechanism
is adopted to reveal the relationship among attention regions from two
different input sizes of one image. Through this relationship, regions
receiving much attention are integrated into the segmentation results, thereby
reducing the unfocused regions of the input image and improving the effective
utilization of multiscale features. STDC- MA maintains the segmentation speed
as an STDC-Seg network while improving the segmentation accuracy of small
objects. STDC-MA was verified on the verification set of Cityscapes. The
segmentation result of STDC-MA attained 76.81% mIOU with the input of 0.5x
scale, 3.61% higher than STDC-Seg.
School of Computer Science and Information Security
コンピュータ科学・情報セキュリティ研究科
0.69
School of Computer Science and Information Security
コンピュータ科学・情報セキュリティ研究科
0.69
Guilin University of Electronic Technology Guilin University of Electronic Technology
ギリン電子工学大学 ギリン電子工学大学
0.45
Xiaochun Lei
Xiaochun (複数形 Xiaochuns)
0.16
Linjun Lu China, GuiLin 541010 lxc8125@guet.edu.cn
林純(りんじゅん) 中国、GuiLin 5410 lxc8125@guet.edu.cn
0.29
Zetao Jiang∗
Zetao Jiang∗
0.44
China, GuiLin 541010
中国、GuiLin541010
0.37
zetaojiang@guet.edu. cn
zetaojiang@guet.edu. cn
0.29
Chang Lu China, GuiLin 541010 Changlu@keter.top
チャン・ル 中国、GuiLin 541010 Changlu@keter.top
0.43
China, GuiLin 541010
中国、GuiLin541010
0.37
linjunlu@zerorains.t op
linjunlu@zerorains.t op
0.39
Zhaoting Gong
Zhaoting Gong
0.43
China, GuiLin 541010 gavin@gong.host
中国、GuiLin 541010gavin@gong.hos t
0.85
Jiaming Liang
Jiaming (複数形 Jiamings)
0.33
China, GuiLin 541010
中国、GuiLin541010
0.37
me@puqing.work
me@puqing.work
0.39
School of Computer Science and Information Security
コンピュータ科学・情報セキュリティ研究科
0.69
School of Computer Science and Information Security
コンピュータ科学・情報セキュリティ研究科
0.69
Guilin University of Electronic Technology Guilin University of Electronic Technology
ギリン電子工学大学 ギリン電子工学大学
0.45
School of Computer Science and Information Security
コンピュータ科学・情報セキュリティ研究科
0.69
School of Computer Science and Information Security
コンピュータ科学・情報セキュリティ研究科
0.69
Guilin University of Electronic Technology Guilin University of Electronic Technology
ギリン電子工学大学 ギリン電子工学大学
0.45
ABSTRACT Semantic segmentation is applied extensively in autonomous driving and intelligent transportation with methods that highly demand spatial and semantic information.
Subsequently, the feature alignment module (FAM) is applied to understand the offset between high-level and low-level features, solving the problem of pixel offset related to upsampling on the high-level feature map.
Our approach implements the effective fusion between high-level features and low-level features.
提案手法は,高次特徴と低次特徴との効果的な融合を実現する。
0.57
A hierarchical multiscale attention mechanism is adopted to reveal the relationship among attention regions from two different input sizes of one image.
Through this relationship, regions receiving much attention are integrated into the segmentation results, thereby reducing the unfocused regions of the input image and improving the effective utilization of multiscale features.
1 Introduction Semantic segmentation is s a classic computer vision task adopted widely in autonomous driving, video surveillance, robot perception, etc.
1) Cropping or adjusting the size of the input image to reduce the computational cost of the image segmentation.
1)画像セグメンテーションの計算コストを削減するため、入力画像のサイズを刻むか調整する。
0.71
However, this approach realizes the loss of spatial information [1, 2].
しかし,本手法は空間情報の喪失 [1, 2] を実現する。
0.77
2) Increasing the speed of model inference by reducing the number of channels for semantic segmentation, which successively reduces the space capacity of the model [3–5].
3) In pursuit of a compact framework, part of the downsample layers may be abandoned, which reduces the receptive field of the model and become insufficient to cover large objects.
Notably, this approach may be associated with poor discrimination ability [5].
特に、このアプローチは差別能力の低い[5]に関連付けられます。
0.66
Researchers developed a U-shape network structure to compensate for the loss of spatial details, which gradually improves spatial information [3, 5–7].
BiSeNet [8] employs a two-stream structure to replace the U-shaped structure and encodes spatial features and semantic information separately to produce excellent segmentation effects.
However, the independent semantic encoding branch of BiseNet generates time-consuming calculations.
しかし、bisenetの独立意味符号化ブランチは、時間消費計算を生成する。
0.61
Furthermore, pre-trained models from other tasks (including image classification) in the semantic branch of BiseNet are inefficient in semantic segmentation tasks.
It eliminates feature redundancy on branches and utilizes edge detail information from ground truth to guide the spatial features learning.
枝上の特徴冗長性を排除し、地平からのエッジディテール情報を利用して空間的特徴学習を導く。
0.72
The STDC-Seg network has achieved satisfactory results in accuracy and speed; however, it does not consider the effect of different scale images on the network.
The segmentation accuracy of small objects is low in small-scale images but can achieve excellent results in large-scale images.
小型物体のセグメンテーション精度は, 小型画像では低いが, 大規模画像では良好な結果が得られる。
0.78
On the other hand, the segmentation effect of large objects (especially background) is poor in large-scale images but can be distinguished well in small-scale images.
Therefore, we integrate the hierarchical multiscale attention mechanism into the STDC-Seg network to allow the model to learn the relationship of regions between different scales through attention.
Simultaneously, STDC-Seg does not consider the problem of feature alignment during feature aggregation in the ARM module.
同時にSTDC-SegはARMモジュールの機能集約時の機能アライメントの問題を考慮していない。
0.76
Direct relationship between the pixels of the local feature map and the upsampled feature map causes inconsistency of the context, further decreasing the classification accuracy in the prediction.
This approach improves the effective application of multiscale features and solves the problem of rough segmentation in some regions, achieved using a single-scale image.
At the same time, we employed the feature alignment module (FAM) and feature selection module (FSM) described previously [11] to replace the original ARM module.
The backbone network mainly extracts the main features in the image, and its structure significantly impacts the performance of the segmentation network.
MobileNetV1 [14] uses deep separable con-volution to reduce FLOPs (Floating-point operations, used to measure the complexity of algorithms/models) in the inference stage.
GhostNet [17] adopts a few primitive convolution operations plus a series of simple linear changes to generate more features to reduce the overall parameters and calculations.
The two strategies to ensure segmentation accuracy and speed in real-time semantic segmentation include
リアルタイム意味セグメンテーションにおけるセグメンテーションの精度と速度を保証する2つの戦略
0.74
1) Lightweight backbone network.
1)軽量バックボーンネットワーク。
0.71
LRNet [18] adopts factorized convolution block (FCB) to establish long-distance relationships and implement a lightweight and efficient feature extraction network.
In [20] high-level features of the encoder structure integrate all channel maps through dense channel relationships learned by the channel correlation coefficient attention module to refine the mask of output.
Hierarchical multiscale attention mechanism [10] realizes dense feature aggregation between any two scales by learning the attention relationship between images of different scales.
We integrated the hierarchical multiscale attention mechanism based on the STDC-Seg work to solve the impact of different scales on the segmentation work.
In the SegNet [3] network, the encoder stores the position information of the maximum pooling and employs the index of the maximum pooling in the decoder for upsampling.
The RoI Align, [25] avoids quantization calculation, and the value of each RoI is calculated by bilinear interpolation, solving the problem of feature misalignment associated with quantization in RoI Pooling.
3 Proposed Methods 3.1 Short-Term Dense Concatenate with Multiscale Attention and Alignment Network
3つの提案方法 3.1 複数スケールの注意とアライメントネットワークを有する短期密結合体
0.68
Our work employs the feature alignment module [11] and the hierarchical multiscale attention mechanism [10] to the STDC-Seg network and designs a short-term dense concatenate with multiscale attention and alignment (STDC-MA) network.
The Feature Alignment Module learns the offset between high-level and low-level features and introduces a feature selection module to generate low-level feature maps with rich spatial information.
This method combines the offset with enhanced low-level features.
この方法はオフセットと低レベル特徴の強化を組み合わせたものである。
0.60
It solves the problem of pixel offset during the fusion of high-level and low-level features, fully utilizing the high-level and low-level image features.
The hierarchical multiscale attention mechanism learns the relationship of attention regions from two different input sizes of one image to compound the attention from different receptive fields.
Investigation of hierarchical multiscale attention demonstrated that the output masks for different scale inputs differ even if the input is derived from the same image [10].
As such, hierarchical multiscale attention proposes to learn the relationship between the attention regions at different scales of one image to integrate the attention regions in different receptive fields.
Although a larger receptive field was obtained in these designs, different areas of interest corresponding to different scales were not recognized clearly.
Figure 2: Multiscale fusion of any two-scale of inputs.
図2: 任意の2スケールの入力のマルチスケール融合。
0.78
The structure of the Spatial Attention module is shown in Fig 1.
空間注意モジュールの構造は、図1に示される。
0.68
Figure 3: The structure of the STDC-Align network.
図3:STDC-Alignネットワークの構造。
0.73
FAM denotes the Feature Alignment Model.
FAMは特徴アライメントモデルを表す。
0.80
FFM denotes the
FFM (複数形 FFMs)
0.50
Feature Fusion Module in STDC-Seg [9].
STDC-Seg[9]の機能融合モジュール。
0.79
single feature maps.
シングル・フィーチャー・マップ
0.64
Hierarchical multiscale attention learns the relationship between any two input scales, effectively reducing the consumption of excessive attention mechanism calculations.
Let S = S1, S2, ..., SN denotes the collection of images with different N scales.
S = S1, S2, ..., SN は異なる N スケールの画像の集合を表す。
0.81
Si(1 ≤ i ≤ N ) denotes the ith scale of the image, and the scale of Si is smaller than the Si+1.
Si(1 ≤ i ≤ N ) は画像の等級を表し、Si の等級は Si+1 よりも小さい。
0.64
The fusion of hierarchical multiscale attention modules involves a series of fusions between any higher-level feature map and the corresponding lower-level feature map (Fig 2).
The hierarchical multiscale attention is integrated into the STDC-Align network to determine the feature relationship between different scales, guiding the extraction of different regions of interest to refine the segmentation mask.
The ARM module of STDC-Seg is a feature aggregation module that does not consider the problem of pixel offset during feature aggregation between different feature maps, which is solved by a practical feature alignment module.
STDC-Seg の ARM モジュールは,機能アライメントモジュールによって解決される,異なる特徴マップ間の特徴アグリゲーションにおける画素オフセットの問題を考慮していない機能アグリゲーションモジュールである。
0.85
In SegNet [3], the encoder employs the position of maximum pooling to enhance upsampling.
SegNet[3]では、エンコーダは最大プールの位置を使ってアップサンプリングを強化する。
0.67
Of note, the problem of pixel shift is solved but part of the feature information is lost in the image after max pooling, which cannot be compensated for by upsampling.
In Feature Alignment Module (FAM) [11], the feature selection module (FSM) is applied to enhance the rich spatial information of low-level feature maps, ensuring that the final alignment result is as close to the ground truth as possible.
Figure 4: The structure of the feature selection module.
図4: 特徴選択モジュールの構造。
0.65
The upper branch denotes channel attention.
上枝はチャネル注意を示す。
0.61
Mul denotes
Mul (複数形 Muls)
0.19
multiplication. 3.4 Feature Alignment and Feature Selection Module
乗算。 3.4 特徴アライメントと特徴選択モジュール
0.70
3.4.1 Feature Selection Module
3.4.1 特徴選択モジュール
0.62
The feature selection module (FSM) utilizes channel attention (corresponding to the upper branch of Fig 4) to enhance the spatial information in the low-level features.
(3) Where Pselecteddenotes the feature map after feature selection; Plow denotes the low-level feature map; φ(·) denotes the feature selection process corresponding to the FSM, which successively selects the features of the current feature map; Conv denotes 1 × 1 convolution; σ(·) denotes the sigmoid function; Wselection denotes learnable parameters.
(3) Pselecteddenotes the feature map after feature selection; Plowは低レベル特徴写像; φ(·)はFSMに対応する特徴選択過程を表し、これは現在の特徴写像の特徴を連続的に選択する; Convは1×1の畳み込み; σ(·)はシグモイド関数; Wselectionは学習可能なパラメータを表す。
0.88
In the implementation, the learned parameters, Wselectionand Plow, are constructed into channel attention to realize the selection function of the feature selection module.
The structure of the feature selection module is outlined in Fig 4.
特徴選択モジュールの構造は、図4に概説されている。
0.89
3.4.2 Feature Alignment Module
3.4.2 機能アライメントモジュール
0.58
Feature alignment module (FAM) employs deformable convolution (DCN) [27] to learn the offset between the high-level feature map and the FSM-derived feature map.
(4) Paligned = f ([Conv([Pselected, Phigh]), Phigh]) + Pselected
(4) Paligned = f ([Conv([Pselected, Phigh]), Phigh]) + Pselected
0.37
(5) Where Paligned denotes the aligned feature map; f (·) denotes the deformable convolution (corresponding to DCN in Fig. 5)); Conv denotes 1 × 1 convolution; [·,·] denotes the channel-wise concat of two feature maps.
In implementing the feature alignment module, the high-level feature map is upsampled to the same size as the feature map selected by the feature selection module before concatenating.
At the same time, the deformable convolution is employed to calculate the concatenate result to achieve the effect of feature alignment.
同時に、変形可能な畳み込みを用いて連結結果の計算を行い、特徴アライメントの効果を達成する。
0.71
Lastly, the selected feature map and the aligned feature map are added by pixel.
最後に、選択した特徴マップと整列した特徴マップをピクセルで追加する。
0.72
The structure of the feature alignment module is shown in Fig 5.
特徴アライメントモジュールの構造は、図5に示す。
0.70
6 Avg Pool1×1 ConvSigmoid1×1 ConvmulPSelectedPLow
6 Avg Pool1×1 ConvSigmoid1×1 ConvmulPSelectedPLow
0.35
英語(論文から抽出)
日本語訳
スコア
Running Title for Header
ヘッダーのランニング・タイトル
0.77
Figure 5: Structure of the feature alignment module.
図5: 機能アライメントモジュールの構造。
0.69
DCN denotes the deformable convolution.
DCNは変形可能な畳み込みを表す。
0.65
4 Experimental Results and Discussion 4.1 Dataset
4 実験結果と考察 4.1 データセット
0.65
The presently established method is implemented using the Cityscapes [12] dataset, a widely used semantic scene analysis dataset, including scenes between different cities from the perspective of a vehicle-mounted camera.
The dataset comprises 30 classes of labels, 19 of which are utilized for semantic segmentation tasks; these images have a high resolution of 2048x1024.
In most cases, the Cityscapes data set is used in pre-training models of vision models for autonomous driving, therefore, poses a challenge for semantic segmentation tasks.
A total of 60,000 iterations were trained using batch size 8, and mIOU was applied for validation, measuring the overlap degree between segmentation results and ground truth.
In this section, the effectiveness of each part of the STDC-MA network was verified gradually.
本稿では,STDC-MAネットワークの各部分の有効性を徐々に検証した。
0.76
However, in future experiments, we shall make improvements based on the work of the STDC-Seg [9] network and evaluate the model on the Cityscapes [12] validation data set.
Notably, because this module does not account for feature alignment, it was substituted with the feature alignment module (FAM) [11], and we proposed our STDC-Align network.
4.3.2 Ablation for Hierarchical Multi-scale Attention
4.3.2階層的マルチスケール注意のためのアブレーション
0.46
Here, the hierarchical multiscale attention mechanism [10] is utilized in the STDC-Seg network [9], with the view that this method can identify different parts of interest between different scales and achieve complementary advantages.
Subsequently, the results of different scale combinations (the scale can be chosen in [0.25x, 0.5x, 1.0x, 1.5x, 2.0x]) are tested on the Cityscapes [12] validation data set.
In the second and third rows, the STDC-Seg network mistakenly predicted the railings.
2行目と3行目では、STDC-Segネットワークが誤って手すりを予測した。
0.46
In the fourth and fifth rows, our STDC-MA network demonstrates a smoother result in predicting the pedestrian, similar to the ground truth, and better than the STDC-Seg network.
5 Conclusions The STDC-MA network, integrating hierarchical multiscale attention mechanism [10] and feature selection module [11], is proposed for the semantic segmentation task.
The hierarchical multiscale attention mechanism is employed to learn the relationship of attention regions from two different input sizes of one image.
階層的多スケール注意機構を用いて、2つの画像の入力サイズから注意領域の関係を学習する。
0.70
Through this relationship, the different regions that the attention is focused on are integrated into the segmentation results.
この関係を通じて、注意が集中する異なる領域がセグメンテーション結果に統合される。
0.61
The present method makes up for the defect of the STDC-Seg network in the multiscale concern problem and improves the accuracy of the small object segmentation.
Rethinking bisenet for real-time semantic segmentation.
リアルタイムセマンティクスセグメンテーションのためのbisenetの再検討。
0.56
Computer Vision and Pattern Recognition (CVPR), pages 9716–9725, June 2021.
コンピュータビジョン・パターン認識(cvpr)、9716-9725ページ、2021年6月。
0.71
[10] Andrew Tao, Karan Sapra, and Bryan Catanzaro.
10]アンドリュー・タオ、カラン・サプラ、ブライアン・カタンザロ
0.48
Hierarchical multi-scale attention for semantic segmentation.
セマンティクスセグメンテーションのための階層的マルチスケール注意
0.73
ArXiv, abs/2005.10821, 2020.
arxiv、abs/2005.10821、2020年。
0.53
[11] Shihua Huang, Zhichao Lu, Ran Cheng, and Cheng He.
[11]シワ・フン、ジチャオ・ル、ラン・チェン、チェン・ヘ。
0.40
Fapn: Feature-aligned pyramid network for dense image
Fapn:高密度画像のための特徴整列ピラミッドネットワーク
0.68
prediction. ArXiv, abs/2108.07058, 2021.
予測だ ArXiv, abs/2108.07058, 2021。
0.51
[12] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele。 訳抜け防止モード: 12] マリウス・コルデ、モハメド・オムラン、セバスチャン・ラモス timo rehfeld, markus enzweiler, rodrigo benenson, uwe franke, ステファン・ロスと バーント・シエレ
0.56
The cityscapes dataset for semantic urban scene understanding.
意味的都市景観理解のための都市景観データセット。
0.66
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) にて、2016年6月に発表された。
0.83
[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
[13]開明、Xiangyu Zhang、Shaoqing Ren、Jian Sun。
0.26
Deep residual learning for image recognition.
画像認識のための深い残差学習
0.81
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
院 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016年6月開催。
0.59
[14] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam.
12]Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam。 訳抜け防止モード: [14 ]Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto そしてハルトヴィヒ・アダム。
0.82
Mobilenets: Efficient convolutional neural networks for mobile vision applications.
[15] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer.
Forrest N. Iandola、Matthew W. Moskewicz、Khalid Ashraf、Song Han、William J. Dally、Kurt Keutzer。 訳抜け防止モード: 15 ]forrest n. iandola, matthew w. moskewicz, khalid ashraf, ソング・ハン、ウィリアム・j・ダリー、カート・キューツァー。
0.64
Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size.
Squeezenet: 50倍少ないパラメータと<1mbモデルサイズのAlexnetレベルの精度。
0.81
ArXiv, abs/1602.07360, 2016.
arxiv、abs/1602.07360、2016年。
0.55
[16] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun.
Shufflenet: An extremely efficient convolutional neural network for mobile devices.
Shufflenet: モバイルデバイス用の極めて効率的な畳み込みニューラルネットワーク。
0.83
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018年6月。 訳抜け防止モード: コンピュータビジョンとパターン認識に関するIEEE会議(CVPR)の開催報告 2018年6月。
0.73
[17] Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu.
【17]カイ・ハン、ユンヘ・ワン、チー・天、ジャンユアン・グオ、チュンジュ・ジュ、チャン・ク
0.43
Ghostnet: More features from cheap operations.
Ghostnet: 安価な操作による機能追加。
0.80
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) に参加して 訳抜け防止モード: IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR) に参加して 2020年6月。
In 2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW), pages 1–6, 2020.
2020年、IEEE International Conference on Multimedia Expo Workshops (ICMEW)、1-6頁。
0.72
[19] Hanchao Li, Pengfei Xiong, Haoqiang Fan, and Jian Sun.
[19]ハンチョ・リ、Pengfei Xiong、Haoqiang Fan、Jian Sun。
0.59
Dfanet: Deep feature aggregation for real-time semantic segmentation.
Dfanet: リアルタイムセマンティックセグメンテーションのためのディープ機能アグリゲーション。
0.59
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019年6月。 訳抜け防止モード: IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR) に参加して 2019年6月。
0.88
[20] Dongli Wang, Nanjun Li, Yan Zhou, and Jinzhen Mu.
[20]ドングリ・ワン、ナンジュン・リー、ヤン・ジュ、ジンジン・ム。
0.49
Bilateral attention network for semantic segmentation.
pyramid. IET Image Processing, 15(13):3142–3152, 2021.
ピラミッド iet画像処理, 15(13):3142-3152, 2021。
0.60
[25] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick.
25]彼、ジョージア・グキオクサーリ、ピョートル・ドル、ロス・ガーシック
0.50
Mask r-cnn. In Proceedings of the IEEE
仮面r-cnn。 IEEEの成果
0.37
International Conference on Computer Vision (ICCV), Oct 2017.
国際コンピュータビジョン会議(ICCV) 2017年10月。
0.73
[26] Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy.
[26]新潮、ケルビン・C・チャン、ケユ、チャオ・ドン、チェンチェン・チェン・ロイ。
0.55
Edvr: Video restoration with enhanced deformable convolutional networks.
edvr: 変形可能な畳み込みネットワークを備えたビデオ復元。
0.71
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019 訳抜け防止モード: IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops に参加して 2019年6月。
0.89
[27] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei.