Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20191226となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 雑音耐性学習のための特徴注意グラフ畳み込みネットワーク Feature-Attention Graph Convolutional Networks for Noise Resilient Learning ( http://arxiv.org/abs/1912.11755v1 ) ライセンス: Link先を確認	Min Shi, Yufei Tang, Xingquan Zhu and Jianxun Liu	(参考訳) ノイズと不整合は、人間のプライバシーやユーザーのプライバシーに固有のエラーが発生するため、現実世界の情報ネットワークに一般的に存在する。これまで、ノードの内容とトポロジ構造を統合することで、最新のGraph Convolutional Networks(GCN)や注目GCNなど、機能学習をネットワークから進めるための大きな努力が続けられてきた。しかし、既存の手法はすべてネットワークをエラーのないソースとみなし、各ノードの機能内容は独立であり、ノード関係のモデル化に等しく重要であるとして扱う。誤ったノードコンテンツとスパース機能を組み合わせることで、実世界のノイズの多いネットワークで使用される既存のメソッドに不可欠な課題を提供する。本稿では,ノイズの多いノード内容のネットワークを扱うための特徴注意グラフ畳み込み学習フレームワークであるFA-GCNを提案する。各ノードのノイズやスパースコンテンツに対処するため、fa-gcnはまずlong short-term memory (lstm) ネットワークを使用して、各特徴の密表現を学ぶ。隣接ノード間の相互作用をモデル化するために、隣接ノードが接続に関して特徴の重要性を学習し、変化させることができる機能アテンション機構が導入された。スペクトルベースのグラフ畳み込み集約プロセスを用いることで、各ノードは、対応する学習課題に対応する最も決定的な近傍特徴に集中することができる。実験と検証、すなわち異なるノイズレベルは、FA-GCNがノイズのないネットワークとノイズのないネットワークの両方で最先端の手法よりも優れた性能を発揮することを示した。 Noise and inconsistency commonly exist in real-world information networks, due to inherent error-prone nature of human or user privacy concerns. To date, tremendous efforts have been made to advance feature learning from networks, including the most recent Graph Convolutional Networks (GCN) or attention GCN, by integrating node content and topology structures. However, all existing methods consider networks as error-free sources and treat feature content in each node as independent and equally important to model node relations. The erroneous node content, combined with sparse features, provide essential challenges for existing methods to be used on real-world noisy networks. In this paper, we propose FA-GCN, a feature-attention graph convolution learning framework, to handle networks with noisy and sparse node content. To tackle noise and sparse content in each node, FA-GCN first employs a long short-term memory (LSTM) network to learn dense representation for each feature. To model interactions between neighboring nodes, a feature-attention mechanism is introduced to allow neighboring nodes learn and vary feature importance, with respect to their connections. By using spectral-based graph convolution aggregation process, each node is allowed to concentrate more on the most determining neighborhood features aligned with the corresponding learning task. Experiments and validations, w.r.t. different noise levels, demonstrate that FA-GCN achieves better performance than state-of-the-art methods on both noise-free and noisy networks.	翻訳日:2023-06-10 08:39:01 公開日:2019-12-26
# 相関コスト付き多視点ステレオの逆深さ回帰学習 Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume ( http://arxiv.org/abs/1912.11746v1 ) ライセンス: Link先を確認	Qingshan Xu and Wenbing Tao	(参考訳) 深層学習は多視点ステレオ(MVS)の深部推論に有効であることが示されている。しかし、この領域ではスケーラビリティと正確性は依然として未解決の問題である。これはメモリ消費コストのボリューム表現と不適切な深さ推論に起因する。ステレオマッチングにおけるグループワイド相関に着想を得て,軽量なコストボリュームを構築するための平均グループワイド相関類似度尺度を提案する。これにより、メモリ消費を削減できるだけでなく、コストボリュームフィルタリングの計算負担を軽減できる。実効的なコスト容積表現に基づいて,コスト容積を正規化して性能をさらに向上するカスケード3次元U-Netモジュールを提案する。多視点深度推論を深度回帰問題や逆深度分類問題として扱う従来の手法とは異なり、多視点深度推論を逆深度回帰問題として再放送する。これにより,サブピクセル推定が可能となり,大規模シーンに適用できる。 DTUデータセットとタンク・アンド・テンプルデータセットに関する広範な実験を通して、我々の提案する相関コストボリュームと逆深さ回帰(CIDER)によるネットワークが最先端の結果を達成し、スケーラビリティと精度に優れた性能を示すことを示す。 Deep learning has shown to be effective for depth inference in multi-view stereo (MVS). However, the scalability and accuracy still remain an open problem in this domain. This can be attributed to the memory-consuming cost volume representation and inappropriate depth inference. Inspired by the group-wise correlation in stereo matching, we propose an average group-wise correlation similarity measure to construct a lightweight cost volume. This can not only reduce the memory consumption but also reduce the computational burden in the cost volume filtering. Based on our effective cost volume representation, we propose a cascade 3D U-Net module to regularize the cost volume to further boost the performance. Unlike the previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. Through extensive experiments on DTU dataset and Tanks and Temples dataset, we show that our proposed network with Correlation cost volume and Inverse DEpth Regression (CIDER), achieves state-of-the-art results, demonstrating its superior performance on scalability and accuracy.	翻訳日:2023-06-10 08:37:49 公開日:2019-12-26
# 平面前処理型パッチマッチマルチビューステレオ Planar Prior Assisted PatchMatch Multi-View Stereo ( http://arxiv.org/abs/1912.11744v1 ) ライセンス: Link先を確認	Qingshan Xu and Wenbing Tao	(参考訳) 3次元モデルの完全性は、低テクスチャ領域における信頼性の低い光度一貫性のため、マルチビューステレオ(MVS)では依然として難しい問題である。低テクスチャ領域は通常強い平面性を示すため、平面モデルは低テクスチャ領域の深さ推定に有利である。一方、PatchMatchのマルチビューステレオはサンプリングおよび伝搬方式において非常に効率的である。本稿では,平面モデルとパッチマッチ多視点ステレオを利用して,平面事前支援パッチマッチ多視点ステレオフレームワークを提案する。詳細は確率的グラフィカルモデルを用いて、平面モデルをPatchMatchマルチビューステレオに埋め込み、新しいマルチビュー集約マッチングコストを貢献する。この新しいコストは、フォトメトリック一貫性と平面互換性の両方を考慮しており、非平面領域と平面領域の両方の深さ推定に適している。実験結果から,本手法は極めて低テクスチャ領域の深度情報を効率よく回収し,高精度な3Dモデルと最先端性能を実現することができることがわかった。 The completeness of 3D models is still a challenging problem in multi-view stereo (MVS) due to the unreliable photometric consistency in low-textured areas. Since low-textured areas usually exhibit strong planarity, planar models are advantageous to the depth estimation of low-textured areas. On the other hand, PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme. By taking advantage of planar models and PatchMatch multi-view stereo, we propose a planar prior assisted PatchMatch multi-view stereo framework in this paper. In detail, we utilize a probabilistic graphical model to embed planar models into PatchMatch multi-view stereo and contribute a novel multi-view aggregated matching cost. This novel cost takes both photometric consistency and planar compatibility into consideration, making it suited for the depth estimation of both non-planar and planar regions. Experimental results demonstrate that our method can efficiently recover the depth information of extremely low-textured areas, thus obtaining high complete 3D models and achieving state-of-the-art performance.	翻訳日:2023-06-10 08:37:31 公開日:2019-12-26
# HTTP上の動的適応ストリーミングのためのアンサンブルレート適応フレームワーク An Ensemble Rate Adaptation Framework for Dynamic Adaptive Streaming Over HTTP ( http://arxiv.org/abs/1912.11822v1 ) ライセンス: Link先を確認	Hui Yuan, Xiaoqian Hu, Junhui Hou, Xuekai Wei, and Sam Kwong	(参考訳) HTTP(DASH)上の動的適応ストリーミングにおいて、レート適応は最も重要な問題のひとつである。ネットワーク帯域幅の頻繁な変動とビデオコンテンツの複雑な変動のため、単一レート適応法を用いて、ネットワーク条件や動画コンテンツを完璧に扱うことは困難である。本稿では,DASHのためのアンサンブルレート適応フレームワークを提案する。このフレームワークに関係する複数の手法の利点を活用し,ユーザ体験の質(QoE)を向上させることを目的とする。提案されたフレームワークはシンプルだが、非常に効果的である。具体的には,提案するフレームワークは,メソッドプールとメソッドコントローラという2つのモジュールから構成される。メソッド・プールでは、いくつかのレート・アダプ・テイション・メソッドが統合される。各決定時刻に最適なqoeを達成する方法のみを選択し、要求されたビデオセグメントのビットレートを決定する。また,最も優れたqoeを提供する方法を決定するための方法コントローラに対して,切替方式,即席切換方式,間欠的切換方式の2つの戦略を提案する。シミュレーションの結果,提案するフレームワークは,常にチャネル環境やビデオの複雑さの変化に対して高いQoEを達成していることがわかった。 Rate adaptation is one of the most important issues in dynamic adaptive streaming over HTTP (DASH). Due to the frequent fluctuations of the network bandwidth and complex variations of video content, it is difficult to deal with the varying network conditions and video content perfectly by using a single rate adaptation method. In this paper, we propose an ensemble rate adaptation framework for DASH, which aims to leverage the advantages of multiple methods involved in the framework to improve the quality of experience (QoE) of users. The proposed framework is simple yet very effective. Specifically, the proposed framework is composed of two modules, i.e., the method pool and method controller. In the method pool, several rate adap tation methods are integrated. At each decision time, only the method that can achieve the best QoE is chosen to determine the bitrate of the requested video segment. Besides, we also propose two strategies for switching methods, i.e., InstAnt Method Switching, and InterMittent Method Switching, for the method controller to determine which method can provide the best QoEs. Simulation results demonstrate that, the proposed framework always achieves the highest QoE for the change of channel environment and video complexity, compared with state-of-the-art rate adaptation methods.	翻訳日:2023-06-10 08:32:13 公開日:2019-12-26
# 分解圧縮空間におけるロバスト辞書学習によるハイブリッド表現の学習 Learning Hybrid Representation by Robust Dictionary Learning in Factorized Compressed Space ( http://arxiv.org/abs/1912.11785v1 ) ライセンス: Link先を確認	Jiahuan Ren, Zhao Zhang, Sheng Li, Yang Wang, Guangcan Liu, Shuicheng Yan, Meng Wang	(参考訳) 本稿では,頑健な辞書学習 (dl) について検討し,因子化圧縮空間における有価な低ランク表現とスパース表現のハイブリッドを探索する。共同ロバスト因子化と投影辞書学習(J-RFDL)モデルを提案する。 J-RFDLの設定は、データ内の外れ値やノイズに対するロバスト性を向上し、再構成誤差をより正確に符号化し、正確な復元能力を有するハイブリッド唾液係数を得ることにより、データ表現を改善することを目的としている。特に、J-RFDLは、分解された圧縮空間におけるDLによるロバスト表現を行い、結果に対するノイズや外れ値の影響を排除し、DLプロセスの効率も向上する。符号化プロセスをデータのノイズに頑健にするため、j-rfdlはスパースl2、1-ノルムを使用しており、リコンストラクションエラーの行をゼロにすることで、ファクタライゼーションとリコンストラクションエラーを最小化することができる。 J-RFDLは、与えられたデータを適切に再構成するために優れた構造を持つ健全な係数を提供するために、埋め込み係数に結合した低ランクかつスパースな制約を合成辞書で課す。また,J-RFDLを結合分類用として拡張し,その分類誤差を最小化することにより,学習者の識別能力を向上する識別的J-RFDLモデルを提案する。公開データセットに関する広範な実験は、私たちの定式化が、他の最先端の方法よりも優れたパフォーマンスを提供できることを証明します。 In this paper, we investigate the robust dictionary learning (DL) to discover the hybrid salient low-rank and sparse representation in a factorized compressed space. A Joint Robust Factorization and Projective Dictionary Learning (J-RFDL) model is presented. The setting of J-RFDL aims at improving the data representations by enhancing the robustness to outliers and noise in data, encoding the reconstruction error more accurately and obtaining hybrid salient coefficients with accurate reconstruction ability. Specifically, J-RFDL performs the robust representation by DL in a factorized compressed space to eliminate the negative effects of noise and outliers on the results, which can also make the DL process efficient. To make the encoding process robust to noise in data, J-RFDL clearly uses sparse L2, 1-norm that can potentially minimize the factorization and reconstruction errors jointly by forcing rows of the reconstruction errors to be zeros. To deliver salient coefficients with good structures to reconstruct given data well, J-RFDL imposes the joint low-rank and sparse constraints on the embedded coefficients with a synthesis dictionary. Based on the hybrid salient coefficients, we also extend J-RFDL for the joint classification and propose a discriminative J-RFDL model, which can improve the discriminating abilities of learnt coeffi-cients by minimizing the classification error jointly. Extensive experiments on public datasets demonstrate that our formulations can deliver superior performance over other state-of-the-art methods.	翻訳日:2023-06-10 08:30:53 公開日:2019-12-26
# ロボットエレベータ・ボタン認識のための視点歪みの自動除去 Autonomous Removal of Perspective Distortion for Robotic Elevator Button Recognition ( http://arxiv.org/abs/1912.11774v1 ) ライセンス: Link先を確認	Delong Zhu, Jianbang Liu, Nachuan Ma, Zhe Min, and Max Q.-H. Meng	(参考訳) エレベータボタン認識は,移動ロボットの自律エレベータ操作を実現する上で欠かせない機能であると考えられる。しかし、好ましくない画像条件と様々な画像歪みにより、認識精度は向上していない。本稿では,エレベーターパネル画像の視点歪みを自律的に補正するアルゴリズムを提案する。このアルゴリズムはまずGaussian Mixture Model(GMM)を用いてボタン認識結果に基づいてグリッドフィッティング処理を行い、次に推定されたグリッドセンターを基準としてカメラの動きを推定して視点歪みを補正する。このアルゴリズムは、1つの画像を自律的に実行し、明示的な特徴検出や特徴マッチングの手順を必要としない。このアルゴリズムの有効性を検証するために,異なる角度から撮影された50画像のエレベータパネルデータセットを収集した。実験の結果,提案アルゴリズムはカメラの動きを正確に推定し,視点歪みを効果的に除去できることがわかった。 Elevator button recognition is considered an indispensable function for enabling the autonomous elevator operation of mobile robots. However, due to unfavorable image conditions and various image distortions, the recognition accuracy remains to be improved. In this paper, we present a novel algorithm that can autonomously correct perspective distortions of elevator panel images. The algorithm first leverages the Gaussian Mixture Model (GMM) to conduct a grid fitting process based on button recognition results, then utilizes the estimated grid centers as reference features to estimate camera motions for correcting perspective distortions. The algorithm performs on a single image autonomously and does not need explicit feature detection or feature matching procedure, which is much more robust to noises and outliers than traditional feature-based geometric approaches. To verify the effectiveness of the algorithm, we collect an elevator panel dataset of 50 images captured from different angles of view. Experimental results show that the proposed algorithm can accurately estimate camera motions and effectively remove perspective distortions.	翻訳日:2023-06-10 08:30:23 公開日:2019-12-26
# チームスポーツにおける結果を予測するための機械学習技術の応用:レビュー The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review ( http://arxiv.org/abs/1912.11762v1 ) ライセンス: Link先を確認	Rory Bunker (1), Teo Susnjak (2) ((1) Nagoya Institute of Technology, Japan, (2) Massey University, Auckland, New Zealand)	(参考訳) 過去20年間で、スポーツの結果を予測するために機械学習(ML)技術がますます利用されるようになった。本稿では,チームスポーツの結果を予測するためにMLを用いた研究のレビューを行い,1996年から2019年までの研究を取り上げる。我々はこの分野の論文を幅広く調査しながら、5つの重要な研究課題に答えようとした。本稿は、この分野でMLアルゴリズムが使われる傾向にあること、および、成功した結果が出現し始めていることを考察する。本研究は,本アプリケーション領域における精度評価のための頑健な戦略を明らかにする。本研究は,様々なスポーツで達成されたアキュラティティーを考察し,チームスポーツの結果は本質的に他のスポーツよりも予測が困難であると考える。最後に、この研究は、すべての調査論文における将来の研究の方向性に関する共通のテーマを明らかにし、ギャップと機会を探しながら、この分野の将来の研究者への勧告を提案している。 Over the past two decades, Machine Learning (ML) techniques have been increasingly utilized for the purpose of predicting outcomes in sport. In this paper, we provide a review of studies that have used ML for predicting results in team sport, covering studies from 1996 to 2019. We sought to answer five key research questions while extensively surveying papers in this field. This paper offers insights into which ML algorithms have tended to be used in this field, as well as those that are beginning to emerge with successful outcomes. Our research highlights defining characteristics of successful studies and identifies robust strategies for evaluating accuracy results in this application domain. Our study considers accuracies that have been achieved across different sports and explores the notion that outcomes of some team sports could be inherently more difficult to predict than others. Finally, our study uncovers common themes of future research directions across all surveyed papers, looking for gaps and opportunities, while proposing recommendations for future researchers in this domain.	翻訳日:2023-06-10 08:30:05 公開日:2019-12-26
# マルチラベルグラフ畳み込みネットワーク表現学習 Multi-Label Graph Convolutional Network Representation Learning ( http://arxiv.org/abs/1912.11757v1 ) ライセンス: Link先を確認	Min Shi, Yufei Tang, Xingquan Zhu and Jianxun Liu	(参考訳) グラフベースのシステムの知識表現は多くの分野において基本的なものである。しかし、現実世界のオブジェクト(ノード)は本質的に複雑であり、しばしばリッチな意味論やラベルを含んでいる。例えば、ユーザはソーシャルネットワークの様々な関心グループに属し、多くのアプリケーションでマルチラベルネットワークとなる。マルチラベルネットワークノードは、各ノードに複数のラベルを持つだけでなく、これらのラベルは、しばしば高い相関関係にあり、既存の手法がノード表現学習において非効率であるか、あるいはそのような相関を処理できない。本稿では,マルチラベルネットワークのためのノード表現学習のための新しいマルチラベルグラフ畳み込みネットワーク(ML-GCN)を提案する。本稿では,ラベル-ラベル相関とネットワークトポロジ構造について,ノード-ラベルグラフとラベル-ラベル-ノードグラフという2つのSiamese GCNとしてモデル化する。 2つのGCNはそれぞれノードとラベルの表現学習の1つの側面を扱い、1つの目的関数の下でシームレスに統合される。学習されたラベル表現は、インナーラベルインタラクションとノードラベルプロパティを効果的に保存することができ、統合トレーニングフレームワークの下でノード表現学習を強化するために集約される。マルチラベルノード分類の実験と比較により,提案手法の有効性が検証された。 Knowledge representation of graph-based systems is fundamental across many disciplines. To date, most existing methods for representation learning primarily focus on networks with simplex labels, yet real-world objects (nodes) are inherently complex in nature and often contain rich semantics or labels, e.g., a user may belong to diverse interest groups of a social network, resulting in multi-label networks for many applications. The multi-label network nodes not only have multiple labels for each node, such labels are often highly correlated making existing methods ineffective or fail to handle such correlation for node representation learning. In this paper, we propose a novel multi-label graph convolutional network (ML-GCN) for learning node representation for multi-label networks. To fully explore label-label correlation and network topology structures, we propose to model a multi-label network as two Siamese GCNs: a node-node-label graph and a label-label-node graph. The two GCNs each handle one aspect of representation learning for nodes and labels, respectively, and they are seamlessly integrated under one objective function. The learned label representations can effectively preserve the inner-label interaction and node label properties, and are then aggregated to enhance the node representation learning under a unified training framework. Experiments and comparisons on multi-label node classification validate the effectiveness of our proposed approach.	翻訳日:2023-06-10 08:29:14 公開日:2019-12-26
# スペクトル変動する空間的ぼかし下におけるハイパースペクトル・マルチスペクトル画像融合 -高次元赤外画像への応用- Hyperspectral and multispectral image fusion under spectrally varying spatial blurs -- Application to high dimensional infrared astronomical imaging ( http://arxiv.org/abs/1912.11868v1 ) ライセンス: Link先を確認	Claire Guilloteau, Thomas Oberlin, Olivier Bern\'e and Nicolas Dobigeon	(参考訳) ハイパースペクトルイメージングは、過去数十年間、天文学者にとって貴重なデータ源となっている。現在の計器と観測時間の制約により、高い空間的かつ低いスペクトル分解能を持つマルチスペクトル画像と、低い空間的かつ高いスペクトル分解能を持つハイパースペクトル画像の直接取得が可能になる。データの科学的解釈を向上させるために,各画像の利点を組み合わせて高スペクトル分解能データキューブを復元するデータ融合法を提案する。提案された逆問題は、スペクトル変動のぼかしのような天文学機器の特異性を説明する。周波数領域と低次元部分空間の問題を解くことで高速な実装を提供し、畳み込み演算子とデータの高次元性を効率的に扱う。我々は、ジェームズ・ウェッブ宇宙望遠鏡のシミュレーション観測のリアルな合成データセットの実験を行い、この融合アルゴリズムは地球観測のためのリモートセンシングで一般的に用いられる最先端の手法よりも優れていることを示す。 Hyperspectral imaging has become a significant source of valuable data for astronomers over the past decades. Current instrumental and observing time constraints allow direct acquisition of multispectral images, with high spatial but low spectral resolution, and hyperspectral images, with low spatial but high spectral resolution. To enhance scientific interpretation of the data, we propose a data fusion method which combines the benefits of each image to recover a high spatio-spectral resolution datacube. The proposed inverse problem accounts for the specificities of astronomical instruments, such as spectrally variant blurs. We provide a fast implementation by solving the problem in the frequency domain and in a low-dimensional subspace to efficiently handle the convolution operators as well as the high dimensionality of the data. We conduct experiments on a realistic synthetic dataset of simulated observation of the upcoming James Webb Space Telescope, and we show that our fusion algorithm outperforms state-of-the-art methods commonly used in remote sensing for Earth observation.	翻訳日:2023-06-10 08:20:43 公開日:2019-12-26
# 神経ファジィ推論システムによるソフトウェア活動推定:過去と現在 Software Effort Estimation using Neuro Fuzzy Inference System: Past and Present ( http://arxiv.org/abs/1912.11855v1 ) ライセンス: Link先を確認	Aditi Sharma, Ravi Ranjan	(参考訳) プロジェクト失敗の最も重要な理由は、努力の少ない見積もりです。ソフトウェア開発の労力見積は、開発に適切なチームメンバを割り当てたり、ソフトウェア開発にリソースを割り当てたり、結合したりするために必要です。不正確なソフトウェア見積は、プロジェクトの遅延、予算過剰、あるいはプロジェクトのキャンセルにつながる可能性がある。しかし、労力推定モデルはそれほど効率的ではない。本稿では,ニューロファジィ推論システム(NFIS)の新たな評価手法について検討する。人工知能のコンポーネントとファジィ論理を融合した混合モデルであり、より良い推定を行うことができる。 Most important reason for project failure is poor effort estimation. Software development effort estimation is needed for assigning appropriate team members for development, allocating resources for software development, binding etc. Inaccurate software estimation may lead to delay in project, over-budget or cancellation of the project. But the effort estimation models are not very efficient. In this paper, we are analyzing the new approach for estimation i.e. Neuro Fuzzy Inference System (NFIS). It is a mixture model that consolidates the components of artificial neural network with fuzzy logic for giving a better estimation.	翻訳日:2023-06-10 08:19:56 公開日:2019-12-26
# 対人ロバストネスのベンチマーク Benchmarking Adversarial Robustness ( http://arxiv.org/abs/1912.11852v1 ) ライセンス: Link先を確認	Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, Jun Zhu	(参考訳) ディープニューラルネットワークは敵の例に弱いため、ディープラーニングの開発において最も重要な研究課題の1つとなっている。近年、多くの努力がなされているが、敵攻撃と防御アルゴリズムの正確かつ完全な評価を行うことは極めて重要である。本稿では,画像分類タスクにおける敵対的ロバスト性を評価するために,包括的で厳密でコヒーレントなベンチマークを確立する。代表的な攻撃法と防御法を簡潔に検討した後,2つのロバスト性曲線を公正な評価基準として大規模実験を行い,その性能を完全に把握した。評価結果に基づいて,いくつかの重要な知見を導き,今後の研究への洞察を提供する。 Deep neural networks are vulnerable to adversarial examples, which becomes one of the most important research problems in the development of deep learning. While a lot of efforts have been made in recent years, it is of great significance to perform correct and complete evaluations of the adversarial attack and defense algorithms. In this paper, we establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks. After briefly reviewing plenty of representative attack and defense methods, we perform large-scale experiments with two robustness curves as the fair-minded evaluation criteria to fully understand the performance of these methods. Based on the evaluation results, we draw several important findings and provide insights for future research.	翻訳日:2023-06-10 08:19:30 公開日:2019-12-26
# ラベルプロパゲーションとリファインメントを用いた効率的なビデオセマンティックセマンティックセグメンテーション Efficient Video Semantic Segmentation with Labels Propagation and Refinement ( http://arxiv.org/abs/1912.11844v1 ) ライセンス: Link先を確認	Matthieu Paul, Christoph Mayer, Luc Van Gool, Radu Timofte	(参考訳) 本稿では,ハイブリッドGPU/CPUを用いた高精細ビデオのリアルタイムセマンティックセマンティックセマンティック化の問題に取り組む。我々は,効率的なビデオセグメンテーション(evs)パイプラインを提案する。 i)CPU上では,映像の時間的側面を利用して,あるフレームから次のフレームへ意味情報を伝達する,非常に高速な光フロー法が用いられる。 GPUと並行して動作する。 (ii)GPUでは、2つの畳み込みニューラルネットワーク:スクラッチから密接なセマンティックラベルを予測するために使用される主セグメンテーションネットワークと、高速不整合注意モジュール(IAM)の助けを借りて、以前のフレームからの予測を改善するように設計されたRefinerである。後者は、正確に伝播できない領域を識別することができる。所望のフレームレートと精度に応じて,いくつかの操作点を提案する。我々のパイプラインは、既存のリアルタイム画像分割法(mIoU 60%以上)と競合する精度を達成し、フレームレートをはるかに高めている。高解像度フレーム(2048 x 1024)を持つ一般的なCityscapesデータセットでは、単一のGPUとCPU上で80から1000Hzの動作ポイントが提案されている。 This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video Segmentation(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. It runs in parallel with the GPU. (ii) On the GPU, two Convolutional Neural Networks: A main segmentation network that is used to predict dense semantic labels from scratch, and a Refiner that is designed to improve predictions from previous frames with the help of a fast Inconsistencies Attention Module (IAM). The latter can identify regions that cannot be propagated accurately. We suggest several operating points depending on the desired frame rate and accuracy. Our pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation(mIoU above 60%), while achieving much higher frame rates. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.	翻訳日:2023-06-10 08:19:00 公開日:2019-12-26
# スパースオートエンコーダを用いたIoTネットワークにおける異常通信検出 Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders ( http://arxiv.org/abs/1912.11831v1 ) ライセンス: Link先を確認	Mustafizur Rahman Shahid (SAMOVAR), Gregory Blanc (SAMOVAR), Zonghua Zhang (SAMOVAR), Herv\'e Debar (SAMOVAR)	(参考訳) 今日では、スマートホームやeヘルスケアなど、さまざまなスマートサービスを実現するために、IoTデバイスが広くデプロイされている。しかし、多くのIoTデバイスが脆弱であるため、セキュリティは依然として最重要課題の1つだ。さらに、IoTマルウェアは常に進化し、洗練されています。 IoTデバイスは非常に特殊なタスクを実行することを意図しているため、ネットワークの動作は合理的に安定し、予測可能であることが期待される。通常のパターンからの重要な行動偏差は異常事象を示す。本稿では,スパースオートエンコーダを用いて,IoTネットワークにおける異常なネットワーク通信を検出する手法を提案する。提案手法により、悪意のある通信を正当な通信と区別することができる。そのため、デバイスが侵害された場合、デバイスが提供するサービスが完全に中断されることなく、悪意のある通信のみを削除できる。ネットワークの振舞いを特徴付けるため,最初のNパケットの大きさの統計を用いて双方向TCPフローを抽出し,それに対応するパケット間時間に関する統計を用いて記述する。次に、スパースオートエンコーダのセットを訓練し、実験的なスマートホームネットワークによって生成された正当な通信のプロファイルを学習する。 Nの値に依存すると、開発モデルは86.9%から91.2%の攻撃検出率と0.1%から0.5%の偽陽性率を達成する。 Nowadays, IoT devices have been widely deployed for enabling various smart services, such as, smart home or e-healthcare. However, security remains as one of the paramount concern as many IoT devices are vulnerable. Moreover, IoT malware are constantly evolving and getting more sophisticated. IoT devices are intended to perform very specific tasks, so their networking behavior is expected to be reasonably stable and predictable. Any significant behavioral deviation from the normal patterns would indicate anomalous events. In this paper, we present a method to detect anomalous network communications in IoT networks using a set of sparse autoencoders. The proposed approach allows us to differentiate malicious communications from legitimate ones. So that, if a device is compromised only malicious communications can be dropped while the service provided by the device is not totally interrupted. To characterize network behavior, bidirectional TCP flows are extracted and described using statistics on the size of the first N packets sent and received, along with statistics on the corresponding inter-arrival times between packets. A set of sparse autoencoders is then trained to learn the profile of the legitimate communications generated by an experimental smart home network. Depending on the value of N, the developed model achieves attack detection rates ranging from 86.9% to 91.2%, and false positive rates ranging from 0.1% to 0.5%.	翻訳日:2023-06-10 08:18:23 公開日:2019-12-26
# 準ニュートン信頼地域政策最適化 Quasi-Newton Trust Region Policy Optimization ( http://arxiv.org/abs/1912.11912v1 ) ライセンス: Link先を確認	Devesh Jha, Arvind Raghunathan, Diego Romeres	(参考訳) 本稿では,ヘシアンに対して準ニュートン近似を用いた信頼領域最適化手法である準ニュートン信頼領域最適化qntrpoを提案する。勾配降下は連続制御による強化学習タスクのためのデファクトアルゴリズムである。このアルゴリズムは、幅広いタスクにわたる強化学習において、最先端のパフォーマンスを達成した。しかし、アルゴリズムには多くの欠点がある: ステップの欠如選択基準の欠如、収束の遅さ。政策最適化のために,ドレグステップと準ニュートン近似を用いた信頼領域法について検討した。我々は, サンプル数の観点から, 選択が効率的で, 性能が向上する, 幅広い難解な連続制御タスクについて, 数値実験により実証する。 We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization QNTRPO. Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance	翻訳日:2023-06-10 08:11:57 公開日:2019-12-26
# 有限温度における多体系の変分的アプローチ A variational approach for many-body systems at finite temperature ( http://arxiv.org/abs/1912.11907v1 ) ライセンス: Link先を確認	Tao Shi, Eugene Demler, J. Ignacio Cirac	(参考訳) 密度行列に対する非線形微分方程式を導入し,自由エネルギーの単調な減少とギブス熱状態の一定点に達する。この方程式を用いて多体系の平衡状態を分析するための変分的アプローチを構築し、電子-フォノン系におけるポラロン変換のようなユニタリ変換によって得られる一般化と同様に、全てのボソニックおよびフェルミオンガウス状態を含む幅広い状態に適用可能であることを証明した。我々は、この方法をBCS格子ハミルトンでベンチマークし、2次元のホルシュタインモデルに適用する。後者では,BCS対流の弱い相互作用における遷移と強い相互作用における極性状態の遷移を再現し,超伝導と電荷密度波の位相分離を示す。 We introduce a non-linear differential flow equation for density matrices that provides a monotonic decrease of the free energy and reaches a fixed point at the Gibbs thermal state. We use this equation to build a variational approach for analyzing equilibrium states of many-body systems and demonstrate that it can be applied to a broad class of states, including all bosonic and fermionic Gaussian states, as well as their generalizations obtained by unitary transformations, such as polaron transformations, in electron-phonon systems. We benchmark this method with a BCS lattice Hamiltonian and apply it to the Holstein model in two dimensions. For the latter, our approach reproduces the transition between the BCS pairing regime at weak interactions and the polaronic regime at stronger interactions, displaying phase separation between superconducting and charge-density wave phases.	翻訳日:2023-06-10 08:11:43 公開日:2019-12-26
# 回転予測を用いたドメイン適応のための簡易ベースライン A simple baseline for domain adaptation using rotation prediction ( http://arxiv.org/abs/1912.11903v1 ) ライセンス: Link先を確認	Ajinkya Tejankar and Hamed Pirsiavash	(参考訳) 近年、ドメイン適応は、多くの応用でホットな研究領域となっている。目標は、アノテーション付きデータが少ないドメインでトレーニングされたモデルを別のドメインに適応させることだ。そこで本研究では,自己教師あり学習に基づく単純かつ効果的な手法を提案する。本手法は,対象領域におけるランダムな回転(自己監督)と,対象領域に対する正しいラベル(教師付き)と,対象領域における自己蒸留の2段階を含む。提案手法は,DomainNetデータセット上での半教師付きドメイン適応の最先端化を実現する。さらに、人気のあるドメイン適応ベンチマークのラベルなしのターゲットデータセットは、テストカテゴリとは別にカテゴリを含まないことを観察する。これは、多くの実際のアプリケーションに存在しないバイアスをもたらすと信じています。このバイアスをラベルのないデータから除去すると、最先端の手法の性能が大幅に低下するのに対し、単純な手法は比較的堅牢であることを示す。 Recently, domain adaptation has become a hot research area with lots of applications. The goal is to adapt a model trained in one domain to another domain with scarce annotated data. We propose a simple yet effective method based on self-supervised learning that outperforms or is on par with most state-of-the-art algorithms, e.g. adversarial domain adaptation. Our method involves two phases: predicting random rotations (self-supervised) on the target domain along with correct labels for the source domain (supervised), and then using self-distillation on the target domain. Our simple method achieves state-of-the-art results on semi-supervised domain adaptation on DomainNet dataset. Further, we observe that the unlabeled target datasets of popular domain adaptation benchmarks do not contain any categories apart from testing categories. We believe this introduces a bias that does not exist in many real applications. We show that removing this bias from the unlabeled data results in a large drop in performance of state-of-the-art methods, while our simple method is relatively robust.	翻訳日:2023-06-10 08:11:27 公開日:2019-12-26
# 3DFR: シーン独立変更検出のためのSwift 3D機能削減フレームワーク 3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection ( http://arxiv.org/abs/1912.11891v1 ) ライセンス: Link先を確認	Murari Mandal, Vansh Dhar, Abhishek Mishra, Santosh Kumar Vipparthi	(参考訳) 本稿では,シーン独立型変化検出のための3次元特徴量削減フレームワーク(3DFR)を提案する。 3DFRフレームワークは、3つの機能ストリームで構成されている: 迅速な3D機能リダミストストリーム(AvFeat)、現代機能ストリーム(ConFeat)、時間中央機能マップ。これらの多面的フォアグラウンド/バックグラウンド機能はエンコーダ/デコーダネットワークによってさらに洗練される。その結果,提案フレームワークは時間変化を検知するだけでなく,高レベルの外観特徴を学習する。したがって、オブジェクトセマンティクスを組み込んで、効果的な変更検出を行う。さらに,ネットワークの堅牢性と一般化能力を示すために,シーン独立評価方式を用いて提案手法の有効性を検証した。提案手法の性能はベンチマークcdnet 2014データセットで評価される。実験の結果,提案した3DFRネットワークは最先端のアプローチよりも優れていた。 In this paper we propose an end-to-end swift 3D feature reductionist framework (3DFR) for scene independent change detection. The 3DFR framework consists of three feature streams: a swift 3D feature reductionist stream (AvFeat), a contemporary feature stream (ConFeat) and a temporal median feature map. These multilateral foreground/background features are further refined through an encoder-decoder network. As a result, the proposed framework not only detects temporal changes but also learns high-level appearance features. Thus, it incorporates the object semantics for effective change detection. Furthermore, the proposed framework is validated through a scene independent evaluation scheme in order to demonstrate the robustness and generalization capability of the network. The performance of the proposed method is evaluated on the benchmark CDnet 2014 dataset. The experimental results show that the proposed 3DFR network outperforms the state-of-the-art approaches.	翻訳日:2023-06-10 08:10:36 公開日:2019-12-26
# マイクロトロイダル共振器によるキラル量子ネットワークの長距離動的絡み合い生成 Microtoroidal resonators enhance long-distance dynamical entanglement generation in chiral quantum networks ( http://arxiv.org/abs/1912.11886v1 ) ライセンス: Link先を確認	Wai-Keong Mok, Davit Aghamalyan, Jia-Bin You, Leong-Chuan Kwek	(参考訳) カイラル量子ネットワークは、量子情報処理と量子通信を実現するための有望な経路を提供する。ここでは、カイラル量子ネットワークの2つの遠い量子ノードが、共通の1次元カイラル導波路を介して光子移動によって動的に絡み合う様子を述べる。キラル結合単モードリング共振器の指向性非対称性を利用して2つの原子間の絡み合い状態を生成する。 Refでは0.736よりも0.969の精度で大きな改善が提案され、分析された。 [1]. この大きな拡張は、光と物質の間の効率的なフォトニックインタフェースとして機能するマイクロトロイダル共振器の導入によって達成される。本プロトコルのノイズ間距離の変動,不完全なキラリティ,様々な変形,原子自然崩壊などの実験的不完全性に対するロバスト性を示す。本提案は,量子コンピューティングや量子情報処理の多くの応用において重要な要素である量子ネットワークにおける長距離絡み合い生成に活用できる。 Chiral quantum networks provide a promising route for realising quantum information processing and quantum communication. Here, we describe how two distant quantum nodes of chiral quantum network become dynamically entangled by a photon transfer through a common 1D chiral waveguide. We harness the directional asymmetry in chirally-coupled single-mode ring resonators to generate entangled state between two atoms. We report a concurrence of up to 0.969, a huge improvement over the 0.736 which was suggested and analyzed in great detail in Ref. [1]. This significant enhancement is achieved by introducing microtoroidal resonators which serve as efficient photonic interface between light and matter. Robustness of our protocol to experimental imperfections such as fluctuations in inter-nodal distance, imperfect chirality, various detunings and atomic spontaneous decay is demonstrated. Our proposal can be utilised for long-distance entanglement generation in quantum networks which is a key ingredient for many applications in quantum computing and quantum information processing.	翻訳日:2023-06-10 08:10:02 公開日:2019-12-26
# 視覚と言語: 視覚的知覚からコンテンツ創造へ Vision and Language: from Visual Perception to Content Creation ( http://arxiv.org/abs/1912.11872v1 ) ライセンス: Link先を確認	Tao Mei, Wei Zhang, Ting Yao	(参考訳) 視覚と言語は人間の知能の2つの基本的な能力である。人間は視覚と言語の間の相互作用を通じて日常的にタスクを実行し、自然言語記述で何を見たか、あるいは絵を幻想するユニークな人間の能力をサポートする。言語が視覚とどのように相互作用するかという有効な質問は、コンピュータビジョン領域の地平線を広げるために研究者を動機付けます。特に、「言語へのビジョン」は、おそらく過去5年間で最も人気のあるトピックの1つであり、出版物の量と、キャプション、視覚的質問応答、視覚的対話、言語ナビゲーションなどの広範囲のアプリケーションの両方で顕著に伸びている。このようなタスクは、より包括的な理解と多様な言語表現によって視覚認知を促進する。言語へのビジョン」の進歩を超えて、言語は視覚理解に寄与し、視覚コンテンツの作成の新たな可能性、すなわち「言語から言語への」可能性を提供する。このプロセスはプリズムとして機能し、言語入力に基づいて視覚コンテンツ条件を作成する。本稿では,この2つの側面,すなわち「言語へのビジョン」と「視覚への言語」の最近の進歩を概観する。より具体的には、前者は画像/ビデオキャプションの開発と、典型的なエンコーダ-デコーダ構造とベンチマークに焦点を当て、後者はビジュアルコンテンツ作成の技術を要約している。現実のデプロイメントやビジョンや言語のサービスについても詳しく説明されている。 Vision and language are two fundamental capabilities of human intelligence. Humans routinely perform tasks through the interactions between vision and language, supporting the uniquely human capacity to talk about what they see or hallucinate a picture on a natural-language description. The valid question of how language interacts with vision motivates us researchers to expand the horizons of computer vision area. In particular, "vision to language" is probably one of the most popular topics in the past five years, with a significant growth in both volume of publications and extensive applications, e.g., captioning, visual question answering, visual dialog, language navigation, etc. Such tasks boost visual perception with more comprehensive understanding and diverse linguistic representations. Going beyond the progresses made in "vision to language," language can also contribute to vision understanding and offer new possibilities of visual content creation, i.e., "language to vision." The process performs as a prism through which to create visual content conditioning on the language inputs. This paper reviews the recent advances along these two dimensions: "vision to language" and "language to vision." More concretely, the former mainly focuses on the development of image/video captioning, as well as typical encoder-decoder structures and benchmarks, while the latter summarizes the technologies of visual content creation. The real-world deployment or services of vision and language are elaborated as well.	翻訳日:2023-06-10 08:09:33 公開日:2019-12-26
# エネルギーに基づく弱測定 Energy-Based Weak Measurement ( http://arxiv.org/abs/1912.11937v1 ) ライセンス: Link先を確認	Mordecai Waegell, Cyril Elouard, Andrew N. Jordan	(参考訳) うまく局在した光子が空間的に重ねられた吸収体に入射するが吸収されないとき、光子は吸収体にエネルギーを供給できる。移動エネルギーが光子のエネルギーの不確かさに対して小さい場合、吸収器のエネルギー分布が測定装置として作用し、吸収器の強い乱れ状態が効果的な事前選択となるような、吸収器のエネルギーの異常なタイプの弱い測定となることが示されている。吸収器の最終状態をポスト選択として処理すると、吸収器のエネルギー増加はその遷移ハミルトニアンの弱い値であり、光子のエネルギー分布は反対の量で変化することが示された。非散乱の基本的な場合、次いで相互作用のないエネルギー移動の場合について検討する。結果の詳細と解釈について述べる。 When a well-localized photon is incident on a spatially superposed absorber but is not absorbed, the photon can still deliver energy to the absorber. It is shown that when the transferred energy is small relative to the energy uncertainty of the photon, this constitutes an unusual type of weak measurement of the absorber's energy, where the energy distribution of the unabsorbed photon acts as the measurement device, and the strongly disturbed state of the absorber becomes the effective pre-selection. Treating the final state of the absorber as the post-selection, it is shown that the absorber's energy increase is the weak value of its translational Hamiltonian, and the energy distribution of the photon shifts by the opposite amount. The basic case of non-scattering is examined, followed by the case of interaction-free energy transfer. Details and interpretations of the results are discussed.	翻訳日:2023-06-10 08:00:55 公開日:2019-12-26
# 物体を部品に分解した3次元点雲からの骨格抽出 Skeleton Extraction from 3D Point Clouds by Decomposing the Object into Parts ( http://arxiv.org/abs/1912.11932v1 ) ライセンス: Link先を確認	Vijai Jayadevan, Edward Delp, and Zygmunt Pizlo	(参考訳) 点雲をその成分に分解し、点雲から曲線骨格を抽出することは、関連する2つの問題である。形状をその成分に分解することは、しばしば骨格抽出の副産物として得られる。本研究では, 対象物をその部分へ分解し, 部分骨格を同定し, それらの部分骨格を連結して完全な骨格を得ることにより, 不整点雲から曲線骨格を抽出することを提案する。これは、人間がこの問題にアプローチする方法であるという意味で、骨格を抽出する最も自然な方法だと考えています。我々の部品は一般化シリンダ(gcs)です。 GCの軸はその定義の不可欠な部分なので、その部分は自然な骨格表現を持つ。我々はGCの基本特性である翻訳対称性を用いて点雲から部品を抽出する。本稿では,この手法が多種多様な形状を扱えることを示す。本手法と工法の現状を比較し,パートベースアプローチが他の手法の限界にどう対処できるかを示す。本稿では,既存のポイントセット登録アルゴリズムの改良版を示し,ポイントクラウドから部品を抽出する際の有用性を示す。また, この手法を用いて, ノイズの多い点群から骨格を抽出し, 同定する方法を示す。部分ベースのアプローチは、ユーザインタラクションの自然な直感的なインターフェースも提供する。グラフィカルユーザインタフェースの助けを借りて,ユーザインタラクションを最小限に抑えることで,ミスの修正が容易になることを示す。 Decomposing a point cloud into its components and extracting curve skeletons from point clouds are two related problems. Decomposition of a shape into its components is often obtained as a byproduct of skeleton extraction. In this work, we propose to extract curve skeletons, from unorganized point clouds, by decomposing the object into its parts, identifying part skeletons and then linking these part skeletons together to obtain the complete skeleton. We believe it is the most natural way to extract skeletons in the sense that this would be the way a human would approach the problem. Our parts are generalized cylinders (GCs). Since, the axis of a GC is an integral part of its definition, the parts have natural skeletal representations. We use translational symmetry, the fundamental property of GCs, to extract parts from point clouds. We demonstrate how this method can handle a large variety of shapes. We compare our method with state of the art methods and show how a part based approach can deal with some of the limitations of other methods. We present an improved version of an existing point set registration algorithm and demonstrate its utility in extracting parts from point clouds. We also show how this method can be used to extract skeletons from and identify parts of noisy point clouds. A part based approach also provides a natural and intuitive interface for user interaction. We demonstrate the ease with which mistakes, if any, can be fixed with minimal user interaction with the help of a graphical user interface.	翻訳日:2023-06-10 08:00:23 公開日:2019-12-26
# 一般原子集合のスパース最適化:GreedyとForward-Backwardアルゴリズム Sparse Optimization on General Atomic Sets: Greedy and Forward-Backward Algorithms ( http://arxiv.org/abs/1912.11931v1 ) ライセンス: Link先を確認	Thomas Zhang	(参考訳) スパース原子最適化の問題を考えると、「スパーシティ」という概念は、少数の原子の線形結合を意味するために一般化される。原子集合の定義は非常に広く、一般的な例としては、標準基底、低ランク行列、過剰完全辞書、置換行列、直交行列などがある。したがって、スパース原子最適化のモデルは、統計学、信号処理、機械学習、コンピュータビジョンなど、多くの分野から生じる問題を含む。具体的には、制限された強い凸(または凹凸)を最大化する問題を考える。我々は、制限された強凹凸上のグリーディアルゴリズムの線形収束率、スパースベクトル上の滑らかな関数を一般原子集合の領域に確立する最近の研究を拡張し、収束率は新しい量を含む:「スパース原子状態数」である。このことは、スパース原子最適化のための様々なグリーディアルゴリズムのフレーバーに対する最強の乗法近似を保証することにつながるが、特に、多くの興味のある設定において、このグリーディアルゴリズムは疎性を維持しながら強い近似を保証することができることを示す。さらに,同じ近似保証を実現する前向きアルゴリズムのスキームを導入する。第二に, 弱部分モジュラリティの代替概念を定め, 従来の線形収束率の証明に用いられてきたより親しみやすいバージョンと密接な関係があることを示した。この代替的な弱部分モジュラリティを用いて類似の乗法近似の保証を証明し、その特異性と応用性を確立する。 We consider the problem of sparse atomic optimization, where the notion of "sparsity" is generalized to meaning some linear combination of few atoms. The definition of atomic set is very broad; popular examples include the standard basis, low-rank matrices, overcomplete dictionaries, permutation matrices, orthogonal matrices, etc. The model of sparse atomic optimization therefore includes problems coming from many fields, including statistics, signal processing, machine learning, computer vision and so on. Specifically, we consider the problem of maximizing a restricted strongly convex (or concave), smooth function restricted to a sparse linear combination of atoms. We extend recent work that establish linear convergence rates of greedy algorithms on restricted strongly concave, smooth functions on sparse vectors to the realm of general atomic sets, where the convergence rate involves a novel quantity: the "sparse atomic condition number". This leads to the strongest known multiplicative approximation guarantees for various flavors of greedy algorithms for sparse atomic optimization; in particular, we show that in many settings of interest the greedy algorithm can attain strong approximation guarantees while maintaining sparsity. Furthermore, we introduce a scheme for forward-backward algorithms that achieves the same approximation guarantees. Secondly, we define an alternate notion of weak submodularity, which we show is tightly related to the more familiar version that has been used to prove earlier linear convergence rates. We prove analogous multiplicative approximation guarantees using this alternate weak submodularity, and establish its distinct identity and applications.	翻訳日:2023-06-10 07:59:59 公開日:2019-12-26
# cluster catch digraphsを用いたパラメータフリークラスタリング(技術報告) Parameter Free Clustering with Cluster Catch Digraphs (Technical Report) ( http://arxiv.org/abs/1912.11926v1 ) ライセンス: Link先を確認	Art\"ur Manukyan and Elvan Ceyhan	(参考訳) 本研究では,最近開発されたクラスタキャッチダイアグラム(ccds)に基づくクラスタリングアルゴリズムを提案する。これらのグラフは密度ベースのクラスタリング法とグラフベースのクラスタリング法のハイブリッドであるクラスタリング法を考案するために使用される。 CCDはクラスタの数を推定するため、クラスタリングのダイグラフをアピールするが、CCD(および密度に基づく一般的な方法)はデータセット内の仮定されたクラスタの 'emph{intensity}' を表すパラメータに関するいくつかの情報を必要とする。我々は, ccdアルゴリズムのパラメータフリーバージョンであるアルゴリズムを提案し, データセットの最適分割を求める際に, 選択が重要となるインテンシティパラメータの仕様を必要としない。空間データ解析からツールを借りて凸クラスタの数を推定し,ripleyの$k$関数を推定した。我々は、RK-CCDとして$K$関数を利用する新しいダイグラフを呼ぶ。 RK-CCDの最小支配集合は、データセット内のノイズクラスタからクラスタを推定し、区別することにより、正しいクラスタ数を推定できることを示す。我々のロバストクラスタリングアルゴリズムはクラスタ数と強度パラメータの両方を推定する手法で構成されており、完全にパラメータフリーである。我々はモンテカルロシミュレーションを行い、実生活データセットを用いてRK-CCDと一般的な密度ベースおよびプロトタイプベースのクラスタリング手法を比較した。 We propose clustering algorithms based on a recently developed geometric digraph family called cluster catch digraphs (CCDs). These digraphs are used to devise clustering methods that are hybrids of density-based and graph-based clustering methods. CCDs are appealing digraphs for clustering, since they estimate the number of clusters; however, CCDs (and density-based methods in general) require some information on a parameter representing the \emph{intensity} of assumed clusters in the data set. We propose algorithms that are parameter free versions of the CCD algorithm and does not require a specification of the intensity parameter whose choice is often critical in finding an optimal partitioning of the data set. We estimate the number of convex clusters by borrowing a tool from spatial data analysis, namely Ripley's $K$ function. We call our new digraphs utilizing the $K$ function as RK-CCDs. We show that the minimum dominating sets of RK-CCDs estimate and distinguish the clusters from noise clusters in a data set, and hence allow the estimation of the correct number of clusters. Our robust clustering algorithms are comprised of methods that estimate both the number of clusters and the intensity parameter, making them completely parameter free. We conduct Monte Carlo simulations and use real life data sets to compare RK-CCDs with some commonly used density-based and prototype-based clustering methods.	翻訳日:2023-06-10 07:59:19 公開日:2019-12-26
# 散乱に基づく光子-光子相互作用の幾何形状形成 Scattering-based geometric shaping of photon-photon interactions ( http://arxiv.org/abs/1912.11925v1 ) ライセンス: Link先を確認	Shahaf Asban and Shaul Mukamel	(参考訳) 我々は,設計した分子アーキテクチャの振動モードからの散乱放射に基づいて,相互作用するボソンの効果的なハミルトニアンを構築する。光散乱を表す空間モードの無限に可算な集合を利用することで、この基底で可変光子-光子相互作用を得る。有効ハミルトニアンハーミシティは、空間モードの重なりによって設定された幾何学的因子によって制御される。このマッピングを用いて、光の強度測定と相互作用するボソンの相関関数を、実効ハミルトニアンに従って発展させ、局所的にも非局所的に観測可能であることを示す。このアーキテクチャは、相互作用するボソンのダイナミクスをシミュレートしたり、量子コンピューティングアプリケーションにおけるマルチキュービットフォトニックゲートの設計ツールとして利用することができる。モデル系において、ボソンの活性空間の可変ホッピング、相互作用、閉じ込めを実演する。 We construct an effective Hamiltonian of interacting bosons, based on scattered radiation off vibrational modes of designed molecular architectures. Making use of the infinite yet countable set of spatial modes representing the scattering of light, we obtain a variable photon-photon interaction in this basis. The effective Hamiltonian hermiticity is controlled by a geometric factor set by the overlaps of spatial modes. Using this mapping, we relate intensity measurements of the light to correlation functions of the interacting bosons evolving according to the effective Hamiltonian, rendering local as well as nonlocal observables accessible. This architecture may be used to simulate the dynamics of interacting bosons, as well as designing tool for multi-qubit photonic gates in quantum computing applications. Variable hopping, interaction and confinement of the active space of the bosons are demonstrated on a model system.	翻訳日:2023-06-10 07:58:55 公開日:2019-12-26
# PI-GAN:多面顔合成のためのポーズ独立表現学習 PI-GAN: Learning Pose Independent representations for multiple profile face synthesis ( http://arxiv.org/abs/2001.00645v1 ) ライセンス: Link先を確認	Hamed Alqahtani	(参考訳) 複数の顔のポーズビューを単一のポーズから合成できるポーズ不変表現の生成は依然として難しい問題である。このソリューションは、マルチメディアセキュリティ、コンピュータビジョン、ロボティクスなど、さまざまな分野で要求される。 GAN(Generative Adversarial Network)は、現実的な顔合成のために識別器ネットワークに組み込まれたポーズ非依存表現を学習する能力を有するエンコーダ・デコーダ構造を持つ。本稿では,この問題を解決するために,循環型共有エンコーダ・デコーダフレームワーク pigan を提案する。従来のGANと比較して、プライマリ構造から重みを共有し、元のポーズで顔を再構築する二次エンコーダ・デコーダフレームワークで構成されている。主要なフレームワークは、アンタングル表現の作成に焦点を当てており、セカンダリフレームワークは、元の顔の復元を目指している。 CFPの高解像度でリアルなデータセットを使用して、パフォーマンスをチェックします。 Generating a pose-invariant representation capable of synthesizing multiple face pose views from a single pose is still a difficult problem. The solution is demanded in various areas like multimedia security, computer vision, robotics, etc. Generative adversarial networks (GANs) have encoder-decoder structures possessing the capability to learn pose-independent representation incorporated with discriminator network for realistic face synthesis. We present PIGAN, a cyclic shared encoder-decoder framework, in an attempt to solve the problem. As compared to traditional GAN, it consists of secondary encoder-decoder framework sharing weights from the primary structure and reconstructs the face with the original pose. The primary framework focuses on creating disentangle representation, and secondary framework aims to restore the original face. We use CFP high-resolution, realistic dataset to check the performance.	翻訳日:2023-06-10 07:52:58 公開日:2019-12-26
# 注意型グラフ畳み込みネットワークを用いたアカデミックパフォーマンス推定 Academic Performance Estimation with Attention-based Graph Convolutional Networks ( http://arxiv.org/abs/2001.00632v1 ) ライセンス: Link先を確認	Qian Hu, Huzefa Rangwala	(参考訳) 学生の学業成績予測は、学術的軌跡や学位計画、コース推薦システム、早期警告システム、助言システムを含む教育技術を強化する。学生の過去のデータ(前科の成績など)を踏まえると、学生のパフォーマンス予測の課題は将来のコースにおける生徒の成績を予測することである。アカデミックプログラムは、事前コースが将来のコースの基礎となるように構成されている。コースに必要な知識は、グラフ構造によってモデル化された複雑な関係を示す複数の事前のコースを受講することで得られる。学生のパフォーマンス予測のための伝統的な方法は、通常、複数のコース間の基礎的な関係を無視し、生徒がそれらの間の知識を取得する方法である。加えて、従来の方法は意思決定に必要な予測の解釈を提供していない。本研究では,生徒のパフォーマンス予測のための注意に基づくグラフ畳み込みネットワークモデルを提案する。大規模な公立大学から得られた実世界のデータセットについて広範な実験を行った。実験の結果,提案モデルが段階予測の面で最先端のアプローチを上回っていることがわかった。提案モデルでは,失敗や脱落のリスクがある学生を識別することで,生徒にタイムリーな介入やフィードバックを提供することができる。 Student's academic performance prediction empowers educational technologies including academic trajectory and degree planning, course recommender systems, early warning and advising systems. Given a student's past data (such as grades in prior courses), the task of student's performance prediction is to predict a student's grades in future courses. Academic programs are structured in a way that prior courses lay the foundation for future courses. The knowledge required by courses is obtained by taking multiple prior courses, which exhibits complex relationships modeled by graph structures. Traditional methods for student's performance prediction usually neglect the underlying relationships between multiple courses; and how students acquire knowledge across them. In addition, traditional methods do not provide interpretation for predictions needed for decision making. In this work, we propose a novel attention-based graph convolutional networks model for student's performance prediction. We conduct extensive experiments on a real-world dataset obtained from a large public university. The experimental results show that our proposed model outperforms state-of-the-art approaches in terms of grade prediction. The proposed model also shows strong accuracy in identifying students who are at-risk of failing or dropping out so that timely intervention and feedback can be provided to the student.	翻訳日:2023-06-10 07:52:44 公開日:2019-12-26
# 減電圧FPGAにおけるディープラーニングのレジリエンスについて On the Resilience of Deep Learning for Reduced-voltage FPGAs ( http://arxiv.org/abs/2001.00053v1 ) ライセンス: Link先を確認	Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal	(参考訳) ディープニューラルネットワーク(DNN)は本質的に計算集約的であり、パワーハングリーでもある。 Field Programmable Gate Arrays (FPGA) のようなハードウェアアクセラレータは、組み込みおよびハイパフォーマンスコンピューティング(HPC)システムの両方の要件を満たす、有望なソリューションである。 fpgaやcpuやgpuでは、名目レベル以下のアグレッシブ電圧スケーリングは、電力散逸を最小化する効果的な手法である。残念ながら、電圧がタイミングの問題によりトランジスタしきい値に近づくとビットフリップの故障が出現し始め、レジリエンスの問題が発生する。本稿では、FPGAの電圧アンダスケーリング関連故障、特にオンチップメモリにおけるDNNのトレーニングフェーズのレジリエンスを実験的に評価する。この目的に向けて、我々はLeNet-5のレジリエンスと、Rectified Linear Unit(Relu)とHyperbolic Tangent(Tanh)の異なるアクティベーション機能を持つCIFAR-10データセットのための特別設計ネットワークを実験的に評価した。最新のFPGAは、極低電圧レベルで十分に堅牢であり、低電圧関連の故障をトレーニングイテレーション中に自動的に隠蔽できるため、ECCのようなコストのかかるソフトウェアやハードウェア指向の故障軽減技術は不要である。精度のギャップを埋めるために、およそ10%のトレーニングイテレーションが必要である。この観測は、実際のFPGAファブリック上で測定された低電圧断層の比較的低い速度、すなわち <0.1\%の結果である。また,ランダムに発生した故障注入キャンペーンにより,LeNet-5ネットワークの故障率を有意に向上させ,トレーニング精度が低下し始めた。フォールトレートが増加すると、tangアクティベーション関数を持つネットワークは、精度の点でreluのネットワークを上回る。例えば、フォールトレートが30%である場合、精度の差は4.92%である。 Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue. This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1\%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%.	翻訳日:2023-06-10 07:51:40 公開日:2019-12-26
# 機械学習と埋め込みを用いたアゼルバイジャン語のテキスト分類 Text Classification for Azerbaijani Language Using Machine Learning and Embedding ( http://arxiv.org/abs/1912.13362v1 ) ライセンス: Link先を確認	Umid Suleymanov, Behnam Kiani Kalejahi, Elkhan Amrahov, Rashid Badirkhanli	(参考訳) テキスト分類システムは、アゼルバイジャン語のテキストクラスタリング問題を解決するのに役立つだろう。外国語にはいくつかのテキスト分類アプリケーションがあるが、我々はアゼルバイジャン語でこの問題を解決するために新しく開発されたシステムを構築しようとした。まず、潜在的な実践領域を見つけようとした。このシステムは、多くの分野で役に立つだろう。主にニュースフィードのカテゴリー化に使用される。ニュースサイトは自動的にスポーツ、ビジネス、教育、科学などのクラスに分類することができる。このシステムは製品レビューの感情分析にも使われている。例えば、同社はfacebookで新製品の写真を共有し、新しいプロダクトに対して1000のコメントを受け取る。システムはコメントを肯定的または否定的なカテゴリに分類する。このシステムは、推奨システムやスパムフィルタリングなどにも適用できる。アゼルバイジャン語のテキスト分類問題を解決するために,naive bayes, svm, decision treeなどの機械学習手法が考案されている。 Text classification systems will help to solve the text clustering problem in the Azerbaijani language. There are some text-classification applications for foreign languages, but we tried to build a newly developed system to solve this problem for the Azerbaijani language. Firstly, we tried to find out potential practice areas. The system will be useful in a lot of areas. It will be mostly used in news feed categorization. News websites can automatically categorize news into classes such as sports, business, education, science, etc. The system is also used in sentiment analysis for product reviews. For example, the company shares a photo of a new product on Facebook and the company receives a thousand comments for new products. The systems classify the comments into categories like positive or negative. The system can also be applied in recommended systems, spam filtering, etc. Various machine learning techniques such as Naive Bayes, SVM, Decision Trees have been devised to solve the text classification problem in Azerbaijani language.	翻訳日:2023-06-10 07:50:47 公開日:2019-12-26
# amharic-arabic neural machine translation(英語) Amharic-Arabic Neural Machine Translation ( http://arxiv.org/abs/1912.13161v1 ) ライセンス: Link先を確認	Ibrahim Gashaw and H L Shashirekha	(参考訳) 大規模な並列コーパスを活用して、ヨーロッパの主要言語ペア間で多くの自動翻訳作業が行われているが、並列データの不足のため、アムハラ・アラビア語ペアに関する研究はほとんど行われていない。 2つのLong Short-Term Memory (LSTM) と Gated Recurrent Units (GRU) ベースのNeural Machine Translation (NMT) モデルを開発した。実験を行うために、タンジールで利用可能な既存の単言語アラビア語のテキストと、それと同等のアムハラ語テキストコーパスを修飾して、小さな並列のクルニックテキストコーパスを構築する。 LSTMとGRUベースのNMTモデルとGoogle翻訳システムを比較し,LSTMベースのOpenNMTはGRUベースのOpenNMTとGoogle翻訳システムを上回っ,BLEUスコアは12%,11%,6%であった。 Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) models are developed using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. In order to perform the experiment, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.	翻訳日:2023-06-10 07:50:33 公開日:2019-12-26
# 拡張畳み込みを伴うu-netによる大腸ポリープの分画 Colorectal Polyp Segmentation by U-Net with Dilation Convolution ( http://arxiv.org/abs/1912.11947v1 ) ライセンス: Link先を確認	Xinzi Sun, Pengfei Zhang, Dechun Wang, Yu Cao, Benyuan Liu	(参考訳) 大腸癌 (crc) は、アメリカ合衆国で最も一般的に診断されたがんの1つであり、最も多いがんの死因である。大腸または直腸の intima で増殖する大腸ポリープは,CRC にとって重要な前駆体である。現在最も一般的な大腸ポリープの検出方法と先天的な病理学は大腸内視鏡である。したがって,大腸内視鏡検査における大腸ポリープの正確な分画はcrc早期発見と予防において大きな臨床的意義を有する。本稿では,大腸ポリープセグメント化のためのエンド・ツー・エンドのディープラーニングフレームワークを提案する。我々が設計したモデルでは,マルチスケールな意味的特徴を抽出するエンコーダと,特徴写像をポリプセグメンテーションマップに拡張するデコーダから構成される。エンコーダの特徴表現能力は拡張畳み込みを導入して向上し、高レベルな意味的特徴を解像度を低下させることなく学習する。さらに、従来のアーキテクチャよりも少ないパラメータでマルチスケールのセマンティック機能を組み合わせた簡易デコーダを設計する。さらに,出力セグメンテーションマップに3つのポストプロセッシング手法を適用し,大腸ポリープ検出性能を向上させる。本手法はCVC-ClinicDBとETIS-Larib Polyp DBの最先端結果を実現する。 Colorectal cancer (CRC) is one of the most commonly diagnosed cancers and a leading cause of cancer deaths in the United States. Colorectal polyps that grow on the intima of the colon or rectum is an important precursor for CRC. Currently, the most common way for colorectal polyp detection and precancerous pathology is the colonoscopy. Therefore, accurate colorectal polyp segmentation during the colonoscopy procedure has great clinical significance in CRC early detection and prevention. In this paper, we propose a novel end-to-end deep learning framework for the colorectal polyp segmentation. The model we design consists of an encoder to extract multi-scale semantic features and a decoder to expand the feature maps to a polyp segmentation map. We improve the feature representation ability of the encoder by introducing the dilated convolution to learn high-level semantic features without resolution reduction. We further design a simplified decoder which combines multi-scale semantic features with fewer parameters than the traditional architecture. Furthermore, we apply three post processing techniques on the output segmentation map to improve colorectal polyp detection performance. Our method achieves state-of-the-art results on CVC-ClinicDB and ETIS-Larib Polyp DB.	翻訳日:2023-06-10 07:50:12 公開日:2019-12-26
# 人工知能の道徳について On the Morality of Artificial Intelligence ( http://arxiv.org/abs/1912.11945v1 ) ライセンス: Link先を確認	Alexandra Luccioni and Yoshua Bengio	(参考訳) 人工知能の社会的および倫理的影響に関する既存の研究の多くは、機械学習(ML)と他の人工知能(AI)アルゴリズム(IEEE, 2017, Jobin et al., 2019)を取り巻く倫理的原則とガイドラインの定義に焦点が当てられている。これは、AIの適切な社会的規範を定義するのに非常に有用であるが、我々はMLの可能性とリスクを議論し、コミュニティに有益な目的のためにMLを使うよう促すことが同様に重要であると信じている。本稿では、特にML実践者を対象としており、後者に重点を置いて、既存の高度な倫理的枠組みとガイドラインの概要を概観するが、それ以上に、ML研究と展開のための概念的および実践的原則とガイドラインを提案し、実践者がより倫理的かつ道徳的なMLの実践を社会的な目的に活用するための具体的な行動を主張している。 Much of the existing research on the social and ethical impact of Artificial Intelligence has been focused on defining ethical principles and guidelines surrounding Machine Learning (ML) and other Artificial Intelligence (AI) algorithms [IEEE, 2017, Jobin et al., 2019]. While this is extremely useful for helping define the appropriate social norms of AI, we believe that it is equally important to discuss both the potential and risks of ML and to inspire the community to use ML for beneficial objectives. In the present article, which is specifically aimed at ML practitioners, we thus focus more on the latter, carrying out an overview of existing high-level ethical frameworks and guidelines, but above all proposing both conceptual and practical principles and guidelines for ML research and deployment, insisting on concrete actions that can be taken by practitioners to pursue a more ethical and moral practice of ML aimed at using AI for social good.	翻訳日:2023-06-10 07:49:51 公開日:2019-12-26

Title

Authors

Abstract

論文公表日・翻訳日

# 雑音耐性学習のための特徴注意グラフ畳み込みネットワーク

Feature-Attention Graph Convolutional Networks for Noise Resilient Learning ( http://arxiv.org/abs/1912.11755v1 )

ライセンス: Link先を確認

Min Shi, Yufei Tang, Xingquan Zhu and Jianxun Liu

(参考訳) ノイズと不整合は、人間のプライバシーやユーザーのプライバシーに固有のエラーが発生するため、現実世界の情報ネットワークに一般的に存在する。これまで、ノードの内容とトポロジ構造を統合することで、最新のGraph Convolutional Networks(GCN)や注目GCNなど、機能学習をネットワークから進めるための大きな努力が続けられてきた。しかし、既存の手法はすべてネットワークをエラーのないソースとみなし、各ノードの機能内容は独立であり、ノード関係のモデル化に等しく重要であるとして扱う。誤ったノードコンテンツとスパース機能を組み合わせることで、実世界のノイズの多いネットワークで使用される既存のメソッドに不可欠な課題を提供する。本稿では,ノイズの多いノード内容のネットワークを扱うための特徴注意グラフ畳み込み学習フレームワークであるFA-GCNを提案する。各ノードのノイズやスパースコンテンツに対処するため、fa-gcnはまずlong short-term memory (lstm) ネットワークを使用して、各特徴の密表現を学ぶ。隣接ノード間の相互作用をモデル化するために、隣接ノードが接続に関して特徴の重要性を学習し、変化させることができる機能アテンション機構が導入された。スペクトルベースのグラフ畳み込み集約プロセスを用いることで、各ノードは、対応する学習課題に対応する最も決定的な近傍特徴に集中することができる。実験と検証、すなわち異なるノイズレベルは、FA-GCNがノイズのないネットワークとノイズのないネットワークの両方で最先端の手法よりも優れた性能を発揮することを示した。

Noise and inconsistency commonly exist in real-world information networks, due to inherent error-prone nature of human or user privacy concerns. To date, tremendous efforts have been made to advance feature learning from networks, including the most recent Graph Convolutional Networks (GCN) or attention GCN, by integrating node content and topology structures. However, all existing methods consider networks as error-free sources and treat feature content in each node as independent and equally important to model node relations. The erroneous node content, combined with sparse features, provide essential challenges for existing methods to be used on real-world noisy networks. In this paper, we propose FA-GCN, a feature-attention graph convolution learning framework, to handle networks with noisy and sparse node content. To tackle noise and sparse content in each node, FA-GCN first employs a long short-term memory (LSTM) network to learn dense representation for each feature. To model interactions between neighboring nodes, a feature-attention mechanism is introduced to allow neighboring nodes learn and vary feature importance, with respect to their connections. By using spectral-based graph convolution aggregation process, each node is allowed to concentrate more on the most determining neighborhood features aligned with the corresponding learning task. Experiments and validations, w.r.t. different noise levels, demonstrate that FA-GCN achieves better performance than state-of-the-art methods on both noise-free and noisy networks.

翻訳日:2023-06-10 08:39:01 公開日:2019-12-26

# 相関コスト付き多視点ステレオの逆深さ回帰学習

Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume ( http://arxiv.org/abs/1912.11746v1 )

ライセンス: Link先を確認

Qingshan Xu and Wenbing Tao

(参考訳) 深層学習は多視点ステレオ(MVS)の深部推論に有効であることが示されている。しかし、この領域ではスケーラビリティと正確性は依然として未解決の問題である。これはメモリ消費コストのボリューム表現と不適切な深さ推論に起因する。ステレオマッチングにおけるグループワイド相関に着想を得て,軽量なコストボリュームを構築するための平均グループワイド相関類似度尺度を提案する。これにより、メモリ消費を削減できるだけでなく、コストボリュームフィルタリングの計算負担を軽減できる。実効的なコスト容積表現に基づいて,コスト容積を正規化して性能をさらに向上するカスケード3次元U-Netモジュールを提案する。多視点深度推論を深度回帰問題や逆深度分類問題として扱う従来の手法とは異なり、多視点深度推論を逆深度回帰問題として再放送する。これにより,サブピクセル推定が可能となり,大規模シーンに適用できる。 DTUデータセットとタンク・アンド・テンプルデータセットに関する広範な実験を通して、我々の提案する相関コストボリュームと逆深さ回帰(CIDER)によるネットワークが最先端の結果を達成し、スケーラビリティと精度に優れた性能を示すことを示す。

Deep learning has shown to be effective for depth inference in multi-view stereo (MVS). However, the scalability and accuracy still remain an open problem in this domain. This can be attributed to the memory-consuming cost volume representation and inappropriate depth inference. Inspired by the group-wise correlation in stereo matching, we propose an average group-wise correlation similarity measure to construct a lightweight cost volume. This can not only reduce the memory consumption but also reduce the computational burden in the cost volume filtering. Based on our effective cost volume representation, we propose a cascade 3D U-Net module to regularize the cost volume to further boost the performance. Unlike the previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. Through extensive experiments on DTU dataset and Tanks and Temples dataset, we show that our proposed network with Correlation cost volume and Inverse DEpth Regression (CIDER), achieves state-of-the-art results, demonstrating its superior performance on scalability and accuracy.

翻訳日:2023-06-10 08:37:49 公開日:2019-12-26

# 平面前処理型パッチマッチマルチビューステレオ

Planar Prior Assisted PatchMatch Multi-View Stereo ( http://arxiv.org/abs/1912.11744v1 )

ライセンス: Link先を確認

Qingshan Xu and Wenbing Tao

(参考訳) 3次元モデルの完全性は、低テクスチャ領域における信頼性の低い光度一貫性のため、マルチビューステレオ(MVS)では依然として難しい問題である。低テクスチャ領域は通常強い平面性を示すため、平面モデルは低テクスチャ領域の深さ推定に有利である。一方、PatchMatchのマルチビューステレオはサンプリングおよび伝搬方式において非常に効率的である。本稿では,平面モデルとパッチマッチ多視点ステレオを利用して,平面事前支援パッチマッチ多視点ステレオフレームワークを提案する。詳細は確率的グラフィカルモデルを用いて、平面モデルをPatchMatchマルチビューステレオに埋め込み、新しいマルチビュー集約マッチングコストを貢献する。この新しいコストは、フォトメトリック一貫性と平面互換性の両方を考慮しており、非平面領域と平面領域の両方の深さ推定に適している。実験結果から,本手法は極めて低テクスチャ領域の深度情報を効率よく回収し,高精度な3Dモデルと最先端性能を実現することができることがわかった。

The completeness of 3D models is still a challenging problem in multi-view stereo (MVS) due to the unreliable photometric consistency in low-textured areas. Since low-textured areas usually exhibit strong planarity, planar models are advantageous to the depth estimation of low-textured areas. On the other hand, PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme. By taking advantage of planar models and PatchMatch multi-view stereo, we propose a planar prior assisted PatchMatch multi-view stereo framework in this paper. In detail, we utilize a probabilistic graphical model to embed planar models into PatchMatch multi-view stereo and contribute a novel multi-view aggregated matching cost. This novel cost takes both photometric consistency and planar compatibility into consideration, making it suited for the depth estimation of both non-planar and planar regions. Experimental results demonstrate that our method can efficiently recover the depth information of extremely low-textured areas, thus obtaining high complete 3D models and achieving state-of-the-art performance.

翻訳日:2023-06-10 08:37:31 公開日:2019-12-26

# HTTP上の動的適応ストリーミングのためのアンサンブルレート適応フレームワーク

An Ensemble Rate Adaptation Framework for Dynamic Adaptive Streaming Over HTTP ( http://arxiv.org/abs/1912.11822v1 )

ライセンス: Link先を確認

Hui Yuan, Xiaoqian Hu, Junhui Hou, Xuekai Wei, and Sam Kwong

(参考訳) HTTP(DASH)上の動的適応ストリーミングにおいて、レート適応は最も重要な問題のひとつである。ネットワーク帯域幅の頻繁な変動とビデオコンテンツの複雑な変動のため、単一レート適応法を用いて、ネットワーク条件や動画コンテンツを完璧に扱うことは困難である。本稿では,DASHのためのアンサンブルレート適応フレームワークを提案する。このフレームワークに関係する複数の手法の利点を活用し,ユーザ体験の質(QoE)を向上させることを目的とする。提案されたフレームワークはシンプルだが、非常に効果的である。具体的には,提案するフレームワークは,メソッドプールとメソッドコントローラという2つのモジュールから構成される。メソッド・プールでは、いくつかのレート・アダプ・テイション・メソッドが統合される。各決定時刻に最適なqoeを達成する方法のみを選択し、要求されたビデオセグメントのビットレートを決定する。また,最も優れたqoeを提供する方法を決定するための方法コントローラに対して,切替方式,即席切換方式,間欠的切換方式の2つの戦略を提案する。シミュレーションの結果,提案するフレームワークは,常にチャネル環境やビデオの複雑さの変化に対して高いQoEを達成していることがわかった。

Rate adaptation is one of the most important issues in dynamic adaptive streaming over HTTP (DASH). Due to the frequent fluctuations of the network bandwidth and complex variations of video content, it is difficult to deal with the varying network conditions and video content perfectly by using a single rate adaptation method. In this paper, we propose an ensemble rate adaptation framework for DASH, which aims to leverage the advantages of multiple methods involved in the framework to improve the quality of experience (QoE) of users. The proposed framework is simple yet very effective. Specifically, the proposed framework is composed of two modules, i.e., the method pool and method controller. In the method pool, several rate adap tation methods are integrated. At each decision time, only the method that can achieve the best QoE is chosen to determine the bitrate of the requested video segment. Besides, we also propose two strategies for switching methods, i.e., InstAnt Method Switching, and InterMittent Method Switching, for the method controller to determine which method can provide the best QoEs. Simulation results demonstrate that, the proposed framework always achieves the highest QoE for the change of channel environment and video complexity, compared with state-of-the-art rate adaptation methods.

翻訳日:2023-06-10 08:32:13 公開日:2019-12-26

# 分解圧縮空間におけるロバスト辞書学習によるハイブリッド表現の学習

Learning Hybrid Representation by Robust Dictionary Learning in Factorized Compressed Space ( http://arxiv.org/abs/1912.11785v1 )

ライセンス: Link先を確認

Jiahuan Ren, Zhao Zhang, Sheng Li, Yang Wang, Guangcan Liu, Shuicheng Yan, Meng Wang

(参考訳) 本稿では,頑健な辞書学習 (dl) について検討し,因子化圧縮空間における有価な低ランク表現とスパース表現のハイブリッドを探索する。共同ロバスト因子化と投影辞書学習(J-RFDL)モデルを提案する。 J-RFDLの設定は、データ内の外れ値やノイズに対するロバスト性を向上し、再構成誤差をより正確に符号化し、正確な復元能力を有するハイブリッド唾液係数を得ることにより、データ表現を改善することを目的としている。特に、J-RFDLは、分解された圧縮空間におけるDLによるロバスト表現を行い、結果に対するノイズや外れ値の影響を排除し、DLプロセスの効率も向上する。符号化プロセスをデータのノイズに頑健にするため、j-rfdlはスパースl2、1-ノルムを使用しており、リコンストラクションエラーの行をゼロにすることで、ファクタライゼーションとリコンストラクションエラーを最小化することができる。 J-RFDLは、与えられたデータを適切に再構成するために優れた構造を持つ健全な係数を提供するために、埋め込み係数に結合した低ランクかつスパースな制約を合成辞書で課す。また,J-RFDLを結合分類用として拡張し,その分類誤差を最小化することにより,学習者の識別能力を向上する識別的J-RFDLモデルを提案する。公開データセットに関する広範な実験は、私たちの定式化が、他の最先端の方法よりも優れたパフォーマンスを提供できることを証明します。

In this paper, we investigate the robust dictionary learning (DL) to discover the hybrid salient low-rank and sparse representation in a factorized compressed space. A Joint Robust Factorization and Projective Dictionary Learning (J-RFDL) model is presented. The setting of J-RFDL aims at improving the data representations by enhancing the robustness to outliers and noise in data, encoding the reconstruction error more accurately and obtaining hybrid salient coefficients with accurate reconstruction ability. Specifically, J-RFDL performs the robust representation by DL in a factorized compressed space to eliminate the negative effects of noise and outliers on the results, which can also make the DL process efficient. To make the encoding process robust to noise in data, J-RFDL clearly uses sparse L2, 1-norm that can potentially minimize the factorization and reconstruction errors jointly by forcing rows of the reconstruction errors to be zeros. To deliver salient coefficients with good structures to reconstruct given data well, J-RFDL imposes the joint low-rank and sparse constraints on the embedded coefficients with a synthesis dictionary. Based on the hybrid salient coefficients, we also extend J-RFDL for the joint classification and propose a discriminative J-RFDL model, which can improve the discriminating abilities of learnt coeffi-cients by minimizing the classification error jointly. Extensive experiments on public datasets demonstrate that our formulations can deliver superior performance over other state-of-the-art methods.

翻訳日:2023-06-10 08:30:53 公開日:2019-12-26

# ロボットエレベータ・ボタン認識のための視点歪みの自動除去

Autonomous Removal of Perspective Distortion for Robotic Elevator Button Recognition ( http://arxiv.org/abs/1912.11774v1 )

ライセンス: Link先を確認

Delong Zhu, Jianbang Liu, Nachuan Ma, Zhe Min, and Max Q.-H. Meng

(参考訳) エレベータボタン認識は,移動ロボットの自律エレベータ操作を実現する上で欠かせない機能であると考えられる。しかし、好ましくない画像条件と様々な画像歪みにより、認識精度は向上していない。本稿では,エレベーターパネル画像の視点歪みを自律的に補正するアルゴリズムを提案する。このアルゴリズムはまずGaussian Mixture Model(GMM)を用いてボタン認識結果に基づいてグリッドフィッティング処理を行い、次に推定されたグリッドセンターを基準としてカメラの動きを推定して視点歪みを補正する。このアルゴリズムは、1つの画像を自律的に実行し、明示的な特徴検出や特徴マッチングの手順を必要としない。このアルゴリズムの有効性を検証するために,異なる角度から撮影された50画像のエレベータパネルデータセットを収集した。実験の結果,提案アルゴリズムはカメラの動きを正確に推定し,視点歪みを効果的に除去できることがわかった。

Elevator button recognition is considered an indispensable function for enabling the autonomous elevator operation of mobile robots. However, due to unfavorable image conditions and various image distortions, the recognition accuracy remains to be improved. In this paper, we present a novel algorithm that can autonomously correct perspective distortions of elevator panel images. The algorithm first leverages the Gaussian Mixture Model (GMM) to conduct a grid fitting process based on button recognition results, then utilizes the estimated grid centers as reference features to estimate camera motions for correcting perspective distortions. The algorithm performs on a single image autonomously and does not need explicit feature detection or feature matching procedure, which is much more robust to noises and outliers than traditional feature-based geometric approaches. To verify the effectiveness of the algorithm, we collect an elevator panel dataset of 50 images captured from different angles of view. Experimental results show that the proposed algorithm can accurately estimate camera motions and effectively remove perspective distortions.

翻訳日:2023-06-10 08:30:23 公開日:2019-12-26

# チームスポーツにおける結果を予測するための機械学習技術の応用:レビュー

The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review ( http://arxiv.org/abs/1912.11762v1 )

ライセンス: Link先を確認

Rory Bunker (1), Teo Susnjak (2) ((1) Nagoya Institute of Technology, Japan, (2) Massey University, Auckland, New Zealand)

(参考訳) 過去20年間で、スポーツの結果を予測するために機械学習(ML)技術がますます利用されるようになった。本稿では,チームスポーツの結果を予測するためにMLを用いた研究のレビューを行い,1996年から2019年までの研究を取り上げる。我々はこの分野の論文を幅広く調査しながら、5つの重要な研究課題に答えようとした。本稿は、この分野でMLアルゴリズムが使われる傾向にあること、および、成功した結果が出現し始めていることを考察する。本研究は,本アプリケーション領域における精度評価のための頑健な戦略を明らかにする。本研究は,様々なスポーツで達成されたアキュラティティーを考察し,チームスポーツの結果は本質的に他のスポーツよりも予測が困難であると考える。最後に、この研究は、すべての調査論文における将来の研究の方向性に関する共通のテーマを明らかにし、ギャップと機会を探しながら、この分野の将来の研究者への勧告を提案している。

Over the past two decades, Machine Learning (ML) techniques have been increasingly utilized for the purpose of predicting outcomes in sport. In this paper, we provide a review of studies that have used ML for predicting results in team sport, covering studies from 1996 to 2019. We sought to answer five key research questions while extensively surveying papers in this field. This paper offers insights into which ML algorithms have tended to be used in this field, as well as those that are beginning to emerge with successful outcomes. Our research highlights defining characteristics of successful studies and identifies robust strategies for evaluating accuracy results in this application domain. Our study considers accuracies that have been achieved across different sports and explores the notion that outcomes of some team sports could be inherently more difficult to predict than others. Finally, our study uncovers common themes of future research directions across all surveyed papers, looking for gaps and opportunities, while proposing recommendations for future researchers in this domain.

翻訳日:2023-06-10 08:30:05 公開日:2019-12-26

# マルチラベルグラフ畳み込みネットワーク表現学習

Multi-Label Graph Convolutional Network Representation Learning ( http://arxiv.org/abs/1912.11757v1 )

ライセンス: Link先を確認

Min Shi, Yufei Tang, Xingquan Zhu and Jianxun Liu

(参考訳) グラフベースのシステムの知識表現は多くの分野において基本的なものである。しかし、現実世界のオブジェクト(ノード)は本質的に複雑であり、しばしばリッチな意味論やラベルを含んでいる。例えば、ユーザはソーシャルネットワークの様々な関心グループに属し、多くのアプリケーションでマルチラベルネットワークとなる。マルチラベルネットワークノードは、各ノードに複数のラベルを持つだけでなく、これらのラベルは、しばしば高い相関関係にあり、既存の手法がノード表現学習において非効率であるか、あるいはそのような相関を処理できない。本稿では,マルチラベルネットワークのためのノード表現学習のための新しいマルチラベルグラフ畳み込みネットワーク(ML-GCN)を提案する。本稿では,ラベル-ラベル相関とネットワークトポロジ構造について,ノード-ラベルグラフとラベル-ラベル-ノードグラフという2つのSiamese GCNとしてモデル化する。 2つのGCNはそれぞれノードとラベルの表現学習の1つの側面を扱い、1つの目的関数の下でシームレスに統合される。学習されたラベル表現は、インナーラベルインタラクションとノードラベルプロパティを効果的に保存することができ、統合トレーニングフレームワークの下でノード表現学習を強化するために集約される。マルチラベルノード分類の実験と比較により,提案手法の有効性が検証された。

Knowledge representation of graph-based systems is fundamental across many disciplines. To date, most existing methods for representation learning primarily focus on networks with simplex labels, yet real-world objects (nodes) are inherently complex in nature and often contain rich semantics or labels, e.g., a user may belong to diverse interest groups of a social network, resulting in multi-label networks for many applications. The multi-label network nodes not only have multiple labels for each node, such labels are often highly correlated making existing methods ineffective or fail to handle such correlation for node representation learning. In this paper, we propose a novel multi-label graph convolutional network (ML-GCN) for learning node representation for multi-label networks. To fully explore label-label correlation and network topology structures, we propose to model a multi-label network as two Siamese GCNs: a node-node-label graph and a label-label-node graph. The two GCNs each handle one aspect of representation learning for nodes and labels, respectively, and they are seamlessly integrated under one objective function. The learned label representations can effectively preserve the inner-label interaction and node label properties, and are then aggregated to enhance the node representation learning under a unified training framework. Experiments and comparisons on multi-label node classification validate the effectiveness of our proposed approach.

翻訳日:2023-06-10 08:29:14 公開日:2019-12-26

# スペクトル変動する空間的ぼかし下におけるハイパースペクトル・マルチスペクトル画像融合 -高次元赤外画像への応用-

Hyperspectral and multispectral image fusion under spectrally varying spatial blurs -- Application to high dimensional infrared astronomical imaging ( http://arxiv.org/abs/1912.11868v1 )

ライセンス: Link先を確認

Claire Guilloteau, Thomas Oberlin, Olivier Bern\'e and Nicolas Dobigeon

(参考訳) ハイパースペクトルイメージングは、過去数十年間、天文学者にとって貴重なデータ源となっている。現在の計器と観測時間の制約により、高い空間的かつ低いスペクトル分解能を持つマルチスペクトル画像と、低い空間的かつ高いスペクトル分解能を持つハイパースペクトル画像の直接取得が可能になる。データの科学的解釈を向上させるために,各画像の利点を組み合わせて高スペクトル分解能データキューブを復元するデータ融合法を提案する。提案された逆問題は、スペクトル変動のぼかしのような天文学機器の特異性を説明する。周波数領域と低次元部分空間の問題を解くことで高速な実装を提供し、畳み込み演算子とデータの高次元性を効率的に扱う。我々は、ジェームズ・ウェッブ宇宙望遠鏡のシミュレーション観測のリアルな合成データセットの実験を行い、この融合アルゴリズムは地球観測のためのリモートセンシングで一般的に用いられる最先端の手法よりも優れていることを示す。

Hyperspectral imaging has become a significant source of valuable data for astronomers over the past decades. Current instrumental and observing time constraints allow direct acquisition of multispectral images, with high spatial but low spectral resolution, and hyperspectral images, with low spatial but high spectral resolution. To enhance scientific interpretation of the data, we propose a data fusion method which combines the benefits of each image to recover a high spatio-spectral resolution datacube. The proposed inverse problem accounts for the specificities of astronomical instruments, such as spectrally variant blurs. We provide a fast implementation by solving the problem in the frequency domain and in a low-dimensional subspace to efficiently handle the convolution operators as well as the high dimensionality of the data. We conduct experiments on a realistic synthetic dataset of simulated observation of the upcoming James Webb Space Telescope, and we show that our fusion algorithm outperforms state-of-the-art methods commonly used in remote sensing for Earth observation.

翻訳日:2023-06-10 08:20:43 公開日:2019-12-26

# 神経ファジィ推論システムによるソフトウェア活動推定:過去と現在

Software Effort Estimation using Neuro Fuzzy Inference System: Past and Present ( http://arxiv.org/abs/1912.11855v1 )

ライセンス: Link先を確認

Aditi Sharma, Ravi Ranjan

(参考訳) プロジェクト失敗の最も重要な理由は、努力の少ない見積もりです。ソフトウェア開発の労力見積は、開発に適切なチームメンバを割り当てたり、ソフトウェア開発にリソースを割り当てたり、結合したりするために必要です。不正確なソフトウェア見積は、プロジェクトの遅延、予算過剰、あるいはプロジェクトのキャンセルにつながる可能性がある。しかし、労力推定モデルはそれほど効率的ではない。本稿では,ニューロファジィ推論システム(NFIS)の新たな評価手法について検討する。人工知能のコンポーネントとファジィ論理を融合した混合モデルであり、より良い推定を行うことができる。

Most important reason for project failure is poor effort estimation. Software development effort estimation is needed for assigning appropriate team members for development, allocating resources for software development, binding etc. Inaccurate software estimation may lead to delay in project, over-budget or cancellation of the project. But the effort estimation models are not very efficient. In this paper, we are analyzing the new approach for estimation i.e. Neuro Fuzzy Inference System (NFIS). It is a mixture model that consolidates the components of artificial neural network with fuzzy logic for giving a better estimation.

翻訳日:2023-06-10 08:19:56 公開日:2019-12-26

# 対人ロバストネスのベンチマーク

Benchmarking Adversarial Robustness ( http://arxiv.org/abs/1912.11852v1 )

ライセンス: Link先を確認

Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, Jun Zhu

(参考訳) ディープニューラルネットワークは敵の例に弱いため、ディープラーニングの開発において最も重要な研究課題の1つとなっている。近年、多くの努力がなされているが、敵攻撃と防御アルゴリズムの正確かつ完全な評価を行うことは極めて重要である。本稿では,画像分類タスクにおける敵対的ロバスト性を評価するために,包括的で厳密でコヒーレントなベンチマークを確立する。代表的な攻撃法と防御法を簡潔に検討した後,2つのロバスト性曲線を公正な評価基準として大規模実験を行い,その性能を完全に把握した。評価結果に基づいて,いくつかの重要な知見を導き,今後の研究への洞察を提供する。

Deep neural networks are vulnerable to adversarial examples, which becomes one of the most important research problems in the development of deep learning. While a lot of efforts have been made in recent years, it is of great significance to perform correct and complete evaluations of the adversarial attack and defense algorithms. In this paper, we establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks. After briefly reviewing plenty of representative attack and defense methods, we perform large-scale experiments with two robustness curves as the fair-minded evaluation criteria to fully understand the performance of these methods. Based on the evaluation results, we draw several important findings and provide insights for future research.

翻訳日:2023-06-10 08:19:30 公開日:2019-12-26

# ラベルプロパゲーションとリファインメントを用いた効率的なビデオセマンティックセマンティックセグメンテーション

Efficient Video Semantic Segmentation with Labels Propagation and Refinement ( http://arxiv.org/abs/1912.11844v1 )

ライセンス: Link先を確認

Matthieu Paul, Christoph Mayer, Luc Van Gool, Radu Timofte

(参考訳) 本稿では,ハイブリッドGPU/CPUを用いた高精細ビデオのリアルタイムセマンティックセマンティックセマンティック化の問題に取り組む。我々は,効率的なビデオセグメンテーション(evs)パイプラインを提案する。 i)CPU上では,映像の時間的側面を利用して,あるフレームから次のフレームへ意味情報を伝達する,非常に高速な光フロー法が用いられる。 GPUと並行して動作する。 (ii)GPUでは、2つの畳み込みニューラルネットワーク:スクラッチから密接なセマンティックラベルを予測するために使用される主セグメンテーションネットワークと、高速不整合注意モジュール(IAM)の助けを借りて、以前のフレームからの予測を改善するように設計されたRefinerである。後者は、正確に伝播できない領域を識別することができる。所望のフレームレートと精度に応じて,いくつかの操作点を提案する。我々のパイプラインは、既存のリアルタイム画像分割法(mIoU 60%以上)と競合する精度を達成し、フレームレートをはるかに高めている。高解像度フレーム(2048 x 1024)を持つ一般的なCityscapesデータセットでは、単一のGPUとCPU上で80から1000Hzの動作ポイントが提案されている。

This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video Segmentation(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. It runs in parallel with the GPU. (ii) On the GPU, two Convolutional Neural Networks: A main segmentation network that is used to predict dense semantic labels from scratch, and a Refiner that is designed to improve predictions from previous frames with the help of a fast Inconsistencies Attention Module (IAM). The latter can identify regions that cannot be propagated accurately. We suggest several operating points depending on the desired frame rate and accuracy. Our pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation(mIoU above 60%), while achieving much higher frame rates. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.

翻訳日:2023-06-10 08:19:00 公開日:2019-12-26

# スパースオートエンコーダを用いたIoTネットワークにおける異常通信検出

Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders ( http://arxiv.org/abs/1912.11831v1 )

ライセンス: Link先を確認

Mustafizur Rahman Shahid (SAMOVAR), Gregory Blanc (SAMOVAR), Zonghua Zhang (SAMOVAR), Herv\'e Debar (SAMOVAR)

(参考訳) 今日では、スマートホームやeヘルスケアなど、さまざまなスマートサービスを実現するために、IoTデバイスが広くデプロイされている。しかし、多くのIoTデバイスが脆弱であるため、セキュリティは依然として最重要課題の1つだ。さらに、IoTマルウェアは常に進化し、洗練されています。 IoTデバイスは非常に特殊なタスクを実行することを意図しているため、ネットワークの動作は合理的に安定し、予測可能であることが期待される。通常のパターンからの重要な行動偏差は異常事象を示す。本稿では,スパースオートエンコーダを用いて,IoTネットワークにおける異常なネットワーク通信を検出する手法を提案する。提案手法により、悪意のある通信を正当な通信と区別することができる。そのため、デバイスが侵害された場合、デバイスが提供するサービスが完全に中断されることなく、悪意のある通信のみを削除できる。ネットワークの振舞いを特徴付けるため,最初のNパケットの大きさの統計を用いて双方向TCPフローを抽出し,それに対応するパケット間時間に関する統計を用いて記述する。次に、スパースオートエンコーダのセットを訓練し、実験的なスマートホームネットワークによって生成された正当な通信のプロファイルを学習する。 Nの値に依存すると、開発モデルは86.9%から91.2%の攻撃検出率と0.1%から0.5%の偽陽性率を達成する。

Nowadays, IoT devices have been widely deployed for enabling various smart services, such as, smart home or e-healthcare. However, security remains as one of the paramount concern as many IoT devices are vulnerable. Moreover, IoT malware are constantly evolving and getting more sophisticated. IoT devices are intended to perform very specific tasks, so their networking behavior is expected to be reasonably stable and predictable. Any significant behavioral deviation from the normal patterns would indicate anomalous events. In this paper, we present a method to detect anomalous network communications in IoT networks using a set of sparse autoencoders. The proposed approach allows us to differentiate malicious communications from legitimate ones. So that, if a device is compromised only malicious communications can be dropped while the service provided by the device is not totally interrupted. To characterize network behavior, bidirectional TCP flows are extracted and described using statistics on the size of the first N packets sent and received, along with statistics on the corresponding inter-arrival times between packets. A set of sparse autoencoders is then trained to learn the profile of the legitimate communications generated by an experimental smart home network. Depending on the value of N, the developed model achieves attack detection rates ranging from 86.9% to 91.2%, and false positive rates ranging from 0.1% to 0.5%.

翻訳日:2023-06-10 08:18:23 公開日:2019-12-26

# 準ニュートン信頼地域政策最適化

Quasi-Newton Trust Region Policy Optimization ( http://arxiv.org/abs/1912.11912v1 )

ライセンス: Link先を確認

Devesh Jha, Arvind Raghunathan, Diego Romeres

(参考訳) 本稿では,ヘシアンに対して準ニュートン近似を用いた信頼領域最適化手法である準ニュートン信頼領域最適化qntrpoを提案する。勾配降下は連続制御による強化学習タスクのためのデファクトアルゴリズムである。このアルゴリズムは、幅広いタスクにわたる強化学習において、最先端のパフォーマンスを達成した。しかし、アルゴリズムには多くの欠点がある: ステップの欠如選択基準の欠如、収束の遅さ。政策最適化のために,ドレグステップと準ニュートン近似を用いた信頼領域法について検討した。我々は, サンプル数の観点から, 選択が効率的で, 性能が向上する, 幅広い難解な連続制御タスクについて, 数値実験により実証する。

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization QNTRPO. Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance

翻訳日:2023-06-10 08:11:57 公開日:2019-12-26

# 有限温度における多体系の変分的アプローチ

A variational approach for many-body systems at finite temperature ( http://arxiv.org/abs/1912.11907v1 )

ライセンス: Link先を確認

Tao Shi, Eugene Demler, J. Ignacio Cirac

(参考訳) 密度行列に対する非線形微分方程式を導入し,自由エネルギーの単調な減少とギブス熱状態の一定点に達する。この方程式を用いて多体系の平衡状態を分析するための変分的アプローチを構築し、電子-フォノン系におけるポラロン変換のようなユニタリ変換によって得られる一般化と同様に、全てのボソニックおよびフェルミオンガウス状態を含む幅広い状態に適用可能であることを証明した。我々は、この方法をBCS格子ハミルトンでベンチマークし、2次元のホルシュタインモデルに適用する。後者では,BCS対流の弱い相互作用における遷移と強い相互作用における極性状態の遷移を再現し,超伝導と電荷密度波の位相分離を示す。

We introduce a non-linear differential flow equation for density matrices that provides a monotonic decrease of the free energy and reaches a fixed point at the Gibbs thermal state. We use this equation to build a variational approach for analyzing equilibrium states of many-body systems and demonstrate that it can be applied to a broad class of states, including all bosonic and fermionic Gaussian states, as well as their generalizations obtained by unitary transformations, such as polaron transformations, in electron-phonon systems. We benchmark this method with a BCS lattice Hamiltonian and apply it to the Holstein model in two dimensions. For the latter, our approach reproduces the transition between the BCS pairing regime at weak interactions and the polaronic regime at stronger interactions, displaying phase separation between superconducting and charge-density wave phases.

翻訳日:2023-06-10 08:11:43 公開日:2019-12-26

# 回転予測を用いたドメイン適応のための簡易ベースライン

A simple baseline for domain adaptation using rotation prediction ( http://arxiv.org/abs/1912.11903v1 )

ライセンス: Link先を確認

Ajinkya Tejankar and Hamed Pirsiavash

(参考訳) 近年、ドメイン適応は、多くの応用でホットな研究領域となっている。目標は、アノテーション付きデータが少ないドメインでトレーニングされたモデルを別のドメインに適応させることだ。そこで本研究では,自己教師あり学習に基づく単純かつ効果的な手法を提案する。本手法は,対象領域におけるランダムな回転(自己監督)と,対象領域に対する正しいラベル(教師付き)と,対象領域における自己蒸留の2段階を含む。提案手法は,DomainNetデータセット上での半教師付きドメイン適応の最先端化を実現する。さらに、人気のあるドメイン適応ベンチマークのラベルなしのターゲットデータセットは、テストカテゴリとは別にカテゴリを含まないことを観察する。これは、多くの実際のアプリケーションに存在しないバイアスをもたらすと信じています。このバイアスをラベルのないデータから除去すると、最先端の手法の性能が大幅に低下するのに対し、単純な手法は比較的堅牢であることを示す。

Recently, domain adaptation has become a hot research area with lots of applications. The goal is to adapt a model trained in one domain to another domain with scarce annotated data. We propose a simple yet effective method based on self-supervised learning that outperforms or is on par with most state-of-the-art algorithms, e.g. adversarial domain adaptation. Our method involves two phases: predicting random rotations (self-supervised) on the target domain along with correct labels for the source domain (supervised), and then using self-distillation on the target domain. Our simple method achieves state-of-the-art results on semi-supervised domain adaptation on DomainNet dataset. Further, we observe that the unlabeled target datasets of popular domain adaptation benchmarks do not contain any categories apart from testing categories. We believe this introduces a bias that does not exist in many real applications. We show that removing this bias from the unlabeled data results in a large drop in performance of state-of-the-art methods, while our simple method is relatively robust.

翻訳日:2023-06-10 08:11:27 公開日:2019-12-26

# 3DFR: シーン独立変更検出のためのSwift 3D機能削減フレームワーク

3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection ( http://arxiv.org/abs/1912.11891v1 )

ライセンス: Link先を確認

Murari Mandal, Vansh Dhar, Abhishek Mishra, Santosh Kumar Vipparthi

(参考訳) 本稿では,シーン独立型変化検出のための3次元特徴量削減フレームワーク(3DFR)を提案する。 3DFRフレームワークは、3つの機能ストリームで構成されている: 迅速な3D機能リダミストストリーム(AvFeat)、現代機能ストリーム(ConFeat)、時間中央機能マップ。これらの多面的フォアグラウンド/バックグラウンド機能はエンコーダ/デコーダネットワークによってさらに洗練される。その結果,提案フレームワークは時間変化を検知するだけでなく,高レベルの外観特徴を学習する。したがって、オブジェクトセマンティクスを組み込んで、効果的な変更検出を行う。さらに,ネットワークの堅牢性と一般化能力を示すために,シーン独立評価方式を用いて提案手法の有効性を検証した。提案手法の性能はベンチマークcdnet 2014データセットで評価される。実験の結果,提案した3DFRネットワークは最先端のアプローチよりも優れていた。

In this paper we propose an end-to-end swift 3D feature reductionist framework (3DFR) for scene independent change detection. The 3DFR framework consists of three feature streams: a swift 3D feature reductionist stream (AvFeat), a contemporary feature stream (ConFeat) and a temporal median feature map. These multilateral foreground/background features are further refined through an encoder-decoder network. As a result, the proposed framework not only detects temporal changes but also learns high-level appearance features. Thus, it incorporates the object semantics for effective change detection. Furthermore, the proposed framework is validated through a scene independent evaluation scheme in order to demonstrate the robustness and generalization capability of the network. The performance of the proposed method is evaluated on the benchmark CDnet 2014 dataset. The experimental results show that the proposed 3DFR network outperforms the state-of-the-art approaches.

翻訳日:2023-06-10 08:10:36 公開日:2019-12-26

# マイクロトロイダル共振器によるキラル量子ネットワークの長距離動的絡み合い生成

Microtoroidal resonators enhance long-distance dynamical entanglement generation in chiral quantum networks ( http://arxiv.org/abs/1912.11886v1 )

ライセンス: Link先を確認

Wai-Keong Mok, Davit Aghamalyan, Jia-Bin You, Leong-Chuan Kwek

(参考訳) カイラル量子ネットワークは、量子情報処理と量子通信を実現するための有望な経路を提供する。ここでは、カイラル量子ネットワークの2つの遠い量子ノードが、共通の1次元カイラル導波路を介して光子移動によって動的に絡み合う様子を述べる。キラル結合単モードリング共振器の指向性非対称性を利用して2つの原子間の絡み合い状態を生成する。 Refでは0.736よりも0.969の精度で大きな改善が提案され、分析された。 [1]. この大きな拡張は、光と物質の間の効率的なフォトニックインタフェースとして機能するマイクロトロイダル共振器の導入によって達成される。本プロトコルのノイズ間距離の変動,不完全なキラリティ,様々な変形,原子自然崩壊などの実験的不完全性に対するロバスト性を示す。本提案は,量子コンピューティングや量子情報処理の多くの応用において重要な要素である量子ネットワークにおける長距離絡み合い生成に活用できる。

Chiral quantum networks provide a promising route for realising quantum information processing and quantum communication. Here, we describe how two distant quantum nodes of chiral quantum network become dynamically entangled by a photon transfer through a common 1D chiral waveguide. We harness the directional asymmetry in chirally-coupled single-mode ring resonators to generate entangled state between two atoms. We report a concurrence of up to 0.969, a huge improvement over the 0.736 which was suggested and analyzed in great detail in Ref. [1]. This significant enhancement is achieved by introducing microtoroidal resonators which serve as efficient photonic interface between light and matter. Robustness of our protocol to experimental imperfections such as fluctuations in inter-nodal distance, imperfect chirality, various detunings and atomic spontaneous decay is demonstrated. Our proposal can be utilised for long-distance entanglement generation in quantum networks which is a key ingredient for many applications in quantum computing and quantum information processing.

翻訳日:2023-06-10 08:10:02 公開日:2019-12-26

# 視覚と言語: 視覚的知覚からコンテンツ創造へ

Vision and Language: from Visual Perception to Content Creation ( http://arxiv.org/abs/1912.11872v1 )

ライセンス: Link先を確認

Tao Mei, Wei Zhang, Ting Yao

(参考訳) 視覚と言語は人間の知能の2つの基本的な能力である。人間は視覚と言語の間の相互作用を通じて日常的にタスクを実行し、自然言語記述で何を見たか、あるいは絵を幻想するユニークな人間の能力をサポートする。言語が視覚とどのように相互作用するかという有効な質問は、コンピュータビジョン領域の地平線を広げるために研究者を動機付けます。特に、「言語へのビジョン」は、おそらく過去5年間で最も人気のあるトピックの1つであり、出版物の量と、キャプション、視覚的質問応答、視覚的対話、言語ナビゲーションなどの広範囲のアプリケーションの両方で顕著に伸びている。このようなタスクは、より包括的な理解と多様な言語表現によって視覚認知を促進する。言語へのビジョン」の進歩を超えて、言語は視覚理解に寄与し、視覚コンテンツの作成の新たな可能性、すなわち「言語から言語への」可能性を提供する。このプロセスはプリズムとして機能し、言語入力に基づいて視覚コンテンツ条件を作成する。本稿では,この2つの側面,すなわち「言語へのビジョン」と「視覚への言語」の最近の進歩を概観する。より具体的には、前者は画像/ビデオキャプションの開発と、典型的なエンコーダ-デコーダ構造とベンチマークに焦点を当て、後者はビジュアルコンテンツ作成の技術を要約している。現実のデプロイメントやビジョンや言語のサービスについても詳しく説明されている。

Vision and language are two fundamental capabilities of human intelligence. Humans routinely perform tasks through the interactions between vision and language, supporting the uniquely human capacity to talk about what they see or hallucinate a picture on a natural-language description. The valid question of how language interacts with vision motivates us researchers to expand the horizons of computer vision area. In particular, "vision to language" is probably one of the most popular topics in the past five years, with a significant growth in both volume of publications and extensive applications, e.g., captioning, visual question answering, visual dialog, language navigation, etc. Such tasks boost visual perception with more comprehensive understanding and diverse linguistic representations. Going beyond the progresses made in "vision to language," language can also contribute to vision understanding and offer new possibilities of visual content creation, i.e., "language to vision." The process performs as a prism through which to create visual content conditioning on the language inputs. This paper reviews the recent advances along these two dimensions: "vision to language" and "language to vision." More concretely, the former mainly focuses on the development of image/video captioning, as well as typical encoder-decoder structures and benchmarks, while the latter summarizes the technologies of visual content creation. The real-world deployment or services of vision and language are elaborated as well.

翻訳日:2023-06-10 08:09:33 公開日:2019-12-26

# エネルギーに基づく弱測定

Energy-Based Weak Measurement ( http://arxiv.org/abs/1912.11937v1 )

ライセンス: Link先を確認

Mordecai Waegell, Cyril Elouard, Andrew N. Jordan

(参考訳) うまく局在した光子が空間的に重ねられた吸収体に入射するが吸収されないとき、光子は吸収体にエネルギーを供給できる。移動エネルギーが光子のエネルギーの不確かさに対して小さい場合、吸収器のエネルギー分布が測定装置として作用し、吸収器の強い乱れ状態が効果的な事前選択となるような、吸収器のエネルギーの異常なタイプの弱い測定となることが示されている。吸収器の最終状態をポスト選択として処理すると、吸収器のエネルギー増加はその遷移ハミルトニアンの弱い値であり、光子のエネルギー分布は反対の量で変化することが示された。非散乱の基本的な場合、次いで相互作用のないエネルギー移動の場合について検討する。結果の詳細と解釈について述べる。

When a well-localized photon is incident on a spatially superposed absorber but is not absorbed, the photon can still deliver energy to the absorber. It is shown that when the transferred energy is small relative to the energy uncertainty of the photon, this constitutes an unusual type of weak measurement of the absorber's energy, where the energy distribution of the unabsorbed photon acts as the measurement device, and the strongly disturbed state of the absorber becomes the effective pre-selection. Treating the final state of the absorber as the post-selection, it is shown that the absorber's energy increase is the weak value of its translational Hamiltonian, and the energy distribution of the photon shifts by the opposite amount. The basic case of non-scattering is examined, followed by the case of interaction-free energy transfer. Details and interpretations of the results are discussed.

翻訳日:2023-06-10 08:00:55 公開日:2019-12-26

# 物体を部品に分解した3次元点雲からの骨格抽出

Skeleton Extraction from 3D Point Clouds by Decomposing the Object into Parts ( http://arxiv.org/abs/1912.11932v1 )

ライセンス: Link先を確認

Vijai Jayadevan, Edward Delp, and Zygmunt Pizlo

(参考訳) 点雲をその成分に分解し、点雲から曲線骨格を抽出することは、関連する2つの問題である。形状をその成分に分解することは、しばしば骨格抽出の副産物として得られる。本研究では, 対象物をその部分へ分解し, 部分骨格を同定し, それらの部分骨格を連結して完全な骨格を得ることにより, 不整点雲から曲線骨格を抽出することを提案する。これは、人間がこの問題にアプローチする方法であるという意味で、骨格を抽出する最も自然な方法だと考えています。我々の部品は一般化シリンダ(gcs)です。 GCの軸はその定義の不可欠な部分なので、その部分は自然な骨格表現を持つ。我々はGCの基本特性である翻訳対称性を用いて点雲から部品を抽出する。本稿では,この手法が多種多様な形状を扱えることを示す。本手法と工法の現状を比較し,パートベースアプローチが他の手法の限界にどう対処できるかを示す。本稿では,既存のポイントセット登録アルゴリズムの改良版を示し,ポイントクラウドから部品を抽出する際の有用性を示す。また, この手法を用いて, ノイズの多い点群から骨格を抽出し, 同定する方法を示す。部分ベースのアプローチは、ユーザインタラクションの自然な直感的なインターフェースも提供する。グラフィカルユーザインタフェースの助けを借りて,ユーザインタラクションを最小限に抑えることで,ミスの修正が容易になることを示す。

Decomposing a point cloud into its components and extracting curve skeletons from point clouds are two related problems. Decomposition of a shape into its components is often obtained as a byproduct of skeleton extraction. In this work, we propose to extract curve skeletons, from unorganized point clouds, by decomposing the object into its parts, identifying part skeletons and then linking these part skeletons together to obtain the complete skeleton. We believe it is the most natural way to extract skeletons in the sense that this would be the way a human would approach the problem. Our parts are generalized cylinders (GCs). Since, the axis of a GC is an integral part of its definition, the parts have natural skeletal representations. We use translational symmetry, the fundamental property of GCs, to extract parts from point clouds. We demonstrate how this method can handle a large variety of shapes. We compare our method with state of the art methods and show how a part based approach can deal with some of the limitations of other methods. We present an improved version of an existing point set registration algorithm and demonstrate its utility in extracting parts from point clouds. We also show how this method can be used to extract skeletons from and identify parts of noisy point clouds. A part based approach also provides a natural and intuitive interface for user interaction. We demonstrate the ease with which mistakes, if any, can be fixed with minimal user interaction with the help of a graphical user interface.

翻訳日:2023-06-10 08:00:23 公開日:2019-12-26

# 一般原子集合のスパース最適化:GreedyとForward-Backwardアルゴリズム

Sparse Optimization on General Atomic Sets: Greedy and Forward-Backward Algorithms ( http://arxiv.org/abs/1912.11931v1 )

ライセンス: Link先を確認

Thomas Zhang

(参考訳) スパース原子最適化の問題を考えると、「スパーシティ」という概念は、少数の原子の線形結合を意味するために一般化される。原子集合の定義は非常に広く、一般的な例としては、標準基底、低ランク行列、過剰完全辞書、置換行列、直交行列などがある。したがって、スパース原子最適化のモデルは、統計学、信号処理、機械学習、コンピュータビジョンなど、多くの分野から生じる問題を含む。具体的には、制限された強い凸(または凹凸)を最大化する問題を考える。我々は、制限された強凹凸上のグリーディアルゴリズムの線形収束率、スパースベクトル上の滑らかな関数を一般原子集合の領域に確立する最近の研究を拡張し、収束率は新しい量を含む:「スパース原子状態数」である。このことは、スパース原子最適化のための様々なグリーディアルゴリズムのフレーバーに対する最強の乗法近似を保証することにつながるが、特に、多くの興味のある設定において、このグリーディアルゴリズムは疎性を維持しながら強い近似を保証することができることを示す。さらに,同じ近似保証を実現する前向きアルゴリズムのスキームを導入する。第二に, 弱部分モジュラリティの代替概念を定め, 従来の線形収束率の証明に用いられてきたより親しみやすいバージョンと密接な関係があることを示した。この代替的な弱部分モジュラリティを用いて類似の乗法近似の保証を証明し、その特異性と応用性を確立する。

We consider the problem of sparse atomic optimization, where the notion of "sparsity" is generalized to meaning some linear combination of few atoms. The definition of atomic set is very broad; popular examples include the standard basis, low-rank matrices, overcomplete dictionaries, permutation matrices, orthogonal matrices, etc. The model of sparse atomic optimization therefore includes problems coming from many fields, including statistics, signal processing, machine learning, computer vision and so on. Specifically, we consider the problem of maximizing a restricted strongly convex (or concave), smooth function restricted to a sparse linear combination of atoms. We extend recent work that establish linear convergence rates of greedy algorithms on restricted strongly concave, smooth functions on sparse vectors to the realm of general atomic sets, where the convergence rate involves a novel quantity: the "sparse atomic condition number". This leads to the strongest known multiplicative approximation guarantees for various flavors of greedy algorithms for sparse atomic optimization; in particular, we show that in many settings of interest the greedy algorithm can attain strong approximation guarantees while maintaining sparsity. Furthermore, we introduce a scheme for forward-backward algorithms that achieves the same approximation guarantees. Secondly, we define an alternate notion of weak submodularity, which we show is tightly related to the more familiar version that has been used to prove earlier linear convergence rates. We prove analogous multiplicative approximation guarantees using this alternate weak submodularity, and establish its distinct identity and applications.

翻訳日:2023-06-10 07:59:59 公開日:2019-12-26

# cluster catch digraphsを用いたパラメータフリークラスタリング(技術報告)

Parameter Free Clustering with Cluster Catch Digraphs (Technical Report) ( http://arxiv.org/abs/1912.11926v1 )

ライセンス: Link先を確認

Art\"ur Manukyan and Elvan Ceyhan

(参考訳) 本研究では,最近開発されたクラスタキャッチダイアグラム(ccds)に基づくクラスタリングアルゴリズムを提案する。これらのグラフは密度ベースのクラスタリング法とグラフベースのクラスタリング法のハイブリッドであるクラスタリング法を考案するために使用される。 CCDはクラスタの数を推定するため、クラスタリングのダイグラフをアピールするが、CCD(および密度に基づく一般的な方法)はデータセット内の仮定されたクラスタの 'emph{intensity}' を表すパラメータに関するいくつかの情報を必要とする。我々は, ccdアルゴリズムのパラメータフリーバージョンであるアルゴリズムを提案し, データセットの最適分割を求める際に, 選択が重要となるインテンシティパラメータの仕様を必要としない。空間データ解析からツールを借りて凸クラスタの数を推定し,ripleyの$k$関数を推定した。我々は、RK-CCDとして$K$関数を利用する新しいダイグラフを呼ぶ。 RK-CCDの最小支配集合は、データセット内のノイズクラスタからクラスタを推定し、区別することにより、正しいクラスタ数を推定できることを示す。我々のロバストクラスタリングアルゴリズムはクラスタ数と強度パラメータの両方を推定する手法で構成されており、完全にパラメータフリーである。我々はモンテカルロシミュレーションを行い、実生活データセットを用いてRK-CCDと一般的な密度ベースおよびプロトタイプベースのクラスタリング手法を比較した。

We propose clustering algorithms based on a recently developed geometric digraph family called cluster catch digraphs (CCDs). These digraphs are used to devise clustering methods that are hybrids of density-based and graph-based clustering methods. CCDs are appealing digraphs for clustering, since they estimate the number of clusters; however, CCDs (and density-based methods in general) require some information on a parameter representing the \emph{intensity} of assumed clusters in the data set. We propose algorithms that are parameter free versions of the CCD algorithm and does not require a specification of the intensity parameter whose choice is often critical in finding an optimal partitioning of the data set. We estimate the number of convex clusters by borrowing a tool from spatial data analysis, namely Ripley's $K$ function. We call our new digraphs utilizing the $K$ function as RK-CCDs. We show that the minimum dominating sets of RK-CCDs estimate and distinguish the clusters from noise clusters in a data set, and hence allow the estimation of the correct number of clusters. Our robust clustering algorithms are comprised of methods that estimate both the number of clusters and the intensity parameter, making them completely parameter free. We conduct Monte Carlo simulations and use real life data sets to compare RK-CCDs with some commonly used density-based and prototype-based clustering methods.

翻訳日:2023-06-10 07:59:19 公開日:2019-12-26

# 散乱に基づく光子-光子相互作用の幾何形状形成

Scattering-based geometric shaping of photon-photon interactions ( http://arxiv.org/abs/1912.11925v1 )

ライセンス: Link先を確認

Shahaf Asban and Shaul Mukamel

(参考訳) 我々は,設計した分子アーキテクチャの振動モードからの散乱放射に基づいて,相互作用するボソンの効果的なハミルトニアンを構築する。光散乱を表す空間モードの無限に可算な集合を利用することで、この基底で可変光子-光子相互作用を得る。有効ハミルトニアンハーミシティは、空間モードの重なりによって設定された幾何学的因子によって制御される。このマッピングを用いて、光の強度測定と相互作用するボソンの相関関数を、実効ハミルトニアンに従って発展させ、局所的にも非局所的に観測可能であることを示す。このアーキテクチャは、相互作用するボソンのダイナミクスをシミュレートしたり、量子コンピューティングアプリケーションにおけるマルチキュービットフォトニックゲートの設計ツールとして利用することができる。モデル系において、ボソンの活性空間の可変ホッピング、相互作用、閉じ込めを実演する。

We construct an effective Hamiltonian of interacting bosons, based on scattered radiation off vibrational modes of designed molecular architectures. Making use of the infinite yet countable set of spatial modes representing the scattering of light, we obtain a variable photon-photon interaction in this basis. The effective Hamiltonian hermiticity is controlled by a geometric factor set by the overlaps of spatial modes. Using this mapping, we relate intensity measurements of the light to correlation functions of the interacting bosons evolving according to the effective Hamiltonian, rendering local as well as nonlocal observables accessible. This architecture may be used to simulate the dynamics of interacting bosons, as well as designing tool for multi-qubit photonic gates in quantum computing applications. Variable hopping, interaction and confinement of the active space of the bosons are demonstrated on a model system.

翻訳日:2023-06-10 07:58:55 公開日:2019-12-26

# PI-GAN:多面顔合成のためのポーズ独立表現学習

PI-GAN: Learning Pose Independent representations for multiple profile face synthesis ( http://arxiv.org/abs/2001.00645v1 )

ライセンス: Link先を確認

Hamed Alqahtani

(参考訳) 複数の顔のポーズビューを単一のポーズから合成できるポーズ不変表現の生成は依然として難しい問題である。このソリューションは、マルチメディアセキュリティ、コンピュータビジョン、ロボティクスなど、さまざまな分野で要求される。 GAN(Generative Adversarial Network)は、現実的な顔合成のために識別器ネットワークに組み込まれたポーズ非依存表現を学習する能力を有するエンコーダ・デコーダ構造を持つ。本稿では,この問題を解決するために,循環型共有エンコーダ・デコーダフレームワーク pigan を提案する。従来のGANと比較して、プライマリ構造から重みを共有し、元のポーズで顔を再構築する二次エンコーダ・デコーダフレームワークで構成されている。主要なフレームワークは、アンタングル表現の作成に焦点を当てており、セカンダリフレームワークは、元の顔の復元を目指している。 CFPの高解像度でリアルなデータセットを使用して、パフォーマンスをチェックします。

Generating a pose-invariant representation capable of synthesizing multiple face pose views from a single pose is still a difficult problem. The solution is demanded in various areas like multimedia security, computer vision, robotics, etc. Generative adversarial networks (GANs) have encoder-decoder structures possessing the capability to learn pose-independent representation incorporated with discriminator network for realistic face synthesis. We present PIGAN, a cyclic shared encoder-decoder framework, in an attempt to solve the problem. As compared to traditional GAN, it consists of secondary encoder-decoder framework sharing weights from the primary structure and reconstructs the face with the original pose. The primary framework focuses on creating disentangle representation, and secondary framework aims to restore the original face. We use CFP high-resolution, realistic dataset to check the performance.

翻訳日:2023-06-10 07:52:58 公開日:2019-12-26

# 注意型グラフ畳み込みネットワークを用いたアカデミックパフォーマンス推定

Academic Performance Estimation with Attention-based Graph Convolutional Networks ( http://arxiv.org/abs/2001.00632v1 )

ライセンス: Link先を確認

Qian Hu, Huzefa Rangwala

(参考訳) 学生の学業成績予測は、学術的軌跡や学位計画、コース推薦システム、早期警告システム、助言システムを含む教育技術を強化する。学生の過去のデータ(前科の成績など)を踏まえると、学生のパフォーマンス予測の課題は将来のコースにおける生徒の成績を予測することである。アカデミックプログラムは、事前コースが将来のコースの基礎となるように構成されている。コースに必要な知識は、グラフ構造によってモデル化された複雑な関係を示す複数の事前のコースを受講することで得られる。学生のパフォーマンス予測のための伝統的な方法は、通常、複数のコース間の基礎的な関係を無視し、生徒がそれらの間の知識を取得する方法である。加えて、従来の方法は意思決定に必要な予測の解釈を提供していない。本研究では,生徒のパフォーマンス予測のための注意に基づくグラフ畳み込みネットワークモデルを提案する。大規模な公立大学から得られた実世界のデータセットについて広範な実験を行った。実験の結果,提案モデルが段階予測の面で最先端のアプローチを上回っていることがわかった。提案モデルでは,失敗や脱落のリスクがある学生を識別することで,生徒にタイムリーな介入やフィードバックを提供することができる。

Student's academic performance prediction empowers educational technologies including academic trajectory and degree planning, course recommender systems, early warning and advising systems. Given a student's past data (such as grades in prior courses), the task of student's performance prediction is to predict a student's grades in future courses. Academic programs are structured in a way that prior courses lay the foundation for future courses. The knowledge required by courses is obtained by taking multiple prior courses, which exhibits complex relationships modeled by graph structures. Traditional methods for student's performance prediction usually neglect the underlying relationships between multiple courses; and how students acquire knowledge across them. In addition, traditional methods do not provide interpretation for predictions needed for decision making. In this work, we propose a novel attention-based graph convolutional networks model for student's performance prediction. We conduct extensive experiments on a real-world dataset obtained from a large public university. The experimental results show that our proposed model outperforms state-of-the-art approaches in terms of grade prediction. The proposed model also shows strong accuracy in identifying students who are at-risk of failing or dropping out so that timely intervention and feedback can be provided to the student.

翻訳日:2023-06-10 07:52:44 公開日:2019-12-26

# 減電圧FPGAにおけるディープラーニングのレジリエンスについて

On the Resilience of Deep Learning for Reduced-voltage FPGAs ( http://arxiv.org/abs/2001.00053v1 )

ライセンス: Link先を確認

Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal

(参考訳) ディープニューラルネットワーク(DNN)は本質的に計算集約的であり、パワーハングリーでもある。 Field Programmable Gate Arrays (FPGA) のようなハードウェアアクセラレータは、組み込みおよびハイパフォーマンスコンピューティング(HPC)システムの両方の要件を満たす、有望なソリューションである。 fpgaやcpuやgpuでは、名目レベル以下のアグレッシブ電圧スケーリングは、電力散逸を最小化する効果的な手法である。残念ながら、電圧がタイミングの問題によりトランジスタしきい値に近づくとビットフリップの故障が出現し始め、レジリエンスの問題が発生する。本稿では、FPGAの電圧アンダスケーリング関連故障、特にオンチップメモリにおけるDNNのトレーニングフェーズのレジリエンスを実験的に評価する。この目的に向けて、我々はLeNet-5のレジリエンスと、Rectified Linear Unit(Relu)とHyperbolic Tangent(Tanh)の異なるアクティベーション機能を持つCIFAR-10データセットのための特別設計ネットワークを実験的に評価した。最新のFPGAは、極低電圧レベルで十分に堅牢であり、低電圧関連の故障をトレーニングイテレーション中に自動的に隠蔽できるため、ECCのようなコストのかかるソフトウェアやハードウェア指向の故障軽減技術は不要である。精度のギャップを埋めるために、およそ10%のトレーニングイテレーションが必要である。この観測は、実際のFPGAファブリック上で測定された低電圧断層の比較的低い速度、すなわち <0.1\%の結果である。また,ランダムに発生した故障注入キャンペーンにより,LeNet-5ネットワークの故障率を有意に向上させ,トレーニング精度が低下し始めた。フォールトレートが増加すると、tangアクティベーション関数を持つネットワークは、精度の点でreluのネットワークを上回る。例えば、フォールトレートが30%である場合、精度の差は4.92%である。

Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue. This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1\%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%.

翻訳日:2023-06-10 07:51:40 公開日:2019-12-26

# 機械学習と埋め込みを用いたアゼルバイジャン語のテキスト分類

Text Classification for Azerbaijani Language Using Machine Learning and Embedding ( http://arxiv.org/abs/1912.13362v1 )

ライセンス: Link先を確認

Umid Suleymanov, Behnam Kiani Kalejahi, Elkhan Amrahov, Rashid Badirkhanli

(参考訳) テキスト分類システムは、アゼルバイジャン語のテキストクラスタリング問題を解決するのに役立つだろう。外国語にはいくつかのテキスト分類アプリケーションがあるが、我々はアゼルバイジャン語でこの問題を解決するために新しく開発されたシステムを構築しようとした。まず、潜在的な実践領域を見つけようとした。このシステムは、多くの分野で役に立つだろう。主にニュースフィードのカテゴリー化に使用される。ニュースサイトは自動的にスポーツ、ビジネス、教育、科学などのクラスに分類することができる。このシステムは製品レビューの感情分析にも使われている。例えば、同社はfacebookで新製品の写真を共有し、新しいプロダクトに対して1000のコメントを受け取る。システムはコメントを肯定的または否定的なカテゴリに分類する。このシステムは、推奨システムやスパムフィルタリングなどにも適用できる。アゼルバイジャン語のテキスト分類問題を解決するために,naive bayes, svm, decision treeなどの機械学習手法が考案されている。

Text classification systems will help to solve the text clustering problem in the Azerbaijani language. There are some text-classification applications for foreign languages, but we tried to build a newly developed system to solve this problem for the Azerbaijani language. Firstly, we tried to find out potential practice areas. The system will be useful in a lot of areas. It will be mostly used in news feed categorization. News websites can automatically categorize news into classes such as sports, business, education, science, etc. The system is also used in sentiment analysis for product reviews. For example, the company shares a photo of a new product on Facebook and the company receives a thousand comments for new products. The systems classify the comments into categories like positive or negative. The system can also be applied in recommended systems, spam filtering, etc. Various machine learning techniques such as Naive Bayes, SVM, Decision Trees have been devised to solve the text classification problem in Azerbaijani language.

翻訳日:2023-06-10 07:50:47 公開日:2019-12-26

# amharic-arabic neural machine translation(英語)

Amharic-Arabic Neural Machine Translation ( http://arxiv.org/abs/1912.13161v1 )

ライセンス: Link先を確認

Ibrahim Gashaw and H L Shashirekha

(参考訳) 大規模な並列コーパスを活用して、ヨーロッパの主要言語ペア間で多くの自動翻訳作業が行われているが、並列データの不足のため、アムハラ・アラビア語ペアに関する研究はほとんど行われていない。 2つのLong Short-Term Memory (LSTM) と Gated Recurrent Units (GRU) ベースのNeural Machine Translation (NMT) モデルを開発した。実験を行うために、タンジールで利用可能な既存の単言語アラビア語のテキストと、それと同等のアムハラ語テキストコーパスを修飾して、小さな並列のクルニックテキストコーパスを構築する。 LSTMとGRUベースのNMTモデルとGoogle翻訳システムを比較し,LSTMベースのOpenNMTはGRUベースのOpenNMTとGoogle翻訳システムを上回っ,BLEUスコアは12%,11%,6%であった。

Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) models are developed using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. In order to perform the experiment, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.

翻訳日:2023-06-10 07:50:33 公開日:2019-12-26

# 拡張畳み込みを伴うu-netによる大腸ポリープの分画

Colorectal Polyp Segmentation by U-Net with Dilation Convolution ( http://arxiv.org/abs/1912.11947v1 )

ライセンス: Link先を確認

Xinzi Sun, Pengfei Zhang, Dechun Wang, Yu Cao, Benyuan Liu

(参考訳) 大腸癌 (crc) は、アメリカ合衆国で最も一般的に診断されたがんの1つであり、最も多いがんの死因である。大腸または直腸の intima で増殖する大腸ポリープは,CRC にとって重要な前駆体である。現在最も一般的な大腸ポリープの検出方法と先天的な病理学は大腸内視鏡である。したがって,大腸内視鏡検査における大腸ポリープの正確な分画はcrc早期発見と予防において大きな臨床的意義を有する。本稿では,大腸ポリープセグメント化のためのエンド・ツー・エンドのディープラーニングフレームワークを提案する。我々が設計したモデルでは,マルチスケールな意味的特徴を抽出するエンコーダと,特徴写像をポリプセグメンテーションマップに拡張するデコーダから構成される。エンコーダの特徴表現能力は拡張畳み込みを導入して向上し、高レベルな意味的特徴を解像度を低下させることなく学習する。さらに、従来のアーキテクチャよりも少ないパラメータでマルチスケールのセマンティック機能を組み合わせた簡易デコーダを設計する。さらに,出力セグメンテーションマップに3つのポストプロセッシング手法を適用し,大腸ポリープ検出性能を向上させる。本手法はCVC-ClinicDBとETIS-Larib Polyp DBの最先端結果を実現する。

Colorectal cancer (CRC) is one of the most commonly diagnosed cancers and a leading cause of cancer deaths in the United States. Colorectal polyps that grow on the intima of the colon or rectum is an important precursor for CRC. Currently, the most common way for colorectal polyp detection and precancerous pathology is the colonoscopy. Therefore, accurate colorectal polyp segmentation during the colonoscopy procedure has great clinical significance in CRC early detection and prevention. In this paper, we propose a novel end-to-end deep learning framework for the colorectal polyp segmentation. The model we design consists of an encoder to extract multi-scale semantic features and a decoder to expand the feature maps to a polyp segmentation map. We improve the feature representation ability of the encoder by introducing the dilated convolution to learn high-level semantic features without resolution reduction. We further design a simplified decoder which combines multi-scale semantic features with fewer parameters than the traditional architecture. Furthermore, we apply three post processing techniques on the output segmentation map to improve colorectal polyp detection performance. Our method achieves state-of-the-art results on CVC-ClinicDB and ETIS-Larib Polyp DB.

翻訳日:2023-06-10 07:50:12 公開日:2019-12-26

# 人工知能の道徳について

On the Morality of Artificial Intelligence ( http://arxiv.org/abs/1912.11945v1 )

ライセンス: Link先を確認

Alexandra Luccioni and Yoshua Bengio

(参考訳) 人工知能の社会的および倫理的影響に関する既存の研究の多くは、機械学習(ML)と他の人工知能(AI)アルゴリズム(IEEE, 2017, Jobin et al., 2019)を取り巻く倫理的原則とガイドラインの定義に焦点が当てられている。これは、AIの適切な社会的規範を定義するのに非常に有用であるが、我々はMLの可能性とリスクを議論し、コミュニティに有益な目的のためにMLを使うよう促すことが同様に重要であると信じている。本稿では、特にML実践者を対象としており、後者に重点を置いて、既存の高度な倫理的枠組みとガイドラインの概要を概観するが、それ以上に、ML研究と展開のための概念的および実践的原則とガイドラインを提案し、実践者がより倫理的かつ道徳的なMLの実践を社会的な目的に活用するための具体的な行動を主張している。

Much of the existing research on the social and ethical impact of Artificial Intelligence has been focused on defining ethical principles and guidelines surrounding Machine Learning (ML) and other Artificial Intelligence (AI) algorithms [IEEE, 2017, Jobin et al., 2019]. While this is extremely useful for helping define the appropriate social norms of AI, we believe that it is equally important to discuss both the potential and risks of ML and to inspire the community to use ML for beneficial objectives. In the present article, which is specifically aimed at ML practitioners, we thus focus more on the latter, carrying out an overview of existing high-level ethical frameworks and guidelines, but above all proposing both conceptual and practical principles and guidelines for ML research and deployment, insisting on concrete actions that can be taken by practitioners to pursue a more ethical and moral practice of ML aimed at using AI for social good.

翻訳日:2023-06-10 07:49:51 公開日:2019-12-26

PDF登録状況（公開日: 20191226）