Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230324となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ビデオ予習型変圧器:予習型エキスパートのマルチモーダル混合 Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts ( http://arxiv.org/abs/2304.10505v1 ) ライセンス: Link先を確認	Kastan Day, Daniel Christl, Rohan Salvi, Pranav Sriram	(参考訳) ビデオプリトレーニングトランスを提案する。 VPTは以前の作業から4つのSOTAエンコーダモデルを使用して、ビデオをコンパクトな埋め込みのシーケンスに変換する。我々のバックボーンは、参照Flan-T5-11Bアーキテクチャに基づいて、エンコーダモデルの非線形和であるビデオの普遍的な表現を学習する。自動回帰因果言語モデリングの損失を利用して学習し、YouTubeビデオで話される単語を予測する。最後に、各タスクの完全連結予測ヘッドをトレーニングすることにより、標準下流ベンチマークを評価する。私たちの知る限りでは、これは"embedding -> backbone -> prediction head"デザインパターンにおけるエンコーダとして、複数の凍結したsomaモデルの最初の使用です。さらに、明示的なScene Graph情報を追加することで、現在のSOTAであるMelot Reserveよりも多くのモダリティが含まれています。これら2つの理由から、SOTAのパフォーマンスを達成するために、世界で最も優れたオープンソースモデルを組み合わせることができると考えています。最初の実験は、モデルを適切に学習していることを実証するが、より多くの実験と計算が必要である。この作業に加えて、私たちはYT-20Mデータセットを構築し、それを再現し、25,000人の個人が選んだYouTubeビデオをコーパスに追加しました。すべてのコードとモデルチェックポイントは、標準のMITライセンスの下でオープンソース化されている。 We present Video Pre-trained Transformer. VPT uses four SOTA encoder models from prior work to convert a video into a sequence of compact embeddings. Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models. It learns using an autoregressive causal language modeling loss by predicting the words spoken in YouTube videos. Finally, we evaluate on standard downstream benchmarks by training fully connected prediction heads for each task. To the best of our knowledge, this is the first use of multiple frozen SOTA models as encoders in an "embedding -> backbone -> prediction head" design pattern - all others have trained their own joint encoder models. Additionally, we include more modalities than the current SOTA, Merlot Reserve, by adding explicit Scene Graph information. For these two reasons, we believe it could combine the world's best open-source models to achieve SOTA performance. Initial experiments demonstrate the model is learning appropriately, but more experimentation and compute is necessary, and already in progress, to realize our loftier goals. Alongside this work, we build on the YT-20M dataset, reproducing it and adding 25,000 personally selected YouTube videos to its corpus. All code and model checkpoints are open sourced under a standard MIT license.	翻訳日:2023-04-23 03:58:52 公開日:2023-03-24
# 特性曲線の変換グラフィカル特徴とCBAMモジュールによる畳み込みニューラルネットワークに基づくダスト衝突を考慮したPVアレイの故障診断 Fault diagnosis for PV arrays considering dust impact based on transformed graphical feature of characteristic curves and convolutional neural network with CBAM modules ( http://arxiv.org/abs/2304.06493v1 ) ライセンス: Link先を確認	Jiaqi Qu, Lu Wei, Qiang Sun, Hamidreza Zareipour, Zheng Qian	(参考訳) PVアレイの動作中に様々な障害が発生し、塵の影響のある動作条件と様々なダイオード構成の両方が断層をより複雑にする。しかし、I-V特性曲線に基づく現在の故障診断法は部分的特徴情報しか利用せず、しばしばフィールド特性曲線を標準試験条件(STC)に校正することに頼っている。 pvアレイの異なるブロッキングダイオード構成に類似性を持つ複数の複雑な欠陥を,ダストの影響下で正確に同定することは,実際に適用することは困難である。そこで, ダスト衝突を考慮した新しいPVアレイ故障診断法を提案する。プリプロセッシング段階では、Isc-Voc正規化文法角差場(GADF)法が提示され、I-VおよびP-Vを含むフィールドから再サンプリングされたPVアレイ特性曲線を正規化し変換し、変換されたグラフィカル特徴行列を得る。そして、障害診断段階では、畳み込みブロック注意モジュール(CBAM)を用いた畳み込みニューラルネットワーク(CNN)モデルが、完全な特徴情報を含む変換されたグラフィカル行列から断層識別情報を抽出し、故障を分類するように設計されている。また,シミュレーション事例を用いて異なる特徴変換法を比較し,cnnに基づく分類法も分析した。その結果,様々な動作条件下で異なるブロッキングダイオード構成を持つPVアレイの開発手法は,高い故障診断精度と信頼性を有することがわかった。 Various faults can occur during the operation of PV arrays, and both the dust-affected operating conditions and various diode configurations make the faults more complicated. However, current methods for fault diagnosis based on I-V characteristic curves only utilize partial feature information and often rely on calibrating the field characteristic curves to standard test conditions (STC). It is difficult to apply it in practice and to accurately identify multiple complex faults with similarities in different blocking diodes configurations of PV arrays under the influence of dust. Therefore, a novel fault diagnosis method for PV arrays considering dust impact is proposed. In the preprocessing stage, the Isc-Voc normalized Gramian angular difference field (GADF) method is presented, which normalizes and transforms the resampled PV array characteristic curves from the field including I-V and P-V to obtain the transformed graphical feature matrices. Then, in the fault diagnosis stage, the model of convolutional neural network (CNN) with convolutional block attention modules (CBAM) is designed to extract fault differentiation information from the transformed graphical matrices containing full feature information and to classify faults. And different graphical feature transformation methods are compared through simulation cases, and different CNN-based classification methods are also analyzed. The results indicate that the developed method for PV arrays with different blocking diodes configurations under various operating conditions has high fault diagnosis accuracy and reliability.	翻訳日:2023-04-16 21:48:28 公開日:2023-03-24
# PromptORE - 完全教師なし関係抽出に向けた新しいアプローチ PromptORE -- A Novel Approach Towards Fully Unsupervised Relation Extraction ( http://arxiv.org/abs/2304.01209v1 ) ライセンス: Link先を確認	Pierre-Yves Genest (Alteca, DRIM), Pierre-Edouard Portier (DRIM), El\"od Egyed-Zsigmond (DRIM), Laurent-Walter Goix (Alteca)	(参考訳) unsupervised relation extraction (re)は、トレーニング中にラベル付きデータにアクセスせずに、テキスト内のエンティティ間の関係を識別することを目的としている。この設定は、アノテーション付きデータセットが利用できないドメイン固有のREと、関係のタイプが未知のオープンドメインREに特に関係している。最近のアプローチでは有望な結果が得られるが、チューニングがラベル付きデータを必要とすることが多いハイパーパラメータに大きく依存している。ハイパーパラメータへの依存を軽減するため,'Prompt-based Open Relation extract'モデルであるPromptOREを提案する。我々は,教師なし設定で作業するために,新しいプロンプト・チューニング・パラダイムを適用し,関係を表す文を埋め込む。次に、これらの埋め込みをクラスタ化して候補関係を発見し、適切なクラスタ数を自動的に見積もるさまざまな戦略を実験します。我々の知る限りでは、PromptOREはハイパーパラメータチューニングを必要としない最初の教師なしREモデルである。 3つの一般および特定のドメインデータセットの結果から、PromptOREはB3、V測定、ARIの40%以上の相対的なゲインを持つ最先端モデルよりも一貫して優れていた。定性的分析はまた、真の関係に非常に近い意味的コヒーレントなクラスタを特定できる PromptORE の能力を示している。 Unsupervised Relation Extraction (RE) aims to identify relations between entities in text, without having access to labeled data during training. This setting is particularly relevant for domain specific RE where no annotated dataset is available and for open-domain RE where the types of relations are a priori unknown. Although recent approaches achieve promising results, they heavily depend on hyperparameters whose tuning would most often require labeled data. To mitigate the reliance on hyperparameters, we propose PromptORE, a ''Prompt-based Open Relation Extraction'' model. We adapt the novel prompt-tuning paradigm to work in an unsupervised setting, and use it to embed sentences expressing a relation. We then cluster these embeddings to discover candidate relations, and we experiment different strategies to automatically estimate an adequate number of clusters. To the best of our knowledge, PromptORE is the first unsupervised RE model that does not need hyperparameter tuning. Results on three general and specific domain datasets show that PromptORE consistently outperforms state-of-the-art models with a relative gain of more than 40% in B 3 , V-measure and ARI. Qualitative analysis also indicates PromptORE's ability to identify semantically coherent clusters that are very close to true relations.	翻訳日:2023-04-09 05:34:10 公開日:2023-03-24
# 機械学習に基づくスピン構造検出 Machine learning-based spin structure detection ( http://arxiv.org/abs/2303.16905v1 ) ライセンス: Link先を確認	Isaac Labrie-Boulay, Thomas Brian Winkler, Daniel Franzen, Alena Romanova, Hans Fangohr, Mathias Kl\"aui	(参考訳) 最も重要な磁気スピン構造の一つは、トポロジカルに安定化されたスカイミオン準粒子である。その興味深い物理的性質は、メモリと効率的なニューロモルフィック計算スキームの候補となる。装置操作には、スキャミオンの位置、形状、大きさの検出が必要であり、一般的に磁気イメージングが用いられる。しばしば用いられる技術は、試料の組成、温度、材料成長手順などによって、ノイズ、低コントラスト、強度勾配などの光学的成果物に悩まされる、磁気光学のカー顕微鏡である。従来の画像解析パッケージは手作業による処理が必要であり、より自動化されたソリューションが必要である。我々は,この測定におけるスカイミオンの位置と形状を検出するために,セグメンテーション問題に特化して設計された畳み込みニューラルネットワークについて報告する。ネットワークは選択した手法で調整され、予測を最適化し、特に検出されたクラス数でパフォーマンスを制御できる。本研究の結果から, よく訓練されたネットワークが, 磁気顕微鏡におけるデータ前処理の自動化に有効であることが示唆された。アプローチは他のスピン構造や他の磁気イメージング手法に容易に拡張できる。 One of the most important magnetic spin structure is the topologically stabilised skyrmion quasi-particle. Its interesting physical properties make them candidates for memory and efficient neuromorphic computation schemes. For the device operation, detection of the position, shape, and size of skyrmions is required and magnetic imaging is typically employed. A frequently used technique is magneto-optical Kerr microscopy where depending on the samples material composition, temperature, material growing procedures, etc., the measurements suffer from noise, low-contrast, intensity gradients, or other optical artifacts. Conventional image analysis packages require manual treatment, and a more automatic solution is required. We report a convolutional neural network specifically designed for segmentation problems to detect the position and shape of skyrmions in our measurements. The network is tuned using selected techniques to optimize predictions and in particular the number of detected classes is found to govern the performance. The results of this study shows that a well-trained network is a viable method of automating data pre-processing in magnetic microscopy. The approach is easily extendable to other spin structures and other magnetic imaging methods.	翻訳日:2023-04-02 18:16:22 公開日:2023-03-24
# 気候モデルにおけるサブグリッドパラメータ化のデータ駆動型マルチスケールモデリング Data-driven multiscale modeling of subgrid parameterizations in climate models ( http://arxiv.org/abs/2303.17496v1 ) ライセンス: Link先を確認	Karl Otness, Laure Zanna, Joan Bruna	(参考訳) 現在の気候モデルの解像度以下の物理過程を表すサブグリッドパラメータ化は、気候の正確な長期予測を生成する上で重要な要素である。これらのコンポーネントを設計するための様々なアプローチがテストされている。本研究では,この予測問題に対する多元的アプローチを示す概念実証について評価する。テストベッドモデルのサブグリッド強制値を予測するためにニューラルネットワークを訓練し、細かな方向と粗い方向の両方で追加情報を用いて得られる予測精度の向上を検討する。 Subgrid parameterizations, which represent physical processes occurring below the resolution of current climate models, are an important component in producing accurate, long-term predictions for the climate. A variety of approaches have been tested to design these components, including deep learning methods. In this work, we evaluate a proof of concept illustrating a multiscale approach to this prediction problem. We train neural networks to predict subgrid forcing values on a testbed model and examine improvements in prediction accuracy that can be obtained by using additional information in both fine-to-coarse and coarse-to-fine directions.	翻訳日:2023-04-02 18:10:57 公開日:2023-03-24
# 非クリニカルテキスト情報検索による癌関連フォーラムポストの効率的なラベル付け Computationally Efficient Labeling of Cancer Related Forum Posts by Non-Clinical Text Information Retrieval ( http://arxiv.org/abs/2303.16766v1 ) ライセンス: Link先を確認	Jimmi Agerskov, Kristian Nielsen, Christian Marius Lillelund, Christian Fischer Pedersen	(参考訳) 癌に関する情報はオンラインで豊富に存在するが、有用な情報を分類し抽出することは困難である。医療データ処理における研究のほとんどは、正式な臨床データに関するものだが、非臨床データにも貴重な情報がある。本研究は, 分散コンピューティング, テキスト検索, クラスタリング, 分類の手法をコヒーレントかつ計算効率の良いシステムに統合し, 非臨床的かつ自由に利用可能な情報に基づいて癌患者の軌跡を明らかにする。我々は,非クリニカルフォーラムポストから癌軌跡情報を検索し,収集し,提示できる完全機能プロトタイプを作成した。我々は3つのクラスタリングアルゴリズム (MR-DBSCAN, DBSCAN, HDBSCAN) を評価し, 得られたポスト数と近傍半径の関数として, 調整された乱数指数と総実行時間を比較した。クラスタリングの結果は, 周辺半径がクラスタリング性能に最も大きな影響を与えることを示している。小さな値の場合、データセットはそれに従って分割されるが、高い値は多数のパーティションを生成し、最適なパーティションを探すのに時間を要する。適切な推定半径で、MR-DBSCANは、DBSCAN (143.4) や HDBSCAN (282.3) と比較して、50000のフォーラムポストを46.1秒でクラスタリングすることができる。デンマーク癌学会とインタビューを行い,ソフトウェアプロトタイプについて紹介する。この組織は、がんに関するオンライン情報を民主化し、そのようなシステムが将来必要となると予測できるソフトウェアの可能性を見込んでいる。 An abundance of information about cancer exists online, but categorizing and extracting useful information from it is difficult. Almost all research within healthcare data processing is concerned with formal clinical data, but there is valuable information in non-clinical data too. The present study combines methods within distributed computing, text retrieval, clustering, and classification into a coherent and computationally efficient system, that can clarify cancer patient trajectories based on non-clinical and freely available information. We produce a fully-functional prototype that can retrieve, cluster and present information about cancer trajectories from non-clinical forum posts. We evaluate three clustering algorithms (MR-DBSCAN, DBSCAN, and HDBSCAN) and compare them in terms of Adjusted Rand Index and total run time as a function of the number of posts retrieved and the neighborhood radius. Clustering results show that neighborhood radius has the most significant impact on clustering performance. For small values, the data set is split accordingly, but high values produce a large number of possible partitions and searching for the best partition is hereby time-consuming. With a proper estimated radius, MR-DBSCAN can cluster 50000 forum posts in 46.1 seconds, compared to DBSCAN (143.4) and HDBSCAN (282.3). We conduct an interview with the Danish Cancer Society and present our software prototype. The organization sees a potential in software that can democratize online information about cancer and foresee that such systems will be required in the future.	翻訳日:2023-03-31 15:51:42 公開日:2023-03-24
# LLM for patient-Trial Matching: パフォーマンスと一般化性向上に向けたプライバシ対応データ拡張 LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability ( http://arxiv.org/abs/2303.16756v1 ) ライセンス: Link先を確認	Jiayi Yuan, Ruixiang Tang, Xiaoqian Jiang, Xia Hu	(参考訳) 患者と適切な臨床試験を合わせるプロセスは、医学研究を進め、最適なケアを提供するために不可欠である。しかし、現在のアプローチでは、データの標準化、倫理的考察、電子健康記録(EHR)と臨床試験基準との相互運用性の欠如といった課題に直面している。本稿では,ehlsと臨床試験記述との互換性を改善するために,それらの高度な自然言語生成能力を活用することで,これらの課題に対処するための大規模言語モデル(llms)の可能性を検討する。本稿では,LLMに基づく患者心電図マッチング(LLM-PTM)のための革新的なプライバシ・アウェアなデータ拡張手法を提案する。本実験では, LLM-PTM法を用いて平均性能を7.32%向上させ, 新データの一般化性を12.12%向上させた。さらに,本手法の有効性をさらに説明し,基礎となる原理をより深く理解するためのケーススタディを提示する。 The process of matching patients with suitable clinical trials is essential for advancing medical research and providing optimal care. However, current approaches face challenges such as data standardization, ethical considerations, and a lack of interoperability between Electronic Health Records (EHRs) and clinical trial criteria. In this paper, we explore the potential of large language models (LLMs) to address these challenges by leveraging their advanced natural language generation capabilities to improve compatibility between EHRs and clinical trial descriptions. We propose an innovative privacy-aware data augmentation approach for LLM-based patient-trial matching (LLM-PTM), which balances the benefits of LLMs while ensuring the security and confidentiality of sensitive patient data. Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%. Additionally, we present case studies to further illustrate the effectiveness of our approach and provide a deeper understanding of its underlying principles.	翻訳日:2023-03-31 15:50:19 公開日:2023-03-24
# スクイーズと励磁によるスウィントランスを用いた表情認識 Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers ( http://arxiv.org/abs/2301.10906v3 ) ライセンス: Link先を確認	Arpita Vats, Aman Chadha	(参考訳) 本稿では,swin vision transformersとswing and excitation block (se)を併用した表情感情認識フレームワークを提案する。近年,視覚課題に対処するために注意機構に基づくトランスフォーマーモデルが提案されている。本稿では,Squeeze Excitation block (SE) と sharpness-aware minimalr (SAM) を用いた視覚変換器を提案する。ハイブリッドデータセットを使用して、モデルとAffectNetデータセットをトレーニングし、モデルの結果を評価しました。 We present a facial emotion recognition framework, built upon Swin vision Transformers jointly with squeeze and excitation block (SE). A transformer model based on an attention mechanism has been presented recently to address vision tasks. Our method uses a vision transformer with a Squeeze excitation block (SE) and sharpness-aware minimizer (SAM). We have used a hybrid dataset, to train our model and the AffectNet dataset to evaluate the result of our model	翻訳日:2023-03-29 18:50:35 公開日:2023-03-24
# 深層学習における数学的挑戦 Mathematical Challenges in Deep Learning ( http://arxiv.org/abs/2303.15464v1 ) ライセンス: Link先を確認	Vahid Partovi Nia, Guojun Zhang, Ivan Kobyzev, Michael R. Metel, Xinlin Li, Ke Sun, Sobhan Hemati, Masoud Asgharian, Linglong Kong, Wulong Liu, Boxing Chen	(参考訳) 2012年のImageNetチャレンジ以来、ディープモデルは人工知能(AI)業界を支配している。それ以来、深層モデルのサイズは増え続けており、携帯電話、パーソナルコンピュータ、自動運転車、無線基地局など、この分野に新たな課題をもたらしている。ここでは、これらの課題を数学者、統計学者、理論計算機科学者と伝達するために、トレーニング、推論、一般化境界、最適化を含む一連の問題を列挙する。これは、長期的な技術産業に利益をもたらすディープラーニングの研究課題の主観的な見解である。 Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run.	翻訳日:2023-03-29 18:05:06 公開日:2023-03-24
# ネットワーク特性を利用した誤入力検出 Utilizing Network Properties to Detect Erroneous Inputs ( http://arxiv.org/abs/2002.12520v3 ) ライセンス: Link先を確認	Matt Gorbett, Nathaniel Blanchard	(参考訳) ニューラルネットワークは、敵、腐敗、配布外、誤分類例など、幅広い誤入力に対して脆弱である。本研究では、線形SVM分類器をトレーニングし、事前学習ニューラルネットワークの隠れおよびソフトマックス特徴ベクトルを用いて、これらの4種類の誤データを検出する。以上の結果から,誤りデータ型は一般に,適切な例から線形に分離可能なアクティベーション特性を示し,余分なトレーニングやオーバーヘッドを伴わずに悪い入力を拒否できることがわかった。我々は、さまざまなデータセット、ドメイン、事前訓練されたモデル、および敵攻撃で、我々の発見を実験的に検証した。 Neural networks are vulnerable to a wide range of erroneous inputs such as adversarial, corrupted, out-of-distribution, and misclassified examples. In this work, we train a linear SVM classifier to detect these four types of erroneous data using hidden and softmax feature vectors of pre-trained neural networks. Our results indicate that these faulty data types generally exhibit linearly separable activation properties from correct examples, giving us the ability to reject bad inputs with no extra training or overhead. We experimentally validate our findings across a diverse range of datasets, domains, pre-trained models, and adversarial attacks.	翻訳日:2023-03-29 05:10:16 公開日:2023-03-24
# 学習に基づくデモサイシング,デノイジング,超解像パイプラインの再考 Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline ( http://arxiv.org/abs/1905.02538v3 ) ライセンス: Link先を確認	Guocheng Qian, Yuanhao Wang, Jinjin Gu, Chao Dong, Wolfgang Heidrich, Bernard Ghanem, Jimmy S. Ren	(参考訳) イメージングは通常、不完全色サンプリング、ノイズ劣化、解像度制限の混合問題である。この混合問題は典型的にはデモサイシング(dm)、デノイジング(dn)、スーパーレゾリューション(sr)を固定および事前定義されたパイプライン(タスクの実行順序)dm$\to$dn$\to$srで順次適用する逐次解によって解決される。画像処理に関する最近の研究は、より高い画質を実現するためのより洗練されたアーキテクチャの開発に焦点を当てている。パイプラインの設計にはほとんど注意が払われておらず、パイプラインが画像品質にどの程度重要かは、まだ明らかではない。本研究では,学習ベースDN,DM,SRの混合問題に対するパイプラインの効果を,逐次的および共同解法の両方において包括的に研究する。一方で、シーケンシャルなソリューションでは、パイプラインが結果の画像品質に非自明な影響を与えていることが分かりました。我々の提案するパイプラインDN$\to$SR$\to$DMは、様々な実験設定やベンチマークにおいて、他のシーケンシャルパイプラインよりも一貫してパフォーマンスが向上する。一方,共同ソリューションでは,混合問題に対する最先端の性能を実現するエンドツーエンドトリニティ画素拡張ネットワーク(tenet)を提案する。さらに,分離可能なヘッドを用いた中間管理を提供することにより,特定のパイプラインを所定のエンドツーエンドネットワークに統合する,新規でシンプルな手法を提案する。広範な実験により、提案するパイプラインとのエンドツーエンドネットワークは、一貫性があるが重要でない改善しか達成できないことが示された。私たちの研究は、パイプラインの調査はシーケンシャルなソリューションに適用できるが、エンドツーエンドネットワークではそれほど必要ではないことを示している。 RR{Code, model, and our contribute PixelShift200 data is available at \url{https://github.com/guochengqian/TENet} Imaging is usually a mixture problem of incomplete color sampling, noise degradation, and limited resolution. This mixture problem is typically solved by a sequential solution that applies demosaicing (DM), denoising (DN), and super-resolution (SR) sequentially in a fixed and predefined pipeline (execution order of tasks), DM$\to$DN$\to$SR. The most recent work on image processing focuses on developing more sophisticated architectures to achieve higher image quality. Little attention has been paid to the design of the pipeline, and it is still not clear how significant the pipeline is to image quality. In this work, we comprehensively study the effects of pipelines on the mixture problem of learning-based DN, DM, and SR, in both sequential and joint solutions. On the one hand, in sequential solutions, we find that the pipeline has a non-trivial effect on the resulted image quality. Our suggested pipeline DN$\to$SR$\to$DM yields consistently better performance than other sequential pipelines in various experimental settings and benchmarks. On the other hand, in joint solutions, we propose an end-to-end Trinity Pixel Enhancement NETwork (TENet) that achieves state-of-the-art performance for the mixture problem. We further present a novel and simple method that can integrate a certain pipeline into a given end-to-end network by providing intermediate supervision using a detachable head. Extensive experiments show that an end-to-end network with the proposed pipeline can attain only a consistent but insignificant improvement. Our work indicates that the investigation of pipelines is applicable in sequential solutions, but is not very necessary in end-to-end networks. \RR{Code, models, and our contributed PixelShift200 dataset are available at \url{https://github.com/guochengqian/TENet}	翻訳日:2023-03-29 05:09:57 公開日:2023-03-24
# 画像登録のための注意(air):教師なし変圧器アプローチ Attention for Image Registration (AiR): an unsupervised Transformer approach ( http://arxiv.org/abs/2105.02282v2 ) ライセンス: Link先を確認	Zihao Wang, Herv\'e Delingette	(参考訳) 画像登録は信号処理において重要なタスクであるが、しばしば安定性と効率の面で問題に遭遇する。非学習登録アプローチは、時間と空間の複雑さの点で高価な固定画像と移動画像の類似度を最適化することに依存する。この問題は、画像が大きく、あるいはそれらの間に大きな変形がある場合、悪化する可能性がある。近年,ディープラーニング,特に畳み込みニューラルネットワーク(CNN)に基づく手法が,非学習アプローチの弱点に対する効果的な解決策として研究されている。画像登録における学習手法をさらに進めるために,変形可能な画像登録問題における注意機構を導入する。提案手法は,GPGPUデバイス上で効率的にトレーニング可能なTransformerフレームワークであるAiRに基づいている。画像登録問題を言語翻訳タスクとして扱い,変形場を学習するためにトランスを使用する。教師なし生成した変形マップを学習し、2つのベンチマークデータセットでテストする。要約すると,本手法は画像登録タスクにおける安定性と効率性の問題に対処する上で有望な効果を示す。 AiRのソースコードはGithubで公開されている。 Image registration is a crucial task in signal processing, but it often encounters issues with stability and efficiency. Non-learning registration approaches rely on optimizing similarity metrics between fixed and moving images, which can be expensive in terms of time and space complexity. This problem can be exacerbated when the images are large or there are significant deformations between them. Recently, deep learning, specifically convolutional neural network (CNN)-based methods, have been explored as an effective solution to the weaknesses of non-learning approaches. To further advance learning approaches in image registration, we introduce an attention mechanism in the deformable image registration problem. Our proposed approach is based on a Transformer framework called AiR, which can be efficiently trained on GPGPU devices. We treat the image registration problem as a language translation task and use the Transformer to learn the deformation field. The method learns an unsupervised generated deformation map and is tested on two benchmark datasets. In summary, our approach shows promising effectiveness in addressing stability and efficiency issues in image registration tasks. The source code of AiR is available on Github.	翻訳日:2023-03-29 05:06:23 公開日:2023-03-24
# エピソジックおよび慢性ホームレスシェルター使用の迅速同定のための最善の閾値 The Best Thresholds for Rapid Identification of Episodic and Chronic Homeless Shelter Use ( http://arxiv.org/abs/2105.01042v3 ) ライセンス: Link先を確認	Geoffrey Guy Messier, Leslie Tutty, Caleb John	(参考訳) 本稿では,ホームレスの避難所アクセスパターンに基づいて,住宅サービスにおけるクライアントの最適識別方法について検討する。我々は、時間枠内でクライアントのシェルター使用回数とシェルター使用回数を数えることに集中する。次に閾値がこれらの値に適用され、その個人が住宅支援のよい候補かどうかを判断する。新しい住宅基準衝撃測定値を用いて,どの組み合わせが影響を最大化するかをしきい値と時間窓値を用いて検討し,住宅候補をできるだけ早く特定する。また、通常、住宅支援について特定されていない「下層」顧客グループの特徴についても、新たな洞察が得られている。 This paper explores how to best identify clients for housing services based on their homeless shelter access patterns. We focus on counting the number of shelter stays and episodes of shelter use for a client within a time window. Thresholds are then applied to these values to determine if that individual is a good candidate for housing support. Using new housing referral impact metrics, we explore a range of threshold and time window values to determine which combination both maximizes impact and identifies good candidates for housing as soon as possible. New insights are also provided regarding the characteristics of the "under-the-radar" client group who are typically not identified for housing support.	翻訳日:2023-03-29 05:06:07 公開日:2023-03-24
# 個人化フェデレーション学習のための共有表現のエクスプロイト Exploiting Shared Representations for Personalized Federated Learning ( http://arxiv.org/abs/2102.07078v3 ) ライセンス: Link先を確認	Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai	(参考訳) 深層ニューラルネットワークは、さまざまな学習タスクに有用な画像やテキストなどのデータから、普遍的な特徴表現を抽出する能力を示している。しかし、表現学習の成果はまだフェデレーション設定で完全に実現されていない。統合された設定におけるデータは、クライアント間では非単位であることが多いが、集中型ディープラーニングの成功は、データがグローバルな特徴表現を共有することが多いことを示唆している。この直感に基づいて,クライアント間の共有データ表現とクライアント毎のユニークなローカルヘッダを学習するための,新しいフェデレーション学習フレームワークとアルゴリズムを提案する。提案アルゴリズムは,クライアント間の分散計算能力を利用して,表現の更新毎に低次元局所パラメータに対して多くの局所更新を行う。本手法は,クライアント毎の問題次元を効率的に削減できることを示すために,最適に近いサンプル複雑性を持つ接地表現への線形収束を線形設定で得ることを実証する。この結果は,例えばメタラーニングやマルチタスクラーニングなどにおいて,データ分布間の共有低次元表現を学習することを目的とした,幅広い問題に対するフェデレートラーニング以上の関心を持っている。さらに,ヘテロジニアスデータを用いたフェデレーション環境において,代替型フェデレーション学習手法よりも経験的改善がみられた。 Deep neural networks have shown the ability to extract universal feature representations from data such as images and text that have been useful for a variety of learning tasks. However, the fruits of representation learning have yet to be fully-realized in federated settings. Although data in federated settings is often non-i.i.d. across clients, the success of centralized deep learning suggests that data often shares a global feature representation, while the statistical heterogeneity across clients or tasks is concentrated in the labels. Based on this intuition, we propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client. Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation. We prove that this method obtains linear convergence to the ground-truth representation with near-optimal sample complexity in a linear setting, demonstrating that it can efficiently reduce the problem dimension for each client. This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions, for example in meta-learning and multi-task learning. Further, extensive experimental results show the empirical improvement of our method over alternative personalized federated learning approaches in federated environments with heterogeneous data.	翻訳日:2023-03-29 05:04:19 公開日:2023-03-24
# 決定論的画像分類器のシーン不確かさとウェリントン後方 Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers ( http://arxiv.org/abs/2106.13870v2 ) ライセンス: Link先を確認	Stephanie Tsuei, Aditya Golatkar, Stefano Soatto	(参考訳) 本研究では,画像分類器の出力結果の不確実性を評価する手法を提案する。画像分類によく使用されるディープニューラルネットワークは、入力画像から出力クラスへの決定論的マップである。そのため、不確実性を定義し、測定し、解釈し、結果に「自信」を帰結させる際に、どのような変動性について言及しているかを明確にする必要がある。この目的のために、Wellington Posteriorは、与えられた画像を生成する同じシーンから生成される可能性のあるデータに応答して得られる結果の分布である。任意の画像を生成できるシーンは無限に多いため、ウェリントン・ポストミラーは描かれたもの以外のシーンから誘導的に移動する。本研究では,データ拡張,ドロップアウト,センシング,単一視点再構成,モデル線形化によるウェリントン後方の計算について検討する。その他の方法は、生成逆数ネットワーク、神経放射場、条件付き事前ネットワークなどの条件付き生成モデルの使用を含む。提案手法は,同じシーンの複数の画像に対して推論を行うことにより得られた経験的後部に対して検証する。これらの開発は、安全クリティカルなアプリケーションや人間の解釈と互換性のある方法でディープネットワーク分類器の信頼性を評価するための小さな一歩にすぎない。 We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As such, their outcome on a given datum involves no uncertainty, so we must specify what variability we are referring to when defining, measuring and interpreting uncertainty, and attributing "confidence" to the outcome. To this end, we introduce the Wellington Posterior, which is the distribution of outcomes that would have been obtained in response to data that could have been generated by the same scene that produced the given image. Since there are infinitely many scenes that could have generated any given image, the Wellington Posterior involves inductive transfer from scenes other than the one portrayed. We explore the use of data augmentation, dropout, ensembling, single-view reconstruction, and model linearization to compute a Wellington Posterior. Additional methods include the use of conditional generative models such as generative adversarial networks, neural radiance fields, and conditional prior networks. We test these methods against the empirical posterior obtained by performing inference on multiple images of the same underlying scene. These developments are only a small step towards assessing the reliability of deep network classifiers in a manner that is compatible with safety-critical applications and human interpretation.	翻訳日:2023-03-29 04:09:45 公開日:2023-03-24
# 慢性ホームレスの予測:クライアント履歴を用いたアルゴリズム比較の重要性 Predicting Chronic Homelessness: The Importance of Comparing Algorithms using Client Histories ( http://arxiv.org/abs/2105.15080v2 ) ライセンス: Link先を確認	Geoffrey G. Messier, Caleb John, Ayush Malik	(参考訳) 本研究は, 住宅計画の好適候補を特定するために, 慢性ホームレスの予測アルゴリズムを最適に比較する方法を検討する。予測手法は、潜在的に慢性的なシェルター利用者を住居に迅速に参照することができるが、時には慢性的な(偽陽性の)個人を誤って識別することもある。私たちはシェルターアクセス履歴を使って、これらの偽陽性が住宅にとって良い候補であることを示す。本研究では,より複雑なロジスティック回帰アルゴリズムとニューラルネットワークアルゴリズムを用いて,慢性ホームレスの予測のための単純なしきい値法を比較した。従来の二分分類性能指標では、機械学習アルゴリズムはしきい値法よりも優れた性能を示すが、3つのアルゴリズムで同定されたコホートのシェルターアクセス履歴を調べると、非常に類似した特徴を持つグループを選択することが示される。しきい値技術は、機械学習アルゴリズムよりもはるかに単純な情報技術基盤を使って実装できるため、リソース制約のある非営利組織にとって重要な意味を持つ。 This paper investigates how to best compare algorithms for predicting chronic homelessness for the purpose of identifying good candidates for housing programs. Predictive methods can rapidly refer potentially chronic shelter users to housing but also sometimes incorrectly identify individuals who will not become chronic (false positives). We use shelter access histories to demonstrate that these false positives are often still good candidates for housing. Using this approach, we compare a simple threshold method for predicting chronic homelessness to the more complex logistic regression and neural network algorithms. While traditional binary classification performance metrics show that the machine learning algorithms perform better than the threshold technique, an examination of the shelter access histories of the cohorts identified by the three algorithms show that they select groups with very similar characteristics. This has important implications for resource constrained not-for-profit organizations since the threshold technique can be implemented using much simpler information technology infrastructure than the machine learning algorithms.	翻訳日:2023-03-29 04:08:25 公開日:2023-03-24
# SQUID: 教師なし異常検出のためのディープ・フィーチャー・イン・パインティング SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2111.13495v3 ) ライセンス: Link先を確認	Tiange Xiang, Yixiao Zhang, Yongyi Lu, Alan L. Yuille, Chaoyi Zhang, Weidong Cai, Zongwei Zhou	(参考訳) 放射線画像撮影プロトコルは特定の身体領域に焦点をあてるため、非常に類似した画像が生成され、患者全体の解剖学的構造が繰り返される。本研究では,この構造化情報を活用するために,空間認識型メモリキューを用いて,X線画像からの異常検出を行う(略してSQUID)。 SQUIDは, 微細な解剖学的構造を逐次パターンに分類でき, 推測では画像中の異常(見えない/修正されたパターン)を識別できる。 SQUIDは、AUC(Area Under the Curve)によって測定された2つの胸部X線ベンチマークデータセットにおいて、教師なし異常検出の13の最先端手法を少なくとも5ポイント超えた。さらに,胸部解剖学における空間相関と一貫した形状を合成する新しいデータセット (DigitAnatomy) も作成した。我々は,DigiAnatomyが異常検出手法の開発,評価,解釈を促進できることを期待している。 Radiography imaging protocols focus on particular body regions, therefore producing images of great similarity and yielding recurrent anatomical structures across patients. To exploit this structured information, we propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID). We show that SQUID can taxonomize the ingrained anatomical structures into recurrent patterns; and in the inference, it can identify anomalies (unseen/modified patterns) in the image. SQUID surpasses 13 state-of-the-art methods in unsupervised anomaly detection by at least 5 points on two chest X-ray benchmark datasets measured by the Area Under the Curve (AUC). Additionally, we have created a new dataset (DigitAnatomy), which synthesizes the spatial correlation and consistent shape in chest anatomy. We hope DigitAnatomy can prompt the development, evaluation, and interpretability of anomaly detection methods.	翻訳日:2023-03-29 03:59:59 公開日:2023-03-24
# 教師付き学習のための情報理論枠組み An Information-Theoretic Framework for Supervised Learning ( http://arxiv.org/abs/2203.00246v6 ) ライセンス: Link先を確認	Hong Jun Jeon and Yifan Zhu and Benjamin Van Roy	(参考訳) ディープラーニングは毎年、より深く広いニューラルネットワークを使って、新しい、そして改善された経験的な結果を示す。一方、既存の理論的枠組みでは、パラメータをカウントしたり、指数関数的なサンプル複雑性境界に遭遇することなく、2層以上のネットワークを解析することは困難である。おそらく、異なるレンズの下で現代の機械学習を分析するのは実りあるかもしれない。本稿では,機械学習のデータ要求を分析するために,後悔とサンプルの複雑さという独自の概念を持つ新しい情報理論フレームワークを提案する。このフレームワークでは,まずスカラー推定や線形回帰といった古典的な例を通して直観を構築し,一般的な手法を導入する。次に,このフレームワークを用いて,reluアクティベーションユニットを用いたディープニューラルネットワークが生成するデータから学習のサンプル複雑性を調べる。重みに関する特定の事前分布に対して、幅が独立で深さが線形なサンプル複雑性境界を確立する。この事前分布は、高い確率で合理的に正確な低次元近似を許容する高次元の潜在表現をもたらす。我々は、ランダム単一隠れ層ニューラルネットワークの実験解析により、理論結果を裏付ける。 Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth. This prior distribution gives rise to high-dimensional latent representations that, with high probability, admit reasonably accurate low-dimensional approximations. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.	翻訳日:2023-03-29 03:52:12 公開日:2023-03-24
# pgmax:離散確率グラフィカルモデルのための因子グラフとjaxにおけるループ的信念伝播 PGMax: Factor Graphs for Discrete Probabilistic Graphical Models and Loopy Belief Propagation in JAX ( http://arxiv.org/abs/2202.04110v4 ) ライセンス: Link先を確認	Guangyao Zhou, Antoine Dedieu, Nishanth Kumar, Wolfgang Lehrach, Miguel L\'azaro-Gredilla, Shrinu Kushagra, Dileep George	(参考訳) PGMaxはオープンソースのPythonパッケージである (a)離散確率図式モデル(PGM)を因子グラフとして容易に特定し、 b) JAXで効率よくスケーラブルなループ的信念伝達(LBP)を自動的に実行します。 pgmaxは扱いやすい因子を持つ一般的な因子グラフをサポートし、gpuのような現代的なアクセラレータを推論に活用している。 PGMaxは既存の代替手法と比較して、最大3桁のマグニチュード推論時間で高品質な推論結果が得られる。 PGMaxは急速に成長しているJAXエコシステムとシームレスに相互作用し、新しい研究可能性を開く。ソースコード、例、ドキュメントはhttps://github.com/deepmind/PGMax.orgで公開されています。 PGMax is an open-source Python package for (a) easily specifying discrete Probabilistic Graphical Models (PGMs) as factor graphs; and (b) automatically running efficient and scalable loopy belief propagation (LBP) in JAX. PGMax supports general factor graphs with tractable factors, and leverages modern accelerators like GPUs for inference. Compared with existing alternatives, PGMax obtains higher-quality inference results with up to three orders-of-magnitude inference time speedups. PGMax additionally interacts seamlessly with the rapidly growing JAX ecosystem, opening up new research possibilities. Our source code, examples and documentation are available at https://github.com/deepmind/PGMax.	翻訳日:2023-03-29 03:50:36 公開日:2023-03-24
# PARC: エネルギー材料のメソスケール反応力学を同化するための物理対応リカレント畳み込みニューラルネットワーク PARC: Physics-Aware Recurrent Convolutional Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials ( http://arxiv.org/abs/2204.07234v3 ) ライセンス: Link先を確認	Phong C.H. Nguyen, Yen-Thi Nguyen, Joseph B. Choi, Pradeep K. Seshadri, H.S. Udaykumar, and Stephen Baek	(参考訳) 衝撃開始エネルギー材料(EM)の熱力学的応答は、そのミクロ構造の影響を強く受け、「材料・バイ・デザイン」フレームワークでEMマイクロ構造を設計する機会を与える。しかし、複雑なem構造-プロパティー-パフォーマンスリンクを構築するには、多くのシミュレーションが必要となるため、現在の設計プラクティスは限られている。本稿では,高分解能直接数値シミュレーション(dns)からemのメソスケール熱力学を学習できるディープラーニングアルゴリズムである,物理量認識リカレント畳み込み(parc)ニューラルネットワークを提案する。検証結果によると、PARCは衝撃を受けたEMの機械的応答をDNSに匹敵する精度で予測できるが、計算時間は明らかに少ない。 PARCの物理学的認識はモデリング能力と一般化性を高める。また, PARCにおける人工ニューロンの可視化は, EM熱力学の重要な面に光を当てることや, EMを概念化するための追加レンズを提供することも実証した。 The thermo-mechanical response of shock-initiated energetic materials (EM) is highly influenced by their microstructures, presenting an opportunity to engineer EM microstructure in a "materials-by-design" framework. However, the current design practice is limited, as a large ensemble of simulations is required to construct the complex EM structure-property-performance linkages. We present the Physics-Aware Recurrent Convolutional (PARC) Neural Network, a deep-learning algorithm capable of learning the mesoscale thermo-mechanics of EM from a modest number of high-resolution direct numerical simulations (DNS). Validation results demonstrated that PARC could predict the themo-mechanical response of shocked EM with a comparable accuracy to DNS but with notably less computation time. The physics awareness of PARC enhances its modeling capabilities and generalizability, especially when challenged in unseen prediction scenarios. We also demonstrate that visualizing the artificial neurons at PARC can shed light on important aspects of EM thermos-mechanics and provide an additional lens for conceptualizing EM.	翻訳日:2023-03-29 03:42:32 公開日:2023-03-24
# SunStage: ライトステージとしての太陽を用いたポートレート再構築とリライティング SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage ( http://arxiv.org/abs/2204.03648v2 ) ライセンス: Link先を確認	Yifan Wang, Aleksander Holynski, Xiuming Zhang and Xuaner Zhang	(参考訳) ライトステージは一連のキャリブレーションされたカメラとライトを使用して、さまざまな照明と視点の下で被写体の顔の外観をキャプチャする。この捉えた情報は、顔の復元とリライトに欠かせない。残念なことに、ライトステージは高価であり、建設と運用に重要な技術的専門知識を必要とする。本稿では、スマートフォンカメラと太陽のみを使用して、同等のデータをキャプチャする、ライトステージの軽量な代替手段であるSunStageを紹介する。提案手法では, 自撮り動画を屋外で撮影し, 位置を回転させ, 顔形状, 反射率, カメラポーズ, 照明パラメータの同時再構成の指針として, 太陽と顔の角度の異なる角度を用いる。本手法は,未校正環境にもかかわらず,顔の外観や形状を再現し,リライティング,新しいビュー合成,リフレクタンス編集などの魅力的な効果を期待できる。結果とインタラクティブなデモはhttps://sunstage.cs.washington.edu/で見ることができる。 A light stage uses a series of calibrated cameras and lights to capture a subject's facial appearance under varying illumination and viewpoint. This captured information is crucial for facial reconstruction and relighting. Unfortunately, light stages are often inaccessible: they are expensive and require significant technical expertise for construction and operation. In this paper, we present SunStage: a lightweight alternative to a light stage that captures comparable data using only a smartphone camera and the sun. Our method only requires the user to capture a selfie video outdoors, rotating in place, and uses the varying angles between the sun and the face as guidance in joint reconstruction of facial geometry, reflectance, camera pose, and lighting parameters. Despite the in-the-wild un-calibrated setting, our approach is able to reconstruct detailed facial appearance and geometry, enabling compelling effects such as relighting, novel view synthesis, and reflectance editing. Results and interactive demos are available at https://sunstage.cs.washington.edu/.	翻訳日:2023-03-29 03:42:12 公開日:2023-03-24
# 慢性緊急時在宅シェルタークライアントの早期同定のためのルール検索フレームワーク A Rule Search Framework for the Early Identification of Chronic Emergency Homeless Shelter Clients ( http://arxiv.org/abs/2205.09883v2 ) ライセンス: Link先を確認	Caleb John and Geoffrey G. Messier	(参考訳) 本稿では,長期ないし慢性的なシェルターユーザになるリスクのある緊急避難所クライアントの早期識別にルールサーチ手法を用いる。 4万人以上の個人との12年間のサービスインタラクションを含む、北米の主要シェルターのデータセットを使用して、unordered search(opus)アルゴリズムを最適化したpruningは、直感的かつ効果的なルールを開発するために使用される。ルールは、リスクの高いクライアントを支援的な住宅に移行するための住宅プログラムのリアルタイム配信と互換性のあるフレームワーク内で評価される。その結果, 本研究の手法を適用した場合, 慢性シェルター使用リスクのクライアント識別の中央値が297日から162日に低下することが認められた。 This paper uses rule search techniques for the early identification of emergency homeless shelter clients who are at risk of becoming long term or chronic shelter users. Using a data set from a major North American shelter containing 12 years of service interactions with over 40,000 individuals, the optimized pruning for unordered search (OPUS) algorithm is used to develop rules that are both intuitive and effective. The rules are evaluated within a framework compatible with the real-time delivery of a housing program meant to transition high risk clients to supportive housing. Results demonstrate that the median time to identification of clients at risk of chronic shelter use drops from 297 days to 162 days when the methods in this paper are applied.	翻訳日:2023-03-29 03:33:47 公開日:2023-03-24
# ランダム特徴回帰モデルのための最適活性化関数 Optimal Activation Functions for the Random Features Regression Model ( http://arxiv.org/abs/2206.01332v3 ) ライセンス: Link先を確認	Jianxin Wang and Jos\'e Bento	(参考訳) 近年,ランダム特徴回帰モデル(rfr)の漸近的平均二乗検定誤差と感度が研究されている。我々はこの研究に基づいて、異なる関数parsimonyの概念の下でrfrのテストエラーと感度の組み合わせを最小化するアクティベーション関数ファミリー(afs)をクローズドフォームで特定する。最適afsが線形、飽和線型関数、あるいはエルミート多項式を用いて表現可能なシナリオを見いだす。最後に, 最適afsの利用が, 二重降下曲線などのrfrモデルの確立した特性や, 観測騒音レベルに対する最適正規化パラメータの依存性にどのように影響するかを示す。 The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well-established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level.	翻訳日:2023-03-29 03:22:56 公開日:2023-03-24
# ジェネリックイベント境界キャプション用デュアルストリームトランス Dual-Stream Transformer for Generic Event Boundary Captioning ( http://arxiv.org/abs/2207.03038v3 ) ライセンス: Link先を確認	Xin Gu, Hanhua Ye, Guang Chen, Yufei Wang, Libo Zhang, Longyin Wen	(参考訳) 本稿では,CVPR2022ジェネリックイベント境界キャプタリング(GEBC)コンペティションのチャンピオンソリューションについて述べる。 GEBCは、キャプションモデルに対して、所定のビデオ境界付近の即時的なステータス変更の理解を必要とするため、従来のビデオキャプションタスクよりもはるかに難しい。本稿では,映像コンテンツエンコーディングとキャプション生成の両面で改善したデュアルストリームトランスを提案する。さらに,境界の型をヒントとして活用し,モデルによるキャプション生成を支援する。 2) 境界キャプションの識別表現を学習するために,特にDual-Stream Transformerと呼ばれるモデルの設計を行う。 3) 内容関連文や人間ライクなキャプションの作成に向けて, 単語レベルのアンサンブル戦略をデザインし, 記述品質の向上を図る。 GEBCテストスプリットの有望な結果は,提案モデルの有効性を示すものである。 This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition. GEBC requires the captioning model to have a comprehension of instantaneous status changes around the given video boundary, which makes it much more challenging than conventional video captioning task. In this paper, a Dual-Stream Transformer with improvements on both video content encoding and captions generation is proposed: (1) We utilize three pre-trained models to extract the video features from different granularities. Moreover, we exploit the types of boundary as hints to help the model generate captions. (2) We particularly design an model, termed as Dual-Stream Transformer, to learn discriminative representations for boundary captioning. (3) Towards generating content-relevant and human-like captions, we improve the description quality by designing a word-level ensemble strategy. The promising results on the GEBC test split demonstrate the efficacy of our proposed model.	翻訳日:2023-03-29 03:15:06 公開日:2023-03-24
# どこから始める? フェデレーション学習における事前学習と初期化の影響について Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning ( http://arxiv.org/abs/2206.15387v3 ) ライセンス: Link先を確認	John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat	(参考訳) 連合学習の暗黙の挑戦は異質性の存在である。 \emph{Data heterogeneity} は、異なるクライアントのデータが全く異なる分散に従う可能性があるという事実を指す。 \emph{system heterogeneity} は異なるシステム能力を持つクライアントデバイスを指す。かなりの数の最適化手法がこの課題に対処する。文献では、経験的評価は通常ランダム初期化から連邦訓練を開始する。しかし、フェデレーション学習の多くの実用的な応用において、サーバーは、フェデレーショントレーニングを開始する前にモデルの事前トレーニングに使用できるトレーニングタスクのプロキシデータにアクセスすることができる。 4つの標準フェデレーション学習ベンチマークデータセットを用いて、フェデレーション学習における事前学習モデルからの開始の影響を実証的に検討する。当然ながら、事前訓練されたモデルから始めると、目標エラー率に達するのに必要なトレーニング時間を短縮し、ランダム初期化から始める場合よりも正確なモデルのトレーニング(最大40%)を可能にする。驚くべきことに、事前訓練された初期化からフェデレート学習を始めることで、データとシステムの不均一性が低下する。我々は,ランダムおよび事前学習初期化から始める際の性能を評価するために,フェデレーション最適化手法の提案と評価を推奨する。本研究は,連帯最適化における不均一性の役割を理解するためのさらなる研究として,いくつかの疑問を提起する。 https://github.com/facebookresearch/where_to_begin}} An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to client devices having different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical evaluations usually start federated training from random initialization. However, in many practical applications of federated learning, the server has access to proxy data for the training task that can be used to pre-train a model before starting federated training. Using four standard federated learning benchmark datasets, we empirically study the impact of starting from a pre-trained model in federated learning. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. We recommend future work proposing and evaluating federated optimization methods to evaluate the performance when starting from random and pre-trained initializations. This study raises several questions for further work on understanding the role of heterogeneity in federated optimization. \footnote{Our code is available at: \url{https://github.com/facebookresearch/where_to_begin}}	翻訳日:2023-03-29 03:14:03 公開日:2023-03-24
# 強電界電離におけるサブバリアリコライジョンとトンネル時間遅延の3つのクラス Sub-barrier recollisions and the three classes of tunneling time delays in strong-field ionization ( http://arxiv.org/abs/2208.10946v2 ) ライセンス: Link先を確認	Michael Klaiber, Daniel Bakucz Can\'ario, and Karen Z. Hatsagortsyan	(参考訳) トンネルイオン化は、サブバリア再結合と直接イオン化経路の干渉によって引き起こされる光電子運動量分布の特定のシフトとして漸近的に観察される負の時間遅延によって特徴づけられる。対照的に、波動関数のピークに続く \textit{gedankenexperiment} は、直接イオン化経路のみを考慮してトンネル出口で正のトンネル時間遅延を示す。本稿では,トンネル出口の時間遅延パターンに対するサブバリア再結合の影響について検討する。その結果、直接軌道と再衝突軌道の干渉により、出口でのトンネル時間遅延は漸近時間遅延と等しい値に減少するが、その大きさは正の値となる。最後に, トンネル時間に対処する光: 科学・応用 11, 1 (2022) の最近の実験について検討する。理論モデルを用いた実験結果の解析は,トンネルイオン化の新しい時間特性であるトンネル波パケットの開始を記述した時間遅延を導入する物理的必要性を示している。 Tunneling ionization is characterized by a negative time delay, observed asymptotically as a specific shift of the photoelectron momentum distribution, which is caused by the interference of the sub-barrier recolliding and direct ionization paths. In contrast, a \textit{Gedankenexperiment} following the peak of the wavefunction shows a positive tunneling time delay at the tunnel exit, considering only the direct ionization path. In this paper, we investigate the effects of sub-barrier recollisions on the time delay pattern at the tunnel exit. We conclude that the interference of the direct and recolliding trajectories decreases the tunneling time delay at the exit by the value equal to the asymptotic time delay maintaining, however, its sizeable positive value. Finally, we discuss the recent experiment [Light: Science \& Applications 11, 1 (2022)] addressing the tunneling time in a modified two-color attoclock setup. The analysis of the experimental findings with our theoretical model indicates the physical necessity to introduce a new time characteristic for tunneling ionization -- the time delay describing the initiation of the tunneling wave packet.	翻訳日:2023-03-29 02:54:26 公開日:2023-03-24
# 自然言語処理の効率的な手法に関する研究 Efficient Methods for Natural Language Processing: A Survey ( http://arxiv.org/abs/2209.00099v2 ) ライセンス: Link先を確認	Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, Andr\'e F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz	(参考訳) 自然言語処理(NLP)における最近の研究は、モデルパラメータのスケーリングとトレーニングデータから魅力的な結果を得ているが、性能向上のためにスケールのみを使用することで、資源消費も増大している。そのようなリソースには、データ、時間、ストレージ、エネルギーが含まれており、それらは自然に制限され、均等に分散している。これにより、同様の結果を得るのに少ないリソースを必要とする効率的な方法の研究が動機となる。本研究は, 効率的なNLPにおける現在の手法と知見を合成し, 関連づけるものである。我々は,限られた資源下でNLPを実施するためのガイダンスと,より効率的な手法を開発するための有望な研究方向性の両立を目指す。 Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.	翻訳日:2023-03-29 02:44:33 公開日:2023-03-24
# Augraphy: ドキュメントイメージのためのデータ拡張ライブラリ Augraphy: A Data Augmentation Library for Document Images ( http://arxiv.org/abs/2208.14558v2 ) ライセンス: Link先を確認	Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman	(参考訳) 本稿では,実際の文書画像データセットによく見られる歪みを生成するデータ拡張パイプラインを構築するためのPythonライブラリであるAugraphyを紹介する。 Augraphyは、印刷、スキャン、古いマシンや汚れたマシンによるファックス化、時間の経過とともにインクの劣化、手書きのマーキングなど、標準的なオフィス操作によって変更されているように見えるクリーンなドキュメントイメージの強化版を作成するための多くの戦略を提供することによって、他のデータ拡張ツールとは異なっている。本稿では,Aaugraphyツールについて論じ,文書記述などのタスクのための多様なトレーニングデータを生成するためのデータ拡張ツールや,文書画像モデリングタスクにおけるモデルロバスト性を評価するための挑戦的なテストデータを生成するツールとしての利用方法を紹介する。 This paper introduces Augraphy, a Python library for constructing data augmentation pipelines which produce distortions commonly seen in real-world document image datasets. Augraphy stands apart from other data augmentation tools by providing many different strategies to produce augmented versions of clean document images that appear as if they have been altered by standard office operations, such as printing, scanning, and faxing through old or dirty machines, degradation of ink over time, and handwritten markings. This paper discusses the Augraphy tool, and shows how it can be used both as a data augmentation tool for producing diverse training data for tasks such as document denoising, and also for generating challenging test data to evaluate model robustness on document image modeling tasks.	翻訳日:2023-03-29 02:44:07 公開日:2023-03-24
# オンラインmin-sumセットカバーの改良アルゴリズム An Improved Algorithm For Online Min-Sum Set Cover ( http://arxiv.org/abs/2209.04870v3 ) ライセンス: Link先を確認	Marcin Bienkowski, Marcin Mucha	(参考訳) 我々は、アルゴリズムが規則付き$n$要素のリストを維持するオンライン嗜好集約の基本的なモデルについて研究する。入力は望ましい集合 $r_1, r_2, \dots, r_t, \dots$ のストリームである。 R_t$を見た後、将来の集合の知識がなければ、アルゴリズムは要素を再ランクし(リストの順序を変更する)、リストフロントの少なくとも1つの要素を見つける必要がある。発生したコストは、リスト更新コスト(隣接するリスト要素のスワップ数)とアクセスコスト(リスト上の$R_t$の最初の要素の配置)の合計である。このシナリオは、オンラインショップにおける商品の注文のような、ショップ顧客の選好を集約したアプリケーションで自然に発生する。この問題の理論的基盤はMin-Sum Set Coverとして知られている。オンラインアルゴリズムALGの静的最適解(単一最適リスト順序付け)に対する性能を主に研究した以前の研究 (Fotakis et al., ICALP 2020, NIPS 2020) とは異なり、本論文では、ベンチマークが証明可能なより強力な最適動的解 OPT (リスト順序付けも変更できる) である、明らかに難しい変種について検討する。オンラインショップの観点では、ユーザーベース全体の嗜好が時間とともに進化することを意味している。我々は、競争比が$O(r^2)$である計算効率の良いランダム化アルゴリズムを構築し、決定論的な$O(r^4)$-競争性アルゴリズムの存在を証明する。ここで、$r$は集合の最大濃度$R_t$である。この問題に対する最善のアルゴリズムは$o(r^{3/2} \cdot \sqrt{n})$-競合であり、$\omega(r)$は任意の決定論的オンラインアルゴリズムのパフォーマンスに対する下限である。 We study a fundamental model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements. An input is a stream of preferred sets $R_1, R_2, \dots, R_t, \dots$. Upon seeing $R_t$ and without knowledge of any future sets, an algorithm has to rerank elements (change the list ordering), so that at least one element of $R_t$ is found near the list front. The incurred cost is a sum of the list update costs (the number of swaps of neighboring list elements) and access costs (position of the first element of $R_t$ on the list). This scenario occurs naturally in applications such as ordering items in an online shop using aggregated preferences of shop customers. The theoretical underpinning of this problem is known as Min-Sum Set Cover. Unlike previous work (Fotakis et al., ICALP 2020, NIPS 2020) that mostly studied the performance of an online algorithm ALG against the static optimal solution (a single optimal list ordering), in this paper, we study an arguably harder variant where the benchmark is the provably stronger optimal dynamic solution OPT (that may also modify the list ordering). In terms of an online shop, this means that the aggregated preferences of its user base evolve with time. We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r^2)$ and prove the existence of a deterministic $O(r^4)$-competitive algorithm. Here, $r$ is the maximum cardinality of sets $R_t$. This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r^{3/2} \cdot \sqrt{n})$-competitive and $\Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.	翻訳日:2023-03-29 02:36:05 公開日:2023-03-24
# AdaGrad on $\R^{d}$:Beyond Convexity, Non-Asymptotic Rate and Accelerationについて On the Convergence of AdaGrad on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration ( http://arxiv.org/abs/2209.14827v2 ) ライセンス: Link先を確認	Zijian Liu, Ta Duy Nguyen, Alina Ene, Huy L. Nguyen	(参考訳) 滑らかな凸最適化のためのAdaGradや他の適応手法の既存の分析は、典型的には有界領域径を持つ関数に対して行われる。制約のない問題では、以前の研究は関数クラス全体に真となる明示的な定数因子を伴わない漸近収束率を保証する。さらに、確率的設定では、AdaGradの修正版のみが、一般的に使われているものと異なり、最新の勾配はステップサイズを更新するのに使われていない。本稿では,これらのギャップを埋め,AdaGradとその変種を滑らかな凸関数の標準設定およびより一般的なクエーサー凸関数の設定でより深く理解することを目的とする。まず,バニラAdaGradの収束率を決定論的,確率的両面の制約のない問題に明示的に拘束する手法を示す。第二に、平均的な反復ではなく、最後の反復の収束を示すことのできる AdaGrad の変種を提案する。最後に,問題パラメータに明示的に依存した決定論的設定において,新しい高速化適応アルゴリズムと収束保証を与え,先行研究で示された漸近速度を改善した。 Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.	翻訳日:2023-03-29 02:25:36 公開日:2023-03-24
# 異種対話生成のための等サイズハードEMアルゴリズム An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation ( http://arxiv.org/abs/2209.14627v2 ) ライセンス: Link先を確認	Yuqiao Wen, Yongchang Hao, Yanshuai Cao, Lili Mou	(参考訳) オープンドメイン対話システムは、自然言語テキストを通じて人間と対話することを目的としている。近年のChatGPTのような超大規模対話システムの成功にもかかわらず、中～小規模の対話システムの方が軽量でアクセスしやすいため、現在でも一般的な方法である。本研究では,多様な対話生成のためのマルチデコーダモデルをトレーニングするためのEqHard-EMアルゴリズムを提案する。このアルゴリズムはサンプルをハードな方法でデコーダに割り当て、さらに全てのデコーダが十分に訓練されていることを保証するために等割り当て制約を課す。我々はアプローチを正当化するために詳細な理論的分析を提供する。さらに,2つの大規模オープンドメイン対話データセットの実験により,我々のEqHard-EMアルゴリズムが高品質な多様な応答を生成することを確認した。 Open-domain dialogue systems aim to interact with humans through natural language texts in an open-ended fashion. Despite the recent success of super large dialogue systems such as ChatGPT, using medium-to-small-sized dialogue systems remains the common practice as they are more lightweight and accessible; however, generating diverse dialogue responses is challenging, especially with smaller models. In this work, we propose an Equal-size Hard Expectation--Maximization (EqHard-EM) algorithm to train a multi-decoder model for diverse dialogue generation. Our algorithm assigns a sample to a decoder in a hard manner and additionally imposes an equal-assignment constraint to ensure that all decoders are well-trained. We provide detailed theoretical analysis to justify our approach. Further, experiments on two large-scale open-domain dialogue datasets verify that our EqHard-EM algorithm generates high-quality diverse responses.	翻訳日:2023-03-29 02:25:18 公開日:2023-03-24
# 硬質単分子非剛体3次元再構成技術の現状 State of the Art in Dense Monocular Non-Rigid 3D Reconstruction ( http://arxiv.org/abs/2210.15664v2 ) ライセンス: Link先を確認	Edith Tretschk, Navami Kairanda, Mallikarjun B R, Rishabh Dabral, Adam Kortylewski, Bernhard Egger, Marc Habermann, Pascal Fua, Christian Theobalt, Vladislav Golyanik	(参考訳) モノキュラーな2次元画像からの変形可能な(または非剛性)シーンの3次元再構成は、コンピュータビジョンとグラフィックスの長年にわたる活発な研究領域である。なぜなら、追加の仮定なしでは、入力された2D画像への正確な投影につながる無限に多くの解を許すからである。非剛性再構築は、ロボット工学、AR/VR、視覚コンテンツ作成といった下流アプリケーションのための基礎的なビルディングブロックである。単眼カメラを使用する主な利点は、全能性とエンドユーザへの可用性であり、ステレオやマルチビューシステムのようなより洗練されたカメラセットと比べて使いやすさである。本研究は, モノキュラ映像やモノキュラビューのセットから, 様々な変形可能な物体と複合シーンの密集した非剛性3次元再構成のための最先端手法に焦点をあてたものである。 2次元画像観察から3次元再構成と変形モデリングの基礎を考察する。次に、任意の場面を処理し、いくつかの前提を下す一般的な方法から始め、観察対象や変形の種類(例えば、人間の顔、体、手、動物)についてより強い仮定を行う技術へと進む。このSTARの重要な部分は、手法の分類と高レベルの比較、および、議論された手法のトレーニングと評価のためのデータセットの概要にも費やされている。本稿では,その分野におけるオープンな課題と,レビュー手法の活用に関連する社会的側面について論じる。 3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since -- without additional prior assumptions -- it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods -- that handle arbitrary scenes and make only a few prior assumptions -- and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.	翻訳日:2023-03-29 02:10:00 公開日:2023-03-24
# 緊急避難所アクセスパターンの簡易理解法 A Simpler Method for Understanding Emergency Shelter Access Patterns ( http://arxiv.org/abs/2210.13619v2 ) ライセンス: Link先を確認	Geoffrey G. Messier	(参考訳) Simplified Access Metric (SAM)は、シェルタークライアント脆弱性の尺度として、緊急シェルターアクセスパターンを特徴付ける新しいアプローチである。 SAMの目標は、スプレッドシート操作を使用して非技術スタッフが実装可能なアクセスパターンを直感的に理解するためのシェルターオペレータを提供することである。北米の大きなシェルターからのクライアントデータは、samが従来のトランジショナル、エピソディック、慢性的なクライアントクラスタ分析と同じような結果を生成することを示すために使用される。 SAMはクラスタ分析よりも少ないデータを必要とするため、外部要因によるシェルターアクセスパターンの影響のリアルタイムな画像を生成することもできる。 samを使った9年間のシェルタークライアントデータから生成されたタイムラインは、ハウジングファーストプログラミングとcovid-19ロックダウンがシェルターへのアクセス方法に与える影響を示しています。最後にSAMは、シェルタースタッフが移行、エピソード、慢性的なラベルを割り当てるだけでなく、SAMの"ソフト"出力を直接脆弱性の尺度として使うことができる。 The Simplified Access Metric (SAM) is a new approach for characterizing emergency shelter access patterns as a measure of shelter client vulnerability. The goal of SAM is to provide shelter operators with an intuitive way to understand access patterns that can be implemented by non-technical staff using spreadsheet operations. Client data from a large North American shelter will be used to demonstrate that SAM produces similar results to traditional transitional, episodic and chronic client cluster analysis. Since SAM requires less data than cluster analysis, it is also able to generate a real time picture of how shelter access patterns are affected by external factors. Timelines generated from nine years of shelter client data using SAM demonstrate the impact of Housing First programming and the COVID-19 lockdown on how people access shelter. Finally, SAM allows shelter staff to move beyond assigning transitional, episodic and chronic labels and instead use the "soft" output of SAM directly as a measure of vulnerability.	翻訳日:2023-03-29 02:08:30 公開日:2023-03-24
# 最低ランダウ準位における量子誤差補正 Quantum Error Correction in the Lowest Landau Level ( http://arxiv.org/abs/2210.16957v2 ) ライセンス: Link先を確認	Yale Fan, Willy Fischler, Eric Kubischta	(参考訳) 我々は、Albert, Covey, Preskill (ACP) によって提案された量子誤り訂正符号の有限次元バージョンを開発し、非アーベル対称性群を持つ構成空間上で連続変数量子計算を行う。我々の符号は、ゴッテマン、キタエフ、プレスキル(gkp)のクディット符号の平面的実現とは対照的に、球面幾何学上のランダウ準位の荷電粒子、またはより一般にスピンコヒーレント状態によって実現できる。我々の量子誤り訂正方式は本質的に近似しており、符号化状態はgkpやacpよりも準備が容易である。 We develop finite-dimensional versions of the quantum error-correcting codes proposed by Albert, Covey, and Preskill (ACP) for continuous-variable quantum computation on configuration spaces with nonabelian symmetry groups. Our codes can be realized by a charged particle in a Landau level on a spherical geometry -- in contrast to the planar Landau level realization of the qudit codes of Gottesman, Kitaev, and Preskill (GKP) -- or more generally by spin coherent states. Our quantum error-correction scheme is inherently approximate, and the encoded states may be easier to prepare than those of GKP or ACP.	翻訳日:2023-03-29 01:57:59 公開日:2023-03-24
# パッキングとカバー制約を伴うコンテキストバンディット:回帰によるモジュールラグランジアンアプローチ Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression ( http://arxiv.org/abs/2211.07484v3 ) ライセンス: Link先を確認	Aleksandrs Slivkins and Karthik Abinav Sankararaman and Dylan J. Foster	(参考訳) 本稿では,線形制約付きコンテキスト帯域(CBwLC)について考察する。これは,アルゴリズムが全消費の線形制約を受ける複数のリソースを消費するコンテキスト帯域の変種である。この問題はknapsacks (CBwK) を用いてコンテキスト的帯域幅を一般化し、制約のパッケージ化とカバー、および正および負のリソース消費を可能にする。回帰オラクルに基づくCBwLC(CBwK)の最初のアルゴリズムを提案する。このアルゴリズムは単純で計算効率が良く、後悔は消える。 CBwKの変種には統計的に最適であり、ある制約が破られたらアルゴリズムは停止しなければならない。さらに,確率的環境を超えたCBwLC(CBwK)について,初めて消滅・回復保証を行う。私たちは、比較するより弱い(そしておそらく公平な)ベンチマークを特定することで、以前の作業から強い不可能性(impossibility)を回避します。我々のアルゴリズムは、CBwKのためのラグランジアンベースのテクニックであるLagrangeBwK(Immorlica et al., FOCS 2019)と、文脈的盗賊のための回帰ベースのテクニックであるSquareCB(Foster and Rakhlin, ICML 2020)に基づいて構築されている。我々の分析は、両方の技術の本質的なモジュラリティを活用する。 We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for the variant of CBwK in which the algorithm must stop once some constraint is violated. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.	翻訳日:2023-03-29 01:48:52 公開日:2023-03-24
# BiasBed - 厳密なテクスチャバイアス評価 BiasBed -- Rigorous Texture Bias Evaluation ( http://arxiv.org/abs/2211.13190v3 ) ライセンス: Link先を確認	Nikolai Kalischek, Rodrigo C. Daudt, Torben Peters, Reinhard Furrer, Jan D. Wegner, Konrad Schindler	(参考訳) 現代の畳み込みニューラルネットワークにおけるテクスチャバイアスの存在は、しばしば新しいドメインへの一般化を支援するために、シェイプキューに重点を置くアルゴリズムの多さにつながっている。しかし、一般的なデータセット、ベンチマーク、一般的なモデル選択戦略は欠落しており、合意された厳密な評価プロトコルは存在しない。本稿では,テクスチャバイアスを低減したトレーニングネットワークの困難さと限界について検討する。特に,手法間の適切な評価と有意義な比較は自明ではないことを示す。複数のデータセットや既存のアルゴリズムを含む、テクスチャとスタイルバイアスのトレーニングのためのテストベッドであるBiasBedを紹介します。スタイルバイアス法のかなりのトレーニング不安定さにもかかわらず、結果の重要度を測定するための厳密な仮説検証を含む広範な評価プロトコルが付属している。私たちの広範な実験は、慎重に統計的に確立されたスタイルバイアスの評価プロトコルの必要性に新たな光を当てました。例えば、文献で提案されているいくつかのアルゴリズムは、スタイルバイアスの影響を全く軽減しない。 BiasBedのリリースにより、一貫した意味のある比較の共通理解が促進され、その結果、テクスチャバイアスのない学習方法へのさらなる進歩が期待できる。コードはhttps://github.com/D1noFuzi/BiasBedで入手できる。 The well-documented presence of texture bias in modern convolutional neural networks has led to a plethora of algorithms that promote an emphasis on shape cues, often to support generalization to new domains. Yet, common datasets, benchmarks and general model selection strategies are missing, and there is no agreed, rigorous evaluation protocol. In this paper, we investigate difficulties and limitations when training networks with reduced texture bias. In particular, we also show that proper evaluation and meaningful comparisons between methods are not trivial. We introduce BiasBed, a testbed for texture- and style-biased training, including multiple datasets and a range of existing algorithms. It comes with an extensive evaluation protocol that includes rigorous hypothesis testing to gauge the significance of the results, despite the considerable training instability of some style bias methods. Our extensive experiments, shed new light on the need for careful, statistically founded evaluation protocols for style bias (and beyond). E.g., we find that some algorithms proposed in the literature do not significantly mitigate the impact of style bias at all. With the release of BiasBed, we hope to foster a common understanding of consistent and meaningful comparisons, and consequently faster progress towards learning methods free of texture bias. Code is available at https://github.com/D1noFuzi/BiasBed	翻訳日:2023-03-29 01:29:13 公開日:2023-03-24
# ObjectMatch: 標準オブジェクト対応を用いたロバスト登録 ObjectMatch: Robust Registration using Canonical Object Correspondences ( http://arxiv.org/abs/2212.01985v2 ) ライセンス: Link先を確認	Can G\"umeli, Angela Dai, Matthias Nie{\ss}ner	(参考訳) rgb-d slamパイプライン用の意味的かつオブジェクト中心のカメラポーズ推定器objectmatchを提案する。現代のカメラポーズ推定装置は、フレーム間の重なり合う領域の直接対応に依存するが、カメラフレームをほとんどあるいは全く重なり合っていない。本研究では,意味オブジェクト識別によって得られる間接対応の活用を提案する。例えば、あるフレームの前面から、別のフレームの後方からオブジェクトが見える場合、標準オブジェクト対応を通じて追加のポーズ制約を与えることができる。まず,1ピクセルあたりの対応を予測するためのニューラルネットワークを提案し,これをエネルギー定式化と最先端キーポイントマッチングと組み合わせ,共同ガウス・ニュートン最適化で解いた。本手法は,24%から45%のペアと10%以下のフレームオーバーラップを含む,最先端機能マッチングの登録リコールを改善する。 RGB-Dシークエンスを登録する場合,本手法は困難かつ低フレームレートのシナリオにおいて最先端のSLAMベースラインよりも優れ,複数のシーンにおいて軌道誤差が35%以上減少する。 We present ObjectMatch, a semantic and object-centric camera pose estimator for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlapping regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an object is seen from the front in one frame and from the back in another frame, we can provide additional pose constraints through canonical object correspondences. We first propose a neural network to predict such correspondences on a per-pixel level, which we then combine in our energy formulation with state-of-the-art keypoint matching solved with a joint Gauss-Newton optimization. In a pairwise setting, our method improves registration recall of state-of-the-art feature matching, including from 24% to 45% in pairs with 10% or less inter-frame overlap. In registering RGB-D sequences, our method outperforms cutting-edge SLAM baselines in challenging, low-frame-rate scenarios, achieving more than 35% reduction in trajectory error in multiple scenes.	翻訳日:2023-03-29 01:21:31 公開日:2023-03-24
# DAワンド:ニューラルメッシュパラメータ化を用いた歪み認識の選択 DA Wand: Distortion-Aware Selection using Neural Mesh Parameterization ( http://arxiv.org/abs/2212.06344v3 ) ライセンス: Link先を確認	Richard Liu, Noam Aigerman, Vladimir G. Kim, Rana Hanocka	(参考訳) 本稿では,メッシュパラメータ化に使用できる点周辺の局所部分領域を学習するためのニューラル手法を提案する。私たちのフレームワークの動機は、表面のデカリング、テキスト作成、ペイントに使用されるインタラクティブなワークフローにあります。我々の重要なアイデアは、ニューラルネットワークフレームワーク内で新しい微分可能パラメータ化層として実装された古典的なパラメータ化法の重みとしてセグメンテーション確率を組み込むことである。我々は,2次元にパラメータ化され,歪みによってペナル化される3次元領域を選択するようにセグメンテーションネットワークを訓練する。学習の後、ユーザは我々のシステムを使ってメッシュ上の点を対話的に選択し、低歪みパラメータ化を誘導する選択に関する大きな意味のある領域を得ることができる。私たちのコードとプロジェクトページは現在利用可能です。 We present a neural technique for learning to select a local sub-region around a point which can be used for mesh parameterization. The motivation for our framework is driven by interactive workflows used for decaling, texturing, or painting on surfaces. Our key idea is to incorporate segmentation probabilities as weights of a classical parameterization method, implemented as a novel differentiable parameterization layer within a neural network framework. We train a segmentation network to select 3D regions that are parameterized into 2D and penalized by the resulting distortion, giving rise to segmentations which are distortion-aware. Following training, a user can use our system to interactively select a point on the mesh and obtain a large, meaningful region around the selection which induces a low-distortion parameterization. Our code and project page are currently available.	翻訳日:2023-03-29 01:11:31 公開日:2023-03-24
# 2体系に対するプレボルン-オッペンハイマーディラック-クーロンブレット計算 Pre-Born-Oppenheimer Dirac-Coulomb-Breit computations for two-body systems ( http://arxiv.org/abs/2301.13477v2 ) ライセンス: Link先を確認	D\'avid Ferenc and Edit M\'atyus	(参考訳) bethe-salpeter方程式から導かれる16成分のno-pair dirac--coulomb-breit方程式は、ガウス型基底関数(例えば、ポジトロニウム、ミューオン、水素原子、ミューオン水素)を用いた変分法によって解かれる。変分エネルギーの$\alpha$ 微構造-定数依存は、$\alpha^n$ と $\alpha^n\text{ln}\alpha$ 項の関数を適合させることにより、(摂動的)非相対論的 qed フレームワークの関連するエネルギー表現と優れた一致を示し、したがって、計算相対論的 qed アプローチの開発のための確かな参照を確立する。 The sixteen-component, no-pair Dirac--Coulomb--Breit equation, derived from the Bethe--Salpeter equation, is solved in a variational procedure using Gaussian-type basis functions for the example of positronium, muonium, hydrogen atom, and muonic hydrogen. The $\alpha$ fine-structure-constant dependence of the variational energies, through fitting a function of $\alpha^n$ and $\alpha^n\text{ln}\alpha$ terms, shows excellent agreement with the relevant energy expressions of the (perturbative) non-relativistic QED framework, and thereby, establishes a solid reference for the development of a computational relativistic QED approach.	翻訳日:2023-03-29 00:45:16 公開日:2023-03-24
# K-Planes: 空間、時間、出現における露光場 K-Planes: Explicit Radiance Fields in Space, Time, and Appearance ( http://arxiv.org/abs/2301.10241v2 ) ライセンス: Link先を確認	Sara Fridovich-Keil, Giacomo Meanti, Frederik Warburg, Benjamin Recht, Angjoo Kanazawa	(参考訳) 任意の次元の放射場に対するホワイトボックスモデルであるk平面を導入する。我々のモデルは、D次元のシーンを表現するためにd choose 2平面を使用し、静的(d=3)から動的(d=4)までのシームレスな方法を提供する。この平面分解により、時間的滑らかさや多次元空間構造といった次元固有の先行要素を容易に追加でき、シーンの静的および動的成分の自然な分解を誘導する。学習カラーベースを持つ線形特徴デコーダを用いて,非線形ブラックボックスmlpデコーダと同様の性能を実現する。様々な合成、現実、静的、動的、固定、そして様々な外観シーンにおいて、kプレーンは競争力があり、しばしば最先端の再現フィリティを、メモリ使用量が少なく、完全な4Dグリッド上で1000倍の圧縮を実現し、純粋なPyTorch実装で高速な最適化を実現している。ビデオ結果とコードについては、https://sarafridov.github.io/K-Planesを参照してください。 We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition of static and dynamic components of a scene. We use a linear feature decoder with a learned color basis that yields similar performance as a nonlinear black-box MLP decoder. Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity with low memory usage, achieving 1000x compression over a full 4D grid, and fast optimization with a pure PyTorch implementation. For video results and code, please see https://sarafridov.github.io/K-Planes.	翻訳日:2023-03-29 00:43:26 公開日:2023-03-24
# ゴール認識表現学習と適応水平予測によるオープンワールドマルチタスク制御 Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction ( http://arxiv.org/abs/2301.10034v2 ) ライセンス: Link先を確認	Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang	(参考訳) 我々は、人間レベルのマルチタスクエージェントを開発するために、普及し、広くアクセスしやすく、挑戦的なオープンエンド環境であるMinecraftの目標条件ポリシーを学習する問題について研究する。まず、このような政策を学ぶ上での2つの主な課題を特定します。 1)広い場面の多様性により、国家分布からタスクが区別できないこと、及び 2)部分的可観測性に起因する環境力学の非定常性。最初の課題に取り組むために,目標関連視覚状態表現の出現を促す政策として,目標感性バックボーン(GSB)を提案する。第2の課題に取り組むために、このポリシーは非定常力学による学習の不確実性を緩和する適応的な水平予測モジュールによってさらに加速される。 20のMinecraftタスクの実験では、我々のメソッドが今までで最高のベースラインを大幅に上回っていることが示されています。我々のアブレーションと探索研究は、我々のアプローチがどのように相手を圧倒するかを説明し、新しいシーン(バイオーム)にゼロショットの一般化の驚くべきボーナスを明らかにします。当社のエージェントが,minecraftのようなオープンな環境において,目標条件とマルチタスクエージェントの学習に光を当ててくれることを願っています。 We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment for developing human-level multi-task agents. We first identify two main challenges of learning such policies: 1) the indistinguishability of tasks from the state distribution, due to the vast scene diversity, and 2) the non-stationary nature of environment dynamics caused by partial observability. To tackle the first challenge, we propose Goal-Sensitive Backbone (GSB) for the policy to encourage the emergence of goal-relevant visual state representations. To tackle the second challenge, the policy is further fueled by an adaptive horizon prediction module that helps alleviate the learning uncertainty brought by the non-stationary dynamics. Experiments on 20 Minecraft tasks show that our method significantly outperforms the best baseline so far; in many of them, we double the performance. Our ablation and exploratory studies then explain how our approach beat the counterparts and also unveil the surprising bonus of zero-shot generalization to new scenes (biomes). We hope our agent could help shed some light on learning goal-conditioned, multi-task agents in challenging, open-ended environments like Minecraft.	翻訳日:2023-03-29 00:43:05 公開日:2023-03-24
# lego-net:部屋の中のオブジェクトの定期的な並べ替えを学ぶ LEGO-Net: Learning Regular Rearrangements of Objects in Rooms ( http://arxiv.org/abs/2301.09629v2 ) ライセンス: Link先を確認	Qiuhong Anna Wei, Sijie Ding, Jeong Joon Park, Rahul Sajnani, Adrien Poulenard, Srinath Sridhar, Leonidas Guibas	(参考訳) 人間は、乱雑な部屋を掃除する仕事を普遍的に嫌います。機械がこの作業を支援するためには、複数の対称性、共線形性、または共円性、線形パターンや円形パターンの均一性、さらにスタイルや機能に関連するオブジェクト間の関係など、通常の配置に対する人間の基準を理解する必要がある。従来のアプローチでは、目標状態を明確に指定したり、スクラッチから合成したシーンを明示的に指定するために、人間の入力に依存していたが、このような方法では、目標状態を提供することなく、既存の乱雑なシーンの再配置に対応していない。本稿では,乱雑な部屋での物体の規則的な並べ替えを学習するためのデータ駆動トランスフォーマーに基づく反復的手法であるlego-netを提案する。 LEGO-Netは、部分的に拡散モデルにインスピレーションを受けています -- 最初の混乱状態から始まり、移動距離を減らしながら、オブジェクトの位置と方向を通常の状態に'de-noises'を繰り返します。プロが配置したシーンの既存のデータセットにおいて、ランダムに乱れた物体の位置と向きが与えられた場合、本手法は定期的な再配置の回復を訓練する。その結果,本手法は部屋のシーンを確実に再構成し,他の手法よりも優れていることがわかった。また,数理論機械を用いて部屋配置の規則性を評価する指標を提案する。 Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types of symmetry, co-linearity or co-circularity, spacing uniformity in linear or circular patterns, and further inter-object relationships that relate to style and functionality. Previous approaches for this task relied on human input to explicitly specify goal state, or synthesized scenes from scratch -- but such methods do not address the rearrangement of existing messy scenes without providing a goal state. In this paper, we present LEGO-Net, a data-driven transformer-based iterative method for LEarning reGular rearrangement of Objects in messy rooms. LEGO-Net is partly inspired by diffusion models -- it starts with an initial messy state and iteratively ''de-noises'' the position and orientation of objects to a regular state while reducing distance traveled. Given randomly perturbed object positions and orientations in an existing dataset of professionally-arranged scenes, our method is trained to recover a regular re-arrangement. Results demonstrate that our method is able to reliably rearrange room scenes and outperform other methods. We additionally propose a metric for evaluating regularity in room arrangements using number-theoretic machinery.	翻訳日:2023-03-29 00:42:27 公開日:2023-03-24
# ソースラベル適応による半教師付き領域適応 Semi-Supervised Domain Adaptation with Source Label Adaptation ( http://arxiv.org/abs/2302.02335v2 ) ライセンス: Link先を確認	Yu-Chu Yu and Hsuan-Tien Lin	(参考訳) Semi-Supervised Domain Adaptation (SSDA)は、いくつかのラベル付きおよび多くのラベル付きターゲットデータと関連するドメインからのラベル付きソースデータで、未表示のターゲットデータを分類する学習を含む。現在のSSDAアプローチは、通常、ターゲットデータとラベル付きソースデータとを特徴空間マッピングと擬似ラベル割り当てで整列することを目的としている。それでも、そのようなソース指向モデルは、時にターゲットデータを間違ったクラスのソースデータに合わせることができ、分類性能を低下させる。本稿では,対象データに適合するソースデータに対応する新しいソース適応パラダイムを提案する。私たちの重要なアイデアは、ソースデータを理想のターゲットデータの能動的にラベル付けされたバージョンとして見ることです。そこで本研究では,ターゲット視点から設計したロバストなクリーナーコンポーネントを用いて,ラベルノイズを動的に除去するSSDAモデルを提案する。このパラダイムは、既存のSSDAアプローチの背景にあるコアアイデアとは大きく異なるため、提案したモデルと簡単に結合して性能を向上させることができる。 2つの最先端ssdaアプローチの実験結果は、提案モデルがソースラベル内のノイズを効果的に除去し、ベンチマークデータセットをまたいだアプローチよりも優れたパフォーマンスを示すことを示している。私たちのコードはhttps://github.com/chu0802/SLA で利用可能です。 Semi-Supervised Domain Adaptation (SSDA) involves learning to classify unseen target data with a few labeled and lots of unlabeled target data, along with many labeled source data from a related domain. Current SSDA approaches usually aim at aligning the target data to the labeled source data with feature space mapping and pseudo-label assignments. Nevertheless, such a source-oriented model can sometimes align the target data to source data of the wrong classes, degrading the classification performance. This paper presents a novel source-adaptive paradigm that adapts the source data to match the target data. Our key idea is to view the source data as a noisily-labeled version of the ideal target data. Then, we propose an SSDA model that cleans up the label noise dynamically with the help of a robust cleaner component designed from the target perspective. Since the paradigm is very different from the core ideas behind existing SSDA approaches, our proposed model can be easily coupled with them to improve their performance. Empirical results on two state-of-the-art SSDA approaches demonstrate that the proposed model effectively cleans up the noise within the source labels and exhibits superior performance over those approaches across benchmark datasets. Our code is available at https://github.com/chu0802/SLA .	翻訳日:2023-03-29 00:32:55 公開日:2023-03-24
# 外乱認識対象検出のための正規化フローベース特徴合成 Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection ( http://arxiv.org/abs/2302.07106v2 ) ライセンス: Link先を確認	Nishant Kumar, Sini\v{s}a \v{S}egvi\'c, Abouzar Eslami, Stefan Gumhold	(参考訳) 自律運転のようなアプリケーションには、信頼性の高いオブジェクト検出器の現実的な展開が不可欠である。しかし、Faster R-CNNのような汎用オブジェクト検出器は、不整形物体の過信予測を提供する傾向にある。最近の異常物体検出手法は, クラス条件ガウシアンによるインスタンスワイド特徴の密度を推定し, 低様領域から合成外乱特徴を訓練する。しかし、この戦略は、合成された外層特徴が他のクラス条件ガウス多様体に従えば低い確率を持つことを保証しない。そこで本研究では,すべてのイリアークラスの合同データ分布を可逆正規化フローで学習することにより,イリアーとイリアーオブジェクトを区別する,新しい外れ値認識型オブジェクト検出フレームワークを提案する。フローモデルの適切なサンプリングは、合成されたアウトリアーが全てのオブジェクトクラスのインリアーよりも低い可能性を持つことを保証するため、インリアーとアウトリアーの間のより良い決定境界をモデル化する。提案手法は,画像データとビデオデータの両方において,外部認識オブジェクト検出の最先端性を大幅に向上させる。 Real-world deployment of reliable object detectors is crucial for applications such as autonomous driving. However, general-purpose object detectors like Faster R-CNN are prone to providing overconfident predictions for outlier objects. Recent outlier-aware object detection approaches estimate the density of instance-wide features with class-conditional Gaussians and train on synthesized outlier features from their low-likelihood regions. However, this strategy does not guarantee that the synthesized outlier features will have a low likelihood according to the other class-conditional Gaussians. We propose a novel outlier-aware object detection framework that distinguishes outliers from inlier objects by learning the joint data distribution of all inlier classes with an invertible normalizing flow. The appropriate sampling of the flow model ensures that the synthesized outliers have a lower likelihood than inliers of all object classes, thereby modeling a better decision boundary between inlier and outlier objects. Our approach significantly outperforms the state-of-the-art for outlier-aware object detection on both image and video datasets.	翻訳日:2023-03-29 00:23:55 公開日:2023-03-24
# MEDBERT.de: 医学領域のための総合的なドイツのBERTモデル MEDBERT.de: A Comprehensive German BERT Model for the Medical Domain ( http://arxiv.org/abs/2303.08179v2 ) ライセンス: Link先を確認	Keno K. Bressem and Jens-Michalis Papaioannou and Paul Grundmann and Florian Borchert and Lisa C. Adams and Leonhard Liu and Felix Busch and Lina Xu and Jan P. Loyen and Stefan M. Niehues and Moritz Augustin and Lennart Grosser and Marcus R. Makowski and Hugo JWL. Aerts and Alexander L\"oser	(参考訳) 本稿では,ドイツ医学領域に特化して設計された,事前訓練型ドイツのBERTモデルであるmedBERTdeについて述べる。このモデルは470万のドイツの医療文書の大規模なコーパスで訓練されており、幅広い規律と医療文書のタイプをカバーする8つの異なる医療ベンチマークにおいて、新しい最先端のパフォーマンスを達成することが示されている。本論文は,モデル全体の性能を評価することに加えて,その機能についてより詳細な分析を行う。本研究では,データ重複がモデルの性能に与える影響と,より効率的なトークン化手法を使用することによる潜在的メリットについて検討する。以上の結果から, medbertde のようなドメイン固有モデルは長文に特に有用であり, トレーニングデータの重複が必ずしも性能向上につながるとは限らない。さらに,効率の良いトークン化はモデルの性能向上に小さな役割しか果たさないことを見出し,改善した性能のほとんどを大量のトレーニングデータに分類した。さらなる研究を促進するために、事前訓練されたモデルウェイトと放射線データに基づく新しいベンチマークが科学コミュニティによって公開されている。 This paper presents medBERTde, a pre-trained German BERT model specifically designed for the German medical domain. The model has been trained on a large corpus of 4.7 Million German medical documents and has been shown to achieve new state-of-the-art performance on eight different medical benchmarks covering a wide range of disciplines and medical document types. In addition to evaluating the overall performance of the model, this paper also conducts a more in-depth analysis of its capabilities. We investigate the impact of data deduplication on the model's performance, as well as the potential benefits of using more efficient tokenization methods. Our results indicate that domain-specific models such as medBERTde are particularly useful for longer texts, and that deduplication of training data does not necessarily lead to improved performance. Furthermore, we found that efficient tokenization plays only a minor role in improving model performance, and attribute most of the improved performance to the large amount of training data. To encourage further research, the pre-trained model weights and new benchmarks based on radiological data are made publicly available for use by the scientific community.	翻訳日:2023-03-28 23:57:37 公開日:2023-03-24
# 聴衆は性役割の表現方法にどのように影響しますか? How does the audience affect the way we express our gender roles? ( http://arxiv.org/abs/2303.12759v2 ) ライセンス: Link先を確認	Melody Sepahpour-Fard and Michael Quayle and Maria Schuld and Taha Yasseri	(参考訳) 人間は、対話するときに聴衆に合うように言語を適応します。オーディエンス効果は理論や小規模な研究で研究されているが、自然発生のオーディエンス効果に関する大規模な研究は乏しい。本研究では,reddit上での対話の分析を通じて,異なる社会的アイデンティティ(母、父、親など)を強調する,ジェンダー化されたコンテキストとの相互作用におけるオーディエンス効果について検討する。 r/daddit、r/mommit、r/parentingの3つの人気子育てサブreddit(r/daddit、r/mommit、r/parenting)からの投稿を収集した。シングルジェンダーとミックスジェンダーの両方のサブレディットで公開しているユーザのサンプルを選択することで、オーディエンスとジェンダーの効果の両方を探索することができる。投稿を解析するために,単語埋め込みを用い,コーパスにトークンとしてユーザを付加した。これにより、ユーザトケンとワードトケンを比較し、その類似度を測定しました。以上の結果から,母親や父親も同様に振舞い,多種多様な話題を混成ジェンダーの文脈で議論し,教育や家族の問題に相互に助言することに焦点を当てた。シングルジェンダーのサブレディットでは、母親と父親は特定のトピックに焦点を当てている。 r/Mommitの母親は、医療、睡眠、トイレのトレーニング、食べ物などのトピックを議論することで、他のグループと差別化している。母と父の両方が子育てのイベントを祝い、シングルジェンダーの聴衆の前で子供たちの身体的外観を記述またはコメントします。本研究は,母親と父親が異なる関心事を表現し,その行動が異なるグループベースのオーディエンスに適応することを示す。また、Redditと単語の埋め込みを使って、自然な設定でオーディエンスとジェンダーのダイナミクスをよりよく理解する可能性を強調している。 Human beings adapt their language to suit their audience when interacting. While audience effects have been studied in theory and small-scale research, there is a lack of large-scale studies on naturally occurring audience effects. In this study, we examine audience effects in interactions with gendered contexts that emphasize different social identities (e.g. mother, father, and parent) by analyzing interactions on Reddit. We collected posts from three popular parenting subreddits (r/Daddit, r/Mommit, and r/Parenting), which cater to self-identified fathers and mothers (ostensibly single-gender) and parents (explicitly mixed-gender) respectively. By selecting a sample of users who have published on both single-gender and mixed-gender subreddits, we are able to explore both audience and gender effects. To analyze the posts, we used word embeddings and added the user as a token in the corpus. This allowed us to compare user-tokens to word-tokens and measure their similarity. Our results show that mothers and fathers behave similarly and discuss a diverse range of topics in a mixed-gender context, focusing more on advising each other on educational and family matters. In single-gender subreddits, mothers and fathers are more focused on specific topics. Mothers in r/Mommit distinguish themselves from other groups by discussing topics such as medical care, sleep and potty training, and food. Both mothers and fathers celebrate parenting events and describe or comment on the physical appearance of their children in front of a single-gender audience. In conclusion, this study demonstrates how mothers and fathers express different concerns and adapt their behaviour to different group-based audiences. It also highlights the potential of using Reddit and word embeddings to better understand the dynamics of audience and gender in a natural setting.	翻訳日:2023-03-28 21:37:06 公開日:2023-03-24
# DiffuScene: 室内シーン生成のための拡散確率モデルに基づくシーングラフ DiffuScene: Scene Graph Denoising Diffusion Probabilistic Model for Generative Indoor Scene Synthesis ( http://arxiv.org/abs/2303.14207v1 ) ライセンス: Link先を確認	Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Justus Thies, Matthias Nie{\ss}ner	(参考訳) 本研究では,全接続されたシーングラフに格納された3Dインスタンス特性を生成し,各グラフノードに対して最も類似したオブジェクト形状,すなわち,位置,大きさ,向き,意味,幾何学的特徴など,異なる属性の結合として特徴付けられるオブジェクトインスタンスを検索する,拡散確率モデルに基づく屋内3Dシーン合成のためのディフフルSceneを提案する。このシーングラフに基づいて、3Dインスタンスの配置とタイプを決定する拡散モデルを構築した。本手法は,シーン補完,シーン配置,テキストコンディショニングシーン合成など,多くの下流アプリケーションを容易に行うことができる。 3d-frontデータセットを用いた実験では,最先端の手法よりも物理的に妥当で多様な室内シーンを合成できることが示されている。大規模なアブレーション研究は、シーン拡散モデルにおける設計選択の有効性を検証する。 We present DiffuScene for indoor 3D scene synthesis based on a novel scene graph denoising diffusion probabilistic model, which generates 3D instance properties stored in a fully-connected scene graph and then retrieves the most similar object geometry for each graph node i.e. object instance which is characterized as a concatenation of different attributes, including location, size, orientation, semantic, and geometry features. Based on this scene graph, we designed a diffusion model to determine the placements and types of 3D instances. Our method can facilitate many downstream applications, including scene completion, scene arrangement, and text-conditioned scene synthesis. Experiments on the 3D-FRONT dataset show that our method can synthesize more physically plausible and diverse indoor scenes than state-of-the-art methods. Extensive ablation studies verify the effectiveness of our design choice in scene diffusion models.	翻訳日:2023-03-28 21:26:40 公開日:2023-03-24
# 深層学習に基づく交通システムにおけるバックドアニュートラル化のための最適平滑分布探索 Optimal Smoothing Distribution Exploration for Backdoor Neutralization in Deep Learning-based Traffic Systems ( http://arxiv.org/abs/2303.14197v1 ) ライセンス: Link先を確認	Yue Wang, Wending Li, Michail Maniatakos, Saif Eddin Jabari	(参考訳) 深層強化学習(Dep Reinforcement Learning, DRL)は、自律走行車(AV)の効率を高めるだけでなく、交通渋滞や衝突を引き起こすバックドア攻撃の影響を受けやすくする。バックドア機能は典型的には、本物の入力に対して高い精度を維持するために、秘密の悪意のあるデータでトレーニングデータセットを汚染し、敵が選択した特定の入力に対して所望の(悪意のある)出力を誘導する。バックドアに対する現在の防御は、主に画像に基づく特徴を用いた画像分類に重点を置いており、入力は連続センサデータ、すなわちAVとその周辺車両の速度と距離の組み合わせであるため、DRLベースのAVコントローラの回帰タスクに容易に移行できない。提案手法はバックドアを中和するために入力によく設計されたノイズを付加する。このアプローチでは、バックドアを中和しながら、真の入力の正常な機能を保ちながら最適な平滑化(ノイズ)分布を学習する。これにより、実際の入力に対して高い精度を維持しつつ、バックドア攻撃に対してより回復力のあるモデルが期待できる。本手法の有効性を微視的トラヒックシミュレータに基づくシミュレーショントラヒックシステムで検証し,スムース化トラヒックコントローラがすべてのトリガサンプルを中和し,トラヒックを緩和する性能を維持することを実験的に示した。 Deep Reinforcement Learning (DRL) enhances the efficiency of Autonomous Vehicles (AV), but also makes them susceptible to backdoor attacks that can result in traffic congestion or collisions. Backdoor functionality is typically incorporated by contaminating training datasets with covert malicious data to maintain high precision on genuine inputs while inducing the desired (malicious) outputs for specific inputs chosen by adversaries. Current defenses against backdoors mainly focus on image classification using image-based features, which cannot be readily transferred to the regression task of DRL-based AV controllers since the inputs are continuous sensor data, i.e., the combinations of velocity and distance of AV and its surrounding vehicles. Our proposed method adds well-designed noise to the input to neutralize backdoors. The approach involves learning an optimal smoothing (noise) distribution to preserve the normal functionality of genuine inputs while neutralizing backdoors. By doing so, the resulting model is expected to be more resilient against backdoor attacks while maintaining high accuracy on genuine inputs. The effectiveness of the proposed method is verified on a simulated traffic system based on a microscopic traffic simulator, where experimental results showcase that the smoothed traffic controller can neutralize all trigger samples and maintain the performance of relieving traffic congestion	翻訳日:2023-03-28 21:26:26 公開日:2023-03-24
# DeepEpiSolver:コビッド、HIV、エボラ、病原体の逆問題 DeepEpiSolver: Unravelling Inverse problems in Covid, HIV, Ebola and Disease Transmission ( http://arxiv.org/abs/2303.14194v1 ) ライセンス: Link先を確認	Ritam Majumdar, Shirish Karande, Lovekesh Vig	(参考訳) 多くの感染症の拡散は、結合微分方程式であるSIR区画モデルの変種を用いてモデル化されている。 SIRモデルの係数は、疾患の拡散軌跡を決定する。したがって、係数推定は高速かつ正確でなければならない。 Shaierらは論文"Disease Informed Neural Networks"の中で、SIRモデルのパラメータを推定するために物理インフォームドニューラルネットワーク(PINN)を使用した。このアプローチには2つの欠点がある。まず、ピンの訓練時間は高く、訓練に90時間近くかかる病気もある。第二に、PINNは新たなSIDR軌道を一般化せず、対応するSIRパラメータを学習するには、PINNをゼロから再トレーニングする必要がある。この作業では、これらの両方の欠点を取り除こうとしています。 LSODAアルゴリズムを用いて,パラメータの大規模分布に対する前方問題の解法により,ODEのパラメータと拡散軌跡とのデータセットを生成する。次に、ニューラルネットワークを使用して、拡散された軌道とsidrの係数のマッピングをオフラインで学習する。これにより、再トレーニングすることなく新しい拡散軌道のパラメータを学習し、テスト時の一般化を可能にします。 11の高感染症に対するPINNと同等の精度で3～4桁のスピードアップを観察した。 PINNを用いたニューラルネットワーク推定ODE係数のさらなる微調整により、推定係数の2～3次改善がもたらされる。 The spread of many infectious diseases is modeled using variants of the SIR compartmental model, which is a coupled differential equation. The coefficients of the SIR model determine the spread trajectories of disease, on whose basis proactive measures can be taken. Hence, the coefficient estimates must be both fast and accurate. Shaier et al. in the paper "Disease Informed Neural Networks" used Physics Informed Neural Networks (PINNs) to estimate the parameters of the SIR model. There are two drawbacks to this approach. First, the training time for PINNs is high, with certain diseases taking close to 90 hrs to train. Second, PINNs don't generalize for a new SIDR trajectory, and learning its corresponding SIR parameters requires retraining the PINN from scratch. In this work, we aim to eliminate both of these drawbacks. We generate a dataset between the parameters of ODE and the spread trajectories by solving the forward problem for a large distribution of parameters using the LSODA algorithm. We then use a neural network to learn the mapping between spread trajectories and coefficients of SIDR in an offline manner. This allows us to learn the parameters of a new spread trajectory without having to retrain, enabling generalization at test time. We observe a speed-up of 3-4 orders of magnitude with accuracy comparable to that of PINNs for 11 highly infectious diseases. Further finetuning of neural network inferred ODE coefficients using PINN further leads to 2-3 orders improvement of estimated coefficients.	翻訳日:2023-03-28 21:26:02 公開日:2023-03-24
# 臨床基礎モデルの揺るぎない基礎:EMRのための大規模言語モデルと基礎モデルに関する調査 The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs ( http://arxiv.org/abs/2303.12961v2 ) ライセンス: Link先を確認	Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A. Pfeffer, Jason Fries, Nigam H. Shah	(参考訳) chatgptやalphafoldのような基礎モデルの成功は、患者ケアや病院の運営を改善するために、電子医療記録(emr)の類似モデルを構築することに大きな関心を寄せている。しかし、最近の誇大広告は、これらのモデルの能力に対する理解において重大なギャップを曖昧にした。我々は,非イメージングEMMデータ(臨床テキストおよび/または構造化データ)に基づいて訓練された80以上の基礎モデルをレビューし,そのアーキテクチャ,トレーニングデータ,潜在的なユースケースを記述した分類学を作成する。殆どのモデルは、小さな、狭くスコープされた臨床データセット(MIMIC-IIIなど)や、広く公共のバイオメディカルコーパス(PubMedなど)で訓練されており、健康システムに対する有用性について有意義な洞察を与えていないタスクで評価されている。これらの知見を踏まえて,医療において重要な指標により深く根ざした臨床基礎モデルの利点を評価するための,改善された評価枠組みを提案する。 The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.	翻訳日:2023-03-28 21:25:19 公開日:2023-03-24
# MaxSATによる量子色符号の復号 Decoding quantum color codes with MaxSAT ( http://arxiv.org/abs/2303.14237v1 ) ライセンス: Link先を確認	Lucas Berent, Lukas Burgholzer, Peter-Jan H.S. Derks, Jens Eisert, Robert Wille	(参考訳) 古典コンピューティングでは、誤り訂正符号は十分に確立されており、理論と実用の両方においてユビキタスである。量子コンピューティングにとって、エラー補正は必須であるが、実現が困難であり、リソースのオーバーヘッドがかなり高く、また、現実の古典的コンピューティングの必要性と相容れない。量子誤り訂正符号は、推定された短期的応用を超えて、フォールトトレラントな量子計算への道に中心的な役割を果たす。その中でも、色コードは特に重要な量子符号のクラスであり、他の符号よりも有利な性質のために近年関心を集めている。古典計算と同様に、復号化は、破損した状態から破損しない状態を復元する操作を推論する問題であり、フォールトトレラント量子デバイスの開発の中心である。本稿では,カラーコードの復号化問題を,よく知られたLightsOutパズルのわずかなバリエーションに還元する方法について述べる。量子カラーコードのための新しいデコーダを提案し,その類似性に基づいて定式化をマックスSAT問題として用いた。さらに、MaxSATの構成を最適化し、提案した復号器の復号性能がカラーコード上で最先端の復号性能を実現することを示す。デコーダの実装と、数値実験を自動的に実行するツールは、GitHubのMunge Quantum Toolkit(MQT)の一部として公開されている。 In classical computing, error-correcting codes are well established and are ubiquitous both in theory and practical applications. For quantum computing, error-correction is essential as well, but harder to realize, coming along with substantial resource overheads and being concomitant with needs for substantial classical computing. Quantum error-correcting codes play a central role on the avenue towards fault-tolerant quantum computation beyond presumed near-term applications. Among those, color codes constitute a particularly important class of quantum codes that have gained interest in recent years due to favourable properties over other codes. As in classical computing, decoding is the problem of inferring an operation to restore an uncorrupted state from a corrupted one and is central in the development of fault-tolerant quantum devices. In this work, we show how the decoding problem for color codes can be reduced to a slight variation of the well-known LightsOut puzzle. We propose a novel decoder for quantum color codes using a formulation as a MaxSAT problem based on this analogy. Furthermore, we optimize the MaxSAT construction and show numerically that the decoding performance of the proposed decoder achieves state-of-the-art decoding performance on color codes. The implementation of the decoder as well as tools to automatically conduct numerical experiments are publicly available as part of the Munich Quantum Toolkit (MQT) on GitHub.	翻訳日:2023-03-28 21:17:48 公開日:2023-03-24
# SIGMORPHON 2023 インターライングロースの共有タスク:ベースラインモデル SIGMORPHON 2023 Shared Task of Interlinear Glossing: Baseline Model ( http://arxiv.org/abs/2303.14234v1 ) ライセンス: Link先を確認	Michael Ginn	(参考訳) 言語文書は言語保存の重要な側面であり、しばしばInterlinear Glossed Text (IGT) の作成を含む。 IGTの作成は時間と手間がかかり、プロセスの自動化は貴重なアノテータの労力を節約します。本稿では,sigmorphon 2023における線間光沢作業のベースラインシステムについて述べる。本システムでは,トランスアーキテクチャを用いて,グロス生成をシーケンスラベリングタスクとして扱う。 Language documentation is a critical aspect of language preservation, often including the creation of Interlinear Glossed Text (IGT). Creating IGT is time-consuming and tedious, and automating the process can save valuable annotator effort. This paper describes the baseline system for the SIGMORPHON 2023 Shared Task of Interlinear Glossing. In our system, we utilize a transformer architecture and treat gloss generation as a sequence labelling task.	翻訳日:2023-03-28 21:17:29 公開日:2023-03-24
# ダガー線形論理とカテゴリー量子力学 Dagger linear logic and categorical quantum mechanics ( http://arxiv.org/abs/2303.14231v1 ) ライセンス: Link先を確認	Priyaa Varshinee Srinivasan	(参考訳) この論文は非コンパクトな乗法ダガー線形論理の圏論的証明理論を発展させ、圏論的量子力学(cqm)への応用を研究する。既存の CQM のフレームワークはコンパクトダガー線型論理の分類的証明理論であり、有限次元ヒルベルト空間の圏における量子系の解釈によって動機付けられる。この論文では、無限次元のシステムに対応可能なMixed Unitary Categoriesと呼ばれる新しい非コンパクトフレームワークを説明し、フレームワークのモデルを開発する。この目的のために、線形分布圏と(非コンパクトな)乗法線形論理の分類的証明理論である$$-自己同型圏の上に構築される。非コンパクトなダガー線形論理の証明理論は、ダガー-LDCを与える適切なコヒーレンスを満たすダガー関手を追加することにより、LCCの基本設定から得られる。すべての (isomix) dagger-ldc から、同値まで標準の "unitary core" を抽出することができ、これはdagger-monoidal categoryの伝統的な cqm フレームワークである。これは混合ユニタリカテゴリ(MUC)のフレームワークにつながります: すべてのMUCは(コンパクトでない)ユニタリコアを含み、(非コンパクトな)アイソミックスダガー-LDCで拡張されます。本論文では, 有限性空間, チュ空間, ホップモジュールなどに基づく MUC の様々なモデルを開発した。この論文はまた、可観測性、測定、相補性といったCQMの重要な代数構造をMUCフレームワークに一般化する。さらに、MUCフレームワークを用いて、この論文は量子力学の相補的な可観測性と線形論理の指数的モジュラリティとの間の接続を確立する。 This thesis develops the categorical proof theory for the non-compact multiplicative dagger linear logic, and investigates its applications to Categorical Quantum Mechanics (CQM). The existing frameworks of CQM are categorical proof theories of compact dagger linear logic, and are motivated by the interpretation of quantum systems in the category of finite dimensional Hilbert spaces. This thesis describes a new non-compact framework called Mixed Unitary Categories which can accommodate infinite dimensional systems, and develops models for the framework. To this end, it builds on linearly distributive categories, and $$-autonomous categories which are categorical proof theories of (non-compact) multiplicative linear logic. The proof theory of non-compact dagger-linear logic is obtained from the basic setting of an LDC by adding a dagger functor satisfying appropriate coherences to give a dagger-LDC. From every (isomix) dagger-LDC one can extract a canonical "unitary core" which up to equivalence is the traditional CQM framework of dagger-monoidal categories. This leads to the framework of Mixed Unitary Categories (MUCs): every MUC contains a (compact) unitary core which is extended by a (non-compact) isomix dagger-LDC. Various models of MUCs based on Finiteness Spaces, Chu spaces, Hopf modules, etc., are developed in this thesis. This thesis also generalizes the key algebraic structures of CQM, such as observables, measurement, and complementarity, to MUC framework. Furthermore, using the MUC framework, this thesis establishes a connection between the complementary observables of quantum mechanics and the exponential modalities of linear logic.	翻訳日:2023-03-28 21:17:22 公開日:2023-03-24
# 効率的なマルチエージェント強化学習のための因果検出 Causality Detection for Efficient Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2303.14227v1 ) ライセンス: Link先を確認	Rafael Pina, Varuna De Silva and Corentin Artaud	(参考訳) タスクをチームとして学ぶとき、MARL(Multi-Agent Reinforcement Learning)のエージェントの中には、チームのパフォーマンスに対する真の影響を理解することができないものもある。このようなエージェントは、望ましくない怠慢な振る舞いを示す、準最適ポリシーを学ぶ。そこで本研究では,marl問題に適用した時間的因果関係の利用を定式化することから始まる。次に,このような遅延エージェントのペナルティと行動改善に因果性がどう役立つかを示す。彼らのローカルな観察がチーム報酬と因果関係にあるかを理解することによって、チームの各エージェントは、報酬を発生させたかどうかに基づいて個々のクレジットを調整することができる。 MARLにおける因果推定は,チームの全体的パフォーマンスだけでなく,各エージェントの個々の能力も向上することを示す。我々は、改善が複数の異なる環境で一貫したものであることを観察する。 When learning a task as a team, some agents in Multi-Agent Reinforcement Learning (MARL) may fail to understand their true impact in the performance of the team. Such agents end up learning sub-optimal policies, demonstrating undesired lazy behaviours. To investigate this problem, we start by formalising the use of temporal causality applied to MARL problems. We then show how causality can be used to penalise such lazy agents and improve their behaviours. By understanding how their local observations are causally related to the team reward, each agent in the team can adjust their individual credit based on whether they helped to cause the reward or not. We show empirically that using causality estimations in MARL improves not only the holistic performance of the team, but also the individual capabilities of each agent. We observe that the improvements are consistent in a set of different environments.	翻訳日:2023-03-28 21:16:52 公開日:2023-03-24
# 合成結合:組合せ介入のための因果推論フレームワーク Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions ( http://arxiv.org/abs/2303.14226v1 ) ライセンス: Link先を確認	Abhineet Agarwal, Anish Agarwal, Suhas Vijaykumar	(参考訳) 我々は、$N$不均一単位と$p$介入の設定を考える。我々の目標は、これらの$p$介入の任意の組み合わせ、すなわち$N \times 2^p$因果パラメータについて、単位固有の潜在的な結果を学ぶことである。介入の組み合わせの選択は、要因設計実験、レコメンデーションエンジン(例えば、ユーザのエンゲージメントを最大化する一連の映画を表示する)、医学における組み合わせ療法、MLモデルの重要な特徴の選択など、多くのアプリケーションで自然に発生する問題である。様々なパラメータを推定するために$N \times 2^p$実験を実行すると、$N$と$p$が成長するので不可能である。また、観測データにより、単位が組み合わせで見られるか否かが、その組み合わせ下での潜在的結果と相関している可能性が高い。これらの課題に対処するため,我々は単位と組合せの両方に潜伏構造を課す新しいモデルを提案する。我々は、単位間の潜在類似性(すなわち、潜在的な結果行列はランクr$)と組み合わせがどのように相互作用するか(つまり、潜在的な結果のフーリエ展開の係数は$s$ sparse)を仮定する。我々は、観察不能な一致にもかかわらず、すべての因果パラメータの識別を確立する。本稿では,観察パターン上の精密な条件下で推定手順,合成組合せ,有限サンプル一貫性を確立することを提案する。この結果から、Synthetic Combinations は$\text{poly}(r) \times (N + s^2p)$ の単位固有ポテンシャルを一貫して推定する。比較して、両単位と組合せをまたいで構造を活用しない以前の手法では、サンプル複雑性のスケーリングは$\min(n \times s^2p, \ \r \times (n + 2^p))$である。合成組合せを用いて組合せ因果推論のためのデータ効率の高い実験設計機構を提案する。我々は数値シミュレーションで理論的な結果を裏付ける。 We consider a setting with $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing combinations of interventions is a problem that naturally arises in many applications such as factorial design experiments, recommendation engines (e.g., showing a set of movies that maximizes engagement for users), combination therapies in medicine, selecting important features for ML models, etc. Running $N \times 2^p$ experiments to estimate the various parameters is infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. To address these challenges, we propose a novel model that imposes latent structure across both units and combinations. We assume latent similarity across units (i.e., the potential outcomes matrix is rank $r$) and regularity in how combinations interact (i.e., the coefficients in the Fourier expansion of the potential outcomes is $s$ sparse). We establish identification for all causal parameters despite unobserved confounding. We propose an estimation procedure, Synthetic Combinations, and establish finite-sample consistency under precise conditions on the observation pattern. Our results imply Synthetic Combinations consistently estimates unit-specific potential outcomes given $\text{poly}(r) \times (N + s^2p)$ observations. In comparison, previous methods that do not exploit structure across both units and combinations have sample complexity scaling as $\min(N \times s^2p, \ \ r \times (N + 2^p))$. We use Synthetic Combinations to propose a data-efficient experimental design mechanism for combinatorial causal inference. We corroborate our theoretical findings with numerical simulations.	翻訳日:2023-03-28 21:16:37 公開日:2023-03-24
# 新規炭素捕集溶媒の機械ガイドによる発見 Machine Guided Discovery of Novel Carbon Capture Solvents ( http://arxiv.org/abs/2303.14223v1 ) ライセンス: Link先を確認	James L. McDonagh, Benjamin H. Wunsch, Stamatia Zavitsanou, Alexander Harrison, Bruce Elmegreen, Stacey Gifford, Theodore van Kessel, and Flaviu Cipcigan	(参考訳) 二酸化炭素排出量の削減に炭素捕獲技術の重要性が高まり、スケーラビリティと効率性を実現するために捕獲材料を改善する必要性が高まり、かなりのコストと時間を要する材料開発が課題となっている。機械学習は、構造とプロパティの関係の効率的な相関によって、材料開発における時間と資源の負担を軽減し、ダウンセレクションを可能にし、有望な候補に焦点を当てる、有望な方法を提供する。これを実証するために, 市販の酸性ガススクラップ式炭酸ガスに適合する新しい水性アミンを抽出する「発見サイクル」を開発した。簡便で迅速なCO2吸収測定法と機械学習に基づく分子フィンガープリントモデルによるアプローチを組み合わせる。予測プロセスでは,材料パラメータの両実験に対する60%の精度と,外部テストセット上の単一パラメータに対する80%の精度を示す。発見サイクルは、実験的に検証され、以前に炭素捕獲に応用されなかったいくつかの有望なアミンを決定づけた。この過程で、炭素捕獲アミンのための大規模な単一ソースデータセットをコンパイルし、アミン分子候補を特定するためのオープンソースの機械学習ツールを作成した(https://github.com/IBM/Carbon-capture-fingerprint-generation)。 The increasing importance of carbon capture technologies for deployment in remediating CO2 emissions, and thus the necessity to improve capture materials to allow scalability and efficiency, faces the challenge of materials development, which can require substantial costs and time. Machine learning offers a promising method for reducing the time and resource burdens of materials development through efficient correlation of structure-property relationships to allow down-selection and focusing on promising candidates. Towards demonstrating this, we have developed an end-to-end "discovery cycle" to select new aqueous amines compatible with the commercially viable acid gas scrubbing carbon capture. We combine a simple, rapid laboratory assay for CO2 absorption with a machine learning based molecular fingerprinting model approach. The prediction process shows 60% accuracy against experiment for both material parameters and 80% for a single parameter on an external test set. The discovery cycle determined several promising amines that were verified experimentally, and which had not been applied to carbon capture previously. In the process we have compiled a large, single-source data set for carbon capture amines and produced an open source machine learning tool for the identification of amine molecule candidates (https://github.com/IBM/Carbon-capture-fingerprint-generation).	翻訳日:2023-03-28 21:16:01 公開日:2023-03-24
# 自然言語処理を用いたレイテキスト要約 : 物語的文献レビュー Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review ( http://arxiv.org/abs/2303.14222v1 ) ライセンス: Link先を確認	Oliver Vinzelberg, Mark David Jenkins, Gordon Morison, David McMinn and Zoe Tieges	(参考訳) 研究成果の要約は,研究成果の公開理解を促進するために重要である。自然言語処理による素な要約の生成は、研究者の作業負荷を軽減し、科学と社会の間のギャップを埋める可能性を秘めている。この物語文学レビューの目的は、平文要約を生成するのに使われる異なるテキスト要約アプローチを記述・比較することである。 Web of Science、Google Scholar、IEEE Xplore、Association for Computing Machinery Digital Library、arXivといったデータベースを2022年5月6日まで検索しました。要約文生成のためのテキスト自動要約手法に関する独自の研究も含んでいる。 82の記事をスクリーニングし、2020年から2021年にかけて発行された8つの関連論文を含む。その結果,変換器からの双方向エンコーダ表現 (BERT) や抽象要約のための抽出ギャップ文による事前学習 (PEGASUS) といった変換器を用いた手法が,テキスト要約のランドスケープを支配していることがわかった。ハイブリッドアプローチにおける抽出的および抽象的要約法の組み合わせが最も有効であることが判明した。さらに、入力テキストへの前処理アプローチ(例:抽出要約の適用)や、テキストのどのセクションを含めるかの決定が重要になる。可読性を考慮していない,リコール指向のgisting evaluation (rouge) のための評価指標が用いられた。結論として、自動的なテキスト要約は未探索である。今後の研究では,臨床試験報告を含む長期文書の要約と,要約の可読性を考慮した評価指標の開発を検討すべきである。 Summarisation of research results in plain language is crucial for promoting public understanding of research findings. The use of Natural Language Processing to generate lay summaries has the potential to relieve researchers' workload and bridge the gap between science and society. The aim of this narrative literature review is to describe and compare the different text summarisation approaches used to generate lay summaries. We searched the databases Web of Science, Google Scholar, IEEE Xplore, Association for Computing Machinery Digital Library and arXiv for articles published until 6 May 2022. We included original studies on automatic text summarisation methods to generate lay summaries. We screened 82 articles and included eight relevant papers published between 2020 and 2021, all using the same dataset. The results show that transformer-based methods such as Bidirectional Encoder Representations from Transformers (BERT) and Pre-training with Extracted Gap-sentences for Abstractive Summarization (PEGASUS) dominate the landscape of lay text summarisation, with all but one study using these methods. A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective. Furthermore, pre-processing approaches to input text (e.g. applying extractive summarisation) or determining which sections of a text to include, appear critical. Evaluation metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) were used, which do not consider readability. To conclude, automatic lay text summarisation is under-explored. Future research should consider long document lay text summarisation, including clinical trial reports, and the development of evaluation metrics that consider readability of the lay summary.	翻訳日:2023-03-28 21:15:39 公開日:2023-03-24
# 情報表現の戦い:市場動向予測のための感性と意味的特徴の比較 The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market Trends ( http://arxiv.org/abs/2303.14221v1 ) ライセンス: Link先を確認	Andrei Zaichenko, Aleksei Kazakov, Elizaveta Kovtun, and Semen Budennyy	(参考訳) 機械学習アプローチの魅力を伴う株式市場の研究は、隠された市場規則を明らかにする主要な方向である。この知識は金融市場のダイナミクスを深く理解し、従来の分析手法では見つからなかった行動的洞察を得ることに寄与する。株価は本質的に世界の出来事や社会的認識と関係している。したがって、株価予測モデルを構築する際には、ニュースやソーシャルメディアの投稿に反映された外界にそのような情報を組み込むことが重要となる。これに対応するために研究者は、(1)テキストから抽出された感情、または(2)原文埋め込みという暗黙的あるいは明示的な知識表現を利用する。しかし、金融モデルの予測力への影響という点で、これらのアプローチの直接比較にはあまり研究の注意が払われていない。本稿では,このギャップを埋め,文脈埋め込み形式の意味的特徴が市場動向を予測するための感情属性よりも価値があるかを明らかにすることを目的とする。当社は、NASDAQの資本化による大手企業に関連するTwitter投稿のコーパスとその価格設定について検討する。まず、企業株価のボラティリティとツイート感情の関連性を示す。既存の関係を確信して、ツイートの感情やツイートの埋め込みを補完する価格予測のために、時間的融合トランスフォーマーモデルをトレーニングします。以上の結果から,感情的特徴の活用により,有意な頻度で測定値が上昇することが示唆された。注目すべきは、この結論は、大手テック企業のTwitter投稿や株価に関する考慮されたシナリオの中で正当化できることだ。 The study of the stock market with the attraction of machine learning approaches is a major direction for revealing hidden market regularities. This knowledge contributes to a profound understanding of financial market dynamics and getting behavioural insights, which could hardly be discovered with traditional analytical methods. Stock prices are inherently interrelated with world events and social perception. Thus, in constructing the model for stock price prediction, the critical stage is to incorporate such information on the outside world, reflected through news and social media posts. To accommodate this, researchers leverage the implicit or explicit knowledge representations: (1) sentiments extracted from the texts or (2) raw text embeddings. However, there is too little research attention to the direct comparison of these approaches in terms of the influence on the predictive power of financial models. In this paper, we aim to close this gap and figure out whether the semantic features in the form of contextual embeddings are more valuable than sentiment attributes for forecasting market trends. We consider the corpus of Twitter posts related to the largest companies by capitalization from NASDAQ and their close prices. To start, we demonstrate the connection of tweet sentiments with the volatility of companies' stock prices. Convinced of the existing relationship, we train Temporal Fusion Transformer models for price prediction supplemented with either tweet sentiments or tweet embeddings. Our results show that in the substantially prevailing number of cases, the use of sentiment features leads to higher metrics. Noteworthy, the conclusions are justifiable within the considered scenario involving Twitter posts and stocks of the biggest tech companies.	翻訳日:2023-03-28 21:15:10 公開日:2023-03-24
# 正規化流を用いた縦波データの変分推定 Variational Inference for Longitudinal Data Using Normalizing Flows ( http://arxiv.org/abs/2303.14220v1 ) ライセンス: Link先を確認	Cl\'ement Chadebec and St\'ephanie Allassonni\`ere	(参考訳) 本稿では,高次元縦データを扱うことができ,変分推論に依存する新しい潜在変数生成モデルを提案する。入力シーケンスの観測間の時間依存性は、関連する潜伏変数上の正規化フローを用いてモデル化される。提案手法は,複数のデータに条件付けされた完全合成縦列あるいは軌跡を連続的に生成し,欠落したデータに対するロバスト性を示す。我々は、異なる複雑性を持つ6つのデータセットでモデルをテストし、いくつかの競合製品やより信頼性の高いデータインプテーションよりも高い確率推定を達成できることを示します。コードは \url{https://github.com/clementchadebec/variational_inference_for_longitudinal_data} で利用可能である。 This paper introduces a new latent variable generative model able to handle high dimensional longitudinal data and relying on variational inference. The time dependency between the observations of an input sequence is modelled using normalizing flows over the associated latent variables. The proposed method can be used to generate either fully synthetic longitudinal sequences or trajectories that are conditioned on several data in a sequence and demonstrates good robustness properties to missing data. We test the model on 6 datasets of different complexity and show that it can achieve better likelihood estimates than some competitors as well as more reliable missing data imputation. A code is made available at \url{https://github.com/clementchadebec/variational_inference_for_longitudinal_data}.	翻訳日:2023-03-28 21:14:45 公開日:2023-03-24
# 物理認識型単一画像デハジングのための曲線コントラスト正則化 Curricular Contrastive Regularization for Physics-aware Single Image Dehazing ( http://arxiv.org/abs/2303.14218v1 ) ライセンス: Link先を確認	Yu Zheng, Jiahui Zhan, Shengfeng He, Junyu Dong, and Yong Du	(参考訳) 不適切な性質を考えると、単一の画像デハジングのためにコントラスト正則化が開発され、負の画像からの情報を下界として導入している。しかし、対照的なサンプルは、通常、負はクリアな(すなわち正の)像から遠ざかって表現され、解空間は下限のままである。さらに、深層デハジングモデルの解釈性は、ハジング過程の物理に対して過小評価されている。本稿では, コンセンサスでないコントラスト空間を対象として, 非コンセンサスなコントラスト正規化を提案する。より低いバウンダリの制約を提供する私たちの負は 1) ぼやけた画像, そして 2) 他の方法による対応する復旧さらに、鮮明な画像の埋め込みと負の類似性が異なるため、複数のコンポーネントの学習困難は本質的に不均衡である。この問題に取り組むために,異なる否定の重要性を強調するためにカリキュラム学習戦略をカスタマイズする。さらに, 特徴空間の解釈性を向上させるため, 大気圧散乱モデルに基づく物理対応二分岐ユニットを構築した。このユニットとカーキュラーコントラスト正則化により、我々はc2pnetと呼ばれるデハザーズネットワークを確立する。我々のC2PNetは、SOTS-indoorデータセットとSOTS-outdoorデータセットにおいて、それぞれ3.94dBと1.50dBの極端なPSNRアップで最先端の手法を大幅に上回ることを示した。 Considering the ill-posed nature, contrastive regularization has been developed for single image dehazing, introducing the information from negative images as a lower bound. However, the contrastive samples are nonconsensual, as the negatives are usually represented distantly from the clear (i.e., positive) image, leaving the solution space still under-constricted. Moreover, the interpretability of deep dehazing models is underexplored towards the physics of the hazing process. In this paper, we propose a novel curricular contrastive regularization targeted at a consensual contrastive space as opposed to a non-consensual one. Our negatives, which provide better lower-bound constraints, can be assembled from 1) the hazy image, and 2) corresponding restorations by other existing methods. Further, due to the different similarities between the embeddings of the clear image and negatives, the learning difficulty of the multiple components is intrinsically imbalanced. To tackle this issue, we customize a curriculum learning strategy to reweight the importance of different negatives. In addition, to improve the interpretability in the feature space, we build a physics-aware dual-branch unit according to the atmospheric scattering model. With the unit, as well as curricular contrastive regularization, we establish our dehazing network, named C2PNet. Extensive experiments demonstrate that our C2PNet significantly outperforms state-of-the-art methods, with extreme PSNR boosts of 3.94dB and 1.50dB, respectively, on SOTS-indoor and SOTS-outdoor datasets.	翻訳日:2023-03-28 21:14:33 公開日:2023-03-24
# SU(2)プラケット鎖上の非アベリアゲージ理論 : 固有状態熱化仮説 SU(2) Non-Abelian Gauge Theory on a Plaquette Chain Obeys Eigenstate Thermalization Hypothesis ( http://arxiv.org/abs/2303.14264v1 ) ライセンス: Link先を確認	Xiaojun Yao	(参考訳) 2+1次元su(2)格子ゲージ理論の固有状態熱化仮説(eth)をテストする。 j=1/2$ におけるプラーペットの連鎖の理論とリンク変数の切断基底状態を考えることにより、イジングチェーンにそれを写像し、適度に大きな格子の大きさのハミルトニアンを正確に対角化することができる。運動量セクターのエネルギー準位は、残りの離散対称性を持たない。我々はWilsonループからなる局所観測器を2つ研究し、ETHと整合性を示すエネルギー固有基底の行列要素を計算した。我々の研究は、量子クロモダイナミックス(QCD)の物理ヒルベルト空間における状態のサブセットがETHに従うことを示唆している。 We test the eigenstate thermalization hypothesis (ETH) for 2+1 dimensional SU(2) lattice gauge theory. By considering the theory on a chain of plaquettes and truncating basis states for link variables at $j=1/2$, we can map it onto an Ising chain and numerically exactly diagonalize the Hamiltonian for reasonably large lattice sizes. We find energy level repulsion in momentum sectors with no remaining discrete symmetry. We study two local observables made up of Wilson loops and calculate their matrix elements in the energy eigenbasis, which are shown consistent with the ETH. Our study implies a subset of states in the physical Hilbert space of Quantum Chromodynamics (QCD) obeys the ETH.	翻訳日:2023-03-28 21:07:18 公開日:2023-03-24
# PACE:DenseとCluttered環境におけるデータ駆動型仮想エージェントインタラクション PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments ( http://arxiv.org/abs/2303.14255v1 ) ライセンス: Link先を確認	James Mullen, Dinesh Manocha	(参考訳) PACEは,高密度で散らばった3Dシーン全体と対話し,移動するために,モーションキャプチャーされた仮想エージェントを修正する新しい手法である。提案手法は,環境中の障害物や物体に適応するために,仮想エージェントの所定の動作順序を変化させる。まず、シーンとの相互作用をモデル化する上で最も重要な動きシーケンスの個々のフレームを、関連するシーンの幾何学、障害物、セマンティクスと組み合わせて、エージェントの動作がシーンの余裕(例えば、床に立ったり、椅子に座ったり)と一致するようにします。次に、シーンの特異な幾何学的制約を考慮し、各フレームの高DOFポーズを直接変更することで、人間の動きを最適化する。我々の定式化は、現実的な流れと自然な動きを維持する新しい損失関数を用いる。提案手法を先行動作生成技術と比較し,本手法の利点を知覚研究および身体的妥当性指標と比較した。人間のラテンダーは、我々の手法を以前のアプローチよりも好んだ。具体的には,既存の動作を用いた最先端の手法に対して57.1%,最先端の動作合成法に対して81.0%を好んだ。さらに,本手法は,確立された物理的可能性と相互作用の指標において有意に高い性能を示す。具体的には,非衝突距離では1.2%以上,接触距離では18%以上,競合手法では1.2%以上である。インタラクティブなシステムをMicrosoft HoloLensに統合し、現実世界の屋内シーンでそのメリットを実証しました。プロジェクトのwebサイトはhttps://gamma.umd.edu/pace/で閲覧できます。 We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment. We first take the individual frames of the motion sequence most important for modeling interactions with the scene and pair them with the relevant scene geometry, obstacles, and semantics such that interactions in the agents motion match the affordances of the scene (e.g., standing on a floor or sitting in a chair). We then optimize the motion of the human by directly altering the high-DOF pose at each frame in the motion to better account for the unique geometric constraints of the scene. Our formulation uses novel loss functions that maintain a realistic flow and natural-looking motion. We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics. Human raters preferred our method over the prior approaches. Specifically, they preferred our method 57.1% of the time versus the state-of-the-art method using existing motions, and 81.0% of the time versus a state-of-the-art motion synthesis method. Additionally, our method performs significantly higher on established physical plausibility and interaction metrics. Specifically, we outperform competing methods by over 1.2% in terms of the non-collision metric and by over 18% in terms of the contact metric. We have integrated our interactive system with Microsoft HoloLens and demonstrate its benefits in real-world indoor scenes. Our project website is available at https://gamma.umd.edu/pace/.	翻訳日:2023-03-28 21:07:05 公開日:2023-03-24
# 時系列予測の多様性とコヒーレント化に向けて Towards Diverse and Coherent Augmentation for Time-Series Forecasting ( http://arxiv.org/abs/2303.14254v1 ) ライセンス: Link先を確認	Xiyuan Zhang, Ranak Roy Chowdhury, Jingbo Shang, Rajesh Gupta, Dezhi Hong	(参考訳) 時系列データ拡張は、ディープラーニングモデルのトレーニングデータ不足の問題を軽減する。しかし,既存の拡張法は主に分類用に設計されており,拡張によって時間的ダイナミクスが変化してもクラスラベルを保存できる。予測のために設計された拡張には多様性と、当初の時間的ダイナミクスとの整合性が必要であることに留意する。実生活の物理プロセスによって生成された時系列データは時間領域と周波数領域の両方で特性を示すため、より多様でコヒーレントなサンプルを生成するためにスペクトルと時間拡張(staug)を組み合わせることを提案する。具体的には、周波数領域において、経験的モード分解を用いて時系列を分解し、サブコンポーネントをランダムな重みで再構成する。このようにして、同一の基底成分を含むため、元の時間的関係と一貫性を持ちながら多様なサンプルを生成する。時間領域では、多種多様かつ線形にコヒーレントなサンプルを生成する混合戦略を適用する。 5つの実世界の時系列データセットの実験は、staugがデータ拡張や最先端の強化手法なしでベースモデルを上回ることを示している。 Time-series data augmentation mitigates the issue of insufficient training data for deep learning models. Yet, existing augmentation methods are mainly designed for classification, where class labels can be preserved even if augmentation alters the temporal dynamics. We note that augmentation designed for forecasting requires diversity as well as coherence with the original temporal dynamics. As time-series data generated by real-life physical processes exhibit characteristics in both the time and frequency domains, we propose to combine Spectral and Time Augmentation (STAug) for generating more diverse and coherent samples. Specifically, in the frequency domain, we use the Empirical Mode Decomposition to decompose a time series and reassemble the subcomponents with random weights. This way, we generate diverse samples while being coherent with the original temporal relationships as they contain the same set of base components. In the time domain, we adapt a mix-up strategy that generates diverse as well as linearly in-between coherent samples. Experiments on five real-world time-series datasets demonstrate that STAug outperforms the base models without data augmentation as well as state-of-the-art augmentation methods.	翻訳日:2023-03-28 21:06:39 公開日:2023-03-24
# A-MuSIC:環境変化における視覚的位置認識のための適応型アンサンブルシステム A-MuSIC: An Adaptive Ensemble System For Visual Place Recognition In Changing Environments ( http://arxiv.org/abs/2303.14247v1 ) ライセンス: Link先を確認	Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier and Shoaib Ehsan	(参考訳) 視覚的位置認識(VPR)は、画像データのみを使用して場所を識別するロボットナビゲーションおよび位置決めシステムにおいて不可欠なコンポーネントである。 VPRは、日中、季節の天気と異なる視点で観察された場合、異なる照明の下で、ある場所の出現が著しく変化するため、困難である。現在、すべての環境条件において単一のVPR技術が優れておらず、それぞれに固有の利点と欠点がある。その結果、複数の手法を組み合わせたVPRシステムは、高い計算負荷を犠牲にして、変化する環境においてより信頼性の高いVPR性能を実現する。この問題に対処するため,適応型マルチセル識別・補正(A-MuSIC)と呼ばれる適応型VPRシステムを提案する。まず,マッチングクエリのフレーム間連続性を解析することにより,vpr手法のランタイム性能に関する情報を収集する手法を開発する。次に,その手法の静的アンサンブル上での操作方法を示し,その手法が現在の環境に最も寄与するデータを生成する。 A-MuSICは収集した情報を使用して、最小限のテクニックのサブセットを選択し、ナビゲーション中に再選択が必要なタイミングを決定する。 a-musicは、テストされたすべてのベンチマークデータセットで最先端のvprのパフォーマンスをマッチさせるか、または上回る。 Visual place recognition (VPR) is an essential component of robot navigation and localization systems that allows them to identify a place using only image data. VPR is challenging due to the significant changes in a place's appearance under different illumination throughout the day, with seasonal weather and when observed from different viewpoints. Currently, no single VPR technique excels in every environmental condition, each exhibiting unique benefits and shortcomings. As a result, VPR systems combining multiple techniques achieve more reliable VPR performance in changing environments, at the cost of higher computational loads. Addressing this shortcoming, we propose an adaptive VPR system dubbed Adaptive Multi-Self Identification and Correction (A-MuSIC). We start by developing a method to collect information of the runtime performance of a VPR technique by analysing the frame-to-frame continuity of matched queries. We then demonstrate how to operate the method on a static ensemble of techniques, generating data on which techniques are contributing the most for the current environment. A-MuSIC uses the collected information to both select a minimal subset of techniques and to decide when a re-selection is required during navigation. A-MuSIC matches or beats state-of-the-art VPR performance across all tested benchmark datasets while maintaining its computational load on par with individual techniques.	翻訳日:2023-03-28 21:06:22 公開日:2023-03-24
# 連続波状態における死亡時間と余パルスによる光計測 Photocounting measurements with dead time and afterpulses in the continuous-wave regime ( http://arxiv.org/abs/2303.14246v1 ) ライセンス: Link先を確認	A. A. Semenov, J. Samelin, Ch. Boldt, M. Sch\"unemann, C. Reiher, W. Vogel, and B. Hage	(参考訳) 連続波検出の広く用いられている実験手法は、所定の測定時間窓内のクリック型検出器からの光電流のパルスを数えることを想定している。このような手順で、検出器のデッドタイム中に各光電流パルス後に検出された光子を見逃す。さらに、各パルスは、実際の光子とは関連のないいわゆる後パルスを初期化する。対応する量子光計数式を導出し,その妥当性を実験的に検証する。光電流パルスの統計は、以前の測定時間ウィンドウのメモリ効果によって説明される量子状態に対して非線形であるように見える。一般に非線形で光子とパルスの統計を接続する表現は、異なる測定シナリオのために導出される。また,不平衡ホモダイン検出を用いた量子状態再構成への応用も検討した。 The widely used experimental technique of continuous-wave detection assumes counting pulses of photocurrent from a click-type detector inside a given measurement time window. With such a procedure we miss out the photons detected after each photocurrent pulse during the detector dead time. Additionally, each pulse may initialize so-called afterpulse, which is not associated with the real photons. We derive the corresponding quantum photocounting formula and experimentally verify its validity. Statistics of photocurrent pulses appears to be nonlinear with respect to quantum state, which is explained by the memory effect of the previous measurement time windows. Expressions -- in general, nonlinear -- connecting statistics of photons and pulses are derived for different measurement scenarios. We also consider an application of the obtained results to quantum state reconstruction with unbalanced homodyne detection.	翻訳日:2023-03-28 21:05:56 公開日:2023-03-24
# 暗黙的バランスと正則化:過パラメータ非対称行列センシングの一般化と収束保証 Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing ( http://arxiv.org/abs/2303.14244v1 ) ライセンス: Link先を確認	Mahdi Soltanolkotabi, Dominik St\"oger, Changzhi Xie	(参考訳) 近年,過パラメータ学習モデルの学習のための勾配型手法の収束特性と一般化特性の理解が著しく進展している。しかし、小さなランダム初期化の役割や、勾配に基づく更新においてモデルの様々なパラメータがどのように結合されるかといった多くの側面は、良い一般化を促進するためにほとんど謎のままである。最近の一連の論文は、いくつかの線形測定から低ランクPSD行列の再構成を含む対称正の半定値(PSD)行列の非凸定式化におけるこの役割について研究し始めている。基礎となる対称性/PSDnessは、この問題に対する既存の収束と一般化の保証に不可欠である。本稿では,非対称な長方形低ランク行列を線形測定から再構成したいという,一般的な過パラメータ化低ランク行列検出問題について検討する。偏微分勾配勾配からトレーニングした過パラメータ化モデルが、測定結果を生成する低ランク行列に収束することが証明された。この設定では,(1)勾配更新軌道を通して因子が様々な方法で結合される勾配勾配の軌道の結合と,(2)係数化モデルの過度な性質にもかかわらず,反復が低ランクモデルに対する正当性を示すアルゴリズム正則性という2つの暗黙的な性質が示される。これらの2つの暗黙的な性質は、小さなランダム初期化からの勾配降下軌道が、大域的に最適かつ一般化された解へと移動することを示す。 Recently, there has been significant progress in understanding the convergence and generalization properties of gradient-based methods for training overparameterized learning models. However, many aspects including the role of small random initialization and how the various parameters of the model are coupled during gradient-based updates to facilitate good generalization remain largely mysterious. A series of recent papers have begun to study this role for non-convex formulations of symmetric Positive Semi-Definite (PSD) matrix sensing problems which involve reconstructing a low-rank PSD matrix from a few linear measurements. The underlying symmetry/PSDness is crucial to existing convergence and generalization guarantees for this problem. In this paper, we study a general overparameterized low-rank matrix sensing problem where one wishes to reconstruct an asymmetric rectangular low-rank matrix from a few linear measurements. We prove that an overparameterized model trained via factorized gradient descent converges to the low-rank matrix generating the measurements. We show that in this setting, factorized gradient descent enjoys two implicit properties: (1) coupling of the trajectory of gradient descent where the factors are coupled in various ways throughout the gradient update trajectory and (2) an algorithmic regularization property where the iterates show a propensity towards low-rank models despite the overparameterized nature of the factorized model. These two implicit properties in turn allow us to show that the gradient descent trajectory from small random initialization moves towards solutions that are both globally optimal and generalize well.	翻訳日:2023-03-28 21:05:45 公開日:2023-03-24
# DyLiN:光電界ネットワークを動的にする DyLiN: Making Light Field Networks Dynamic ( http://arxiv.org/abs/2303.14243v1 ) ライセンス: Link先を確認	Heng Yu, Joel Julin, Zoltan A. Milacski, Koichiro Niinuma, Laszlo A. Jeni	(参考訳) 光電場から配向線へのリフォーミュレーションである光電場ネットワークは、座標ネットワークよりも高速で、2次元観測から3次元構造を表現することができる。一般的なシーンの表現や操作には適しているが、ひとつの問題に悩まされている。本稿では,位相変化を含む非剛性変形を処理可能な動的光電界ネットワーク(dylin)法を提案する。入力光線から正準光線への変形場を学習し、それらを高次元空間に持ち上げて不連続を扱う。さらに,制御可能な属性入力でDyLiNを拡張するCoDyLiNを紹介する。我々は,事前学習した動的放射場から知識蒸留により両方のモデルを訓練する。種々の非剛性変形を含む合成および実世界のデータセットを用いてDyLiNを評価した。 DyLiNは、25倍から71倍の計算速度を保ちながら、視覚的忠実度の観点から、定性的かつ定量的に最先端の手法に適合した。また,属性付データに対してCoDyLiNを試験し,教師モデルを上回った。プロジェクトページ: https://dylin2023.github.io Light Field Networks, the re-formulations of radiance fields to oriented rays, are magnitudes faster than their coordinate network counterparts, and provide higher fidelity with respect to representing 3D structures from 2D observations. They would be well suited for generic scene representation and manipulation, but suffer from one problem: they are limited to holistic and static scenes. In this paper, we propose the Dynamic Light Field Network (DyLiN) method that can handle non-rigid deformations, including topological changes. We learn a deformation field from input rays to canonical rays, and lift them into a higher dimensional space to handle discontinuities. We further introduce CoDyLiN, which augments DyLiN with controllable attribute inputs. We train both models via knowledge distillation from pretrained dynamic radiance fields. We evaluated DyLiN using both synthetic and real world datasets that include various non-rigid deformations. DyLiN qualitatively outperformed and quantitatively matched state-of-the-art methods in terms of visual fidelity, while being 25 - 71x computationally faster. We also tested CoDyLiN on attribute annotated data and it surpassed its teacher model. Project page: https://dylin2023.github.io .	翻訳日:2023-03-28 21:05:18 公開日:2023-03-24
# IDGI:統合勾配から説明ノイズを除去するフレームワーク IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients ( http://arxiv.org/abs/2303.14242v1 ) ライセンス: Link先を確認	Ruo Yang, Binghui Wang, Mustafa Bilgic	(参考訳) 統合勾配(ig)とその変種は、ディープニューラルネットワークの決定を解釈するためのよく知られた技術である。 IGベースのアプローチは最先端のパフォーマンスを実現するが、しばしばノイズを説明相性マップに統合し、解釈可能性を低減する。ノイズを最小化するために, ノイズ源を解析的に検討し, 解析結果に基づいて説明ノイズを低減するための新しい手法を提案する。本稿では,Reimann 積分を積分勾配計算に用いる任意の IG 法に容易に組み込むことができる重要方向勾配積分(IDGI)フレームワークを提案する。 3つのIGベースの手法による大規模な実験により、IDGIは多数の解釈可能性指標を大幅に改善することが示された。 Integrated Gradients (IG) as well as its variants are well-known techniques for interpreting the decisions of deep neural networks. While IG-based approaches attain state-of-the-art performance, they often integrate noise into their explanation saliency maps, which reduce their interpretability. To minimize the noise, we examine the source of the noise analytically and propose a new approach to reduce the explanation noise based on our analytical findings. We propose the Important Direction Gradient Integration (IDGI) framework, which can be easily incorporated into any IG-based method that uses the Reimann Integration for integrated gradient computation. Extensive experiments with three IG-based methods show that IDGI improves them drastically on numerous interpretability metrics.	翻訳日:2023-03-28 21:05:00 公開日:2023-03-24
# ブロックチェーンネットワークにおけるコアベーストレンド検出 Core-based Trend Detection in Blockchain Networks ( http://arxiv.org/abs/2303.14241v1 ) ライセンス: Link先を確認	Jason Zhu, Arijit Khan, Cuneyt Gurcan Akcora	(参考訳) ブロックチェーンは今や貿易金融を著しく緩和しており、毎日数十億ドル相当の資産が取引されている。しかし、データのサイズと複雑さのため、これらのネットワークの分析は依然として困難である。 InnerCore”と呼ばれるスケーラブルなアプローチを導入し、ブロックチェーンベースのネットワークにおけるキーアクターを特定し、データ深度ベースのコア分解とセンタ付きモチーフディスカバリを使用して、ネットワークに対するインセンティブインジケータを提供する。インナーコア(innercore)は、大規模な時間グラフの解析に適した計算効率と教師なしのアプローチである。我々は,LunaTerraの最近の崩壊とEthereumのProof-of-Stake(PoS)スイッチのケーススタディを通じて,主要なブロックチェーン分析会社によって収集された外的根拠を用いて,その効果を実証する。ブロックチェーン分析の自動化とトレンド検出をスケーラブルな方法で行うことにより,InnerCoreは人間の関与なしに,適切な分析を正確に行うことができることを示す。 Blockchains are now significantly easing trade finance, with billions of dollars worth of assets being transacted daily. However, analyzing these networks remains challenging due to the large size and complexity of the data. We introduce a scalable approach called "InnerCore" for identifying key actors in blockchain-based networks and providing a sentiment indicator for the networks using data depth-based core decomposition and centered-motif discovery. InnerCore is a computationally efficient, unsupervised approach suitable for analyzing large temporal graphs. We demonstrate its effectiveness through case studies on the recent collapse of LunaTerra and the Proof-of-Stake (PoS) switch of Ethereum, using external ground truth collected by a leading blockchain analysis company. Our experiments show that InnerCore can match the qualified analysis accurately without human involvement, automating blockchain analysis and its trend detection in a scalable manner.	翻訳日:2023-03-28 21:04:49 公開日:2023-03-24
# 適応ベースクラス抑圧とワンショット物体検出のための事前誘導ネットワーク Adaptive Base-class Suppression and Prior Guidance Network for One-Shot Object Detection ( http://arxiv.org/abs/2303.14240v1 ) ライセンス: Link先を確認	Wenwen Zhang, Xinyu Xiao, Hangguan Shan and Eryun Liu	(参考訳) one-shot object detection (osod)は、クエリイメージによって指定されたカテゴリに向かって、すべてのオブジェクトインスタンスを検出することを目的としている。 osodの既存の研究のほとんどは、効果的なクロスイメージ相関を探求し、意味的特徴の誤用を緩和しようと試みているが、モデルバイアスの基底クラスに対する現象や、新しいクラスにおける一般化の低下を無視している。そこで我々は,この問題を克服するために,BSPG(Base-class Suppression and Prior Guidance)ネットワークという新しいフレームワークを提案する。具体的には,ベースクラス予測器を用いて,ベースクラスのオブジェクトを明示的に検出し,ベースクラス抑制モジュールによって適応的に除去する。さらに、事前ガイダンスモジュールは、非パラメトリックな方法で高レベルの特徴の相関を計算し、クラスに依存しない事前マップを生成し、目的の特徴にリッチなセマンティックな手がかりを与え、その後の検出プロセスを導くように設計されている。提案した2つのモジュールが組み合わさったモデルに対して,対象オブジェクトを基本クラスに属するイントラクタと区別する強力な識別能力を与える。実験の結果,提案手法は従来手法よりも高い性能を示し,各種評価条件下での最先端性能を実現している。 One-shot object detection (OSOD) aims to detect all object instances towards the given category specified by a query image. Most existing studies in OSOD endeavor to explore effective cross-image correlation and alleviate the semantic feature misalignment, however, ignoring the phenomenon of the model bias towards the base classes and the generalization degradation on the novel classes. Observing this, we propose a novel framework, namely Base-class Suppression and Prior Guidance (BSPG) network to overcome the problem. Specifically, the objects of base categories can be explicitly detected by a base-class predictor and adaptively eliminated by our base-class suppression module. Moreover, a prior guidance module is designed to calculate the correlation of high-level features in a non-parametric manner, producing a class-agnostic prior map to provide the target features with rich semantic cues and guide the subsequent detection process. Equipped with the proposed two modules, we endow the model with a strong discriminative ability to distinguish the target objects from distractors belonging to the base classes. Extensive experiments show that our method outperforms the previous techniques by a large margin and achieves new state-of-the-art performance under various evaluation settings.	翻訳日:2023-03-28 21:04:33 公開日:2023-03-24
# H\"古い連続多変量関数の効率的リプシッツ大域最適化 Efficient Lipschitzian Global Optimization of H\"older Continuous Multivariate Functions ( http://arxiv.org/abs/2303.14293v1 ) ライセンス: Link先を確認	Kaan Gokcesu, Hakan Gokcesu	(参考訳) 本研究では,H\より古い連続な多変量関数に対して効率的な大域最適化手法を提案する。低バウンディングプロキシ関数を構成する従来の方法とは異なり、このアルゴリズムは、計算的に優れている所定のクエリ生成ルールを採用している。アルゴリズムのパフォーマンスは、平均的または累積的後悔を用いて評価され、これはまた、単純な後悔の限界を意味し、アプローチの全体的な効果を反映している。その結果、アルゴリズムは適切なパラメータで、与えられた時間的地平線内の$n$次元空間において、H\"older exponent $\alpha$でH\"older連続ターゲット関数を最適化するために、平均後悔境界の$O(T^{-\frac{\alpha}{n}})$に達することを示した。この境界がミニマックス最適であることを示す。 This study presents an effective global optimization technique designed for multivariate functions that are H\"older continuous. Unlike traditional methods that construct lower bounding proxy functions, this algorithm employs a predetermined query creation rule that makes it computationally superior. The algorithm's performance is assessed using the average or cumulative regret, which also implies a bound for the simple regret and reflects the overall effectiveness of the approach. The results show that with appropriate parameters the algorithm attains an average regret bound of $O(T^{-\frac{\alpha}{n}})$ for optimizing a H\"older continuous target function with H\"older exponent $\alpha$ in an $n$-dimensional space within a given time horizon $T$. We demonstrate that this bound is minimax optimal.	翻訳日:2023-03-28 20:58:18 公開日:2023-03-24
# 極長スケールにおけるガウス過程の応用:分子からブラックホールへ Applications of Gaussian Processes at Extreme Lengthscales: From Molecules to Black Holes ( http://arxiv.org/abs/2303.14291v1 ) ライセンス: Link先を確認	Ryan-Rhys Griffiths	(参考訳) 観測科学と実験科学の多くの領域では、データが乏しい。高エネルギー天体物理学におけるデータの観測は、天体の閉塞と望遠鏡時間の制限によって妨害されるが、合成化学と材料科学の実験室実験から得られたデータは収集するのに時間と費用がかかる。一方で、実験装置の計測誤差など、データ生成機構に関する知識は科学においてしばしば利用可能である。小さなデータと基礎となる物理学の知識の両方が、ガウス過程(GP)をそのようなデータセットに適合させる理想的な候補にしている。 GPは、例えば分子や物質の仮想スクリーニングにおいて不確実性を考慮した予測を行うことができ、またブラックホール集積ディスクからの潜在放出シグネチャのような不完全なデータについて推論することもできる。さらに、GPは現在ベイズ最適化の作業モデルであり、科学的な発見キャンペーンにおける実験実験のガイドとなるための方法論である。この論文の最初の貢献は、セイファート銀河のマーカリアン335からの潜在放出のシグネチャを推論するためにgpモデルを使用し、拡張により、ブラックホール降着円盤の様々な理論モデルの適用可能性について推論することである。第2の貢献はGPフレームワークを分子および化学反応表現に拡張し、このフレームワークを科学者が使えるようにするためのオープンソースソフトウェアライブラリを提供することである。第3の貢献は、GPを利用して新規で高性能なフォトウィッチ分子を発見することである。第4の貢献は、アレエータ的不確かさをモデル化できるベイズ最適化スキームを導入し、大規模な製造プロセスに本質的ロバスト性を持つ材料組成の同定を容易にすることである。 In many areas of the observational and experimental sciences data is scarce. Data observation in high-energy astrophysics is disrupted by celestial occlusions and limited telescope time while data derived from laboratory experiments in synthetic chemistry and materials science is time and cost-intensive to collect. On the other hand, knowledge about the data-generation mechanism is often available in the sciences, such as the measurement error of a piece of laboratory apparatus. Both characteristics, small data and knowledge of the underlying physics, make Gaussian processes (GPs) ideal candidates for fitting such datasets. GPs can make predictions with consideration of uncertainty, for example in the virtual screening of molecules and materials, and can also make inferences about incomplete data such as the latent emission signature from a black hole accretion disc. Furthermore, GPs are currently the workhorse model for Bayesian optimisation, a methodology foreseen to be a guide for laboratory experiments in scientific discovery campaigns. The first contribution of this thesis is to use GP modelling to reason about the latent emission signature from the Seyfert galaxy Markarian 335, and by extension, to reason about the applicability of various theoretical models of black hole accretion discs. The second contribution is to extend the GP framework to molecular and chemical reaction representations and to provide an open-source software library to enable the framework to be used by scientists. The third contribution is to leverage GPs to discover novel and performant photoswitch molecules. The fourth contribution is to introduce a Bayesian optimisation scheme capable of modelling aleatoric uncertainty to facilitate the identification of material compositions that possess intrinsic robustness to large scale fabrication processes.	翻訳日:2023-03-28 20:58:05 公開日:2023-03-24
# 生活支援におけるニュース検索改善のための音声対話エージェントと知識グラフ Voice-Based Conversational Agents and Knowledge Graphs for Improving News Search in Assisted Living ( http://arxiv.org/abs/2303.14286v1 ) ライセンス: Link先を確認	Phillip Schneider, Nils Rehtanz, Kristiina Jokinen and Florian Matthes	(参考訳) 高齢化、スタッフ不足、一般的な慢性疾患など、医療分野は大きな課題に直面しているため、個人への高品質なケアの提供は非常に困難になっている。会話エージェントは、これらの問題を緩和するための有望な技術であることが示されている。デジタルヘルスアシスタントの形では、高齢者や慢性疾患者の日常生活を改善する可能性を秘めている。これには例えば、薬のリマインダー、定期的なチェック、ソーシャルなチャットが含まれる。さらに、会話エージェントは、日々のニュースやローカルイベントに関する情報にアクセスするための基本的なニーズを満たすことができ、それによって個人は周囲の世界に情報を提供し、つながり続けることができる。しかし、特に技術的リテラシーや健康関連障害に乏しい人にとっては、関連するニュースソースの発見や、オンラインで利用可能な多くのニュース記事のナビゲートは圧倒的である。この課題に対処するために,生活支援におけるニュース検索のための知識グラフと会話エージェントを組み合わせた革新的なソリューションを提案する。グラフデータベースを利用してニュースデータを意味論的に構造化し、直感的な音声ベースのインターフェースを実装することで、ケア依存の人々が関連するニュース記事を簡単に発見し、パーソナライズされたレコメンデーションを提供することができる。設計上の選択を説明し、システムアーキテクチャを提供し、最初のユーザテストに関する洞察を共有し、今後の計画について展望を与えます。 As the healthcare sector is facing major challenges, such as aging populations, staff shortages, and common chronic diseases, delivering high-quality care to individuals has become very difficult. Conversational agents have shown to be a promising technology to alleviate some of these issues. In the form of digital health assistants, they have the potential to improve the everyday life of the elderly and chronically ill people. This includes, for example, medication reminders, routine checks, or social chit-chat. In addition, conversational agents can satisfy the fundamental need of having access to information about daily news or local events, which enables individuals to stay informed and connected with the world around them. However, finding relevant news sources and navigating the plethora of news articles available online can be overwhelming, particularly for those who may have limited technological literacy or health-related impairments. To address this challenge, we propose an innovative solution that combines knowledge graphs and conversational agents for news search in assisted living. By leveraging graph databases to semantically structure news data and implementing an intuitive voice-based interface, our system can help care-dependent people to easily discover relevant news articles and give personalized recommendations. We explain our design choices, provide a system architecture, share insights of an initial user test, and give an outlook on planned future work.	翻訳日:2023-03-28 20:57:36 公開日:2023-03-24
# ロジスティック回帰のための特徴空間スケッチ Feature Space Sketching for Logistic Regression ( http://arxiv.org/abs/2303.14284v1 ) ライセンス: Link先を確認	Gregory Dexter, Rajiv Khanna, Jawad Raheel, and Petros Drineas	(参考訳) 本稿では,コアセット構成,特徴選択,ロジスティック回帰の次元性低減のための新しい境界を提案する。これら3つのアプローチはロジスティック回帰入力のスケッチと考えることができる。コアセット構築の最前線では,事前作業から開放的な問題を解消し,コアセット構築手法の複雑さに対する新たな境界を提案する。特徴選択と次元減少の面では、ロジスティック回帰のための前方誤差境界の研究を開始する。我々の境界は定数に密着しており、前方誤差境界は一般化線形モデルに拡張することができる。 We present novel bounds for coreset construction, feature selection, and dimensionality reduction for logistic regression. All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve open problems from prior work and present novel bounds for the complexity of coreset construction methods. On the feature selection and dimensionality reduction front, we initiate the study of forward error bounds for logistic regression. Our bounds are tight up to constant factors and our forward error bounds can be extended to Generalized Linear Models.	翻訳日:2023-03-28 20:57:12 公開日:2023-03-24
# 強化学習における可変選択のための逐次ノックオフ Sequential Knockoffs for Variable Selection in Reinforcement Learning ( http://arxiv.org/abs/2303.14281v1 ) ライセンス: Link先を確認	Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber	(参考訳) 強化学習の現実世界の応用では、事前の知識なしではマルコフ特性を満たすような状態表現を得ることがしばしば困難である。したがって、連続した時間点上の測定を連結することで、必要以上に大きい状態を構築するのが一般的である。しかし、必然的に国家の次元を増大させると、学習が遅くなり、学習方針が難解になる。我々は、マルコフ決定過程(MDP)において、そのプロセスがMDPのままであり、元のプロセスと同じ最適なポリシーを共有する元の状態の最小のサブベクターとして、最小の十分状態の概念を導入する。本稿では,高次元複素非線形力学系における最小限の十分状態を推定する新しいシーケンシャルノックオフ(SEEK)アルゴリズムを提案する。大規模なサンプルでは, 提案手法は偽発見率を制御し, 確率が近づいた全ての変数を選択する。本手法は強化学習アルゴリズムの適用に非依存であるため,政策最適化などの下流タスクに有効である。実験的実験により理論的結果が検証され,提案手法が様々な選択精度と後悔の点で競合する手法よりも優れていることを示す。 In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same optimal policy as the original process. We propose a novel sequential knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method controls the false discovery rate, and selects all sufficient variables with probability approaching one. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy optimization. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods in terms of variable selection accuracy and regret.	翻訳日:2023-03-28 20:57:04 公開日:2023-03-24
# 感情的・社会的規範的特徴を用いたソーシャルメディア投稿の抑うつ検出 Depression detection in social media posts using affective and social norm features ( http://arxiv.org/abs/2303.14279v1 ) ライセンス: Link先を確認	Ilias Triantafyllopoulos, Georgios Paraskevopoulos, Alexandros Potamianos	(参考訳) ソーシャルメディア投稿からの抑うつ検出のための深いアーキテクチャを提案する。提案アーキテクチャはBERTに基づいてソーシャルメディア投稿から言語表現を抽出し、注意深い双方向GRUネットワークを用いてこれらの表現を組み合わせる。我々は,事前学習された感情分類器から抽出した特徴をテキスト表現の強化により感情情報を取り込む。心理学的文献に動機づけられた我々は,後期融合方式を用いて,ポストとワードの表現性と道徳性を建築に取り入れることを提案する。分析の結果,抑うつ検出にはモラルや偏見が重要であることが示された。我々は,pirinaデータセット上のreddit投稿に対する抑うつ検出モデルを適用し,さらに,reddit rsddデータセットで提案されている1ユーザあたりの複数の投稿に対して抑うつのあるユーザを検出する設定について検討する。提案された機能の追加は、それぞれ2.65%と6.73%のf1スコアの絶対的な改善という両方の設定で最先端の結果をもたらす。指標項:抑うつ検出、BERT、特徴融合、感情認識、憎悪、道徳 We propose a deep architecture for depression detection from social media posts. The proposed architecture builds upon BERT to extract language representations from social media posts and combines these representations using an attentive bidirectional GRU network. We incorporate affective information, by augmenting the text representations with features extracted from a pretrained emotion classifier. Motivated by psychological literature we propose to incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme. Our analysis indicates that morality and profanity can be important features for depression detection. We apply our model for depression detection on Reddit posts on the Pirina dataset, and further consider the setting of detecting depressed users, given multiple posts per user, proposed in the Reddit RSDD dataset. The inclusion of the proposed features yields state-of-the-art results in both settings, namely 2.65% and 6.73% absolute improvement in F1 score respectively. Index Terms: Depression detection, BERT, Feature fusion, Emotion recognition, profanity, morality	翻訳日:2023-03-28 20:56:42 公開日:2023-03-24
# プランニングモデルの適用によるオープンワールドでの運用の学習 Learning to Operate in Open Worlds by Adapting Planning Models ( http://arxiv.org/abs/2303.14272v1 ) ライセンス: Link先を確認	Wiktor Piotrowski and Roni Stern and Yoni Sher and Jacob Le and Matthew Klenk and Johan deKleer and Shiwali Mohan	(参考訳) プランニングエージェントは、ドメインモデルがもはや世界を正確に表現していない新しい状況で振る舞うことができない。オープンな世界で活動するエージェントに対して,新規性の存在を検知し,ドメインモデルやアクション選択を効果的に適用するアプローチを提案する。行動の実行を観察し、環境モデルによって期待されるものとの相違を計測し、ノベルティの存在を推測する。そして、モデル変更に対するヒューリスティックスガイダンスによる探索を通じてモデルを改訂する。標準強化学習(rl)ベンチマークであるcartopole問題に関する経験的評価を報告する。その結果,本手法は極めて迅速かつ解釈可能な方法で新規性に対処できることがわかった。 Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.	翻訳日:2023-03-28 20:56:23 公開日:2023-03-24
# マニフォールド上のカーネル回帰の不変性から得られる特異なサンプル複雑度 The Exact Sample Complexity Gain from Invariances for Kernel Regression on Manifolds ( http://arxiv.org/abs/2303.14269v1 ) ライセンス: Link先を確認	Behrooz Tahmasebi, Stefanie Jegelka	(参考訳) 実際、モデルへの不変性のエンコーディングは、サンプルの複雑さに役立つ。本研究では,不変性がサンプル複雑性をいかに改善するかに関する理論的結果を整理し,一般化する。特に、任意の多様体上の任意の群作用に不変な対象関数を持つ任意の多様体上のカーネルリッジ回帰に対するミニマックス最適レートを提供する。我々の結果は(ほとんど)任意の群作用、あるいは正次元の群に対して成り立つ。有限群の場合、利得は群の大きさによってサンプルの「有効」数を増加させる。正の次元の群について、ゲインは商空間の体積に比例する因子に加えて多様体の次元の減少によって観測される。我々の証明は、不変多項式を使用するより一般的な戦略とは対照的に、微分幾何学の観点を取る。したがって、不変性を持つ学習に関するこの新しい幾何学的視点は独立した関心を持つかもしれない。 In practice, encoding invariances into models helps sample complexity. In this work, we tighten and generalize theoretical results on how invariances improve sample complexity. In particular, we provide minimax optimal rates for kernel ridge regression on any manifold, with a target function that is invariant to an arbitrary group action on the manifold. Our results hold for (almost) any group action, even groups of positive dimension. For a finite group, the gain increases the "effective" number of samples by the group size. For groups of positive dimension, the gain is observed by a reduction in the manifold's dimension, in addition to a factor proportional to the volume of the quotient space. Our proof takes the viewpoint of differential geometry, in contrast to the more common strategy of using invariant polynomials. Hence, this new geometric viewpoint on learning with invariances may be of independent interest.	翻訳日:2023-03-28 20:56:12 公開日:2023-03-24
# マルチモーダル受動センシングによるデータ駆動型ストレスモニタリングのための自己教師付きフレームワーク A Self-supervised Framework for Improved Data-Driven Monitoring of Stress via Multi-modal Passive Sensing ( http://arxiv.org/abs/2303.14267v1 ) ライセンス: Link先を確認	Shayan Fazeli, Lionel Levine, Mehrab Beikzadeh, Baharan Mirzasoleiman, Bita Zadeh, Tara Peris, Majid Sarrafzadeh	(参考訳) 近年の遠隔医療モニタリングの進歩は, 患者の生活の質向上に重要な役割を担っている。しかしながら、生理学的な健康に焦点を当てたソリューションは成功と成熟度の向上を実証しているが、ストレスや不安障害が日常生活で人々が扱う最も一般的な問題であるにもかかわらず、メンタルヘルスに焦点を当てたアプリケーションは、比較的限られた成功を収めている。メンタルヘルスの指標を測定するためのより堅牢な分析フレームワークの開発を通じて、この領域のさらなる進展を期待するために、ストレス応答の生理的前駆体を追跡するための多モード半教師付きフレームワークを提案する。本手法は,ウェアラブル端末と異なる領域と解像度のマルチモーダルデータを利用して,短時間のエピソードを意味的に効率的な埋め込みにマッピングする。さらに、モジュラーとスケーラブルの両方でフレームワークをレンダリングする利点があるため、モダリティ間の対照的な目的も活用しています。階層構造による埋め込みの局所的側面とグローバルな側面の最適化に注力することで、知識の伝達と他のデバイスとの互換性の達成が容易になります。私たちのパイプラインでは、各モードのインスタンスレベルでの寄与を推定するアテンションメカニズムに基づくタスク固有のプーリングが、観測のための最終埋め込みを計算する。これはまた、データ特性に関する詳細な診断の洞察を提供し、メンタルヘルスステータス毎に注釈付けされたエピソードを予測するというより広い視点における信号の重要性を強調します。本研究は,実世界のデータを用いて,知覚的ストレスに対する学習実験を行い,提案手法の有効性を実証した。 Recent advances in remote health monitoring systems have significantly benefited patients and played a crucial role in improving their quality of life. However, while physiological health-focused solutions have demonstrated increasing success and maturity, mental health-focused applications have seen comparatively limited success in spite of the fact that stress and anxiety disorders are among the most common issues people deal with in their daily lives. In the hopes of furthering progress in this domain through the development of a more robust analytic framework for the measurement of indicators of mental health, we propose a multi-modal semi-supervised framework for tracking physiological precursors of the stress response. Our methodology enables utilizing multi-modal data of differing domains and resolutions from wearable devices and leveraging them to map short-term episodes to semantically efficient embeddings for a given task. Additionally, we leverage an inter-modality contrastive objective, with the advantages of rendering our framework both modular and scalable. The focus on optimizing both local and global aspects of our embeddings via a hierarchical structure renders transferring knowledge and compatibility with other devices easier to achieve. In our pipeline, a task-specific pooling based on an attention mechanism, which estimates the contribution of each modality on an instance level, computes the final embeddings for observations. This additionally provides a thorough diagnostic insight into the data characteristics and highlights the importance of signals in the broader view of predicting episodes annotated per mental health status. We perform training experiments using a corpus of real-world data on perceived stress, and our results demonstrate the efficacy of the proposed approach in performance improvements.	翻訳日:2023-03-28 20:56:00 公開日:2023-03-24
# クラスター型動的環境のための安全・サンプル効率強化学習 Safe and Sample-efficient Reinforcement Learning for Clustered Dynamic Environments ( http://arxiv.org/abs/2303.14265v1 ) ライセンス: Link先を確認	Hongyi Chen and Changliu Liu	(参考訳) 本研究では,RLアルゴリズムの開発において,安全性の制約を満たすこと,限られたサンプルで効率的に学習することの2つの大きな課題に対処する,安全かつサンプル効率のよい強化学習(RL)フレームワークを提案する。実世界の複雑な環境での安全性を確保するため,安全設定アルゴリズム(SSA)を用いて名目制御の監視と修正を行い,既存のRLアルゴリズムでは解決が難しいクラスタリングされた動的環境におけるSSA+RLの評価を行う。しかしながら、SSA+RLフレームワークは通常、特に報酬分散環境ではサンプリング効率が良くない。学習効率を向上させるために,(1)SSAを適応させることで過度に保守的な行動を避けること,(2)安全制約付きランダムネットワーク蒸留による安全な探索を促進すること,(3)SSAを専門家による実証として扱うことで政策収束を改善し,そこから直接学習すること,の3つの手法を提案する。実験の結果,我々のフレームワークは,トレーニング中の他の安全なrl手法と比較し,より少ないエピソードで課題を解決できることがわかった。プロジェクトwebサイト: https://hychen-naza.github.io/projects/safe_rl/ This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges in developing applicable RL algorithms: satisfying safety constraints and efficiently learning with limited samples. To guarantee safety in real-world complex environments, we use the safe set algorithm (SSA) to monitor and modify the nominal controls, and evaluate SSA+RL in a clustered dynamic environment which is challenging to be solved by existing RL algorithms. However, the SSA+RL framework is usually not sample-efficient especially in reward-sparse environments, which has not been addressed in previous safe RL works. To improve the learning efficiency, we propose three techniques: (1) avoiding behaving overly conservative by adapting the SSA; (2) encouraging safe exploration using random network distillation with safety constraints; (3) improving policy convergence by treating SSA as expert demonstrations and directly learn from that. The experimental results show that our framework can achieve better safety performance compare to other safe RL methods during training and solve the task with substantially fewer episodes. Project website: https://hychen-naza.github.io/projects/Safe_RL/.	翻訳日:2023-03-28 20:55:33 公開日:2023-03-24
# VILA:Vision-Language Pretrainingによるユーザコメントからイメージ美学を学ぶ VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining ( http://arxiv.org/abs/2303.14302v1 ) ライセンス: Link先を確認	Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, Feng Yang	(参考訳) 画像の審美性を評価することは、構成、色、スタイル、高レベルの意味論など、複数の要因に影響されるため、難しい。既存の画像美的評価法(IAA)は、人間が知覚する視覚的美的情報を過度に単純化する人間のラベル付き評価スコアに依存している。逆に、ユーザーコメントはより包括的な情報を提供し、画像美学に関する人間の意見や好みを表現する自然な方法である。そこで本研究では,ユーザのコメントからイメージ美学を学ぶこと,マルチモーダル美学表現を学習するための視覚言語事前学習法を提案する。具体的には、コントラスト的および生成的目的を用いて画像テキストエンコーダ-デコーダモデルを事前訓練し、人間のラベルなしでリッチで汎用的な美的意味学を学習する。下流のiaaタスクに事前学習したモデルを効率的に適応させるために,テキストをアンカーとして使用する軽量なランクベースアダプタを提案する。以上の結果から,AVA-Captionsデータセットによる画像の美的字幕化は従来よりも優れており,ゼロショットスタイル分類やゼロショットIAAなどの美的タスクには強力なゼロショット機能を備えており,多くの教師付きベースラインを超えていることがわかった。提案するアダプタモジュールを用いた最小限の微調整パラメータのみを用いて,AVAデータセット上での最先端IAA性能を実現する。 Assessing the aesthetics of an image is challenging, as it is influenced by multiple factors including composition, color, style, and high-level semantics. Existing image aesthetic assessment (IAA) methods primarily rely on human-labeled rating scores, which oversimplify the visual aesthetic information that humans perceive. Conversely, user comments offer more comprehensive information and are a more natural way to express human opinions and preferences regarding image aesthetics. In light of this, we propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels. To efficiently adapt the pretrained model for downstream IAA tasks, we further propose a lightweight rank-based adapter that employs text as an anchor to learn the aesthetic ranking concept. Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset, and it has powerful zero-shot capability for aesthetic tasks such as zero-shot style classification and zero-shot IAA, surpassing many supervised baselines. With only minimal finetuning parameters using the proposed adapter module, our model achieves state-of-the-art IAA performance over the AVA dataset.	翻訳日:2023-03-28 20:46:38 公開日:2023-03-24
# repliclust: クラスター分析のための合成データ repliclust: Synthetic Data for Cluster Analysis ( http://arxiv.org/abs/2303.14301v1 ) ライセンス: Link先を確認	Michael J. Zellinger and Peter B\"uhlmann	(参考訳) repliclust(repli-cateおよびclust-erより)は、クラスタで合成データセットを生成するpythonパッケージである。本手法は,ユーザが所望の幾何学的特徴を持つ多数の異なるデータセットを作成できる,高レベルな幾何学的記述であるデータセットアーチタイプに基づく。ソフトウェアアーキテクチャはモジュールでオブジェクト指向であり、クラスタセンターを設置し、クラスタの形状をサンプリングし、クラスタごとにデータポイント数を選択し、クラスタに確率分布を割り当てるアルゴリズムにデータ生成を分解する。プロジェクトのWebページ、reliclust.orgは簡潔なユーザーガイドと詳細なドキュメントを提供する。 We present repliclust (from repli-cate and clust-er), a Python package for generating synthetic data sets with clusters. Our approach is based on data set archetypes, high-level geometric descriptions from which the user can create many different data sets, each possessing the desired geometric characteristics. The architecture of our software is modular and object-oriented, decomposing data generation into algorithms for placing cluster centers, sampling cluster shapes, selecting the number of data points for each cluster, and assigning probability distributions to clusters. The project webpage, repliclust.org, provides a concise user guide and thorough documentation.	翻訳日:2023-03-28 20:46:11 公開日:2023-03-24
# agilegan3d: 拡張トランスファー学習による3dポートレートスタイライゼーション AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning ( http://arxiv.org/abs/2303.14297v1 ) ライセンス: Link先を確認	Guoxian Song and Hongyi Xu and Jing Liu and Tiancheng Zhi and Yichun Shi and Jianfeng Zhang and Zihang Jiang and Jiashi Feng and Shen Sang and Linjie Luo	(参考訳) 自動2Dポートレートのスタイリングは大幅に進歩しているが、単一のユーザー写真から3Dポートレートのスタイリングは未解決の課題だ。ここでの大きな障害は、高品質な3Dトレーニングデータがないことだ。本稿では,3d芸術的およびパーソナライズされたポートレートを詳細な形状で生成できる,新しい枠組みである \emph{agilegan3d} を提案する。新しいスタイリゼーションは、わずか (約20) の未完成の2D例で得られる。まず、既存の2Dスタイル化機能である「emph{style pre creation}」を活用して、大量の拡張された2Dスタイルの例を生成する。これらの拡張された例は、正確なカメラポーズラベルと、下流の3Dスタイリングタスクにとって重要なペアリングされた実顔画像で生成される。近年の 3D 対応 GAN モデルの発展により,事前学習した 3D GAN ジェネレータ上で \emph{guided transfer learning} を実行し,マルチビュー一貫性のスタイリングレンダリングを生成する。被験者の身元をよく保持できる3D GANインバージョンを実現するために,エンコーダのトレーニングに 'emph{multi-view consistency loss' を組み込む。われわれのパイプラインは、ユーザー写真を多様な3dアートポートレートに変換する強力な能力を示している。本手法の優れた性能を示すために,質的結果と定量的評価を行った。コードとトレーニング済みのモデルは、再現目的でリリースされる予定だ。 While substantial progresses have been made in automated 2D portrait stylization, admirable 3D portrait stylization from a single user photo remains to be an unresolved challenge. One primary obstacle here is the lack of high quality stylized 3D training data. In this paper, we propose a novel framework \emph{AgileGAN3D} that can produce 3D artistically appealing and personalized portraits with detailed geometry. New stylization can be obtained with just a few (around 20) unpaired 2D exemplars. We achieve this by first leveraging existing 2D stylization capabilities, \emph{style prior creation}, to produce a large amount of augmented 2D style exemplars. These augmented exemplars are generated with accurate camera pose labels, as well as paired real face images, which prove to be critical for the downstream 3D stylization task. Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings. In order to achieve 3D GAN inversion that can preserve subject's identity well, we incorporate \emph{multi-view consistency loss} in the training of our encoder. Our pipeline demonstrates strong capability in turning user photos into a diverse range of 3D artistic portraits. Both qualitative results and quantitative evaluations have been conducted to show the superior performance of our method. Code and pretrained models will be released for reproduction purpose.	翻訳日:2023-03-28 20:46:00 公開日:2023-03-24
# グローバル感度解析と機械学習説明可能性のための導出型シェープリー値 Derivative-based Shapley value for global sensitivity analysis and machine learning explainability ( http://arxiv.org/abs/2303.15183v1 ) ライセンス: Link先を確認	Hui Duan and Giray \"Okten	(参考訳) 我々は、グローバル感度分析と機械学習説明可能性のための新しいShapley値アプローチを導入する。この方法は基礎関数の1階部分微分に基づいている。この方法の計算複雑性は、文献における他のシェープリー値アプローチの指数複雑性とは対照的に、次元(特徴数)において線型である。グローバルな感度分析や機械学習の例を用いて、この手法をアクティビティスコア、SHAP、KernelSHAPと数値的に比較する。 We introduce a new Shapley value approach for global sensitivity analysis and machine learning explainability. The method is based on the first-order partial derivatives of the underlying function. The computational complexity of the method is linear in dimension (number of features), as opposed to the exponential complexity of other Shapley value approaches in the literature. Examples from global sensitivity analysis and machine learning are used to compare the method numerically with activity scores, SHAP, and KernelSHAP.	翻訳日:2023-03-28 15:24:09 公開日:2023-03-24
# グラフ自動コントラスト学習のハイブリッド化 Hybrid Augmented Automated Graph Contrastive Learning ( http://arxiv.org/abs/2303.15182v1 ) ライセンス: Link先を確認	Yifu Chen and Qianqian Ren and Liu Yong	(参考訳) グラフコントラスト学習にはグラフ拡張が不可欠である。既存の作業の多くは、事前に定義されたランダム拡張を使用しており、通常は異なる入力グラフに適応できず、異なるノードとエッジがグラフセマンティクスに与える影響を考慮できない。この問題に対処するため,Hybrid Augmented Automated Graph Contrastive Learning (HAGCL) というフレームワークを提案する。 HAGCLは機能レベルの学習可能なビュージェネレータとエッジレベルの学習可能なビュージェネレータで構成される。ビュージェネレータは、入力グラフに条件付きビューの確率分布を学習するために、エンドツーエンドで微分可能である。特徴とトポロジーの観点で、最も意味的に意味のある構造を学ぶことを保証します。さらに,下流作業におけるラベル情報の弱さや追加作業の広範な評価を伴わずに,従来の作業よりも優れた結果を得られるような共同学習戦略を提案する。 Graph augmentations are essential for graph contrastive learning. Most existing works use pre-defined random augmentations, which are usually unable to adapt to different input graphs and fail to consider the impact of different nodes and edges on graph semantics. To address this issue, we propose a framework called Hybrid Augmented Automated Graph Contrastive Learning (HAGCL). HAGCL consists of a feature-level learnable view generator and an edge-level learnable view generator. The view generators are end-to-end differentiable to learn the probability distribution of views conditioned on the input graph. It insures to learn the most semantically meaningful structure in terms of features and topology, respectively. Furthermore, we propose an improved joint training strategy, which can achieve better results than previous works without resorting to any weak label information in the downstream tasks and extensive evaluation of additional work.	翻訳日:2023-03-28 15:24:01 公開日:2023-03-24
# ISS++:テキストガイドによる3D形状生成のためのステッピングストーンとしてのイメージ ISS++: Image as Stepping Stone for Text-Guided 3D Shape Generation ( http://arxiv.org/abs/2303.15181v1 ) ライセンス: Link先を確認	Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu	(参考訳) 本稿では,2つのテキストと3Dデータを必要としない3次元形状を生成するために,画像をステップストーンとして利用する新しい3次元形状生成手法(ISS++)を提案する。提案手法のコアとなるのは,CLIP 画像の特徴を SVR モデルの詳細な3次元形状空間にマッピングし,CLIP のテキスト特徴を描画画像と入力テキスト間のCLIP 一貫性を奨励することで,CLIP のテキスト特徴を3次元形状空間にマッピングする,事前訓練された単一ビュー再構成(SVR)モデルを活用する2段階の機能空間アライメント戦略である。さらに,svrモデルの生成能力を超えて,新たな構造やテクスチャで出力形状を向上できるテキスト誘導型3d形状スタイライゼーションモジュールも設計する。さらに,事前学習したテキストから画像への拡散モデルを用いて,生成的多様性,忠実度,スタイライゼーション能力を高める。我々のアプローチは汎用的で柔軟でスケーラブルであり、様々なSVRモデルと容易に統合して生成空間を拡大し、生成精度を向上させることができる。広範な実験結果から,本手法は,生成的品質と入力テキストとの一貫性の観点から,最先端手法よりも優れていることが示された。コードとモデルはhttps://github.com/liuzhengzhe/ISS- Image-as-Stepping-Stone-for-Text-Guided-3D-Shape-Generationで公開されている。 In this paper, we present a new text-guided 3D shape generation approach (ISS++) that uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP features to shapes: to begin with, map the CLIP image feature to the detail-rich 3D shape space of the SVR model, then map the CLIP text feature to the 3D shape space through encouraging the CLIP-consistency between rendered images and the input text. Besides, to extend beyond the generative capability of the SVR model, we design a text-guided 3D shape stylization module that can enhance the output shapes with novel structures and textures. Further, we exploit pre-trained text-to-image diffusion models to enhance the generative diversity, fidelity, and stylization capability. Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity. Extensive experimental results demonstrate that our approach outperforms the state-of-the-art methods in terms of generative quality and consistency with the input text. Codes and models are released at https://github.com/liuzhengzhe/ISS-Image-as-Stepping-Stone-for-Text-Guided-3D-Shape-Generation.	翻訳日:2023-03-28 15:23:46 公開日:2023-03-24
# ブートストラップ強化学習による河川のロバストパス追従 Robust Path Following on Rivers Using Bootstrapped Reinforcement Learning ( http://arxiv.org/abs/2303.15178v1 ) ライセンス: Link先を確認	Niklas Paulig, Ostap Ohkrin	(参考訳) 本稿では,内陸海域における自律型表面容器(ASV)の航行制御のための深層強化学習(DRL)エージェントを開発した。水路の幾何学による空間的制限と、高流動速度や浅瀬のような結果として生じる課題は、ANVの制御と正確な移動を必要とする。最先端のブートストラップq-learningアルゴリズムと多用途なトレーニング環境ジェネレータを組み合わせることで、堅牢で正確なラダーコントローラが実現される。提案手法の経路追従性能を,下流ライン川と中部ライン川からの実世界の河川データに対して比較したところ,DRLアルゴリズムは航法精度を高く保ちながら,見つからないシナリオでも効果的に一般化可能であることが示唆された。 This paper develops a Deep Reinforcement Learning (DRL)-agent for navigation and control of autonomous surface vessels (ASV) on inland waterways. Spatial restrictions due to waterway geometry and the resulting challenges, such as high flow velocities or shallow banks, require controlled and precise movement of the ASV. A state-of-the-art bootstrapped Q-learning algorithm in combination with a versatile training environment generator leads to a robust and accurate rudder controller. To validate our results, we compare the path-following capabilities of the proposed approach to a vessel-specific PID controller on real-world river data from the lower- and middle Rhine, indicating that the DRL algorithm could effectively prove generalizability even in never-seen scenarios while simultaneously attaining high navigational accuracy.	翻訳日:2023-03-28 15:23:02 公開日:2023-03-24
# 正面視のためのNeRFおよびニューラルビュー合成法の知覚的品質評価 Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views ( http://arxiv.org/abs/2303.15206v1 ) ライセンス: Link先を確認	Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli	(参考訳) ニューラルビュー合成(neural view synthesis, nvs)は、自由視点映像を合成する最も成功した手法の1つであり、撮像された画像の集合から高い忠実度を達成することができる。この成功は、PSNR、SSIM、LPIPSといった画像品質の指標を用いて、テストビューのセットで評価される、多くのバリエーションを生み出した。 nvsの手法がビデオ品質に対してどのように機能するかについては、研究が不足している。本研究は,NVSおよびNeRFの知覚的評価に関する最初の研究である。本研究では,制御された実験室環境で撮影されたシーンの2つのデータセットと,室内のシーンを収集した。既存のデータセットとは対照的に、これらのシーンには参照ビデオシーケンスがあり、静的画像のみを見る際に容易に見過ごされる時間的アーティファクトや微妙な歪みをテストできます。我々は,NVS法によって合成された映像の品質をよく制御された知覚品質評価実験で測定した。本稿では,nvs評価のためのデータセットとメトリック選択の結果と推奨結果の詳細な分析を行う。 Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.	翻訳日:2023-03-28 15:14:00 公開日:2023-03-24
# アウトカム駆動サブグループに向けて:6つのうつ病治療研究にわたる機械学習分析 Towards Outcome-Driven Patient Subgroups: A Machine Learning Analysis Across Six Depression Treatment Studies ( http://arxiv.org/abs/2303.15202v1 ) ライセンス: Link先を確認	David Benrimoh, Akiva Kleinerman, Toshi A. Furukawa, Charles F. Reynolds III, Eric Lenze, Jordan Karp, Benoit Mulsant, Caitrin Armstrong, Joseph Mehltretter, Robert Fratila, Kelly Perlman, Sonia Israel, Christina Popescu, Grace Golden, Sabrina Qassim, Alexandra Anacleto, Adam Kapelner, Ariel Rosenfeld, Gustavo Turecki	(参考訳) 主要なうつ病性障害(mdd)は不均一な疾患であり、複数の基礎となる神経生物学的基質が治療反応の変動と関連している可能性がある。この可変性と予測結果の源泉を理解することは明白である。機械学習はmddで治療反応を予測することが期待されているが、機械学習モデルの臨床的解釈性の欠如が制限されている。うつ病に対する薬理学的治療(total n = 5438)の6つの臨床試験から,治療関連患者クラスターの導出に使用可能なニューラルネットワークモデルである差分原型ニューラルネットワーク(DPNN)を用いて,差分処理応答の確率を学習しながら分析した。臨床および人口統計データを用いて, 寛解・個別寛解確率を分類し, 5本の単眼および3種類の組み合わせ治療を訓練した。モデルの妥当性と臨床的有用性は,AUC (Area under the curve) とモデル誘導治療による試料送還率の改善に基づいて測定した。ポストホック分析は、トレーニング中に学んだ患者プロトタイプに基づいてクラスター(サブグループ)を得た。特徴分布と治療特異的な結果の違いを評価することにより, 解釈可能性の評価を行った。 3-プロトタイプモデルではAUCは0.66であり、標本再送率に比べて絶対的な人口再送率の向上が期待された。臨床的に解釈可能な3つの治療関連患者クラスターを同定した。機械学習モデルを用いて新しい治療関連患者のプロファイルを作成することが可能であり、うつ病の精密医療を改善することができる。注:このモデルは、現在、アクティブな臨床試験の対象ではなく、臨床用途を意図していない。 Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models. We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data. Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions and treatment-specific outcomes. A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate compared to the sample remission rate. We identified three treatment-relevant patient clusters which were clinically interpretable. It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use.	翻訳日:2023-03-28 15:13:43 公開日:2023-03-24
# 変圧器の深部特性探索による画像劣化 Image Deblurring by Exploring In-depth Properties of Transformer ( http://arxiv.org/abs/2303.15198v1 ) ライセンス: Link先を確認	Pengwei Liang, Junjun Jiang, Xianming Liu, Jiayi Ma	(参考訳) 画像デブラリングは生成モデルの開発によって印象的な性能を保ち続けている。それでも、回復した画像の知覚的品質と定量的スコアを同時に向上させたい場合、いまだに不快な問題が残っている。本研究では, 変圧器特性の研究から着想を得て, 予め学習した変圧器を導入し, この問題に対処する。特に,事前訓練された視覚トランスフォーマ(vit)から抽出された深部特徴を活用して,定量的測定で測定した性能を犠牲にすることなく,復元画像のシャープ化を奨励する。事前学習した変換器は画像のグローバルなトポロジカルな関係(すなわち自己相似性)を捉えることができ、鮮明な画像に関する捕獲されたトポロジカルな関係は、ぼかしが発生すると変化する。復元画像と目標画像とのトランスフォーマー特性を比較することにより、予め訓練されたトランスフォーマーは高分解能のぼやけ感のある意味情報を提供する。優位性に基づいて、画像の劣化をガイドする2種類の新しい知覚的損失を提示する。特徴をベクトルとみなし、抽出された画像から抽出された表現とユークリッド空間における対象表現との差を計算する。他の型は、画像から抽出した特徴を分布とみなし、回収した画像と対象画像との分布差を比較する。そこで本研究では,uformer,restormer,nafnetなど,最も競争の激しいモデルに対する定量的スコア(psnr)を犠牲にすることなく,知覚品質向上におけるトランスフォーマ特性の有効性を実証する。 Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we introduce the pretrained transformers to address this problem. In particular, we leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. The pretrained transformer can capture the global topological relations (i.e., self-similarity) of image, and we observe that the captured topological relations about the sharp image will change when blur occurs. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information, which is critical in measuring the sharpness of the deblurred image. On the basis of the advantages, we present two types of novel perceptual losses to guide image deblurring. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space. The other type considers the features extracted from an image as a distribution and compares the distribution discrepancy between recovered image and target one. We demonstrate the effectiveness of transformer properties in improving the perceptual quality while not sacrificing the quantitative scores (PSNR) over the most competitive models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and motion deblurring tasks.	翻訳日:2023-03-28 15:12:54 公開日:2023-03-24
# プロンプトチューニングに基づく視覚言語モデル適応用アダプタ Prompt Tuning based Adapter for Vision-Language Model Adaption ( http://arxiv.org/abs/2303.15234v1 ) ライセンス: Link先を確認	Jingchen Sun, Jiayu Qin, Zihao Lin, Changyou Chen	(参考訳) 大規模な事前学習型視覚言語(VL)モデルは、様々な下流タスクに適応する上で大きな可能性を示している。しかし,モデルパラメータの多さからネットワーク全体の微調整は困難である。この問題に対処するため,プロンプトチューニングなどの効率的な適応手法が提案されている。我々は,マルチタスク事前学習した初期化によるプロンプトチューニングのアイデアを探求し,モデル性能を著しく向上できることを示す。そこで本研究では,事前学習されたプロンプトチューニングと効率的な適応ネットワークを組み合わせた新しいモデルであるprompt-adapterを提案する。特に1ショット、2ショット、4ショット、8ショット画像といった限られたデータインスタンスの設定では、我々のアプローチは、パブリックな11データセットで数ショットのイメージ分類で最先端の手法を破りました。提案手法は,高速な視覚言語モデル適応のために,プロンプトチューニングとパラメータ効率のよいネットワークを組み合わせることを実証する。コードは、https://github.com/Jingchensun/prompt_adapter.comで公開されている。 Large pre-trained vision-language (VL) models have shown significant promise in adapting to various downstream tasks. However, fine-tuning the entire network is challenging due to the massive number of model parameters. To address this issue, efficient adaptation methods such as prompt tuning have been proposed. We explore the idea of prompt tuning with multi-task pre-trained initialization and find it can significantly improve model performance. Based on our findings, we introduce a new model, termed Prompt-Adapter, that combines pre-trained prompt tunning with an efficient adaptation network. Our approach beat the state-of-the-art methods in few-shot image classification on the public 11 datasets, especially in settings with limited data instances such as 1 shot, 2 shots, 4 shots, and 8 shots images. Our proposed method demonstrates the promise of combining prompt tuning and parameter-efficient networks for efficient vision-language model adaptation. The code is publicly available at: https://github.com/Jingchensun/prompt_adapter.	翻訳日:2023-03-28 14:54:15 公開日:2023-03-24
# Pitchclass2vec: コード埋め込みによるシンボリック音楽構造セグメンテーション Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings ( http://arxiv.org/abs/2303.15306v1 ) ライセンス: Link先を確認	Nicolas Lazzari, Andrea Poltronieri, Valentina Presutti	(参考訳) 構造知覚は人間の音楽認知の基本的な側面である。歴史的に、音楽の階層構造は、意味を伝達し、期待を作り、リスナーの感情を喚起するための物語装置として機能した。これにより、作曲者が自分の考えを整理する音楽的談話を形成するため、音楽構造は作曲において重要な役割を担っている。本稿では,自然言語処理技術とカスタムメイド符号化技術の両方を用いて,連続ベクトル表現に埋め込まれた記号コードアノテーションに基づく新しい楽曲セグメンテーション手法である2vecを提案する。提案アルゴリズムは,Long-Short term memory(LSTM)ニューラルネットワークをベースとして,現場における記号コードアノテーションに基づく最先端技術より優れている。 Structure perception is a fundamental aspect of music cognition in humans. Historically, the hierarchical organization of music into structures served as a narrative device for conveying meaning, creating expectancy, and evoking emotions in the listener. Thereby, musical structures play an essential role in music composition, as they shape the musical discourse through which the composer organises his ideas. In this paper, we present a novel music segmentation method, pitchclass2vec, based on symbolic chord annotations, which are embedded into continuous vector representations using both natural language processing techniques and custom-made encodings. Our algorithm is based on long-short term memory (LSTM) neural network and outperforms the state-of-the-art techniques based on symbolic chord annotations in the field.	翻訳日:2023-03-28 14:35:41 公開日:2023-03-24
# PeakNet: U-Netを用いたX線結晶学実験におけるブラッグピーク発見 PeakNet: Bragg peak finding in X-ray crystallography experiments with U-Net ( http://arxiv.org/abs/2303.15301v1 ) ライセンス: Link先を確認	Cong Wang, Po-Nan Li, Jana Thayer and Chun Hong Yoon	(参考訳) X線自由電子レーザー(XFEL)のシリアル結晶学は、近年、高いデータ速度を達成するために著しく進歩している。この開発は、対数的時間スケールでの分子イベントのイメージングなど、新しい科学的研究を可能にする可能性があるが、ディスク上の科学に関連する特徴や画像だけを保存するために、ある程度のデータ削減を伴うリアルタイムデータ分析に関する課題も生んでいる。データ削減が効果的でない場合、施設の予算要件が大幅に増加するか、あるいはデータ分析を不安定にする超高繰り返しイメージング技術の利用を妨げる可能性がある。さらに、リアルタイムデータ分析からユーザーへリアルタイムフィードバックを提供するという課題もある。連続結晶学の文脈では、リアルタイムデータ解析における初期および臨界ステップは、回折画像からx線ブラッグピークを見つけることである。この課題に対処するために、ニューラルネットワークを活用し、Psocakeのピークファインダの約4倍の速度で実行されるBraggのピークファインダであるPeakNetを紹介します。従来のU-Netアーキテクチャとして実装されたセマンティックセグメンテーション問題にピーク探索のタスクを定式化した。 PeakNetの重要な利点は、データボリュームに関して線形にスケールできることであり、リアルタイムの連続結晶データ解析に高いデータレートで適している。 Serial crystallography at X-ray free electron laser (XFEL) sources has experienced tremendous progress in achieving high data rate in recent times. While this development offers potential to enable novel scientific investigations, such as imaging molecular events at logarithmic timescales, it also poses challenges in regards to real-time data analysis, which involves some degree of data reduction to only save those features or images pertaining to the science on disks. If data reduction is not effective, it could directly result in a substantial increase in facility budgetary requirements, or even hinder the utilization of ultra-high repetition imaging techniques making data analysis unwieldy. Furthermore, an additional challenge involves providing real-time feedback to users derived from real-time data analysis. In the context of serial crystallography, the initial and critical step in real-time data analysis is finding X-ray Bragg peaks from diffraction images. To tackle this challenge, we present PeakNet, a Bragg peak finder that utilizes neural networks and runs about four times faster than Psocake peak finder, while delivering significantly better indexing rates and comparable number of indexed events. We formulated the task of peak finding into a semantic segmentation problem, which is implemented as a classical U-Net architecture. A key advantage of PeakNet is its ability to scale linearly with respect to data volume, making it well-suited for real-time serial crystallography data analysis at high data rates.	翻訳日:2023-03-28 14:35:27 公開日:2023-03-24
# ビジュアルプロンプティングの理解と改善 - ラベルマッピングの視点から Understanding and Improving Visual Prompting: A Label-Mapping Perspective ( http://arxiv.org/abs/2211.11635v5 ) ライセンス: Link先を確認	Aochuan Chen, Yuguang Yao, Pin-Yu Chen, Yihua Zhang, Sijia Liu	(参考訳) 我々は視覚タスクの入力プロンプト技術である視覚プロンプト(VP)を再検討し前進する。 VPは、(入力摂動パターンの観点で)普遍的なプロンプトを下流のデータポイントに組み込むことで、固定されたトレーニング済みのソースモデルをプログラムして、ターゲットドメインの下流タスクを達成できる。しかし、なぜVPが、ソースクラスとターゲットクラスの間のルールレスラベルマッピング(LM)でさえ有効であるのかは、いまだ解明されていない。 LMはVPとどのように関連していますか? そして、そのような関係を利用してターゲットタスクの精度を向上する方法。我々は、LMがVPに与える影響を考察し、LMのより良い「品質」(マッピング精度と説明による評価)がVPの有効性を一貫して改善できるという肯定的な回答を提供する。これは、LMの要素が欠落していた以前の技術とは対照的である。 LMを最適化するために、新たなVPフレームワークであるILM-VP(iterative label mapping-based visual prompting)を提案し、ソースラベルをターゲットラベルに自動的に再マップし、VPの目標タスク精度を徐々に改善する。さらに,コントラッシブ言語画像事前訓練(CLIP)モデルを用いて,CLIPのテキスト選択を支援するためのLMプロセスの統合と,目標タスクの精度の向上を提案する。広範な実験により,提案手法が最先端vp法を大きく上回ることを示した。以下に示すように、ImageNet-pretrained ResNet-18を13のターゲットタスクに再プログラミングする場合、我々の手法はベースラインをかなり上回り、例えば、ターゲットのFlowers102とCIFAR100データセットへの変換学習の精度が7.9%と6.7%向上している。さらに、CLIPベースのVPに関する提案では、Flowers102とDTDの精度がそれぞれ13.7%と7.1%向上している。私たちのコードはhttps://github.com/OPTML-Group/ILM-VPで利用可能です。 We revisit and advance visual prompting (VP), an input prompting technique for vision tasks. VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the target domain by simply incorporating universal prompts (in terms of input perturbation patterns) into downstream data points. Yet, it remains elusive why VP stays effective even given a ruleless label mapping (LM) between the source classes and the target classes. Inspired by the above, we ask: How is LM interrelated with VP? And how to exploit such a relationship to improve its accuracy on target tasks? We peer into the influence of LM on VP and provide an affirmative answer that a better 'quality' of LM (assessed by mapping precision and explanation) can consistently improve the effectiveness of VP. This is in contrast to the prior art where the factor of LM was missing. To optimize LM, we propose a new VP framework, termed ILM-VP (iterative label mapping-based visual prompting), which automatically re-maps the source labels to the target labels and progressively improves the target task accuracy of VP. Further, when using a contrastive language-image pretrained (CLIP) model, we propose to integrate an LM process to assist the text prompt selection of CLIP and to improve the target task accuracy. Extensive experiments demonstrate that our proposal significantly outperforms state-of-the-art VP methods. As highlighted below, we show that when reprogramming an ImageNet-pretrained ResNet-18 to 13 target tasks, our method outperforms baselines by a substantial margin, e.g., 7.9% and 6.7% accuracy improvements in transfer learning to the target Flowers102 and CIFAR100 datasets. Besides, our proposal on CLIP-based VP provides 13.7% and 7.1% accuracy improvements on Flowers102 and DTD respectively. Our code is available at https://github.com/OPTML-Group/ILM-VP.	翻訳日:2023-03-28 11:56:08 公開日:2023-03-24
# 破壊的ニューラルスケーリング法則 Broken Neural Scaling Laws ( http://arxiv.org/abs/2210.14891v10 ) ライセンス: Link先を確認	Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger	(参考訳) We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, "emergent" "phase transitions / changes", arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). 神経スケーリング行動の他の機能形式と比較すると、この関数形式は、この集合においてかなり正確なスケーリング行動の外挿をもたらす。さらに、この関数形式は、二重降下のような現象のスケーリング挙動に存在する非単調遷移や、算術のようなタスクのスケーリング挙動に存在する遅延、鋭いインフレクションポイントなど、他の関数形式が表現できないスケーリング挙動を正確にモデル化し、外挿する。最後に、この関数形式を使用して、スケーリング動作の予測可能性の限界に関する洞察を得ます。コードはhttps://github.com/ethancaballero/broken_neural_scaling_lawsで入手できる。 We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, "emergent" "phase transitions / changes", arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws	翻訳日:2023-03-28 11:55:38 公開日:2023-03-24
# マルチアーマッド帯域における局所クラスタリング Local Clustering in Contextual Multi-Armed Bandits ( http://arxiv.org/abs/2103.00063v3 ) ライセンス: Link先を確認	Yikun Ban, Jingrui He	(参考訳) 本研究では,コンテキスト型マルチアームバンディット(MAB)におけるユーザクラスタの識別について検討する。コンテキストMABは、コンテンツレコメンデーションやオンライン広告など、多くの実アプリケーションにとって効果的なツールである。実際には、ユーザ依存はユーザのアクション、つまり報酬において重要な役割を果たす。類似したユーザーをクラスタリングすることで報酬の質が向上し、結果としてより効果的なコンテンツレコメンデーションとターゲット広告につながる。従来のクラスタリング設定とは異なり、未知のbanditパラメータに基づいてユーザをクラスタ化します。特に、コンテキストMABにおけるクラスタ検出の問題を定義し、局所クラスタリング手法を組み込んだ帯域幅アルゴリズム、LOCBを提案する。また,クラスタリングの正しさと効率,およびその後悔境界の観点から,LOCBに関する理論的解析を行った。最後に,提案アルゴリズムを,最先端のベースラインを上回る様々な側面から評価する。 We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.	翻訳日:2023-03-27 19:16:54 公開日:2023-03-24
# バイナリラテントを用いた変分オートエンコーダの直接進化最適化 Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents ( http://arxiv.org/abs/2011.13704v2 ) ライセンス: Link先を確認	Enrico Guiraud, Jakob Drefs, J\"org L\"ucke	(参考訳) 離散潜在変数は実世界のデータにとって重要であると考えられており、離散潜在変数を持つ変分オートエンコーダ(VAE)の研究の動機となっている。しかし、この場合、標準的なVAEトレーニングは不可能であり、従来のような個別のVAEを訓練するために、個別の分散を操作するための異なる戦略を動機付けている。ここでは、符号化モデルに直接離散最適化を適用することにより、潜伏者の離散性を完全に維持できるかどうかを問う。この手法は, サイドステッピングサンプリング近似, 再パラメータ化トリック, 償却により, 標準的なVAEトレーニングから強く逸脱している。離散最適化は、進化的アルゴリズムと連動して、切断後段を用いた変分設定で実現される。バイナリラテントを持つVAEに対して、(A)ネットワーク重みに対する勾配上昇にそのような離散的変動法がどのように結びついているか、および(B)デコーダがトレーニングのために遅延状態を選択する方法を示す。従来の償却トレーニングはより効率的で、大きなニューラルネットワークに適用できる。しかし、より小さなネットワークを用いることで、数百の潜伏者に対して効率よく分散最適化を行うことができる。さらに重要なのは,直接最適化の有効性が,‘ゼロショット’学習において極めて競争力が高いことだ。大規模な教師付きネットワークとは対照的に、hereが調査したvaes canは、クリーンなデータや大きな画像データセットのトレーニングの事前のトレーニングなしに、1つのイメージをデノーズする。より一般に,vaeの訓練はサンプリングに基づく近似と再パラメータ化を伴わずに可能であり,一般にvae訓練の解析には興味深いものと考えられる。ゼロショット' 設定では、直接最適化され、さらに、VAE は非生成的アプローチによって以前より優れていた。 Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.	翻訳日:2023-03-27 19:16:40 公開日:2023-03-24
# クロスU統計を用いた次元非依存推論 Dimension-agnostic inference using cross U-statistics ( http://arxiv.org/abs/2011.05068v6 ) ライセンス: Link先を確認	Ilmun Kim, Aaditya Ramdas	(参考訳) 統計的推論に対する古典的な漸近理論は、通常、次元$d$を固定し、サンプルサイズ$n$を無限大に増やすことで統計学を校正する。最近、これらのメソッドが高次元設定でどのように振る舞うかを理解するために多くの努力が払われており、$d$と$n$は共に無限大へと増加する。これはしばしば、次元に関する仮定によって異なる推論手順をもたらし、実践者はバインドに残される: 20次元に100のサンプルを持つデータセットが与えられたら、$n \gg d$、または$d/n \approx 0.2$を仮定してキャリブレーションすべきだろうか? 本論文は次元非依存推論の目的を考察し,$d$ と $n$ の仮定に依存しない手法の開発について述べる。サンプル分割と自己正規化とともに既存のテスト統計の変動表現を用いて、$d$が$n$でスケールするかどうかに関わらず、ガウス極限分布を持つ洗練されたテスト統計値を生成するアプローチを導入する。結果の統計学は、縮退したU統計を慎重に修正し、対角ブロックを落とし、対角ブロックを外したままにすると見なすことができる。我々は,一サンプル平均値と共分散テストを含む古典的な問題に対して,本手法を例示し,本試験が局所的代替品に対して最小速度最適化力を有することを示す。ほとんどの設定では、我々の交差U統計は対応する(退化)U統計の高次元のパワーと$\sqrt{2}$因子と一致する。 Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.	翻訳日:2023-03-27 19:16:07 公開日:2023-03-24
# UCB帯域における最適対外攻撃 Near Optimal Adversarial Attack on UCB Bandits ( http://arxiv.org/abs/2008.09312v3 ) ライセンス: Link先を確認	Shiliang Zuo	(参考訳) 我々は,報酬が敵対的腐敗を受ける確率的多腕バンディット問題を考える。本稿では、UCBの原理を巧みに操り、累積コストを$\sqrt{\log T}$とすると、$T$がラウンド数であるような累積コストで、最適でないターゲットアームを$T - o(T)$倍に引く新たな攻撃戦略を提案する。また、累積攻撃コストに対する最初の下限も証明する。我々の下限は最大$\log \log t$ 要素の上限に一致し、攻撃が最適に近いことを示している。 We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.	翻訳日:2023-03-27 19:15:40 公開日:2023-03-24
# フィッシャー情報を用いた量子ステアリングのウイットネス化 Witnessing quantum steering by means of the Fisher information ( http://arxiv.org/abs/2107.14730v2 ) ライセンス: Link先を確認	Ilaria Gianani, Vincenzo Berardi and Marco Barbieri	(参考訳) 量子ネットワークでは、特定の種類の量子相関を捉えることが重要となる。この課題を達成するために異なる経路をとることができ、そのような量子相関の異なる新しい側面を強調している。 yadin, fadel, gessner [nat. commun. 12, 2410 (2021)] による最近の理論結果に従い、二成分状態のメトロロジー能力においてステアリングがどのように現れるかを実験的に示す。本研究は,本手法の有効性を確認し,既存の代替案と比較した。 Capturing specific kinds of quantum correlation is of paramount importance for quantum networking. Different routes can be taken to achieve this task, highlighting different novel aspects of such quantum correlations. Following the recent theoretical results by Yadin, Fadel and Gessner [Nat. Commun. 12, 2410 (2021)], we demonstrate experimentally how steering manifests in the metrological abilities of a bipartite state. Our results confirm the relevance of this novel approach, and compare the outcome with already employed alternatives.	翻訳日:2023-03-27 19:10:26 公開日:2023-03-24
# 歩行認識のためのシルエットと骨格データの組み合わせ Combining the Silhouette and Skeleton Data for Gait Recognition ( http://arxiv.org/abs/2202.10645v3 ) ライセンス: Link先を確認	Likai Wang, Ruize Han, Wei Feng	(参考訳) 長距離バイオメトリック技術である歩行認識は近年、強い関心を集めている。現在、主要な2つの歩行認識作業は外観ベースとモデルベースであり、それぞれシルエットと骨格から特徴を抽出する。しかし, 着替えや搬送条件では外観ベースが大きな影響を受け, モデルベースではポーズ推定の精度が制限される。そこで,本研究では,シルエットを入力とするcnn系分枝と,スケルトンを入力とするgcn系分枝を含む,簡便かつ効果的な二分枝ネットワークを提案する。さらに,GCN系分岐における歩行表現の改善のために,マルチスケールグラフ畳み込みを統合する完全連結グラフ畳み込み演算子を提案し,自然関節接続への依存を軽減する。また,stc-attと呼ばれる多次元アテンションモジュールを配置し,空間的,時間的,チャネル的アテンションを同時に学習する。 CASIA-BとOUMVLPの実験結果から, 各種条件下での最先端性能が得られた。 Gait recognition, a long-distance biometric technology, has aroused intense interest recently. Currently, the two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes and skeletons, respectively. However, appearance-based methods are greatly affected by clothes-changing and carrying conditions, while model-based methods are limited by the accuracy of pose estimation. To tackle this challenge, a simple yet effective two-branch network is proposed in this paper, which contains a CNN-based branch taking silhouettes as input and a GCN-based branch taking skeletons as input. In addition, for better gait representation in the GCN-based branch, we present a fully connected graph convolution operator to integrate multi-scale graph convolutions and alleviate the dependence on natural joint connections. Also, we deploy a multi-dimension attention module named STC-Att to learn spatial, temporal and channel-wise attention simultaneously. The experimental results on CASIA-B and OUMVLP show that our method achieves state-of-the-art performance in various conditions.	翻訳日:2023-03-27 19:02:28 公開日:2023-03-24
# ProxSkip: はい。ローカルなグラディエントステップはおそらく通信加速につながる! ついに! ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! ( http://arxiv.org/abs/2202.09357v2 ) ライセンス: Link先を確認	Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich and Peter Richt\'arik	(参考訳) ProxSkipは、スムーズな(f$)関数と高価な非滑らかな(\psi$)関数の和を最小化する驚くほどシンプルで、証明可能な効率のよい方法です。このような問題を解決するための標準的なアプローチは、各反復において$f$の勾配と$\psi$のプロキシ演算子の評価に基づいて、近勾配降下法(ProxGD)アルゴリズムである。本研究で特に注目しているのは,proxの評価が勾配の評価に比較して費用がかかるようなシステムであり,多くの応用例においてそうである。 proxskipは高価なprox演算子をほとんどのイテレーションでスキップできる: イテレーションの複雑さは$\mathcal{o}\left(\kappa \log \frac{1}{\varepsilon}\right)$であり、ここで$\kappa$は$f$の条件番号であるが、proxの評価の数は$\mathcal{o}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$のみである。我々の主な動機は、勾配演算子の評価がすべてのデバイスで独立に局所的なGDステップをとることに対応し、プロキシの評価は勾配平均化の形式での(拡張的な)コミュニケーションに対応することにある。この文脈では、ProxSkipは通信複雑性の効果的な加速を提供する。 fedavg, scaffold,s-local-gd,fedlinなどの他の局所勾配型手法とは異なり,不均質なデータレジームにおけるバニラgdのそれよりも理論的な通信複雑性が悪く,あるいは最善の一致があるため,不均質性境界を仮定せずに証明可能で大きな改善が得られる。 We introduce ProxSkip -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $\mathcal{O}\left(\kappa \log \frac{1}{\varepsilon}\right)$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $\mathcal{O}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.	翻訳日:2023-03-27 19:02:08 公開日:2023-03-24
# 問合せによるブラックボックス深層学習モデルに対するスパース攻撃 Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models ( http://arxiv.org/abs/2202.00091v2 ) ライセンス: Link先を確認	Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe	(参考訳) 最善の努力にもかかわらず、ディープラーニングモデルは入力に適用される小さな逆さまの摂動にも非常に弱いままです。機械学習モデルの出力のみから情報を抽出し、ブラックボックスモデルに敵対的な摂動を発生させる能力は、自律車や機械学習モデルがサービスとして公開するMLaaSのような現実のシステムに対する現実的な脅威である。特に興味深いのは、スパース攻撃である。ブラックボックスモデルにおけるスパース攻撃の実現は、機械学習モデルが私たちが信じているよりも脆弱であることを示している。これらの攻撃は、l_0標準条件で測定された摂動画素の数を最小限に抑え、決定(予測ラベル)のみをモデルクエリに返却することで、モデルを誤解させる。しかし、このような攻撃はNPハード最適化の問題につながる。本研究では,畳み込み型ディープニューラルネットワークと視覚トランスフォーマの両方に対して,問題に対する進化に基づくアルゴリズムスパーセボを開発した。特に、視覚変換器は、決定に基づく攻撃条件下ではまだ調査されていない。 SparseEvoは、未ターゲットとターゲットの両方の攻撃に対して、最先端のスパース攻撃よりもはるかに少ないモデルクエリを必要とする。攻撃アルゴリズムは概念的には単純ではあるが、ImageNetのような標準的なコンピュータビジョンタスクにおける最先端の勾配ベースのホワイトボックス攻撃に対して、限られたクエリ予算で競合する。重要なことは、クエリ効率のよいSparseEvoと、一般的には意思決定ベースの攻撃は、デプロイされたシステムの安全性に関する新たな疑問を提起し、機械学習モデルの堅牢性を研究し、理解するための新たな方向性を示す。 Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (MLaaS). Of particular interest are sparse attacks. The realization of sparse attacks in black-box models demonstrates that machine learning models are more vulnerable than we believe. Because these attacks aim to minimize the number of perturbed pixels measured by l_0 norm-required to mislead a model by solely observing the decision (the predicted label) returned to a model query; the so-called decision-based attack setting. But, such an attack leads to an NP-hard optimization problem. We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. Notably, vision transformers are yet to be investigated under a decision-based attack setting. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. The attack algorithm, although conceptually simple, is also competitive with only a limited query budget against the state-of-the-art gradient-based whitebox attacks in standard computer vision tasks such as ImageNet. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models.	翻訳日:2023-03-27 19:01:30 公開日:2023-03-24
# 分数最小化のための座標降下法 Coordinate Descent Methods for Fractional Minimization ( http://arxiv.org/abs/2201.12691v3 ) ライセンス: Link先を確認	Ganzhao Yuan	(参考訳) 目的の数値部が微分可能凸関数と凸非滑らか関数の和であり、分母部が凸あるいは凹関数であるような構成された分数最小化問題のクラスを考える。非凸であるため、この問題は解決が難しい。問題の構造を利用して,この問題を解決するための2つのコーディネートDescent法を提案する。提案手法は1次元のsubproblem \textit{globally} を反復的に解き、座標の定常点に収束することが保証される。凸分母の場合、弱 \textit{locally bounded non-convexity condition} の下では、座標次定常点の最適性が標準臨界点と方向点の最適点よりも強いことが証明される。追加の適切な条件下では、cd法は座標的に静止点に q-線型収束する。凹分母の場合、任意の臨界点が大域的最小値であり、cd法は大域的最小値に劣線形収束率で収束することを示す。提案手法をいくつかの機械学習および信号処理モデルに適用する可能性を示す。実世界のデータを用いた実験により,提案手法は精度において既存手法よりも著しく優れていた。 We consider a class of structured fractional minimization problems, in which the numerator part of the objective is the sum of a differentiable convex function and a convex non-smooth function, while the denominator part is a convex or concave function. This problem is difficult to solve since it is non-convex. By exploiting the structure of the problem, we propose two Coordinate Descent (CD) methods for solving this problem. The proposed methods iteratively solve a one-dimensional subproblem \textit{globally}, and they are guaranteed to converge to coordinate-wise stationary points. In the case of a convex denominator, under a weak \textit{locally bounded non-convexity condition}, we prove that the optimality of coordinate-wise stationary point is stronger than that of the standard critical point and directional point. Under additional suitable conditions, CD methods converge Q-linearly to coordinate-wise stationary points. In the case of a concave denominator, we show that any critical point is a global minimum, and CD methods converge to the global minimum with a sublinear convergence rate. We demonstrate the applicability of the proposed methods to some machine learning and signal processing models. Our experiments on real-world data have shown that our method significantly and consistently outperforms existing methods in terms of accuracy.	翻訳日:2023-03-27 19:01:06 公開日:2023-03-24
# RBMLE-UCBによる線形二次系の適応制御 Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems ( http://arxiv.org/abs/2201.10542v2 ) ライセンス: Link先を確認	Akshay Mete, Rahul Singh and P. R. Kumar	(参考訳) 適応LQ制御問題(Adaptive LQ control problem)と呼ばれる2次コストで未知の確率線形系を制御する問題を考える。我々は40年以上前に提案された「RBMLE(Reward Biased Maximum Likelihood Estimate)」という手法を再検討し、盗賊問題に対する「Regret」の定義とともに「Upper Confidence Bound(UCB)」法を先行した。単にパラメータ推定の基準により大きな報酬を持つパラメータを好む項を追加しただけである。本研究では,rbmle法とucb法を両立させる方法を示し,rbmle法のペナルティとucb法の制約を組み合わせた拡張rbmle-ucbアルゴリズムを提案する。理論的には、この手法はこれまでの最もよく知られている$\tilde{\mathcal{o}}(\sqrt{t})$ regretを保っている。さらに,提案する拡張型rbmle-ucbと標準のrbmleを,ucb,トンプソンサンプリング,入力摂動,ランダム化された確実性等価性,stablで比較し,ボーイング747の飛行制御や無人航空機などの実世界の実例と比較した。拡張 RBMLE は UCB, Thompson Sampling および StabL を大差で一貫した性能を保ちながら, 入力摂動よりも極端に優れ, ランダム化された不確実性等価性よりも適度に優れていることを示した。 We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.	翻訳日:2023-03-27 19:00:44 公開日:2023-03-24
# 分解型量子グラフニューラルネットワーク Decompositional Quantum Graph Neural Network ( http://arxiv.org/abs/2201.05158v2 ) ライセンス: Link先を確認	Xing Ai, Zhihong Zhang, Luzhe Sun, Junchi Yan, Edwin Hancock	(参考訳) 量子機械学習(quantum machine learning)は、量子アルゴリズムと量子コンピューティングを用いた機械学習に取り組むことを目的とした、急速に進化する分野である。物理量子ビットの欠如とユークリッド空間からヒルベルト空間に実世界のデータをマッピングする効果的な手段のため、これらの手法のほとんどは量子類似性やプロセスシミュレーションに焦点をあてる。本稿では,ego-graphベースの量子グラフニューラルネットワーク (egoqgnn) と呼ぶ,グラフ構造データのためのハイブリッド量子古典アルゴリズムを提案する。 egoQGNNはテンソル積とユニティ行列表現を用いてGNN理論フレームワークを実装し、必要なモデルパラメータの数を大幅に削減する。古典的コンピュータによって制御される場合、egoQGNNは、適度な大きさの量子デバイスを用いて入力グラフからエゴグラフを処理することにより、任意の大きさのグラフを調整できる。このアーキテクチャは、現実世界のデータからヒルベルト空間への新しいマッピングに基づいている。このマッピングは、データに存在する距離関係を維持し、情報損失を低減する。実験の結果,提案手法はこれらのモデルと比較して1.68 %のパラメータしか持たない競争状態モデルよりも優れていた。 Quantum machine learning is a fast-emerging field that aims to tackle machine learning using quantum algorithms and quantum computing. Due to the lack of physical qubits and an effective means to map real-world data from Euclidean space to Hilbert space, most of these methods focus on quantum analogies or process simulations rather than devising concrete architectures based on qubits. In this paper, we propose a novel hybrid quantum-classical algorithm for graph-structured data, which we refer to as the Ego-graph based Quantum Graph Neural Network (egoQGNN). egoQGNN implements the GNN theoretical framework using the tensor product and unity matrix representation, which greatly reduces the number of model parameters required. When controlled by a classical computer, egoQGNN can accommodate arbitrarily sized graphs by processing ego-graphs from the input graph using a modestly-sized quantum device. The architecture is based on a novel mapping from real-world data to Hilbert space. This mapping maintains the distance relations present in the data and reduces information loss. Experimental results show that the proposed method outperforms competitive state-of-the-art models with only 1.68\% parameters compared to those models.	翻訳日:2023-03-27 19:00:10 公開日:2023-03-24
# RamBoAttack: 効率的なディープニューラルネットワーク決定エクスプロイトのロバストクエリ RamBoAttack: A Robust Query Efficient Deep Neural Network Decision Exploit ( http://arxiv.org/abs/2112.05282v3 ) ライセンス: Link先を確認	Viet Quoc Vo and Ehsan Abbasnejad and Damith C. Ranasinghe	(参考訳) 機械学習モデルは、敵の例からの回避攻撃に極めて敏感である。一般的に、元の入力と知覚的に類似した修正された入力は、モデルに完全にアクセス可能な敵によってホワイトボックス設定で構築される。しかし、最近の攻撃では、ブラックボックス攻撃を使って敵の例を作るためにクエリ数が著しく減少している。特にアラームは、google、microsoft、ibmを含む多くの機械学習サービスプロバイダによって提供されるトレーニングされたモデルのアクセスインターフェースから、これらのモデルを組み込んだ多数のアプリケーションによって使用される分類決定を活用できる能力である。モデルから予測されたラベルのみを利用して敵の例を作る能力は、決定に基づく攻撃として区別される。本研究では,iclrとspにおける最近の最先端意思決定に基づく攻撃を深く掘り下げ,勾配推定手法を用いた低歪み逆検出の費用対効果を強調する。我々は,局所的な最小値の侵入を回避し,勾配推定法で見られる雑音勾配からの誤方向を回避できる,堅牢なクエリ効率的な攻撃を開発する。提案する攻撃手法であるRamBoAttackは、ランダム化ブロック座標 Descent の概念を利用して隠れた分類器多様体を探索し、局所化入力のみを演算して勾配推定法の問題に対処する摂動を目標とする。重要なことは、RamBoAttackは、敵とターゲットクラスに利用可能な異なるサンプル入力に対してより堅牢である。全体として、特定のターゲットクラスに対して、RamBoAttackは、所定のクエリ予算内で低い歪みを達成するために、より堅牢であることが示されている。大規模な高解像度imagenetデータセットを使用して広範な結果をキュレーションし、攻撃、テストサンプル、アーティファクトをgithubでオープンソースにしました。 Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attacks. Particularly, alarming is the ability to exploit the classification decision from the access interface of a trained model provided by a growing number of Machine Learning as a Service providers including Google, Microsoft, IBM and used by a plethora of applications incorporating these models. The ability of an adversary to exploit only the predicted label from a model to craft adversarial examples is distinguished as a decision-based attack. In our study, we first deep dive into recent state-of-the-art decision-based attacks in ICLR and SP to highlight the costly nature of discovering low distortion adversarial employing gradient estimation methods. We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The attack method we propose, RamBoAttack, exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the issues of gradient estimation methods. Importantly, the RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class. Overall, for a given target class, RamBoAttack is demonstrated to be more robust at achieving a lower distortion within a given query budget. We curate our extensive results using the large-scale high-resolution ImageNet dataset and open-source our attack, test samples and artifacts on GitHub.	翻訳日:2023-03-27 18:59:50 公開日:2023-03-24
# DeBERTaV3: ELECTRA-Style Pre-TrainingによるDeBERTaの改善 DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing ( http://arxiv.org/abs/2111.09543v4 ) ライセンス: Link先を確認	Pengcheng He, Jianfeng Gao and Weizhu Chen	(参考訳) 本稿では,マスク言語モデリング(MLM)を,よりサンプル効率の高い事前学習タスクであるRTDに置き換えることで,従来のDeBERTaモデルを改善する新しい事前学習言語モデルであるDeBERTaV3を提案する。 ELECTRAにおけるバニラ埋め込み共有は,訓練効率とモデル性能を損なうことが示された。これは、ディスクリミネータとジェネレータのプルトークンのトレーニング損失が異なる方向に埋め込み、"綱引き"のダイナミクスを生成するためである。そこで本研究では,タッグ・オブ・ウォーのダイナミクスを回避し,トレーニング効率と事前学習モデルの質を両立させる,新しい勾配偏角埋め込み共有法を提案する。我々はDeBERTaV3をDeBERTaと同じ設定で事前訓練し、広範囲の下流自然言語理解(NLU)タスクにおいて例外的な性能を示す。 GLUEベンチマークを例に挙げると、DeBERTaV3 Largeモデルは平均スコア91.37%で、DeBERTaは1.37%、ELECTRAは1.91%で、同様の構造を持つモデルに新しい最先端(SOTA)が設定されている。さらに,多言語モデルmdebertaを事前学習し,英語モデルに比べて強いベースラインよりも大きな改善が見られた。例えば、mDeBERTa Baseは、XNLIで79.8%のゼロショットのクロスランガル精度を達成し、XLM-R Baseで3.6%改善した。トレーニング済みのモデルと推論コードをhttps://github.com/microsoft/DeBERTaで公開しました。 This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. Our analysis shows that vanilla embedding sharing in ELECTRA hurts training efficiency and model performance. This is because the training losses of the discriminator and the generator pull token embeddings in different directions, creating the "tug-of-war" dynamics. We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model. We have pre-trained DeBERTaV3 using the same settings as DeBERTa to demonstrate its exceptional performance on a wide range of downstream natural language understanding (NLU) tasks. Taking the GLUE benchmark with eight tasks as an example, the DeBERTaV3 Large model achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (SOTA) among the models with a similar structure. Furthermore, we have pre-trained a multi-lingual model mDeBERTa and observed a larger improvement over strong baselines compared to English models. For example, the mDeBERTa Base achieves a 79.8% zero-shot cross-lingual accuracy on XNLI and a 3.6% improvement over XLM-R Base, creating a new SOTA on this benchmark. We have made our pre-trained models and inference code publicly available at https://github.com/microsoft/DeBERTa.	翻訳日:2023-03-27 18:59:18 公開日:2023-03-24
# 最適フェルミオン-量子マッピング Optimal fermion-qubit mappings ( http://arxiv.org/abs/2110.12792v4 ) ライセンス: Link先を確認	Mitchell Chiew, Sergii Strelchuk	(参考訳) 量子コンピュータ上のフェルミオン系をシミュレーションするには、フェルミオン状態の量子ビットへの高速なマッピングが必要である。効率的なマッピングの重要な特徴は、局所的なフェルミオン相互作用を局所的な量子ビット相互作用に変換する能力である。すべてのフェルミオン・クビット写像は、クビット演算への変換のためにフェルミオンモードの番号スキームを使用する必要がある。フェルミオン系が写像する量子ビットの順序付けされていない記号ラベリングと順序付けされた数値ラベリングとを区別する。この分離は、フェルミオンモードの列挙スキームである追加の自由度を利用してフェルミオン-量子マッピングを設計する新しい方法に光を当てる。これにより、最適なフェルミオン量子ビット写像の概念をよく定義することができる:例えば、本論文の主な焦点は、彼らが生成する量子ハミルトニアンの項において、パウリ行列の平均数を最小化するジョルダン・ウィグナー変換を特定することである。フェルミオン系が与えられたとき、ターゲットの量子ハミルトニアンの実用的コスト関数に対する最適なフェルミオン列挙スキームを選択するためのリソースは一切与えられない。ミッチソンとダービンの列挙パターンが、正方形フェルミオン格子で相互作用するシステムのヨルダン・ウィグナー変換の平均ポーリ重みを最小化するために最適であることを示す。これにより、クビット・ハミルトニアン(qubit hamiltonian)は、パウリの平均重量が13.9%短くなる。さらに、2つのアンシラ量子ビットのみを加えることで、新しいフェルミオン・クビット写像のクラスを導入し、以前の方法と比較してハミルトン項の平均パウリ重量を37.9%削減する。他の自然なn$モードフェルミオン系では、平均ポーリ重みがna\"ive列挙スキームに対して$n^{1/4}$向上する列挙パターンが見つかる。 Simulating fermionic systems on a quantum computer requires a high-performing mapping of fermionic states to qubits. The key characteristic of an efficient mapping is its ability to translate local fermionic interactions into local qubit interactions, leading to easy-to-simulate qubit Hamiltonians. All fermion-qubit mappings must use a numbering scheme for the fermionic modes in order for translation to qubit operations. We make a distinction between the unordered, symbolic labelling of fermions and the ordered, numeric labelling of the qubits to which the fermionic system maps. This separation shines light on a new way to design fermion-qubit mappings by making use of the extra degree of freedom -- the enumeration scheme for the fermionic modes. This allows well-defined concepts of optimal fermion-qubit mappings: for example, the main focus of this paper is in identifying Jordan-Wigner transformations that minimise the average number of Pauli matrices in the terms of the qubit Hamiltonians they produce. Given a fermionic system, it does not expend any resources to choose the optimal fermionic enumeration schemes for practical cost functions of the target qubit Hamiltonian. We demonstrate how Mitchison and Durbin's enumeration pattern is optimal for minimising the average Pauli weight of Jordan-Wigner transformations of the systems interacting in square fermionic lattices. This leads to qubit Hamiltonians consisting of terms with average Pauli weights 13.9% shorter than previously known. Furthermore, by adding only two ancilla qubits we introduce a new class of fermion-qubit mappings, and reduce the average Pauli weight of Hamiltonian terms by 37.9% compared to previous methods. For other natural $n$-mode fermionic systems we find enumeration patterns which result in $n^{1/4}$ improvement in average Pauli weight over na\"ive enumeration schemes.	翻訳日:2023-03-27 18:58:45 公開日:2023-03-24
# 粗いラベルから学習する効率的なアルゴリズム Efficient Algorithms for Learning from Coarse Labels ( http://arxiv.org/abs/2108.09805v2 ) ライセンス: Link先を確認	Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos	(参考訳) 多くの学習問題では、細かなラベル情報にアクセスできない場合がある。例えば、画像は注釈の専門知識によっては、ハスキー、犬、さらには動物と分類できる。本研究では,これらの設定を定式化し,粗いデータから学習する問題を考察する。設定された$\mathcal{Z}$から実際のラベルを観察する代わりに、$\mathcal{Z}$(またはパーティションの混合)のパーティションに対応する粗いラベルを観察します。私たちのアルゴリズムの主な結果は、粗いデータが十分に有益であるときに、きめ細かいラベルから学べるどんな問題も効率的に学習できるということです。粗いラベルのみを付与したきめ細かなラベルに対して、統計的クエリ(SQ)に応答する一般的なリダクションにより、この結果を得る。要求される粗いラベルの数は、粗さによる情報歪みと \|\|\mathcal{z}\|$ の細かいラベルの数に多項式的に依存する。また、検閲された統計学における中心的な問題に焦点をあてた(無限に多くの)実価値ラベルについても検討する: ガウス平均は粗いデータから推定される。分割中の集合が凸であるときに効率的なアルゴリズムを提供し、非常に単純な非凸集合に対してもNPハードであることを示す。 For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a partition of $\mathcal{Z}$ (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels $\|\mathcal{Z}\|$. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets.	翻訳日:2023-03-27 18:58:11 公開日:2023-03-24
# 呪いを祝福に変える - 安定化モデルインバージョンによる分散データ不要バックドアの除去を可能にする Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion ( http://arxiv.org/abs/2206.07018v3 ) ライセンス: Link先を確認	Si Chen, Yi Zeng, Jiachen T.Wang, Won Park, Xun Chen, Lingjuan Lyu, Zhuoqing Mao, Ruoxi Jia	(参考訳) 機械学習モデルにおける多くのバックドア除去技術は、きれいな配布データを必要とするが、プロプライエタリなデータセットのために常に利用できるとは限らない。モデル反転技術は、しばしばプライバシーの脅威と見なされるが、現実的なトレーニングサンプルを再構築し、配布データの必要性をなくす可能性がある。バックドア除去とモデル逆転を組み合わせた以前の試みは、限られた結果をもたらした。本研究は, モデルインバージョンを有効なバックドア除去に活用する手法として, 再構成されたサンプルの特性, 知覚的類似性, バックドアトリガの潜在的な存在に関する重要な疑問に対処する。強固な防御には知覚的類似性のみに依存することは不十分であり、入力とパラメータの摂動に対するモデル予測の安定性も重要である。そこで本研究では,モデルインバージョンと安定性,視覚的品質向上のための2段階最適化フレームワークを提案する。興味深いことに、事前訓練された発電機の潜伏空間からの再構成サンプルは、バックドアモデルからの信号を利用する場合でも、バックドアフリーであることが判明した。この発見を支持する理論的分析を提供する。その結果,本手法は,同一量のクリーンサンプルを用いた性能の一致や超過を伴わずに,最先端のバックドア除去性能を実現した。 Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we introduce a novel bi-level optimization-based framework for model inversion, promoting stability and visual quality. Interestingly, we discover that reconstructed samples from a pre-trained generator's latent space are backdoor-free, even when utilizing signals from a backdoored model. We provide a theoretical analysis to support this finding. Our evaluation demonstrates that our stabilized model inversion technique achieves state-of-the-art backdoor removal performance without clean in-distribution data, matching or surpassing performance using the same amount of clean samples.	翻訳日:2023-03-27 18:51:36 公開日:2023-03-24
# 深層学習モデルの対称性とその内部表現について On the Symmetries of Deep Learning Models and their Internal Representations ( http://arxiv.org/abs/2205.14258v5 ) ライセンス: Link先を確認	Charles Godfrey, Davis Brown, Tegan Emerson, Henry Kvinge	(参考訳) 対称性は、幅広い複雑なシステムの探索における基本的な道具である。機械学習の対称性はモデルとデータの両方で研究されている。本稿では,モデルファミリーのアーキテクチャから生じる対称性と,そのファミリーの内部データ表現の対称性を結びつける。これを基本対称群の集合を計算し、モデルのインターツウィナー群(英語版)と呼ぶ。我々は、同じアーキテクチャを持つモデル間の隠れた状態間の類似性を調べる一連の実験を通して、データの内部表現に相互に結合する。我々の研究は、ネットワークの対称性が、そのネットワークのデータ表現の対称性に伝播されることを示唆し、アーキテクチャが学習と予測プロセスにどのように影響するかをよりよく理解する。最後に、ReLUネットワークでは、任意の線形結合ではなく、隠れ層における活性化に基づくモデル解釈可能性探索を集中させる一般的な手法の正当性を推測する。 Symmetry is a fundamental tool in the exploration of a broad range of complex systems. In machine learning symmetry has been explored in both models and data. In this paper we seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data. We do this by calculating a set of fundamental symmetry groups, which we call the intertwiner groups of the model. We connect intertwiner groups to a model's internal representations of data through a range of experiments that probe similarities between hidden states across models with the same architecture. Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for ReLU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers rather than arbitrary linear combinations thereof.	翻訳日:2023-03-27 18:50:53 公開日:2023-03-24
# Penguins Don't Fly: Instantiationsと例外によるジェネリックの推論 Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions ( http://arxiv.org/abs/2205.11658v3 ) ライセンス: Link先を確認	Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi	(参考訳) 属は、普遍的に真実ではない世界(例えば、鳥は飛ぶことができる)に関する一般化を表現する(例えば、新生児の鳥やペンギンは飛べない)。共通センス知識ベース(Commonsense knowledge bases)は、NLPで広く使われ、いくつかの一般的な知識を符号化するが、そのような例外を列挙することは滅多にない。我々は、言語理論から情報を得た新しい枠組みを提示する -- ジェネリックが真または偽を持っている特定の場合。我々は、約650のジェネリックに対して19kの例を生成し、我々のフレームワークが強力なGPT-3ベースラインを12.8精度で上回っていることを示す。分析では,例文生成における言語理論に基づく制御可能性の重要性,例文の源としての知識基盤の不足,自然言語推論の課題について考察した。 Generics express generalizations about the world (e.g., birds can fly) that are not universally true (e.g., newborn birds and penguins cannot fly). Commonsense knowledge bases, used extensively in NLP, encode some generic knowledge but rarely enumerate such exceptions and knowing when a generic statement holds or does not hold true is crucial for developing a comprehensive understanding of generics. We present a novel framework informed by linguistic theory to generate exemplars -- specific cases when a generic holds true or false. We generate ~19k exemplars for ~650 generics and show that our framework outperforms a strong GPT-3 baseline by 12.8 precision points. Our analysis highlights the importance of linguistic theory-based controllability for generating exemplars, the insufficiency of knowledge bases as a source of exemplars, and the challenges exemplars pose for the task of natural language inference.	翻訳日:2023-03-27 18:50:37 公開日:2023-03-24
# 量子電池の充電における触媒 Catalysis in Charging Quantum Batteries ( http://arxiv.org/abs/2205.05018v3 ) ライセンス: Link先を確認	Ricard Ravell Rodriguez, Borhan Ahmadi, Pawel Mazurek, Shabir Barzanjeh, Robert Alicki, and Pawel Horodecki	(参考訳) 本稿では,高調波発振器 (量子電池) と高調波発振器 (チャージャー) をレーザー磁場で駆動する方式を提案する。触媒系の存在下では, 充電器と量子電池の間を媒介し, エネルギー伝達限界を著しく緩和できることを示す。これらの触媒系は量子ビットまたは高調波振動子であり、量子電池へのエネルギー移動量を増加させるが、それ自体はほとんどエネルギーを蓄積しない。これにより、電池と充電器の結合強度に依存するベアセッティングにおける最適値である帯電レーザ場の周波数を最適化する必要がなくなる。 We propose a novel approach for optimization of charging of harmonic oscillators (quantum batteries) coupled to a harmonic oscillator (charger), driven by laser field. We demonstrate that energy transfer limitations can be significantly mitigated in the presence of catalyst systems, mediating between the charger and quantum batteries. We show that these catalyst systems, either qubits or harmonic oscillators, enhance the amount of energy transferred to quantum batteries, while they themselves store almost no energy. It eliminates the need for optimizing frequency of the charging laser field, whose optimal value in the bare setting depends on coupling strengths between the charger and the batteries.	翻訳日:2023-03-27 18:49:55 公開日:2023-03-24
# 道路側LiDAR物体検出のための重み付きベイズガウス混合モデル Weighted Bayesian Gaussian Mixture Model for Roadside LiDAR Object Detection ( http://arxiv.org/abs/2204.09804v4 ) ライセンス: Link先を確認	Tianya Zhang, Yi Ge, Peter J. Jin	(参考訳) 背景モデリングは、静的な背景成分を減じることで移動目標を検出するインテリジェントな監視システムに広く利用されている。多くの道端lidarオブジェクト検出手法は、多くのフレーム(ボクセル密度、近傍数、最大距離など)の記述統計に基づいて、新しいデータポイントと事前訓練された背景参照を比較して、前景ポイントをフィルタリングする。しかし、これらのソリューションは高トラフィック下では非効率であり、パラメータ値はあるシナリオから別のシナリオへの転送が困難である。初期の研究では、ビデオベースシステムに広く用いられている確率論的背景モデリング手法は、疎小で非構造化のクラウドデータのため、ロードサイドのLiDAR監視システムには適していないと考えられていた。本稿では,各LiDAR点の標高と方位値に基づいて,生のLiDARデータを構造化表現に変換する。この高次テンソル表現により、道路側LiDAR背景モデリングのための効率的な高次元多変量解析を可能にする障壁を破る。ベイズ非パラメトリック(BNP)アプローチは、強度値と3D計測を統合し、3Dと強度情報を完全に活用する。提案手法は,2つの最先端の道路背景モデル,コンピュータビジョンベンチマーク,深層学習ベースラインを比較し,交通量と難易度で評価された点,対象,経路レベルを比較した。このマルチモーダル重み付きベイズ混合モデル(GMM)は、ノイズ測定により動的バックグラウンドを処理でき、インフラベースのLiDARオブジェクト検出を大幅に強化し、スマートシティアプリケーションのための様々な3Dモデリングを作成することができる。 Background modeling is widely used for intelligent surveillance systems to detect moving targets by subtracting the static background components. Most roadside LiDAR object detection methods filter out foreground points by comparing new data points to pre-trained background references based on descriptive statistics over many frames (e.g., voxel density, number of neighbors, maximum distance). However, these solutions are inefficient under heavy traffic, and parameter values are hard to transfer from one scenario to another. In early studies, the probabilistic background modeling methods widely used for the video-based system were considered unsuitable for roadside LiDAR surveillance systems due to the sparse and unstructured point cloud data. In this paper, the raw LiDAR data were transformed into a structured representation based on the elevation and azimuth value of each LiDAR point. With this high-order tensor representation, we break the barrier to allow efficient high-dimensional multivariate analysis for roadside LiDAR background modeling. The Bayesian Nonparametric (BNP) approach integrates the intensity value and 3D measurements to exploit the measurement data using 3D and intensity info entirely. The proposed method was compared against two state-of-the-art roadside LiDAR background models, computer vision benchmark, and deep learning baselines, evaluated at point, object, and path levels under heavy traffic and challenging weather. This multimodal Weighted Bayesian Gaussian Mixture Model (GMM) can handle dynamic backgrounds with noisy measurements and substantially enhances the infrastructure-based LiDAR object detection, whereby various 3D modeling for smart city applications could be created.	翻訳日:2023-03-27 18:49:41 公開日:2023-03-24
# 連続時空間グラフ畳み込みネットワーク Continual Spatio-Temporal Graph Convolutional Networks ( http://arxiv.org/abs/2203.11009v2 ) ライセンス: Link先を確認	Lukas Hedegaard and Negar Heidari and Alexandros Iosifidis	(参考訳) スケルトンデータによるグラフに基づく推論は、人間の行動認識に有望なアプローチとして現れてきた。しかし、オンライン推論の設定に主に時間列全体を入力として利用する従来のグラフベースの手法は、かなりの計算冗長性を必要とする。本稿では,時空間グラフ畳み込みニューラルネットワークを連続推論ネットワークとして再構成することで,フレーム処理を繰り返すことなく段階的に予測を行う。提案手法を評価するため,ST-GCN,CoST-GCN,CoAGCN,CoS-TRの2つの自己保持機構を持つ導出法を連続的に生成する。提案手法は, NTU RGB+D 60, NTU RGB+D 120, Kinetics Skeleton 400 データセットを用いて, 重量移動戦略と予測加速度のアーキテクチャ修正について検討した。同様の予測精度を維持しながら、時間複雑性の最大109倍の削減、26倍のハードウェア上のアクセラレーション、オンライン推論中の最大割り当てメモリの最大52%の削減を観察する。 Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109x reduction in time complexity, on-hardware accelerations of 26x, and reductions in maximum allocated memory of 52% during online inference.	翻訳日:2023-03-27 18:49:10 公開日:2023-03-24
# Euler State Networks - 非散逸型貯留層コンピューティング Euler State Networks: Non-dissipative Reservoir Computing ( http://arxiv.org/abs/2203.09382v3 ) ライセンス: Link先を確認	Claudio Gallicchio	(参考訳) 本稿では, 常微分方程式の数値解に着想を得て, オイラー状態ネットワーク(EuSN)と呼ばれる新しい貯留層計算(RC)モデルを提案する。提案手法は, 構造によって安定かつ非散逸性を有する貯留層ダイナミクスを設計するために, 前方オイラー離散化と反対称再帰行列を用いる。我々の数学的解析は、結果のモデルが一元的有効スペクトル半径とゼロ局所リアプノフ指数に偏り、本質的に安定性の端近くで動作していることを示している。長期記憶課題実験の結果,複数の時間ステップにわたる入力情報の効果的な伝播を必要とする問題において,標準rcモデルよりも提案手法が優れていることが明らかとなった。さらに、時系列分類ベンチマークの結果から、eusnは、rcファミリーのトレーニング効率を保ちながら、トレーニング可能なリカレントニューラルネットワークの精度をマッチ(あるいは超過)することができ、計算時間の最大490倍の節約と、エネルギー消費量の約1750倍の節約が可能であることが示されている。 Inspired by the numerical solution of ordinary differential equations, in this paper we propose a novel Reservoir Computing (RC) model, called the Euler State Network (EuSN). The presented approach makes use of forward Euler discretization and antisymmetric recurrent matrices to design reservoir dynamics that are both stable and non-dissipative by construction. Our mathematical analysis shows that the resulting model is biased towards a unitary effective spectral radius and zero local Lyapunov exponents, intrinsically operating near to the edge of stability. Experiments on long-term memory tasks show the clear superiority of the proposed approach over standard RC models in problems requiring effective propagation of input information over multiple time-steps. Furthermore, results on time-series classification benchmarks indicate that EuSN is able to match (or even exceed) the accuracy of trainable Recurrent Neural Networks, while retaining the training efficiency of the RC family, resulting in up to $\approx$ 490-fold savings in computation time and $\approx$ 1750-fold savings in energy consumption.	翻訳日:2023-03-27 18:48:51 公開日:2023-03-24
# 量子及び古典的時間結晶における揺らぎの役割 The role of fluctuations in quantum and classical time crystals ( http://arxiv.org/abs/2203.05577v4 ) ライセンス: Link先を確認	Toni L. Heugel, Alexander Eichler, R. Chitra, and Oded Zilberberg	(参考訳) 離散時間結晶(dtc)は、動力学的に作用する力よりも遅い多体状態である。時代が2倍の古典的なシステムにも当てはまる。したがって、この問題は古典と量子 DTC の区別が自然に生じる。ここでは、Bose-Hubbardモデルの変種を分析し、多くの物理現象を記述し、古典的および量子的時間結晶的極限を持つ。システムの安定性におけるゆらぎの役割を考察し、量子と古典的dtcの区別を見いださない。これにより、古典的雑音を受ける2つの強結合パラメトリック共振器を用いて実験におけるゆらぎを調べることができる。 Discrete time crystals (DTCs) are a many-body state of matter whose dynamics are slower than the forces acting on it. The same is true for classical systems with period-doubling bifurcations. Hence, the question naturally arises what differentiates classical from quantum DTCs. Here, we analyze a variant of the Bose-Hubbard model, which describes a plethora of physical phenomena and has both a classical and a quantum time-crystalline limit. We study the role of fluctuations on the stability of the system and find no distinction between quantum and classical DTCs. This allows us to probe the fluctuations in an experiment using two strongly coupled parametric resonators subject to classical noise.	翻訳日:2023-03-27 18:48:32 公開日:2023-03-24
# クロスモーダル・ディエンタングルメントによるフォトリアリスティックな仮想ヒトの合成 Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement ( http://arxiv.org/abs/2209.01320v2 ) ライセンス: Link先を確認	Siddarth Ravichandran, Ond\v{r}ej Texler, Dimitar Dinev, Hyun Jae Kang	(参考訳) 過去数十年にわたって、AmazonのAlexaやAppleのSiriといったデジタルアシスタントの登場から、Metaブランドの最新のメタバース活動に至るまで、人間の生活の多くの側面が仮想ドメインで強化されてきた。これらの傾向は、人間を写実的に描写することの重要性を強調する。これは近年、いわゆるディープフェイクとトーキーヘッド生成手法の急速な成長につながっている。その印象的な結果と人気にもかかわらず、通常はテクスチャの品質、唇の同期、解像度といった定性的側面や、リアルタイムに走る能力といった実用的側面を欠いている。仮想人間のアバターを実用的なシナリオで使用できるようにするために,高性能な仮想顔合成のためのエンドツーエンドフレームワークを提案する。本稿では,ビセムを中間音声表現として利用する新たなネットワークと,大域的な頭部運動を制御するために使用される異なるモーダルのばらつきを解消する階層的画像合成手法を用いた新しいデータ拡張戦略を提案する。提案手法はリアルタイムに動作し,現在の最先端技術と比較して優れた結果が得られる。 Over the last few decades, many aspects of human life have been enhanced with virtual domains, from the advent of digital assistants such as Amazon's Alexa and Apple's Siri to the latest metaverse efforts of the rebranded Meta. These trends underscore the importance of generating photorealistic visual depictions of humans. This has led to the rapid growth of so-called deepfake and talking-head generation methods in recent years. Despite their impressive results and popularity, they usually lack certain qualitative aspects such as texture quality, lips synchronization, or resolution, and practical aspects such as the ability to run in real-time. To allow for virtual human avatars to be used in practical scenarios, we propose an end-to-end framework for synthesizing high-quality virtual human faces capable of speaking with accurate lip motion with a special emphasis on performance. We introduce a novel network utilizing visemes as an intermediate audio representation and a novel data augmentation strategy employing a hierarchical image synthesis approach that allows disentanglement of the different modalities used to control the global head motion. Our method runs in real-time, and is able to deliver superior results compared to the current state-of-the-art.	翻訳日:2023-03-27 18:42:44 公開日:2023-03-24
# 変圧器を用いた物体検出装置における多機能化に向けて Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors ( http://arxiv.org/abs/2208.11356v2 ) ライセンス: Link先を確認	Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, Shijian Lu	(参考訳) マルチスケールの機能はオブジェクト検出に非常に効果的であることが証明されているが、特に最近のTransformerベースの検出器では、大きな計算コストが伴う。本稿では,Transformerベースのオブジェクト検出器において,マルチスケール特徴の効率的な利用を可能にする汎用パラダイムとして,Iterative Multi-scale Feature Aggregation (IMFA)を提案する。中心となるアイデアは、いくつかの重要な場所からスパースなマルチスケール機能を活用し、2つの斬新なデザインで達成することだ。まず、IMFAはTransformerエンコーダ-デコーダパイプラインを再構成し、検出予測に基づいてコード化された特徴を反復的に更新する。第2に、IMFAは事前検出予測のガイダンスに基づき、わずか数箇所のキーポイント位置からの精密検出のためのスケール適応的特徴をわずかにサンプリングした。その結果、サンプルされたマルチスケール機能は少ないが、オブジェクト検出には非常に有益である。広範囲な実験により、提案されたIMFAは、わずかな計算オーバーヘッドだけで、複数のトランスフォーマーベースの物体検出器の性能を大幅に向上させることを示した。 Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.	翻訳日:2023-03-27 18:41:50 公開日:2023-03-24
# LayoutFormer++:制約シリアライゼーションとデコード空間制限による条件付きグラフレイアウト生成 LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction ( http://arxiv.org/abs/2208.08037v2 ) ライセンス: Link先を確認	Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang	(参考訳) ユーザ制約に従ってリアルなレイアウトを生成する条件付きグラフィックレイアウト生成は,まだ十分に研究されていない課題である。まず、多様なユーザー制約を柔軟かつ均一に扱う方法についての議論が限られている。第二に、レイアウトをユーザの制約に適合させるため、既存の作業は生成品質を著しく犠牲にすることが多い。本稿では,上記の問題に対処するためにlayoutformer++を提案する。まず,多様な制約を柔軟に処理するために,ユーザ制約を予め定義されたフォーマットのトークン列として表現する制約シリアライズスキームを提案する。次に,シーケンスからシーケンスへの変換として条件付きレイアウト生成を定式化し,トランスフォーマを基本アーキテクチャとするエンコーダ・デコーダフレームワークを利用する。さらに,品質を損なうことなくレイアウトをユーザの要求に合致させるため,デコード空間制限戦略を提案する。具体的には、ユーザの制約に確実に違反し、低品質なレイアウトをもたらす可能性のあるオプションを無視して、予測された分布を訓練し、制限された分布からモデルサンプルを作成する。実験によると、layoutformer++は、生成品質の向上と制約違反の低減という両面で、すべてのタスクで既存のアプローチを上回っている。 Conditional graphic layout generation, which generates realistic layouts according to user constraints, is a challenging task that has not been well-studied yet. First, there is limited discussion about how to handle diverse user constraints flexibly and uniformly. Second, to make the layouts conform to user constraints, existing work often sacrifices generation quality significantly. In this work, we propose LayoutFormer++ to tackle the above problems. First, to flexibly handle diverse constraints, we propose a constraint serialization scheme, which represents different user constraints as sequences of tokens with a predefined format. Then, we formulate conditional layout generation as a sequence-to-sequence transformation, and leverage encoder-decoder framework with Transformer as the basic architecture. Furthermore, to make the layout better meet user requirements without harming quality, we propose a decoding space restriction strategy. Specifically, we prune the predicted distribution by ignoring the options that definitely violate user constraints and likely result in low-quality layouts, and make the model samples from the restricted distribution. Experiments demonstrate that LayoutFormer++ outperforms existing approaches on all the tasks in terms of both better generation quality and less constraint violation.	翻訳日:2023-03-27 18:41:32 公開日:2023-03-24
# 言語モデルの「実」予測に対するデータ統計量の因果効果の測定 Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions ( http://arxiv.org/abs/2207.14251v2 ) ライセンス: Link先を確認	Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Sch\"utze, Yoav Goldberg	(参考訳) 大量のトレーニングデータが、最先端のNLPモデルの高性能化の大きな理由の1つである。しかし、トレーニングデータの何がモデルに特定の予測をさせるのか? 私たちは、トレーニングデータが予測にどのように影響するかを、因果フレームワークを通じて記述する言語を提供することで、この質問に答えたいと考えています。重要なのは、フレームワークが高価なモデルを再トレーニングする必要を回避し、観測データのみに基づいて因果効果を推定できることです。事前学習された言語モデル(PLM)から事実知識を抽出する問題に対処し、共起数などの単純なデータ統計に焦点をあて、これらの統計がPLMの予測に影響を及ぼすことを示す。本研究の因果関係は,NLPモデルを理解する上で,データセットの学習の重要性と因果関係の利点を示すものである。 Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. But what exactly in the training data causes a model to make a certain prediction? We seek to answer this question by providing a language for describing how training data influences predictions, through a causal framework. Importantly, our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone. Addressing the problem of extracting factual knowledge from pretrained language models (PLMs), we focus on simple data statistics such as co-occurrence counts and show that these statistics do influence the predictions of PLMs, suggesting that such models rely on shallow heuristics. Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.	翻訳日:2023-03-27 18:41:13 公開日:2023-03-24
# omni3d:野生の3dオブジェクト検出のための大規模ベンチマークとモデル Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild ( http://arxiv.org/abs/2207.10660v2 ) ライセンス: Link先を確認	Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari	(参考訳) 単一の画像から3Dのシーンやオブジェクトを認識することは、ロボット工学やAR/VRにおけるコンピュータビジョンの長年の目標である。 2D認識では、大規模なデータセットとスケーラブルなソリューションが前例のない進歩をもたらした。 3Dでは、既存のベンチマークは小さく、オブジェクトのカテゴリや特定のドメイン(例えば都市運転シーン)に特化している。 2次元認識の成功に動機づけられて,omni3dと呼ばれる大規模ベンチマークを導入することで,3次元物体検出の課題を再検討した。 Omni3Dは既存のデータセットを再利用し、300万以上のインスタンスと98のカテゴリで注釈付けされた234Kイメージを生成する。このようなスケールでの3D検出は、カメラの内在性の変化とシーンやオブジェクトの多様さにより困難である。本稿では,カメラとシーンタイプを統一したアプローチで一般化するCube R-CNNというモデルを提案する。 cube r-cnnは、より大きなomni3dと既存のベンチマークで以前よりも優れています。最後に、Omni3Dは3Dオブジェクト認識のための強力なデータセットであり、シングルデータセットのパフォーマンスを改善し、事前学習によって新しい小さなデータセットでの学習を加速できることを示す。 Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the success of 2D recognition, we revisit the task of 3D object detection by introducing a large benchmark, called Omni3D. Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 98 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types. We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D object recognition and show that it improves single-dataset performance and can accelerate learning on new smaller datasets via pre-training.	翻訳日:2023-03-27 18:40:58 公開日:2023-03-24
# 量子乱流理論に向けて:渦ループの相互作用を伴う単純なモデル Towards quantum turbulence theory: A simple model with interaction of the vortex loops ( http://arxiv.org/abs/2207.05414v2 ) ライセンス: Link先を確認	Sergei V. Talalov	(参考訳) 本稿では内部構造を持つ量子化された薄い渦輪について検討する。この力学系の量子化スキームは、著者が以前に提案したアプローチに基づいている。エネルギースペクトルと循環スペクトルの両方が計算される。例として、許容循環値の集合がフラクタル構造を持つことを示す。提案されたモデルにより、孤立渦環と相互作用を持つ渦環の系を記述することができる。さらに、量子乱流理論への応用についても論じる。乱流の分配関数の一般表現を提案する。 This paper investigates quantized thin vortex rings with an internal structure. The quantization scheme of this dynamical system is based on an earlier the approach proposed by the author. Both energy spectrum and circulation spectrum are calculated. Examples show that the set of permissible circulation values has a fractal structure. The suggested model allows us to describe the system of isolated vortex rings as well as the vortex rings with interaction. Furthermore, the application to the quantum turbulence theory is discussed. The general expression for the partition function of a turbulent flow is suggested.	翻訳日:2023-03-27 18:40:39 公開日:2023-03-24
# dualafford:デュアルグリッパーオブジェクト操作のための協調視覚支援学習 DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation ( http://arxiv.org/abs/2207.01971v5 ) ライセンス: Link先を確認	Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, Hao Dong	(参考訳) 未来のホームアシストロボットにとって、日々の環境において多様な3Dオブジェクトを理解し、操作することが不可欠である。様々な3D形状で多様な操作タスクを実行できるスケーラブルなシステムの構築に向けて、最近の研究は、入力された3D幾何学上のすべての点を下流のタスク(例えば、プッシュまたはピックアップ)を達成するアクションの可能性でラベル付けする、視覚的な動作可能な可測性を学ぶ有望な結果を提唱し、実証してきた。しかし、これらの研究はシングルグリッパー操作しか研究しなかったが、現実のタスクの多くは協調的に達成するために両手を必要とする。本研究では,デュアルグリッパー操作タスクの協調的余裕を学ぶための新しい学習フレームワークであるdualaffordを提案する。この手法の中核となる設計は、2つのグリップの二次問題を2つの非絡み合った相互接続サブタスクに還元し、効率的な学習を行うことである。大規模なPartNet-MobilityデータセットとShapeNetデータセットを使用して、デュアルグリッパー操作のための4つのベンチマークタスクを設定した。実験により,提案手法の有効性と優越性が3つのベースラインで証明された。 It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.	翻訳日:2023-03-27 18:40:34 公開日:2023-03-24
# EventNeRF: 単一カラーイベントカメラからのニューラル放射場 EventNeRF: Neural Radiance Fields from a Single Colour Event Camera ( http://arxiv.org/abs/2206.11896v3 ) ライセンス: Link先を確認	Viktor Rudnev and Mohamed Elgharib and Christian Theobalt and Vladislav Golyanik	(参考訳) 非同期動作のイベントカメラは、高ダイナミックレンジ、ムラ、低レイテンシ、低データ帯域幅のために多くのアプリケーションを見つける。この分野はここ数年で著しく進歩し、既存のイベントベースの3d再構築アプローチは、シーンの疎点雲を回復した。しかし、このような空間性は、特にコンピュータビジョンやグラフィックスにおいて、これまで十分に対処されていない多くのケースにおいて制限要因である。そこで本研究では,単色イベントストリームのみを入力として3次元一貫性,密度,フォトリアリスティックな新規ビュー合成を提案する。その中核は、カラーイベントチャンネルのオリジナルの解像度を維持しながら、イベントから完全に自己教師された方法で訓練された神経放射場である。次に、レイサンプリング戦略をイベントに合わせて調整し、データ効率の良いトレーニングを可能にします。実験では,RGB空間において前例のない品質で結果が得られた。提案手法は,いくつかの難易度の高い合成シーンと実シーンで定性的かつ数値的に評価し,既存の手法よりもはるかに高密度で視覚的に魅力的であることを示す。また, 高速かつ低照度条件下での挑戦シナリオにおいて, 強靭性を示す。新たに記録されたデータセットとソースコードを公開し、研究フィールドを促進する。 Asynchronously operating event cameras find many applications due to their high dynamic range, vanishingly low motion blur, low latency and low data bandwidth. The field saw remarkable progress during the last few years, and existing event-based 3D reconstruction approaches recover sparse point clouds of the scene. However, such sparsity is a limiting factor in many cases, especially in computer vision and graphics, that has not been addressed satisfactorily so far. Accordingly, this paper proposes the first approach for 3D-consistent, dense and photorealistic novel view synthesis using just a single colour event stream as input. At its core is a neural radiance field trained entirely in a self-supervised manner from events while preserving the original resolution of the colour event channels. Next, our ray sampling strategy is tailored to events and allows for data-efficient training. At test, our method produces results in the RGB space at unprecedented quality. We evaluate our method qualitatively and numerically on several challenging synthetic and real scenes and show that it produces significantly denser and more visually appealing renderings than the existing methods. We also demonstrate robustness in challenging scenarios with fast motion and under low lighting conditions. We release the newly recorded dataset and our source code to facilitate the research field, see https://4dqv.mpi-inf.mpg.de/EventNeRF.	翻訳日:2023-03-27 18:40:11 公開日:2023-03-24
# 自動テスト生成への機械学習の統合:システムマッピングによる研究 The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study ( http://arxiv.org/abs/2206.10210v4 ) ライセンス: Link先を確認	Afonso Fontes and Gregory Gay	(参考訳) コンテキスト: 機械学習(ML)は効果的な自動テスト生成を可能にする。目的:我々は、新しい研究、テストプラクティス、研究者の目標、適用されたML技術、評価、課題を特徴づけます。方法: 102の出版物のサンプルに対して体系的なマッピングを行う。結果:MLはシステム,GUI,ユニット,パフォーマンス,組合せテストの入力を生成したり,既存の生成メソッドのパフォーマンスを向上する。 MLはまた、テストの検証、プロパティベース、期待される出力オラクルを生成するためにも使用される。監視された学習(ニューラルネットワークと強化学習をベースとすることが多い)は一般的であり、一部の出版物では教師なしあるいは半教師なしの学習も採用されている。 (Semi-/Un-) 従来のテストメトリクスとML関連のメトリクス(例えば精度)の両方を用いて改善されたアプローチを評価する一方、強化学習は報酬関数に関連するテストメトリクスを用いてしばしば評価される。結論: Work-to-dateは素晴らしい将来性を示していますが、トレーニングデータ、リトレーニング、スケーラビリティ、評価の複雑さ、採用するMLアルゴリズム、ベンチマーク、複製性に関するオープンな課題があります。私たちの発見は、この分野の研究者にとってロードマップとインスピレーションとなり得る。 Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.	翻訳日:2023-03-27 18:39:48 公開日:2023-03-24
# 変分オートエンコーダと1級支持ベクトルマシンによる構造損傷の半教師あり検出 Semi-supervised detection of structural damage using Variational Autoencoder and a One-Class Support Vector Machine ( http://arxiv.org/abs/2210.05674v3 ) ライセンス: Link先を確認	Andrea Pollastro, Giusiana Testa, Antonio Bilotta, Roberto Prevete	(参考訳) 近年,構造的ヘルスモニタリング(shm)システムにおいて,ニューラルネットワーク(anns)が導入されている。データ駆動アプローチによる半教師付き手法では、損傷のない構造条件から取得したデータに基づいてANNがトレーニングし、構造的損傷を検出する。標準的なアプローチでは、トレーニング段階の後、決定ルールを手動で定義し、異常なデータを検出する。しかし、このプロセスは、ハイパーパラメータ最適化技術を用いて性能を最大化する機械学習手法を用いて自動で行うことができる。本稿では,構造異常を検出するためのデータ駆動アプローチによる半教師付き手法を提案する。方法論は以下の通りである。 (i)無傷データ分布を近似する変分オートエンコーダ(vae)と (ii)vae信号再構成から抽出した損傷に敏感な特徴を用いて異なる健康状態を判別する一級支援ベクターマシン(oc-svm)。 IASC-ASCE 構造健康モニタリングタスクグループによって9つの損傷シナリオで試験されたスケール鋼構造物に適用した。 In recent years, Artificial Neural Networks (ANNs) have been introduced in Structural Health Monitoring (SHM) systems. A semi-supervised method with a data-driven approach allows the ANN training on data acquired from an undamaged structural condition to detect structural damages. In standard approaches, after the training stage, a decision rule is manually defined to detect anomalous data. However, this process could be made automatic using machine learning methods, whom performances are maximised using hyperparameter optimization techniques. The paper proposes a semi-supervised method with a data-driven approach to detect structural anomalies. The methodology consists of: (i) a Variational Autoencoder (VAE) to approximate undamaged data distribution and (ii) a One-Class Support Vector Machine (OC-SVM) to discriminate different health conditions using damage sensitive features extracted from VAE's signal reconstruction. The method is applied to a scale steel structure that was tested in nine damage's scenarios by IASC-ASCE Structural Health Monitoring Task Group.	翻訳日:2023-03-27 18:32:44 公開日:2023-03-24
# 量子6-および19-頂点モデルからのFredkinとMotzkinの結合鎖 Coupled Fredkin and Motzkin chains from quantum six- and nineteen-vertex models ( http://arxiv.org/abs/2210.03038v3 ) ライセンス: Link先を確認	Zhao Zhang, Israel Klich	(参考訳) 我々は、フレドキンとモツキンのスピン鎖の領域法則違反モデルを2次元に一般化し、相関相互作用を持つ量子6頂点モデルと19頂点モデルを構築する。ハミルトニアンはフラストレーションが無く、そのプロジェクタは非負な高さ構成の部分空間内でエルゴード力学を生成する。基底状態は、バルクの非負の高さと境界のゼロの高さを持つ古典的な二色頂点配置の体積および色重み付き重ね合わせである。サブシステム間の絡み合いエントロピーは、$q$-deformationパラメータがチューニングされるにつれて位相遷移を持ち、自由度に作用する外部フィールドの存在下ではロバストであることが示されている。基底状態は、線形系サイズ $l$ の関数 $l\log l$ として絡み合いエントロピーがスケールする臨界点を持つ領域および体積則エンタングルメント位相の間の量子位相遷移を受ける。 L^2$ と $L^2$ の間の中間電力法則スケーリングは、熱力学限界の異なる速度で 1 に近づく不均一な変形パラメータによって達成できる。 q>1$相の場合、スペクトルギャップ上の上限を$q^{-L^3/8}$に設定する変動波動関数を構築する。 We generalize the area-law violating models of Fredkin and Motzkin spin chains into two dimensions by building quantum six- and nineteen-vertex models with correlated interactions. The Hamiltonian is frustration free, and its projectors generate ergodic dynamics within the subspace of height configuration that are non negative. The ground state is a volume- and color-weighted superposition of classical bi-color vertex configurations with non-negative heights in the bulk and zero height on the boundary. The entanglement entropy between subsystems has a phase transition as the $q$-deformation parameter is tuned, which is shown to be robust in the presence of an external field acting on the color degree of freedom. The ground state undergoes a quantum phase transition between area- and volume-law entanglement phases with a critical point where entanglement entropy scales as a function $L\log L$ of the linear system size $L$. Intermediate power law scalings between $L\log L$ and $L^2$ can be achieved with an inhomogeneous deformation parameter that approaches 1 at different rates in the thermodynamic limit. For the $q>1$ phase, we construct a variational wave function that establishes an upper bound on the spectral gap that scales as $q^{-L^3/8}$.	翻訳日:2023-03-27 18:32:28 公開日:2023-03-24
# Inverse Unscented Kalman Filter による対逆学習 Counter-Adversarial Learning with Inverse Unscented Kalman Filter ( http://arxiv.org/abs/2210.00359v2 ) ライセンス: Link先を確認	Himali Singh, Kumar Vijay Mishra and Arpan Chattopadhyay	(参考訳) 対戦相手システムでは、知的敵エージェントの戦略を推測するために、防御エージェントは、相手が後者について集めた情報を認知的に知覚する必要がある。この問題の先行研究は、線形ガウス状態空間モデルを採用し、逆確率フィルタを設計することで、この逆認知問題を解決する。しかし、実際には対向系は一般的に高度に非線形である。本稿では,逆認識を非線形ガウス状態空間モデルとして定式化することにより,このシナリオに対処する。敵のディフェンダーの推定値を推定するために、逆ukf(iukf)システムを提案し、開発する。次に、平均二乗有界性感覚におけるIUKFの確率安定性に関する理論的保証を導出する。複数の実用的応用に対する数値実験により、iukfの推定誤差が収束し、再帰的 cram\'{e}r-rao 下界に密接に従うことが示されている。 In counter-adversarial systems, to infer the strategy of an intelligent adversarial agent, the defender agent needs to cognitively sense the information that the adversary has gathered about the latter. Prior works on the problem employ linear Gaussian state-space models and solve this inverse cognition problem by designing inverse stochastic filters. However, in practice, counter-adversarial systems are generally highly nonlinear. In this paper, we address this scenario by formulating inverse cognition as a nonlinear Gaussian state-space model, wherein the adversary employs an unscented Kalman filter (UKF) to estimate the defender's state with reduced linearization errors. To estimate the adversary's estimate of the defender, we propose and develop an inverse UKF (IUKF) system. We then derive theoretical guarantees for the stochastic stability of IUKF in the mean-squared boundedness sense. Numerical experiments for multiple practical applications show that the estimation error of IUKF converges and closely follows the recursive Cram\'{e}r-Rao lower bound.	翻訳日:2023-03-27 18:31:33 公開日:2023-03-24
# 断熱量子回路におけるノイズ・ディテールトレードオフの探索 Navigating the noise-depth tradeoff in adiabatic quantum circuits ( http://arxiv.org/abs/2209.11245v2 ) ライセンス: Link先を確認	Daniel Azses, Maxime Dupont, Bram Evert, Matthew J. Reagor, Emanuele G. Dalla Torre	(参考訳) 断熱量子アルゴリズムは、所望の解に自明な状態をゆっくりと発展させることで計算問題を解決する。理想的な量子コンピュータでは、解の質は回路深さの増加とともに単調に向上する。対照的に、現在のノイズの多いコンピュータの深さの増加はより多くのノイズをもたらし、最終的には計算上の優位性を損なう。最善のソリューションを提供する最適な回路深度は何か? ここでは、1次元量子イジングモデルの常磁性と強磁性の基底状態の間を補間する断熱回路を調査してこの問題に対処する。我々は、回路深さ$N$と雑音強度$\sigma$の関数として、欠陥密度$d$によって最終的な出力の品質を特徴づける。 d$ は単純形式 $d_\mathrm{ideal}+d_\mathrm{noise}$ でよく記述されており、理想的な場合 $d_\mathrm{ideal}\sim N^{-1/2}$ は Kibble-Zurek 機構によって制御され、ノイズコントリビューションは $d_\mathrm{noise}\sim N\sigma^2$ となる。欠陥の数を最小化する最適なステップ数は$\sim\sigma^{-4/3}$となる。このアルゴリズムを雑音超伝導量子プロセッサに実装し,回路の深さに対する欠陥密度の依存性が予測される非単調な挙動に従い,ノイズシミュレーションとよく一致することを示す。我々の研究により、量子デバイスを効率的にベンチマークし、その効果的なノイズ強度を抽出できる。 Adiabatic quantum algorithms solve computational problems by slowly evolving a trivial state to the desired solution. On an ideal quantum computer, the solution quality improves monotonically with increasing circuit depth. By contrast, increasing the depth in current noisy computers introduces more noise and eventually deteriorates any computational advantage. What is the optimal circuit depth that provides the best solution? Here, we address this question by investigating an adiabatic circuit that interpolates between the paramagnetic and ferromagnetic ground states of the one-dimensional quantum Ising model. We characterize the quality of the final output by the density of defects $d$, as a function of the circuit depth $N$ and noise strength $\sigma$. We find that $d$ is well-described by the simple form $d_\mathrm{ideal}+d_\mathrm{noise}$, where the ideal case $d_\mathrm{ideal}\sim N^{-1/2}$ is controlled by the Kibble-Zurek mechanism, and the noise contribution scales as $d_\mathrm{noise}\sim N\sigma^2$. It follows that the optimal number of steps minimizing the number of defects goes as $\sim\sigma^{-4/3}$. We implement this algorithm on a noisy superconducting quantum processor and find that the dependence of the density of defects on the circuit depth follows the predicted non-monotonous behavior and agrees well with noisy simulations. Our work allows one to efficiently benchmark quantum devices and extract their effective noise strength $\sigma$.	翻訳日:2023-03-27 18:30:56 公開日:2023-03-24
# トラクタブル確率モデルの連続混合 Continuous Mixtures of Tractable Probabilistic Models ( http://arxiv.org/abs/2209.10584v3 ) ライセンス: Link先を確認	Alvaro H.C. Correia, Gennaro Gala, Erik Quaeghebeur, Cassio de Campos, Robert Peharz	(参考訳) 変分オートエンコーダのような連続的潜在空間に基づく確率的モデルは、コンポーネントが潜在コードに依存する可算混合モデルとして理解することができる。これらは生成的および確率的モデリングの表現的ツールであることが証明されているが、確率分布を表す辺数や条件を計算できる確率的推論とは相反している。一方、確率的回路(pcs)のような扱いやすい確率的モデルは、階層的離散混合モデルとして理解することができ、正確な推論を効率的に行うことができるが、連続的潜在空間モデルと比較してサブパー性能を示すことが多い。本稿では,少ない潜在次元のトラクタブルモデルの連続混合というハイブリッドアプローチについて検討する。これらのモデルは解析的に難解であるが、有限の積分点集合に基づく数値積分スキームによく対応できる。十分な数の統合ポイントがあれば、近似は正確にデファクトになる。さらに、有限の積分点に対して、積分法は、連続混合を標準PCに効率的にコンパイルする。実験では、PCが多くの標準密度推定ベンチマーク上で、トラクタブルモデルのための新しい技術の状態を設定できるので、この単純なスキームは極めて効果的であることを示す。 Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.	翻訳日:2023-03-27 18:30:25 公開日:2023-03-24
# スマートCCTVカメラとセマンティックセグメンテーションを用いたCNNによる知的街路管理 CNN based Intelligent Streetlight Management Using Smart CCTV Camera and Semantic Segmentation ( http://arxiv.org/abs/2209.08633v2 ) ライセンス: Link先を確認	Md Sakib Ullah Sourav, Huidong Wang, Mohammad Raziuddin Chowdhury, Rejwan Bin Sulaiman	(参考訳) 最も無視されたエネルギー損失の源の1つは街灯であり、不要な地域ではあまりにも多くの光を発生させる。エネルギー廃棄物は経済と環境に大きな影響を及ぼす。また、通常の手動運転のため、昼は街灯が点灯し、夜はOFが点灯することがしばしば見られるが、これは21世紀になっても残念である。これらの問題は解決するために自動街灯制御を必要とする。本研究の目的は,コンピュータビジョン技術を利用したスマートトランスポート監視システムと閉回路テレビ(CCTV)カメラを組み合わせることで,歩行者や車両の存在を検知し,CCTVビデオストリーミングからのセマンティックイメージセグメンテーションを用いて,街灯を不要にすることで,発光ダイオード(LED)の街灯が適切な明るさで自動的に照らされるようにすることにある。その結果、昼と夜を区別し、街灯のオン/オフを自動化して省エネを図ることが可能となった。前述のアプローチによると、位置情報センサーデータは、よりインフォームドな街灯管理の決定に利用することができる。タスクを完了させるために、ResNet-34をバックボーンとしてU-netモデルをトレーニングすることを検討する。モデルの有効性は評価行列の使用によって保証される。提案された概念は、従来の代替案よりも単純で、経済的、エネルギー効率、長期的、弾力性が高い。 One of the most neglected sources of energy loss is streetlights which generate too much light in areas where it is not required. Energy waste has enormous economic and environmental effects. In addition, due to the conventional manual nature of the operation, streetlights are frequently seen being turned ON during the day and OFF in the evening, which is regrettable even in the twenty-first century. These issues require automated streetlight control in order to be resolved. This study aims to develop a novel streetlight controlling method by combining a smart transport monitoring system powered by computer vision technology with a closed circuit television (CCTV) camera that allows the light-emitting diode (LED) streetlight to automatically light up with the appropriate brightness by detecting the presence of pedestrians or vehicles and dimming the streetlight in their absence using semantic image segmentation from the CCTV video streaming. Consequently, our model distinguishes daylight and nighttime, which made it feasible to automate the process of turning the streetlight 'ON' and 'OFF' to save energy consumption costs. According to the aforementioned approach, geolocation sensor data could be utilized to make more informed streetlight management decisions. To complete the tasks, we consider training the U-net model with ResNet-34 as its backbone. The validity of the models is guaranteed with the use of assessment matrices. The suggested concept is straightforward, economical, energy-efficient, long-lasting, and more resilient than conventional alternatives.	翻訳日:2023-03-27 18:30:07 公開日:2023-03-24
# OCTET: オブジェクト指向の対実的説明 OCTET: Object-aware Counterfactual Explanations ( http://arxiv.org/abs/2211.12380v2 ) ライセンス: Link先を確認	Mehdi Zemni, Micka\"el Chen, \'Eloi Zablocki, H\'edi Ben-Younes, Patrick P\'erez, Matthieu Cord	(参考訳) 近年、ディープビジョンモデルは、例えば自律運転のような安全クリティカルなアプリケーションに広くデプロイされ、そのようなモデルの説明可能性への懸念が高まっている。説明方法のうち、反事実説明は、説明すべきモデルの出力を変更する入力画像の最小かつ解釈可能な変更を見つけることを目的としている。このような説明は、エンドユーザーがモデルの決定に影響を及ぼす主要な要因を指し示している。しかし、従来の手法では、例えば都市シーンのような、多くのオブジェクトで訓練された画像上の決定モデルを説明するのに苦労していた。本稿では,反事実的説明生成のためのオブジェクト指向フレームワークを用いてこの問題に取り組むことを提案する。近年のジェネレーティブ・モデリングに触発された本手法では,オブジェクトレベルの操作を容易にするために,クエリ画像を潜在空間に符号化する。これにより、エンドユーザーに対して、探索方向(例えば、オブジェクトの空間的変位、スタイル変更など)が、デファクトジェネレーション中に探索される制御を提供する。運転シーンの非現実的説明ベンチマークに関する一連の実験を行い,提案手法が,セマンティクスのセグメンテーションモデルなど,分類以外にも適用可能であることを示す。分析を完了させるために,意思決定モデル理解における反事実的説明の有用性を計測するユーザスタディを設計・実施する。コードはhttps://github.com/valeoai/OCTET.comで入手できる。 Nowadays, deep vision models are being widely deployed in safety-critical applications, e.g., autonomous driving, and explainability of such models is becoming a pressing concern. Among explanation methods, counterfactual explanations aim to find minimal and interpretable changes to the input image that would also change the output of the model to be explained. Such explanations point end-users at the main factors that impact the decision of the model. However, previous methods struggle to explain decision models trained on images with many objects, e.g., urban scenes, which are more difficult to work with but also arguably more critical to explain. In this work, we propose to tackle this issue with an object-centric framework for counterfactual explanation generation. Our method, inspired by recent generative modeling works, encodes the query image into a latent space that is structured in a way to ease object-level manipulations. Doing so, it provides the end-user with control over which search directions (e.g., spatial displacement of objects, style modification, etc.) are to be explored during the counterfactual generation. We conduct a set of experiments on counterfactual explanation benchmarks for driving scenes, and we show that our method can be adapted beyond classification, e.g., to explain semantic segmentation models. To complete our analysis, we design and run a user study that measures the usefulness of counterfactual explanations in understanding a decision model. Code is available at https://github.com/valeoai/OCTET.	翻訳日:2023-03-27 18:24:25 公開日:2023-03-24
# SPARF:スパースと雑音場からの神経放射場 SPARF: Neural Radiance Fields from Sparse and Noisy Poses ( http://arxiv.org/abs/2211.11738v2 ) ライセンス: Link先を確認	Prune Truong and Marie-Julie Rakotosaona and Fabian Manhardt and Federico Tombari	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は近年,フォトリアリスティック・ノベルビューを合成するための強力な表現として登場した。印象的なパフォーマンスを示す一方で、高い精度のカメラポーズを備えた高密度のインプットビューの可用性に依存しているため、実際のシナリオでの応用は制限される。本研究ではSPARF(Sparse Pose Adjusting Radiance Field)を導入し,ノイズの多いカメラポーズを付加した広帯域入力画像(以下3以下)の新規ビュー合成の課題に対処する。本手法では,多視点幾何制約を生かしてnerfを学習し,カメラポーズを洗練する。入力ビュー間で抽出された画素マッチングを頼りにすることで、多視点対応の目的は最適化シーンを強制し、カメラのポーズをグローバルかつ幾何学的に正確な解に収束させる。私たちの奥行きの一貫性の喪失は、再構築されたシーンをあらゆる視点から一貫することをさらに促します。われわれのアプローチは、複数の挑戦的なデータセットに基づいてスパースビュー体制における新しい技術状況を設定する。 Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.	翻訳日:2023-03-27 18:24:00 公開日:2023-03-24
# 視覚言語モデルのチューニングのためのタスク残差 Task Residual for Tuning Vision-Language Models ( http://arxiv.org/abs/2211.10277v2 ) ライセンス: Link先を確認	Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, Xinchao Wang	(参考訳) 数十億レベルのデータに事前訓練された大規模視覚言語モデル(VLM)は、一般的な視覚表現と広い視覚概念を学んだ。原則として、VLMの知識構造は、限られたデータで下流タスクに転送される際に適切に継承されるべきである。しかしながら、VLMの既存の効率的な転写学習(ETL)アプローチは、損傷するか、事前知識に過度に偏っている。例えば、即時チューニング(PT)は、事前訓練されたテキストベースの分類器を捨て、新しいものを構築する。そこで本研究では,テキストベース分類器上で直接動作し,事前学習したモデルの事前知識と目標タスクに関する新たな知識を明示的に分離するタスク残差調整(TaskRes)という,VLMの効率的なチューニング手法を提案する。具体的には、TaskResは、元の分類器の重みをVLMから凍結させ、初期独立パラメータのセットを元のパラメータの残余としてチューニングすることで、目標タスクの新しい分類器を取得し、信頼性の高い事前知識保存と柔軟なタスク固有の知識探索を可能にする。提案するtaskresは単純かつ効果的であり、実装に最小限の労力を要しながら、11のベンチマークデータセットで以前のetlメソッド(例えばptとat)を著しく上回っている。私たちのコードはhttps://github.com/geekyutao/taskresで利用可能です。 Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned general visual representations and broad visual concepts. In principle, the well-learned knowledge structure of the VLMs should be inherited appropriately when being transferred to downstream tasks with limited data. However, most existing efficient transfer learning (ETL) approaches for VLMs either damage or are excessively biased towards the prior knowledge, e.g., prompt tuning (PT) discards the pre-trained text-based classifier and builds a new one while adapter-style tuning (AT) fully relies on the pre-trained features. To address this, we propose a new efficient tuning approach for VLMs named Task Residual Tuning (TaskRes), which performs directly on the text-based classifier and explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. Specifically, TaskRes keeps the original classifier weights from the VLMs frozen and obtains a new classifier for the target task by tuning a set of prior-independent parameters as a residual to the original one, which enables reliable prior knowledge preservation and flexible task-specific knowledge exploration. The proposed TaskRes is simple yet effective, which significantly outperforms previous ETL methods (e.g., PT and AT) on 11 benchmark datasets while requiring minimal effort for the implementation. Our code is available at https://github.com/geekyutao/TaskRes.	翻訳日:2023-03-27 18:23:40 公開日:2023-03-24
# ビデオインスタンスセグメンテーションのための一般化フレームワーク A Generalized Framework for Video Instance Segmentation ( http://arxiv.org/abs/2211.08834v2 ) ライセンス: Link先を確認	Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim	(参考訳) 近年,ビデオインスタンスセグメンテーション(VIS)コミュニティにおいて,複雑なシーケンスと隠蔽シーケンスによる長いビデオの処理が新たな課題として浮上している。しかし、既存の手法はこの課題に対処するのに限界がある。現在のアプローチの最大のボトルネックは、トレーニングと推論の相違にある、と私たちは主張する。このギャップを効果的に埋めるために、複雑なアーキテクチャを設計したり、余分な後処理を必要とせずに、挑戦的なベンチマークで最先端のパフォーマンスを実現する、VISの汎用フレームワーク、すなわちGenVISを提案する。 GenVISの重要なコントリビューションは、新しいターゲットラベル割り当てによるシーケンシャルラーニングのためのクエリベースのトレーニングパイプラインを含む、学習戦略である。さらに,従来の状態から情報を効果的に取得するメモリを導入する。異なるフレームやクリップ間の関係を構築することに焦点を当てた新しい視点のおかげで、GenVISはオンラインと半オンラインの両方で柔軟に実行できる。提案手法は,YouTube-VIS 2019/2021/2022とOccluded VIS (OVIS) で最先端の結果を得られる。特に、ロングVISベンチマーク(OVIS)の最先端性能を大きく上回り、ResNet-50のバックボーンで5.6 APを改善した。コードはhttps://github.com/miranheo/GenVIS.comで入手できる。 The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.	翻訳日:2023-03-27 18:23:15 公開日:2023-03-24
# FAPM: リアルタイム産業異常検出のための高速適応パッチメモリ FAPM: Fast Adaptive Patch Memory for Real-time Industrial Anomaly Detection ( http://arxiv.org/abs/2211.07381v2 ) ライセンス: Link先を確認	Donghyeong Kim, Chaewon Park, Suhwan Cho and Sangyoun Lee	(参考訳) 特徴埋め込みに基づく手法は, 対象画像の特徴と正常画像とを比較することで, 産業異常の検出において, 例外的な性能を示した。しかし,いくつかの手法は実世界のアプリケーションにとって重要なリアルタイム推論の速度要件を満たしていない。そこで本研究では,リアルタイム産業的異常検出のための高速適応パッチメモリ(fast adaptive patch memory, fapm)という新しい手法を提案する。 FAPMはパッチワイドとレイヤワイドのメモリバンクを使用して,イメージの埋め込み機能をそれぞれパッチレベルとレイヤレベルに格納する。また,より高速かつ正確な検出のためのパッチアダプティブコアセットサンプリングを提案する。 FAPMは、他の最先端手法と比較して精度と速度の両方で良好に機能する Feature embedding-based methods have shown exceptional performance in detecting industrial anomalies by comparing features of target images with normal images. However, some methods do not meet the speed requirements of real-time inference, which is crucial for real-world applications. To address this issue, we propose a new method called Fast Adaptive Patch Memory (FAPM) for real-time industrial anomaly detection. FAPM utilizes patch-wise and layer-wise memory banks that store the embedding features of images at the patch and layer level, respectively, which eliminates unnecessary repetitive computations. We also propose patch-wise adaptive coreset sampling for faster and more accurate detection. FAPM performs well in both accuracy and speed compared to other state-of-the-art methods	翻訳日:2023-03-27 18:22:53 公開日:2023-03-24
# particlenerf: online neural radiance fieldsのための粒子ベースのエンコーディング ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields ( http://arxiv.org/abs/2211.04041v4 ) ライセンス: Link先を確認	Jad Abou-Chakra, Feras Dayoub, Niko S\"underhauf	(参考訳) 動的シーンに対する既存のNeural Radiance Fields(NeRF)は、視覚的忠実度を重視したオフライン手法であるが、本稿は、リアルタイム適応性を優先するオンラインユースケースに対処する。我々は200ミリ秒毎に最新の表現をオンラインで学習することで、シーン形状の変化に動的に適応する新しいアプローチであるParticleNeRFを提案する。 ParticleNeRFは、新しい粒子ベースのパラメトリック符号化を用いてこれを実現する。我々は,空間内の粒子に特徴を結合し,光計測による再構成損失を粒子の位置勾配にバックプロパゲートし,速度ベクトルとして解釈する。衝突に対処するための軽量な物理システムによって守られ、地形の変化とともに自由に動きます。本研究では, 翻訳, 回転, 調音, 変形可能な物体を含む様々な動的シーンでParticleNeRFを実演する。 ParticleNeRFは初めてのオンライン動的NeRFであり、ブルートフォースオンラインInstantNGPや他のオンライン制約のある動的シーンに対するベースラインアプローチよりも優れた視覚的忠実度で高速な適応性を実現する。私たちのシステムのビデオは、プロジェクトのWebサイトhttps://sites.google.com/view/ Particlenerf.comで見ることができる。 While existing Neural Radiance Fields (NeRFs) for dynamic scenes are offline methods with an emphasis on visual fidelity, our paper addresses the online use case that prioritises real-time adaptability. We present ParticleNeRF, a new approach that dynamically adapts to changes in the scene geometry by learning an up-to-date representation online, every 200ms. ParticleNeRF achieves this using a novel particle-based parametric encoding. We couple features to particles in space and backpropagate the photometric reconstruction loss into the particles' position gradients, which are then interpreted as velocity vectors. Governed by a lightweight physics system to handle collisions, this lets the features move freely with the changing scene geometry. We demonstrate ParticleNeRF on various dynamic scenes containing translating, rotating, articulated, and deformable objects. ParticleNeRF is the first online dynamic NeRF and achieves fast adaptability with better visual fidelity than brute-force online InstantNGP and other baseline approaches on dynamic scenes with online constraints. Videos of our system can be found at our project website https://sites.google.com/view/particlenerf.	翻訳日:2023-03-27 18:22:42 公開日:2023-03-24
# 言葉なし学習によるオープン語彙テキスト・トゥ・モーション生成 Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training ( http://arxiv.org/abs/2210.15929v3 ) ライセンス: Link先を確認	Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen	(参考訳) テキストから動きへの生成は、入力テキストと同じ意味で動きを合成することを目的とした、新しくて困難な問題である。しかしながら、多種多様なラベル付きトレーニングデータがないため、ほとんどのアプローチは特定のタイプのテキストアノテーションに制限するか、効率と安定性の犠牲で推論中のテキストに対応するためにオンライン最適化を必要とする。本稿では,ゼロショット学習方式でオフラインのオープン語彙テキスト・トゥ・モーション生成を検証し,ペアトレーニングデータや,見当たらないテキストに適応するための追加のオンライン最適化を必要としない。 NLPの即時学習にインスパイアされ、マスクされた動きから全動作を再構築する動き生成装置を事前訓練する。推論中,動作生成装置を変更する代わりに,動作生成装置が動作を「再構成」するプロンプトとして入力テキストをマスクされた動作に再構成する。プロンプトを構築する際、プロンプトの未マストポーズをテキスト対ポス発生器で合成する。テキスト対ポーズ生成器の最適化を監督するために,テキストと3dポーズのアライメントを測定するための最初のテキスト対ポーズアライメントモデルを提案する。また、ポーズ生成器が限られたトレーニングテキストに過度に適合することを防止するため、トレーニングテキストを必要とせず、テキスト対ポーズ生成器を最適化する新しいワードレストレーニング機構を提案する。総合実験の結果,本手法はベースライン法に対して有意な改善が得られた。コードはhttps://github.com/junfanlin/oohmgで入手できる。 Text-to-motion generation is an emerging and challenging problem, which aims to synthesize motion with the same semantics as the input text. However, due to the lack of diverse labeled training data, most approaches either limit to specific types of text annotations or require online optimizations to cater to the texts during inference at the cost of efficiency and stability. In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner that neither requires paired training data nor extra online optimization to adapt for unseen texts. Inspired by the prompt learning in NLP, we pretrain a motion generator that learns to reconstruct the full motion from the masked motion. During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion. In constructing the prompt, the unmasked poses of the prompt are synthesized by a text-to-pose generator. To supervise the optimization of the text-to-pose generator, we propose the first text-pose alignment model for measuring the alignment between texts and 3D poses. And to prevent the pose generator from overfitting to limited training texts, we further propose a novel wordless training mechanism that optimizes the text-to-pose generator without any training texts. The comprehensive experimental results show that our method obtains a significant improvement against the baseline methods. The code is available at https://github.com/junfanlin/oohmg.	翻訳日:2023-03-27 18:22:22 公開日:2023-03-24
# 破壊的ニューラルスケーリング法則 Broken Neural Scaling Laws ( http://arxiv.org/abs/2210.14891v9 ) ライセンス: Link先を確認	Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger	(参考訳) We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). 神経スケーリング行動の他の機能形式と比較すると、この関数形式は、この集合においてかなり正確なスケーリング行動の外挿をもたらす。さらに、この関数形式は、二重降下のような現象のスケーリング挙動に存在する非単調遷移や、算術のようなタスクのスケーリング挙動に存在する遅延した鋭いインフレクション点(しばしば「創発的な位相遷移」と呼ばれる)など、他の関数形式が表現できないスケーリング挙動を正確にモデル化し、外挿する。最後に、この関数形式を使用して、スケーリング動作の予測可能性の限界に関する洞察を得ます。コードはhttps://github.com/ethancaballero/broken_neural_scaling_lawsで入手できる。 We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points (often called "emergent phase transitions") present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws	翻訳日:2023-03-27 18:21:35 公開日:2023-03-24
# 困難層領域の少ない強調:特異な対流拡散反応問題に対するカリキュラム学習 Less Emphasis on Difficult Layer Regions: Curriculum Learning for Singularly Perturbed Convection-Diffusion-Reaction Problems ( http://arxiv.org/abs/2210.12685v2 ) ライセンス: Link先を確認	Yufeng Wang, Cong Xu, Min Yang, Jin Zhang	(参考訳) 物理情報ニューラルネットワーク(PINN)は、様々な科学・工学分野に応用されているが、わずかに困難な対流拡散反応問題において、基礎となる解を正確に予測できない。本稿では,この障害の原因をドメイン分布の観点から検討し,マルチスケールフィールドの学習を同時に行うことで,ネットワークがトレーニングを前進させることができず,ローカル・ミニマで簡単に立ち往生することを明らかにする。高損失層領域でより多くのコロケーションポイントをサンプリングした経験が、最適化に役立たず、結果が悪化する可能性も示唆した。これらの知見は、ニューラルネットワークがより容易な非層領域での学習を優先し、より難しい層領域での学習を軽視する新しいカリキュラム学習手法の開発を動機付けている。提案手法は,学習強調を自動的に調整し,最適化作業を容易にする。典型的なベンチマーク式における数値的な結果から,提案したカリキュラム学習手法はPINNの故障モードを緩和し,極めて鋭い境界層と内部層に対して正確な結果が得られることが示された。本研究は,大規模に異なる解を持つ方程式に対して,高損失領域に注意を払わないことが,それらを正確に学習するための効果的な戦略であることを示す。 Although Physics-Informed Neural Networks (PINNs) have been successfully applied in a wide variety of science and engineering fields, they can fail to accurately predict the underlying solution in slightly challenging convection-diffusion-reaction problems. In this paper, we investigate the reason of this failure from a domain distribution perspective, and identify that learning multi-scale fields simultaneously makes the network unable to advance its training and easily get stuck in poor local minima. We show that the widespread experience of sampling more collocation points in high-loss layer regions hardly help optimize and may even worsen the results. These findings motivate the development of a novel curriculum learning method that encourages neural networks to prioritize learning on easier non-layer regions while downplaying learning on harder layer regions. The proposed method helps PINNs automatically adjust the learning emphasis and thereby facilitate the optimization procedure. Numerical results on typical benchmark equations show that the proposed curriculum learning approach mitigates the failure modes of PINNs and can produce accurate results for very sharp boundary and interior layers. Our work reveals that for equations whose solutions have large scale differences, paying less attention to high-loss regions can be an effective strategy for learning them accurately.	翻訳日:2023-03-27 18:21:11 公開日:2023-03-24
# 畳み込み・集約・注意に基づく深層ニューラルネットワークによる力学シミュレーションの高速化 Convolution, aggregation and attention based deep neural networks for accelerating simulations in mechanics ( http://arxiv.org/abs/2212.01386v2 ) ライセンス: Link先を確認	Saurabh Deshpande, Ra\'ul I. Sosa, St\'ephane P.A. Bordas, Jakub Lengiewicz	(参考訳) ディープラーニングサロゲートモデルは、コストのかかる従来の数値手法の代替として、科学シミュレーションの加速にますます利用されている。しかし、実世界の複雑な例を扱う場合、それらの使用は依然として大きな課題である。本研究では,固体の非線形変形を効率的に学習するための3種類のニューラルネットワークアーキテクチャを示す。最初の2つのアーキテクチャは、最近提案されたCNN U-NETとMagNET(グラフ U-NET)フレームワークに基づいている。第3のアーキテクチャであるPerceiver IOは、注目に基づくニューラルネットワークのファミリーに属する、非常に最近のアーキテクチャである。 3つのネットワークの性能を2つのベンチマーク例で比較し,ソフトボディの非線形機械的応答を正確に予測する能力を示した。 Deep learning surrogate models are being increasingly used in accelerating scientific simulations as a replacement for costly conventional numerical techniques. However, their use remains a significant challenge when dealing with real-world complex examples. In this work, we demonstrate three types of neural network architectures for efficient learning of highly non-linear deformations of solid bodies. The first two architectures are based on the recently proposed CNN U-NET and MAgNET (graph U-NET) frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks--a class that has revolutionised diverse engineering fields and is still unexplored in computational mechanics. We study and compare the performance of all three networks on two benchmark examples, and show their capabilities to accurately predict the non-linear mechanical responses of soft bodies.	翻訳日:2023-03-27 18:15:05 公開日:2023-03-24
# MIC:コンテキスト拡張ドメイン適応のためのマスク付き画像整合性 MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation ( http://arxiv.org/abs/2212.01322v2 ) ライセンス: Link先を確認	Lukas Hoyer, Dengxin Dai, Haoran Wang, Luc Van Gool	(参考訳) unsupervised domain adaptation(uda)では、ソースデータ(例えばsynthetic)に基づいてトレーニングされたモデルは、ターゲットのアノテーションにアクセスせずにターゲットデータ(例えば実世界)に適応される。従来のUDA手法は、視覚的外観が類似したクラスと競合することが多いが、外観の違いを学習するための基礎的な真実は存在しない。この問題に対処するために、ターゲット領域の空間的コンテキスト関係を頑健な視覚認識のための追加の手がかりとして学習することにより、UDAを強化するMasked Image Consistency (MIC)モジュールを提案する。 MICは、ランダムパッチが保持されないマスクされたターゲット画像の予測と、指数移動平均教師による完全な画像に基づいて生成された擬似ラベルとの一貫性を強制する。一貫性損失を最小限に抑えるために、ネットワークは、そのコンテキストからマスキングされた領域の予測を推測することを学ぶ必要がある。シンプルで普遍的な概念のため、MICは画像分類、セマンティックセグメンテーション、オブジェクト検出など、さまざまな視覚認識タスクにまたがる様々なUDAメソッドに統合することができる。 MICは、合成からリアルタイム、日夜、クリア・ツー・リバース・ウェザーUDAの様々な認識タスクにおいて、最先端の性能を著しく向上させる。例えば、MICは、GTA-to-Cityscapes と VisDA-2017 の75.9 mIoU と92.8%という前例のない UDA のパフォーマンスを達成した。実装はhttps://github.com/lhoyer/micで利用可能である。 In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.	翻訳日:2023-03-27 18:14:50 公開日:2023-03-24
# MHCCL:多変量時系列のための階層型クラスタワイズコントラスト学習 MHCCL: Masked Hierarchical Cluster-Wise Contrastive Learning for Multivariate Time Series ( http://arxiv.org/abs/2212.01141v3 ) ライセンス: Link先を確認	Qianwen Meng, Hangwei Qian, Yong Liu, Lizhen Cui, Yonghui Xu, Zhiqi Shen	(参考訳) 未ラベルの時系列データから意味豊かな表現を学習することは、分類や予測といった下流のタスクに不可欠である。対照的な学習は、最近、専門家のアノテーションがない場合に有望な表現学習能力を示している。しかし、既存の対照的なアプローチは一般的に各インスタンスを独立に扱い、同じ意味論を共有する偽の負のペアを生み出す。この問題に対処するために,多変量時系列の複数の潜在パーティションからなる階層構造から得られた意味情報を利用する,マスケッド階層クラスタ単位のコントラスト学習モデルであるMHCCLを提案する。細粒度クラスタリングが高純度を維持しつつ、粗粒度が高レベルのセマンティクスを反映しているという観察に動機づけられ、クラスタリング階層から複数の粒度情報を取り入れることで偽陰性をフィルタリングし、正を補う新しい下方マスキング戦略を提案する。加えて、mhcclで新しい上向きマスキング戦略が設計され、各パーティションのクラスタの異常を取り除き、プロトタイプを洗練し、階層的クラスタリングプロセスを高速化し、クラスタリング品質を向上させる。広帯域多変量時系列データセットの実験的評価を行う。その結果,教師なし時系列表現学習における最先端手法よりもmhcclが優れていることが示された。 Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.	翻訳日:2023-03-27 18:14:12 公開日:2023-03-24
# 希少事象による動的因果発見に向けて:非パラメトリック条件独立試験 Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test ( http://arxiv.org/abs/2211.16596v3 ) ライセンス: Link先を確認	Chih-Yuan Chiu, Kshitij Kulkarni, Shankar Sastry	(参考訳) 稀な事象に関連する因果現象は、危険に敏感な安全分析、事故解析と予防、極端な価値理論など、幅広い工学的問題にまたがる。しかし、因果発見の現在の手法は、変数が最初に低確率の実現を経験したときにのみ現れる、動的環境におけるランダム変数間の因果関係を発見できないことが多い。そこで本研究では, 時間不変力学系から収集されたデータに対して, 稀ではあるが連続的な事象が発生する新しい統計独立性テストを提案する。特に,システム状態の重畳されたデータセットを,異なるタイミングで発生する前に構築するために,基礎となるデータの時間的不変性を利用する。次に、再構成データに基づいて条件付き独立試験を設計する。本手法の一貫性のために非漸近的なサンプル複雑性境界を提供し,caltrans performance measurement system (pems) から収集したインシデントデータを含む様々なシミュレーションおよび実世界のデータセットでその性能を検証する。データセットと実験を含むコードは公開されている。 Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.	翻訳日:2023-03-27 18:13:48 公開日:2023-03-24
# 能率的単一画像超解像のための特徴領域適応型コントラスト蒸留 Feature-domain Adaptive Contrastive Distillation for Efficient Single Image Super-Resolution ( http://arxiv.org/abs/2211.15951v2 ) ライセンス: Link先を確認	HyeonCheol Moon, JinWoo Jeong, SungJei Kim	(参考訳) 近年,CNN ベースの SISR には多くのパラメータがあり,性能向上のための計算コストが高い。ネットワークを効率的にする方法の1つとして、教師の有用な知識を学生に伝達する知識蒸留(KD)が現在研究されている。近年では,教師と生徒のネットワーク間における特徴マップのユークリッド距離損失を最小限に抑えるために特徴蒸留(fd)が用いられているが,ネットワーク容量制約により教師の知識を効果的かつ有意義に提供し,生徒のパフォーマンスを向上させる方法を十分に検討していない。本稿では,軽量なSISRネットワークを効率的に訓練するための特徴領域適応型コントラスト蒸留(FACD)手法を提案する。本稿では, ユークリッド距離損失を用いた既存のfd手法の限界を示し, 生徒ネットワークが特徴領域における教師の表現からよりリッチな情報を学習させる特徴領域コントラスト損失を提案する。また, トレーニングパッチの条件に応じて選択的に蒸留を施す適応蒸留法を提案する。実験結果から,提案方式による学生EDSRとRCANネットワークは,ベンチマークデータセット全体のPSNR性能だけでなく,従来のFD手法と比較して主観的画質も向上することが示された。 Recently, CNN-based SISR has numerous parameters and high computational cost to achieve better performance, limiting its applicability to resource-constrained devices such as mobile. As one of the methods to make the network efficient, Knowledge Distillation (KD), which transfers teacher's useful knowledge to student, is currently being studied. More recently, KD for SISR utilizes Feature Distillation (FD) to minimize the Euclidean distance loss of feature maps between teacher and student networks, but it does not sufficiently consider how to effectively and meaningfully deliver knowledge from teacher to improve the student performance at given network capacity constraints. In this paper, we propose a feature-domain adaptive contrastive distillation (FACD) method for efficiently training lightweight student SISR networks. We show the limitations of the existing FD methods using Euclidean distance loss, and propose a feature-domain contrastive loss that makes a student network learn richer information from the teacher's representation in the feature domain. In addition, we propose an adaptive distillation that selectively applies distillation depending on the conditions of the training patches. The experimental results show that the student EDSR and RCAN networks with the proposed FACD scheme improves not only the PSNR performance of the entire benchmark datasets and scales, but also the subjective image quality compared to the conventional FD approaches.	翻訳日:2023-03-27 18:13:29 公開日:2023-03-24
# 創発的言語の語彙エントロピーを数学的にモデル化する Mathematically Modeling the Lexicon Entropy of Emergent Language ( http://arxiv.org/abs/2211.15783v2 ) ライセンス: Link先を確認	Brendon Boldt, David Mortensen	(参考訳) 深層学習に基づく創発言語システムにおける語彙エントロピーの数学的モデルとして確率過程FiLexを定式化する。モデルを数学的に定義することで、直接かつ決定的にテスト可能な明確な予測を生成することができる。本研究は,FiLexがハイパーパラメータ(トレーニングステップ,レキシコンサイズ,学習速度,ロールアウトバッファサイズ,Gumbel-Softmax温度)と,20の環境-ハイパーパラメータの組み合わせのうち20の創発言語エントロピーの正確な相関を予測できる4つの環境を実証的に検証した。さらに, 実験により, 異なる環境が過度パラメータとエントロピーの関係を多様に示し, 精度の高い粒度の予測を行うモデルの必要性が示された。 We formulate a stochastic process, FiLex, as a mathematical model of lexicon entropy in deep learning-based emergent language systems. Defining a model mathematically allows it to generate clear predictions which can be directly and decisively tested. We empirically verify across four different environments that FiLex predicts the correct correlation between hyperparameters (training steps, lexicon size, learning rate, rollout buffer size, and Gumbel-Softmax temperature) and the emergent language's entropy in 20 out of 20 environment-hyperparameter combinations. Furthermore, our experiments reveal that different environments show diverse relationships between their hyperparameters and entropy which demonstrates the need for a model which can make well-defined predictions at a precise level of granularity.	翻訳日:2023-03-27 18:13:04 公開日:2023-03-24
# ディープニューラルネットワークにおけるフォアリング説明 Foiling Explanations in Deep Neural Networks ( http://arxiv.org/abs/2211.14860v2 ) ライセンス: Link先を確認	Snir Vitrack Tamam, Raz Lapid, Moshe Sipper	(参考訳) ディープニューラルネットワーク(DNN)は、過去10年間に多くの分野に大きな影響を与えてきた。しかし、多くの問題に対して優れたパフォーマンスを示すにもかかわらず、ブラックボックスの性質は説明可能性に関して依然として大きな課題となっている。実際、説明可能な人工知能(XAI)はいくつかの分野で重要である。本稿では、画像ベースDNNにおける説明手法の厄介な性質を明らかにする: 入力画像に小さな視覚的変化を加えることで、ネットワークの出力に影響を与えることがほとんどなく、進化戦略を用いて、どのように説明が任意に操作されるかを実証する。我々の新しいアルゴリズムであるAttaXAIは、XAIアルゴリズムに対するモデルに依存しない、敵対的な攻撃であり、分類器の出力ロジットと説明マップへのアクセスしか必要としない。ベンチマークデータセットであるcifar100とimagenetのパフォーマンスを,vgg16-cifar100,vgg16-imagenet,mobilenet-cifar100,inception-v3-imagenetの4つの異なるディープラーニングモデルを用いて比較した。 XAI法は勾配やモデル内部を使わずに操作できることがわかった。我々の新しいアルゴリズムは、XAI法が特定の説明図を出力するように、人間の目では認識できない方法で画像を操作できる。我々の知る限り、これはブラックボックス設定における最初の方法であり、説明責任が望まれ、必要であり、法的に義務付けられている重要な価値があると考えている。 Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone -- sans a reasoning of how said answer was derived -- is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image -- hardly influencing the network's output -- we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.	翻訳日:2023-03-27 18:12:48 公開日:2023-03-24
# RUST:未提示画像からの潜在神経シーン表現 RUST: Latent Neural Scene Representations from Unposed Imagery ( http://arxiv.org/abs/2211.14306v2 ) ライセンス: Link先を確認	Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lucic, Klaus Greff	(参考訳) 2次元の観察から3dシーンの構造を推測することは、コンピュータビジョンにおける根本的な課題である。近年,ニューラルシーン表現に基づくアプローチが広く普及し,様々なアプリケーションに適用されている。この領域で残っている大きな課題の1つは、1つのシーンを超えて効果的に一般化する潜在表現を提供する単一のモデルを訓練することである。 SRT(Scene Representation Transformer)はこの方向を約束しているが、より広い範囲の多様なシーンにスケールすることは困難であり、正確な地上真実データを必要とする。この問題に対処するために,RGB画像だけで訓練された新規ビュー合成のためのポーズレスアプローチであるRUST(Really Unposed Scene representation Transformer)を提案する。我々の主な洞察は、ターゲット画像を覗き見し、デコーダがビュー合成に使用する潜伏ポーズの埋め込みを学習するPose Encoderを訓練できるということです。我々は,学習された潜在ポーズ構造について経験的調査を行い,有意義なテスト時間カメラ変換と正確なポーズ読み出しを可能にすることを示す。おそらく意外なことに、RUSTは完璧なカメラポーズにアクセスできる方法と同じような品質を実現し、それによって、償却されたニューラルシーン表現の大規模トレーニングの可能性を解き放ちます。 Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent pose embedding which is used by the decoder for view synthesis. We perform an empirical investigation into the learned latent pose structure and show that it allows meaningful test-time camera transformations and accurate explicit pose readouts. Perhaps surprisingly, RUST achieves similar quality as methods which have access to perfect camera pose, thereby unlocking the potential for large-scale training of amortized neural scene representations.	翻訳日:2023-03-27 18:12:19 公開日:2023-03-24
# ニューラルレンダリングによる教師なし連続意味適応 Unsupervised Continual Semantic Adaptation through Neural Rendering ( http://arxiv.org/abs/2211.13969v2 ) ライセンス: Link先を確認	Zhizheng Liu, Francesco Milano, Jonas Frey, Roland Siegwart, Hermann Blum, Cesar Cadena	(参考訳) アプリケーションの増加は、シーンのシーケンスにわたって知覚タスクにデプロイされるデータ駆動モデルに依存している。トレーニングデータとデプロイメントデータのミスマッチのため、新しいシーンでモデルを適用することは、しばしば優れたパフォーマンスを得るために重要である。本研究では,セマンティクスセグメンテーションのタスクに対して,セマンティクスセグメンテーションを行うための連続的マルチシーン適応について検討する。セグメンテーションモデルの予測を融合させ,ビュー一貫性のあるセマンティックラベルを擬似ラベルとして使用することにより,シーン毎にセマンティック・NeRFネットワークをトレーニングする。セグメンテーションモデルとのジョイントトレーニングにより,セマンティック・ニューラルフモデルにより2次元3次元の知識伝達が可能となる。さらに、サイズが小さく、長期記憶に保存でき、その後、任意の視点からデータをレンダリングして忘れることを減らすことができる。我々は,Voxelベースのベースラインと最先端の教師なしドメイン適応手法の両方より優れているScanNetに対するアプローチを評価する。 An increasing amount of applications rely on data-driven models that are deployed for perception tasks across a sequence of scenes. Due to the mismatch between training and deployment data, adapting the model on the new scenes is often crucial to obtain good performance. In this work, we study continual multi-scene adaptation for the task of semantic segmentation, assuming that no ground-truth labels are available during deployment and that performance on the previous scenes should be maintained. We propose training a Semantic-NeRF network for each scene by fusing the predictions of a segmentation model and then using the view-consistent rendered semantic labels as pseudo-labels to adapt the model. Through joint training with the segmentation model, the Semantic-NeRF model effectively enables 2D-3D knowledge transfer. Furthermore, due to its compact size, it can be stored in a long-term memory and subsequently used to render data from arbitrary viewpoints to reduce forgetting. We evaluate our approach on ScanNet, where we outperform both a voxel-based baseline and a state-of-the-art unsupervised domain adaptation method.	翻訳日:2023-03-27 18:11:57 公開日:2023-03-24
# FFHQ-UV:3次元顔再構成のための正常顔面UVテクスチャデータセット FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction ( http://arxiv.org/abs/2211.13874v2 ) ライセンス: Link先を確認	Haoran Bai, Di Kang, Haoxian Zhang, Jinshan Pan, Linchao Bao	(参考訳) 本稿では,5万以上の高品質なテクスチャuvマップと,照度,中性表現,清浄された顔領域を含む大規模顔用uvテクスチャデータセットを提案する。データセットはFFHQという大規模な顔画像データセットから派生したもので、完全に自動で堅牢なUVテクスチャ生産パイプラインの助けを借りています。我々のパイプラインは、最近のStyleGANベースの顔画像編集手法を利用して、画像入力から多視点正規化顔画像を生成する。次に、精巧なUVテクスチャ抽出、補正、完了手順を適用し、正規化顔画像から高品質なUVマップを生成する。既存のuvテキストデータセットと比較して、データセットはより多様で高品質なテクスチャマップを持っています。さらに,パラメトリックフィッティングに基づく3次元顔再構成のための非線形テクスチャベースとしてganベースのテクスチャデコーダを訓練する。実験の結果,本手法は最先端の手法よりも再構成精度が向上し,さらに,現実的なレンダリングが可能な高品質なテクスチャマップが得られた。データセット、コード、トレーニング済みテクスチャデコーダはhttps://github.com/csbhr/FFHQ-UVで公開されている。 We present a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. The dataset is derived from a large-scale face image dataset namely FFHQ, with the help of our fully automatic and robust UV-texture production pipeline. Our pipeline utilizes the recent advances in StyleGAN-based facial image editing approaches to generate multi-view normalized face images from single-image inputs. An elaborated UV-texture extraction, correction, and completion procedure is then applied to produce high-quality UV-maps from the normalized face images. Compared with existing UV-texture datasets, our dataset has more diverse and higher-quality texture maps. We further train a GAN-based texture decoder as the nonlinear texture basis for parametric fitting based 3D face reconstruction. Experiments show that our method improves the reconstruction accuracy over state-of-the-art approaches, and more importantly, produces high-quality texture maps that are ready for realistic renderings. The dataset, code, and pre-trained texture decoder are publicly available at https://github.com/csbhr/FFHQ-UV.	翻訳日:2023-03-27 18:11:25 公開日:2023-03-24
# Flow-Lenia: 大量保存とパラメータ局在による細胞オートマトンのオープンエンド進化に向けて Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization ( http://arxiv.org/abs/2212.07906v2 ) ライセンス: Link先を確認	Erwan Plantec, Gautier Hamon, Mayalen Etcheverry, Pierre-Yves Oudeyer, Cl\'ement Moulin-Frier and Bert Wang-Chak Chan	(参考訳) 仮想生物のオープンエンド進化のような生命に似た現象を生み出す複雑な自己組織化システムの設計は、人工生命の主要な目標の1つである。コンウェイの生命を連続空間、時間、状態に一般化するセル・オートマトン(ca)のファミリーであるレニアは、それが生成できる幅広い自己組織化パターンのために多くの注目を集めている。その中でも、空間的局所化パターン(slp)は生命のような人工生物に似ており、複雑な行動を示す。しかし、これらの生物は、レニアパラメータ空間の小さな部分空間にのみ存在し、発見し、先進的な探索アルゴリズムを必要とする。さらに、これらの生物は特定の更新規則によって制御された世界でのみ存在し、したがって同一の世界では相互作用できない。本稿ではこれらの問題を解決するために,フローレニアと呼ばれるレニアの大量保存的拡張を提案する。本稿では,複雑な動作を伴うSLPの生成の有効性を示す実験を行い,関心を示すSLPを生成するために更新ルールパラメータを最適化できることを示す。最後に、フローレニアはCAのダイナミックス内でCAの更新ルールのパラメータの統合を可能にし、動的かつ局所化し、複数種のシミュレーションを可能にし、出現する生物の性質を定義する局所的コヒーレントな更新ルールを持ち、近隣の規則と混同できることを示す。これは連続casにおける自己組織型人工生命の形態の本質的進化への道を開くと論じている。 The design of complex self-organising systems producing life-like phenomena, such as the open-ended evolution of virtual creatures, is one of the main goals of artificial life. Lenia, a family of cellular automata (CA) generalizing Conway's Game of Life to continuous space, time and states, has attracted a lot of attention because of the wide diversity of self-organizing patterns it can generate. Among those, some spatially localized patterns (SLPs) resemble life-like artificial creatures and display complex behaviors. However, those creatures are found in only a small subspace of the Lenia parameter space and are not trivial to discover, necessitating advanced search algorithms. Furthermore, each of these creatures exist only in worlds governed by specific update rules and thus cannot interact in the same one. This paper proposes as mass-conservative extension of Lenia, called Flow Lenia, that solve both of these issues. We present experiments demonstrating its effectiveness in generating SLPs with complex behaviors and show that the update rule parameters can be optimized to generate SLPs showing behaviors of interest. Finally, we show that Flow Lenia enables the integration of the parameters of the CA update rules within the CA dynamics, making them dynamic and localized, allowing for multi-species simulations, with locally coherent update rules that define properties of the emerging creatures, and that can be mixed with neighbouring rules. We argue that this paves the way for the intrinsic evolution of self-organized artificial life forms within continuous CAs.	翻訳日:2023-03-27 18:05:15 公開日:2023-03-24
# 制御可能なアバターの再構成のための構造的3次元特徴 Structured 3D Features for Reconstructing Controllable Avatars ( http://arxiv.org/abs/2212.06820v2 ) ライセンス: Link先を確認	Enric Corona, Mihai Zanfir, Thiemo Alldieck, Eduard Gabriel Bazavan, Andrei Zanfir, Cristian Sminchisescu	(参考訳) パラメトリックな統計的メッシュ表面からサンプリングされた高密度な3次元点に画素整列画像特徴をプールする,新しい暗黙の3次元表現に基づくモデルであるStructured 3D Featuresを紹介する。 3Dポイントは関連する意味を持ち、3D空間で自由に移動することができる。これにより、身体の形状だけでなく、興味のある人物の最適なカバーが可能になり、さらにアクセサリー、髪、ゆるい衣服のモデリングにも役立ちます。そこで本研究では,アルベドと照明分解を併用したアニマタブルな3次元再構成を,一方のエンド・ツー・エンドモデル,訓練された半教師付きセミプロセッサ,追加のポストプロセッシングを伴わない,完全な3次元トランスフォーマーベースのアテンション・フレームワークを提案する。本研究では,S3Fモデルがモノクロ3D再構成やアルベド,シェーディング推定など,これまでの課題を超越していることを示す。さらに,提案手法では,新しい視点合成,リライト,再構成が可能であり,複数の入力画像(例えば,人物の異なる視点,あるいは同じ視点を異なるポーズで,映像内で)を自然に処理できるように拡張できることを示す。最後に,3次元仮想トライオンアプリケーションのためのモデルの編集機能を示す。 We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.	翻訳日:2023-03-27 18:04:47 公開日:2023-03-24
# 構造化知識強化によるオープンワールドストーリー生成:包括的調査 Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey ( http://arxiv.org/abs/2212.04634v2 ) ライセンス: Link先を確認	Yuxin Wang, Jieru Lin, Zhiwei Yu, Wei Hu, B\"orje F. Karlsson	(参考訳) ストーリーテリングと物語は人間体験の基本であり、社会と文化の関わりに絡み合っている。そのため、研究者は長い間、物語を自動生成できるシステムを作ろうとしてきた。近年,ディープラーニングと大量のデータリソースを活用して,自動ストーリ生成が大きな進歩を見せている。しかし、生成したストーリーのグローバルコヒーレンスの必要性など、かなりの課題は、生成モデルが人間のナレーターと同じストーリーテリング能力に達することを妨げている。これらの課題に取り組むために、多くの研究は構造的知識を生成プロセスに注入し、構造的知識強化ストーリー生成(structured knowledge-enhanced story generation)と呼ばれる。外部知識の導入は、ストーリーイベント間の論理的一貫性を高め、より良い知識基盤化を達成し、ストーリーにおける過剰な一般化と反復問題を緩和することができる。この調査は、この研究分野の最新かつ包括的なレビューを提供する。 (i)既存の手法がいかに構造化された知識をストーリー生成に組み込むかに関する体系的分類法を提示する。 (二)ストーリーコーパス、構造化知識データセット、評価指標をまとめる。 (3)知識強化ストーリー生成の課題を多次元的に把握し,将来的な研究の方向性に光を当てる。 Storytelling and narrative are fundamental to human experience, intertwined with our social and cultural engagement. As such, researchers have long attempted to create systems that can generate stories automatically. In recent years, powered by deep learning and massive data resources, automatic story generation has shown significant advances. However, considerable challenges, like the need for global coherence in generated stories, still hamper generative models from reaching the same storytelling ability as human narrators. To tackle these challenges, many studies seek to inject structured knowledge into the generation process, which is referred to as structured knowledge-enhanced story generation. Incorporating external knowledge can enhance the logical coherence among story events, achieve better knowledge grounding, and alleviate over-generalization and repetition problems in stories. This survey provides the latest and comprehensive review of this research field: (i) we present a systematical taxonomy regarding how existing methods integrate structured knowledge into story generation; (ii) we summarize involved story corpora, structured knowledge datasets, and evaluation metrics; (iii) we give multidimensional insights into the challenges of knowledge-enhanced story generation and cast light on promising directions for future study.	翻訳日:2023-03-27 18:04:07 公開日:2023-03-24
# 静止空間における運動拡散によるコマンドの実行 Executing your Commands via Motion Diffusion in Latent Space ( http://arxiv.org/abs/2212.04048v2 ) ライセンス: Link先を確認	Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu	(参考訳) 本稿では,アクションクラスやテキスト記述子など,様々な条件入力に応じて人間の動作シーケンスを生成する課題である条件付きヒューマンモーション生成について検討する。人間の動きは多様であり、自然言語におけるテキスト記述子のような条件付きモダリティとは全く異なる性質を持つため、所望の条件付きモダリティから人間の動き列への確率的マッピングを学ぶことは困難である。さらに、モーションキャプチャシステムからの生のモーションデータはシーケンスが冗長でノイズも含んでいる可能性があり、生のモーションシーケンスと条件付きモダリティのジョイント分布を直接モデル化するには、重い計算オーバーヘッドが必要となり、キャプチャされたノイズによって引き起こされるアーティファクトを発生させる可能性がある。人間の動作シーケンスをよりよく表現するために、我々はまず強力な変分オートエンコーダ(VAE)を設計し、人間の動作シーケンスを代表的で低次元の遅延コードに到達する。次に, 動き列と条件入力との接続を確立するために拡散モデルを用いる代わりに, 動き潜在空間上で拡散過程を行う。提案した動作遅延に基づく拡散モデル(MLD)は、与えられた条件入力に対応する鮮明な動き列を生成し、トレーニングおよび推論段階の計算オーバーヘッドを大幅に低減する。様々な人体運動生成タスクに対する広範囲な実験により、我々のMLDは、広範囲な人体運動生成タスクにおける最先端の手法よりも大幅に改善され、原動列上の従来の拡散モデルよりも2桁高速であることが示された。 We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.	翻訳日:2023-03-27 18:03:47 公開日:2023-03-24
# 逆強化学習における誤特定 Misspecification in Inverse Reinforcement Learning ( http://arxiv.org/abs/2212.03201v2 ) ライセンス: Link先を確認	Joar Skalse, Alessandro Abate	(参考訳) 逆強化学習(IRL)の目的は、ポリシー$\pi$から報酬関数$R$を推論することである。これを行うには、$\pi$と$R$の関係のモデルが必要です。現在の文献では、最も一般的なモデルは最適性、ボルツマン合理性、因果エントロピー最大化である。 IRLの主な動機の1つは、人間の行動から人間の嗜好を推測することである。しかしながら、人間の嗜好と人間の行動の関係は、現在IRLで使われているどのモデルよりもはるかに複雑である。これは、それらが誤って特定され、現実世界のデータに適用された場合、不適切な推測につながる恐れが生じることを意味する。本稿では,異なるirlモデルが不特定化に対していかに頑健であるかを数学的に解析し,そのモデルが報酬関数 $r$ に関する誤った推論につながる前に,各標準モデルとデモストラクタポリシーがどのように異なるかを正確に答える。また、IRLの誤特定を推論するためのフレームワークと、新しいIRLモデルの誤特定堅牢性を容易に導き出すためのフォーマルなツールも導入する。 The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. To do this, we need a model of how $\pi$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.	翻訳日:2023-03-27 18:03:18 公開日:2023-03-24
# State Space Closure: 強化学習による無限のオンラインレベル生成の再考 State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning ( http://arxiv.org/abs/2212.02951v2 ) ライセンス: Link先を確認	Ziqi Wang, Tianye Shu, Jialin Liu	(参考訳) 本稿では,最近提案されている強化学習(edrl)フレームワークによる経験駆動プロシージャコンテンツ生成を用いて,エンドレスオンラインレベル生成を再考する。 EDRLは繰り返しパターンを生成する傾向にあるという観察から着想を得た状態空間閉包の概念を定式化し、無限水平オンライン生成プロセスにおいて、任意の確率状態が有限水平内で見られるようにした。理論的解析により、状態空間の閉包が多様性に関する懸念を生じても、コンテンツ品質の劣化を伴わずに有限水平で訓練されたEDRLを無限水平シナリオに一般化する。さらに,広範に使用されているSuper Mario Bros.ベンチマークを用いて,EDRLが生成するコンテンツの品質と多様性を実証研究により検証した。実験結果から,EDRLが生成するレベルの多様性は状態空間の閉鎖によって制限されるが,その品質はトレーニングで指定されたものよりも長い水平線では劣化しないことがわかった。結果と分析をまとめると、強化学習による無限のオンラインレベル生成に関する今後の取り組みは、状態空間の閉鎖と品質の発生を保証しながら多様性の問題に対処すべきである。 In this paper, we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework. Inspired by an observation that EDRL tends to generate recurrent patterns, we formulate a notion of state space closure which makes any stochastic state appeared possibly in an infinite-horizon online generation process can be found within a finite-horizon. Through theoretical analysis, we find that even though state space closure arises a concern about diversity, it generalises EDRL trained with a finite-horizon to the infinite-horizon scenario without deterioration of content quality. Moreover, we verify the quality and the diversity of contents generated by EDRL via empirical studies, on the widely used Super Mario Bros. benchmark. Experimental results reveal that the diversity of levels generated by EDRL is limited due to the state space closure, whereas their quality does not deteriorate in a horizon which is longer than the one specified in the training. Concluding our outcomes and analysis, future work on endless online level generation via reinforcement learning should address the issue of diversity while assuring the occurrence of state space closure and quality.	翻訳日:2023-03-27 18:03:00 公開日:2023-03-24
# イメージが画像で話す: 文脈内ビジュアル学習のためのジェネラリスト・ペインティング Images Speak in Images: A Generalist Painter for In-Context Visual Learning ( http://arxiv.org/abs/2212.02499v2 ) ライセンス: Link先を確認	Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang	(参考訳) インコンテキスト学習は、NLPの新しいパラダイムとして、少数のプロンプトと例だけで、モデルが様々なタスクに迅速に適応できるようにする。しかし、コンピュータビジョンでは、文脈内学習の難しさは、タスクが出力表現で大きく異なるため、ビジョンモデルがドメイン外のタスクを理解し、転送できる汎用的なタスクプロンプトをどのように定義すればよいかは明らかではない。本稿では,コアビジョンタスクの出力をイメージとして再定義する"イメージ"中心のソリューションを用いて,これらの障害に対処するジェネラリストモデルであるpaintを提案し,タスクプロンプトをイメージとして指定する。この考え方では、トレーニングプロセスは非常にシンプルで、入力と出力のイメージペアを縫い合わせることで、標準的なマスク画像モデリングを実行します。これにより、モデルは可視像パッチで条件付きタスクを実行することができる。したがって、推論中に入力条件と同じタスクから一対の入出力画像を適用でき、どのタスクを実行するかを示すことができる。ベルやホイッスルがなければ,高レベルの視覚的理解から低レベルの画像処理に至るまでの7つの視覚的タスクにおいて,精確に確立されたタスク固有モデルと比較して,競争性能が向上する。加えて、paintはいくつかの困難なタスクで最近のジェネラリストモデルを大きく上回っている。 In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. But in computer vision, the difficulties for in-context learning lie in that tasks vary significantly in the output representations, thus it is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks. In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images. With this idea, our training process is extremely simple, which performs standard masked image modeling on the stitch of input and output image pairs. This makes the model capable of performing tasks conditioned on visible image patches. Thus, during inference, we can adopt a pair of input and output images from the same task as the input condition, to indicate which task to perform. Without bells and whistles, our generalist Painter can achieve competitive performance compared to well-established task-specific models, on seven representative vision tasks ranging from high-level visual understanding to low-level image processing. In addition, Painter significantly outperforms recent generalist models on several challenging tasks.	翻訳日:2023-03-27 18:02:38 公開日:2023-03-24
# 単一カメラからのシーン認識型3次元マルチヒューマンモーションキャプチャ Scene-Aware 3D Multi-Human Motion Capture from a Single Camera ( http://arxiv.org/abs/2301.05175v2 ) ライセンス: Link先を確認	Diogo Luvizon, Marc Habermann, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt	(参考訳) 本研究では,静的カメラで記録された1枚のRGBビデオから,シーン内の複数の人間の3次元位置を推定する問題と,その身体形状と調音性について考察する。高価なマーカーベースやマルチビューシステムとは対照的に、当社の軽量なセットアップは、インストールが容易で専門家の知識を必要としない安価な3dモーションキャプチャを可能にするため、プライベートユーザにとって理想的です。この困難な状況に対処するため,我々は,2次元身体関節,関節角度,正規化格差マップ,ヒトセグメンテーションマスクなど,様々な形態の大規模事前学習モデルを用いて,コンピュータビジョンの最近の進歩を活用している。そこで,本稿では,人間の絶対3次元位置,関節的なポーズ,個々の形状,シーンのスケールについて共同で解く,非線形最適化に基づく最初のアプローチを提案する。特に, 2次元身体関節と関節角度を用いた正規化不等式予測から, シーンの奥行きと人別尺度を推定した。フレームあたりのシーン深度を考慮し、3次元空間の静的シーンの点雲を再構成する。最後に、人間のフレーム当たりの3D推定値とシーンポイントクラウドを考慮し、時間的、空間的、物理的妥当性を確保するために、ビデオ上で時空間コヒーレントな最適化を行う。本手法は,従来手法を一貫して上回る多人数3次元ポーズベンチマークを用いて評価し,異なる大きさの人物による挑戦シーンを含む実環境条件にロバストな手法であることを定性的に証明した。 In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the absolute 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person unique scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-the-wild conditions including challenging scenes with people of different sizes.	翻訳日:2023-03-27 17:56:06 公開日:2023-03-24
# frustumformer:マルチビュー3d検出のための適応型インスタンスアウェアリサンプリング FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection ( http://arxiv.org/abs/2301.04467v2 ) ライセンス: Link先を確認	Yuqi Wang, Yuntao Chen, and Zhaoxiang Zhang	(参考訳) 2次元視点空間から3次元空間への特徴の変換は、多視点3次元オブジェクト検出に不可欠である。近年のアプローチでは、視界を3D空間に引き上げる画素ワイジングや、3DプロジェクションによってBEV機能をグリッドワイジングで構築し、すべてのピクセルやグリッドを等しく扱うという視点変換の設計に重点を置いている。しかし、トランスフォーメーションの選択も重要だが、これまで議論されることはめったにない。動く車のピクセルは、空のピクセルよりも情報的です。画像に含まれる情報を十分に活用するためには、ビュー変換はその内容に応じて異なる画像領域に適応できる必要がある。本稿では,アダプティブ・インスタンス・アウェア・リサンプリング(adaptive instance-aware resampling)によってインスタンス領域の機能にさらに注目する,frustumformerという新しいフレームワークを提案する。具体的には、画像ビューオブジェクトの提案を利用して、鳥の視線上のインスタンスフラストレーションを取得する。インスタンスの場所を洗練するために、インスタンスフラスタム内のアダプティブ占有マスクが学習される。さらに、時間的フラストタル交叉は、物体の局在不確実性をさらに減少させる可能性がある。 nuScenesデータセットに関する総合的な実験はFrustumFormerの有効性を示し、ベンチマークで新しい最先端性能を実現する。コードとモデルはhttps://github.com/Robertwyq/Frustum.comで公開される。 The transformation of features from 2D perspective space to 3D space is essential to multi-view 3D object detection. Recent approaches mainly focus on the design of view transformation, either pixel-wisely lifting perspective view features into 3D space with estimated depth or grid-wisely constructing BEV features via 3D projection, treating all pixels or grids equally. However, choosing what to transform is also important but has rarely been discussed before. The pixels of a moving car are more informative than the pixels of the sky. To fully utilize the information contained in images, the view transformation should be able to adapt to different image regions according to their contents. In this paper, we propose a novel framework named FrustumFormer, which pays more attention to the features in instance regions via adaptive instance-aware resampling. Specifically, the model obtains instance frustums on the bird's eye view by leveraging image view object proposals. An adaptive occupancy mask within the instance frustum is learned to refine the instance location. Moreover, the temporal frustum intersection could further reduce the localization uncertainty of objects. Comprehensive experiments on the nuScenes dataset demonstrate the effectiveness of FrustumFormer, and we achieve a new state-of-the-art performance on the benchmark. Codes and models will be made available at https://github.com/Robertwyq/Frustum.	翻訳日:2023-03-27 17:55:41 公開日:2023-03-24
# 二元性ニューロン活性化パターンを用いた分布外サンプルの検出 Detection of out-of-distribution samples using binary neuron activation patterns ( http://arxiv.org/abs/2212.14268v2 ) ライセンス: Link先を確認	Bartlomiej Olber, Krystian Radlak, Adam Popowicz, Michal Szczepankiewicz, Krystian Chachu{\l}a	(参考訳) ディープニューラルネットワーク(DNN)は、様々なアプリケーションで優れた性能を発揮する。研究コミュニティの多くの努力にもかかわらず、アウト・オブ・ディストリビューション(OOD)サンプルはDNN分類器の重要な制限として残っている。未発見の入力を新規に識別する能力は、自動運転車、無人航空機、ロボットといった安全上重要な応用において不可欠である。 OODサンプルを検出するための既存のアプローチでは、DNNをブラックボックスとして扱い、出力予測の信頼性スコアを評価する。残念ながら、DNNはOOD入力に対する信頼を減らすために訓練されていないため、この方法は頻繁に失敗する。本研究では,OOD検出のための新しい手法を提案する。この手法は、ReLUアーキテクチャにおけるニューロン活性化パターン(NAP)の理論解析によって動機づけられる。提案手法では,畳み込み層から抽出した活性化パターンのバイナリ表現による計算オーバーヘッドが高まることはない。広範な実証評価により、様々なDNNアーキテクチャと7つの画像データセットの性能が証明された。 Deep neural networks (DNN) have outstanding performance in various applications. Despite numerous efforts of the research community, out-of-distribution (OOD) samples remain a significant limitation of DNN classifiers. The ability to identify previously unseen inputs as novel is crucial in safety-critical applications such as self-driving cars, unmanned aerial vehicles, and robots. Existing approaches to detect OOD samples treat a DNN as a black box and evaluate the confidence score of the output predictions. Unfortunately, this method frequently fails, because DNNs are not trained to reduce their confidence for OOD inputs. In this work, we introduce a novel method for OOD detection. Our method is motivated by theoretical analysis of neuron activation patterns (NAP) in ReLU-based architectures. The proposed method does not introduce a high computational overhead due to the binary representation of the activation patterns extracted from convolutional layers. The extensive empirical evaluation proves its high performance on various DNN architectures and seven image datasets.	翻訳日:2023-03-27 17:55:19 公開日:2023-03-24
# 作業前を対象とする統一オブジェクトカウントネットワーク A Unified Object Counting Network with Object Occupation Prior ( http://arxiv.org/abs/2212.14193v2 ) ライセンス: Link先を確認	Shengqin Jiang, Qing Wang, Fengna Cheng, Yuankai Qi, Qingshan Liu	(参考訳) 多数のアプリケーション(例えば、群衆数、トラフィック統計)で基本的な役割を果たすカウントタスクは、さまざまな密度を持つオブジェクトの数を予測することを目的としている。既存のオブジェクトカウントタスクは単一のオブジェクトクラスのために設計されます。しかし、私たちの現実世界で新しいクラスで新しいデータに遭遇するのは避けられない。このシナリオを \textit{evolving object counting} と命名します。本稿では,最初の進化するオブジェクト計数データセットを構築し,この課題に対する最初の試みとして統一オブジェクト計数ネットワークを提案する。提案モデルは,クラスに依存しないマスクモジュールとクラスインクリメンタルモジュールの2つの重要なコンポーネントから構成される。クラス非依存マスクモジュールは、クラス非依存なバイナリマスクを予測して、汎用オブジェクトの占有を事前に学習する(例えば、1は、画像中の考慮位置にあるオブジェクトが存在し、それ以外は0であることを示す)。 class-incrementalモジュールは新しい来るべきクラスを扱うために使われ、密度マップ予測のための判別クラスガイダンスを提供する。クラス非依存マスクモジュールと画像特徴抽出器の組合せ出力を用いて最終密度マップを予測する。新しいクラスが来たら、まずクラスインクリメンタルモジュールの最後の回帰層と分類層に新しいニューラルネットワークを追加します。そして、モデルをスクラッチから再トレーニングするのではなく、モデルが以前のオブジェクトクラスについて既に学んだことを思い出すのに役立つ知識蒸留を利用する。また、各クラスの典型的なトレーニングサンプルを少数のサポートサンプルバンクに格納することで、モデルが古いデータのキー情報を忘れないようにしています。この設計により,大規模再トレーニングを行わずに,既存のデータのパフォーマンスを維持しつつ,新しいクラスに効率的に適応することができる。収集したデータセットに関する広範な実験は、優れたパフォーマンスを示している。 The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-incremental module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-incremental module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of class-incremental module. Then, instead of retraining the model from scratch, we utilize knowledge distillation to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.	翻訳日:2023-03-27 17:55:06 公開日:2023-03-24
# スケーラブルな物理的一貫性のあるニューラルネットワークに向けて:データ駆動型マルチゾーンサーマルビルディングモデルへの応用 Towards Scalable Physically Consistent Neural Networks: an Application to Data-driven Multi-zone Thermal Building Models ( http://arxiv.org/abs/2212.12380v2 ) ライセンス: Link先を確認	Loris Di Natale, Bratislav Svetozarevic, Philipp Heer, and Colin Neil Jones	(参考訳) 収集されるデータが増えるにつれて、データ駆動モデリングの手法が近年人気が高まっている。物理的に健全であるが、古典的なグレーボックスモデルはしばしば識別とスケールが困難であり、その正確さは表現力の制限によって妨げられる可能性がある。一方で、現在ではニューラルネットワーク(nns)に依存する古典的なブラックボックス法は、データから統計的パターンを導出することで、大規模でも印象的なパフォーマンスを達成していることが多い。しかし、それらは基礎となる物理法則に完全に従わないままであり、現実世界の物理システムに対する決定がそれらに基づく場合、破滅的な失敗につながる可能性がある。物理的に一貫性のあるニューラルネットワーク(PCNN)は最近、前述の問題に対処するために開発された。そこで本研究では,PCNNを用いて建築温度動態をモデル化し,従来のグレーボックス法とブラックボックス法とを徹底的に比較する。より正確には、3つの異なるpcnn拡張を設計し、アーキテクチャのモジュラリティと柔軟性を例示し、その物理的一貫性を正式に証明します。実例では,PCNNは最先端の精度を達成でき,制約構造にもかかわらず従来のNNモデルよりも優れていた。さらに、我々の調査は、完全に物理に依存しないまま、NNが優れたパフォーマンスを達成していることを示す明確なイラストを提供している。この性能は計算複雑性のコストがかかるが、pcnnは他の物理的に一貫性のある手法と比較して17-35%の精度向上を示し、最先端の性能を持つスケーラブルな物理的一貫性モデルへの道を開く。 With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.	翻訳日:2023-03-27 17:54:38 公開日:2023-03-24
# 波動関数密度勾配に伴う粒子運動 Particle motion associated with wave function density gradients ( http://arxiv.org/abs/2212.11575v3 ) ライセンス: Link先を確認	Jan Klaers, Violetta Sharoglazova, Chris Toebes	(参考訳) 2つの結合導波管電位の系における大粒子の量子力学的運動について検討し、導波管間の集団移動が時計として効果的に働き、粒子速度を決定できることを示す。反射ステップポテンシャルにおけるエバネッセント現象へのこのスキームの適用は、古典的に禁止された運動に対するエネルギー-速度関係を明らかにする。獲得と損失の領域は、想像上のポテンシャルによって説明され、粒子の運動を加速させる。量子力学的波動関数の位相および密度勾配は粒子の速度を示すのに相補的な役割を果たす。 We study the quantum mechanical motion of massive particles in a system of two coupled waveguide potentials, where the population transfer between the waveguides effectively acts as a clock and allows particle velocities to be determined. Application of this scheme to evanescent phenomena at a reflective step potential reveals an energy-velocity relationship for classically forbidden motion. Regions of gain and loss, as described by imaginary potentials, are shown to speed up the motion of particles. We argue that phase and density gradients in quantum mechanical wave functions play complementary roles in indicating the speed of particles.	翻訳日:2023-03-27 17:54:06 公開日:2023-03-24
# layoutdetr: detection transformerは優れたマルチモーダルレイアウトデザイナである LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer ( http://arxiv.org/abs/2212.09877v3 ) ライセンス: Link先を確認	Ning Yu, Chia-Chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu	(参考訳) グラフィックレイアウト設計は視覚コミュニケーションにおいて重要な役割を果たす。しかし、手作りのレイアウト設計は、スキル要求、時間消費、バッチ生産への非スカラブルである。生成モデルは、デザインの自動化をスケーラブルにするために出現するが、デザイナーのマルチモーダルな願望、すなわち背景画像によって制約され、前景コンテンツによって駆動されるデザインを作成することは、いまだに自明ではない。本研究では,生成モデルから高品質かつ現実性を継承するLayoutDETRを提案するとともに,コンテンツ認識要求を検出問題として再定義し,背景画像から適切な位置,スケール,空間的関係をレイアウトで検出する。当社のソリューションは、パブリックベンチマークと新たに調達したad bannerデータセットで、レイアウト生成のための新たな最先端のパフォーマンスを設定します。ユーザの学習を促進するグラフィカルなシステムにソリューションを統合することで,ユーザがベースラインよりもデザインを好むことを示す。私たちのコード、モデル、データセット、グラフィカルシステム、デモはhttps://github.com/salesforce/LayoutDETRで公開されています。 Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.	翻訳日:2023-03-27 17:53:32 公開日:2023-03-24
# MM拡散:共同音声・ビデオ生成のための多モード拡散モデル学習 MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation ( http://arxiv.org/abs/2212.09478v2 ) ライセンス: Link先を確認	Ludan Ruan and Yiyang Ma and Huan Yang and Huiguo He and Bei Liu and Jianlong Fu and Nicholas Jing Yuan and Qin Jin and Baining Guo	(参考訳) 本稿では,高品質なリアルなビデオにエンゲージメントと聴取体験を同時にもたらす,初の共同音声ビデオ生成フレームワークを提案する。音声とビデオの併用ペアを生成するために,二結合脱音オートエンコーダを用いたマルチモーダル拡散モデル(mm-diffusion)を提案する。既存の単一モード拡散モデルとは対照的に、MM拡散は設計による共同記述プロセスのための逐次多モードU-Netで構成されている。音声とビデオの2つのサブネットは、ガウス雑音から徐々にアライメントされたオーディオビデオペアを生成する。モダリティ間の意味的一貫性を確保するために,2つのサブネット上にランダムシフトに基づくアテンションブロックを橋渡しし,効率的なクロスモーダルアライメントを実現することにより,相互に音声・映像の忠実度を高めることを提案する。広汎な実験は、無条件のオーディオビデオ生成やゼロショット条件タスク(例えば、ビデオからオーディオ)において優れた結果を示す。特にランドスケープとAIST++のダンスデータセットで最高のFVDとFADを実現する。 10k票のチューリングテストは、我々のモデルに支配的な選好を示す。コードと事前訓練されたモデルはhttps://github.com/researchmm/MM-Diffusion.comでダウンロードできる。 We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion.	翻訳日:2023-03-27 17:53:14 公開日:2023-03-24
# 大規模言語モデルにおける創発的類推 Emergent Analogical Reasoning in Large Language Models ( http://arxiv.org/abs/2212.09196v2 ) ライセンス: Link先を確認	Taylor Webb, Keith J. Holyoak, Hongjing Lu	(参考訳) 近年の大規模言語モデルの出現は、十分な訓練データを得た一般的なモデルに人間の認知能力が出現するかどうかという議論を再燃させた。特に興味深いのは、これらのモデルが直接訓練することなく、ゼロショットで新しい問題を推論する能力である。人間の認知では、この能力は類推による推論能力と密接に結びついている。本稿では,ラヴェンのプログレッシブ・マトリクスをモデルとした新しいテキストベースマトリクス推論タスクを含む,様々な類推的タスクについて,人間推論者と大言語モデル(gpt-3のテキストダヴィンチ-003変種)の直接比較を行った。その結果、GPT-3は、多くの設定において、抽象パターン誘導、マッチング、さらには人間の能力を超える、驚くほど強力な能力を示した。以上の結果から, GPT-3のような大規模言語モデルでは, 幅広い類似問題に対するゼロショット解を求める能力が得られている。 The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven's Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.	翻訳日:2023-03-27 17:52:52 公開日:2023-03-24
# User-Centered Design (IX):人工知能時代の"User Experience 3.0"パラダイムフレームワーク User-Centered Design (IX): A "User Experience 3.0" Paradigm Framework in the Intelligence Era ( http://arxiv.org/abs/2302.06681v6 ) ライセンス: Link先を確認	Wei Xu	(参考訳) ユーザ中心設計」のデザイン哲学に基づくユーザエクスペリエンス(UX)の分野は、インテリジェンスの時代に向かっている。それでも、既存のUXパラダイムは主にインテリジェントでないシステムを対象としており、インテリジェントなシステムに対するUXに対する体系的なアプローチが欠けている。 UXの開発を通じて、UXパラダイムは技術横断時代の進化特性を示している。現在、インテリジェンス時代はUXパラダイムに対する新たな要求を提起している。そこで本稿では,インテリジェンス時代の"UX 3.0"パラダイムフレームワークと,それに対応するUX方法論システムを提案する。 UX 3.0"パラダイムフレームワークには、エコロジーエクスペリエンス、イノベーション対応エクスペリエンス、AI対応エクスペリエンス、ヒューマン-AIインタラクションベースエクスペリエンス、ヒューマン-AIコラボレーションベースのエクスペリエンスメソッドの5つのカテゴリが含まれており、それぞれが対応する複数のUXパラダイム指向を提供する。 UX 3.0"パラダイムの提案は、既存のUXメソッドの改善を支援し、インテリジェントシステム開発におけるUXの研究と応用に対する方法論的なサポートを提供する。最後に、この論文は「UX 3.0」パラダイムの今後の研究と応用を楽しみにしている。 The field of user experience (UX) based on the design philosophy of "user-centered design" is moving towards the intelligence era. Still, the existing UX paradigm mainly aims at non-intelligent systems and lacks a systematic approach to UX for intelligent systems. Throughout the development of UX, the UX paradigm shows the evolution characteristics of the cross-technology era. At present, the intelligence era has put forward new demands on the UX paradigm. For this reason, this paper proposes a "UX 3.0" paradigm framework and the corresponding UX methodology system in the intelligence era. The "UX 3.0" paradigm framework includes five categories of UX methods: ecological experience, innovation-enabled experience, AI-enabled experience, human-AI interaction-based experience, and human-AI collaboration-based experience methods, each providing corresponding multiple UX paradigmatic orientations. The proposal of the "UX 3.0" paradigm helps improve the existing UX methods and provides methodological support for the research and applications of UX in developing intelligent systems. Finally, this paper looks forward to future research and applications of the "UX 3.0" paradigm.	翻訳日:2023-03-27 17:46:58 公開日:2023-03-24
# 多様性が必要である:安定拡散によるモデル非依存なゼロショット分類の改善 Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion ( http://arxiv.org/abs/2302.03298v3 ) ライセンス: Link先を確認	Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookes	(参考訳) 本研究では,実画像を用いずに実画像の分類を行うための非特異的分類アーキテクチャ(ダウンストリームモデル)を訓練することを目的とした,モデル非依存ゼロショット分類(ma-zsc)の問題を検討する。近年の研究では、拡散モデルを用いた合成訓練画像の生成は、ma-zscに対処する潜在的な解決策となることが示されている。しかし、現在のこのアプローチの性能は、大規模なビジョン言語モデルによって達成されるものには及ばない。考えられる説明の1つは、合成画像と実画像の間の潜在的な領域ギャップである。我々の研究は、生成したデータセット内の画像の多様性を改善することにより、MA-ZSCの性能を改善することができるという最初の洞察を提供することで、この問題に対する新たな視点を提供する。我々は,事前学習した拡散モデルを用いてテキストから画像への生成プロセスを改良し,多様性を高めることを提案する。提案手法は,CLIPなどの最先端モデルに匹敵する,様々な分類アーキテクチャにおける顕著な改善を示す。 CIFAR10, CIFAR100, EuroSATの衛星画像領域によるゼロショット分類は特に困難である。我々はResNetとViTを含む5つの分類アーキテクチャでアプローチを評価した。本研究は拡散モデルを用いたma-zsc問題の初期知見を提供する。すべてのコードはGitHubで入手できる。 In this work, we investigate the problem of Model-Agnostic Zero-Shot Classification (MA-ZSC), which refers to training non-specific classification architectures (downstream models) to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. However, the performance of this approach currently falls short of that achieved by large-scale vision-language models. One possible explanation is a potential significant domain gap between synthetic and real images. Our work offers a fresh perspective on the problem by providing initial insights that MA-ZSC performance can be improved by improving the diversity of images in the generated dataset. We propose a set of modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity, which we refer to as our $\textbf{bag of tricks}$. Our approach shows notable improvements in various classification architectures, with results comparable to state-of-the-art models such as CLIP. To validate our approach, we conduct experiments on CIFAR10, CIFAR100, and EuroSAT, which is particularly difficult for zero-shot classification due to its satellite image domain. We evaluate our approach with five classification architectures, including ResNet and ViT. Our findings provide initial insights into the problem of MA-ZSC using diffusion models. All code will be available on GitHub.	翻訳日:2023-03-27 17:46:40 公開日:2023-03-24
# ロバスト多視点三角測量のための半定値緩和 Semidefinite Relaxations for Robust Multiview Triangulation ( http://arxiv.org/abs/2301.11431v2 ) ライセンス: Link先を確認	Linus H\"arenstam-Nielsen, Niclas Zeller, Daniel Cremers	(参考訳) 本稿では,凸緩和に基づく最適ロバスト多視点三角測量のアプローチを提案する。この目的のために、最小二乗コスト関数を組み込むことで、既存の緩和アプローチを非ロバスト多視点三角測量に拡張する。本稿では,エピポーラ制約に基づく2つの定式化と,分数再投影制約に基づく2つの定式化を提案する。 1つ目は低次元であり、中程度の騒音と降圧レベルの下ではきつく、もう1つ目は高次元であり、したがって遅いが、極端な騒音と降圧レベルでもきつい。提案手法は,大きな雑音と大容量の異常の下でも,証明可能な最適再構成を計算できることを実証する。 We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. To this end, we extend existing relaxation approaches to non-robust multiview triangulation by incorporating a least squares cost function. We propose two formulations, one based on epipolar constraints and one based on fractional reprojection constraints. The first is lower dimensional and remains tight under moderate noise and outlier levels, while the second is higher dimensional and therefore slower but remains tight even under extreme noise and outlier levels. We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal reconstructions even under significant noise and a large percentage of outliers.	翻訳日:2023-03-27 17:45:25 公開日:2023-03-24
# 音声言語理解におけるファイラー : 計算的・心理的視点 Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives ( http://arxiv.org/abs/2301.10761v4 ) ライセンス: Link先を確認	Tanvi Dinkar, Chlo\'e Clavel, Ioana Vasilescu	(参考訳) 発話の通常の流れにおける中断(disfluencies)は、話し言葉に対してユビキタスである。フィラー("uh", "um")は、他の種類の不均衡と比較して最も頻繁に発生する不規則である。しかし、私たちの知る限りでは、これらのスピーチイベントにおいてSpoken Language Understanding(SLU)に影響を与える研究の視点をまとめるリソースは存在しない。本論文の目的は,基本(心理学)言語理論の考察から,自動音声認識(asr)とsluシステムにおける注釈と考察から,世代的観点からの研究まで,幅広い視点を総合的に調査することである。この記事では、SLUと会話型AIコミュニティにアプローチ可能な方法で視点を提示し、前進、各分野のトレンドと課題を議論することを目的としています。 Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.	翻訳日:2023-03-27 17:45:14 公開日:2023-03-24
# すべてのドメインに対する1つのモデル:クロスドメインnerのためのコラボレーティブなドメインプリフィックスチューニング One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER ( http://arxiv.org/abs/2301.10410v3 ) ライセンス: Link先を確認	Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen	(参考訳) クロスドメインNERは、実践シナリオにおける低リソースの問題に対処する上で難しいタスクである。従来の典型的なソリューションは主に、リッチリソースドメインのデータを持つ事前学習言語モデル(PLM)を用いてNERモデルを取得し、ターゲットドメインに適応する。異なるドメインのエンティティタイプ間のミスマッチの問題のため、従来のアプローチは通常、PLMのすべてのパラメータをチューニングし、最終的に各ドメインに対して全く新しいNERモデルになる。さらに、現在のモデルは、複数のソースからターゲットへの知識の転送に失敗しながら、単一のソースドメインにおける知識の活用にのみ焦点を当てている。この問題に対処するために,テキスト対テキスト生成plmに基づくクロスドメインner(cp-ner)のための協調型ドメインプリフィックスチューニングを導入する。具体的には、ドメイン関連インストラクターを対象に、構造変更なしに知識を新しいドメインNERタスクに転送するテキスト・ツー・テキスト生成を提案する。凍結したPLMを利用して協調的なドメイン-プレフィックスチューニングを行い、PLMのポテンシャルを刺激し、NERタスクを様々なドメインで処理する。 Cross-NERベンチマークによる実験結果から,提案手法はフレキシブルトランスファー能力を有し,単一ソースと複数ソースのクロスドメインNERタスクにおいて優れた性能を発揮することが示された。コードはhttps://github.com/zjunlp/DeepKE/tree/main/example/ner/crossで提供される。 Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks. Codes will be available in https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross.	翻訳日:2023-03-27 17:44:59 公開日:2023-03-24
# 深部測定量子化 Deep Conditional Measure Quantization ( http://arxiv.org/abs/2301.06907v2 ) ライセンス: Link先を確認	Gabriel Turinici	(参考訳) 確率測度の量子化は、それを有限個のディラック質量で表し、入力分布を十分に近似することを意味する(確率測度の幾らかの計量空間において)。様々な方法が存在するが、条件付き法則の定量化の状況は調査されていない。本稿では,深層ニューラルネットワークアーキテクチャと組み合わされたフーバーエネルギーカーネルベースアプローチを用いたdcmqと呼ばれる手法を提案する。この方法はいくつかの例でテストされ、有望な結果が得られる。 Quantization of a probability measure means representing it with a finite set of Dirac masses that approximates the input distribution well enough (in some metric space of probability measures). Various methods exists to do so, but the situation of quantizing a conditional law has been less explored. We propose a method, called DCMQ, involving a Huber-energy kernel-based approach coupled with a deep neural network architecture. The method is tested on several examples and obtains promising results.	翻訳日:2023-03-27 17:44:18 公開日:2023-03-24
# セマンティック・トレラント・コントラスト・ロスによる自己監督型イメージ・ツー・ポイント蒸留 Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss ( http://arxiv.org/abs/2301.05709v2 ) ライセンス: Link先を確認	Anas Mahmoud, Jordan S. K. Hu, Tianshu Kuai, Ali Harakeh, Liam Paull, and Steven L. Waslander	(参考訳) 知覚タスクの3D表現を学習するための効果的なフレームワークは、コントラスト学習を通じて、リッチな自己教師付き画像特徴を抽出することである。しかし、自律運転データセットのイメージ・ツー・ポイント表現学習は2つの大きな課題に直面している。 1) 自己相似性の豊富さは、意味的に類似した点や画像領域を押し出し、学習した表現の局所的な意味構造を乱す、対照的な損失をもたらす。 2)プリトレーニングとしての厳しいクラス不均衡は,過度に表現されたクラスに支配される。本稿では,画像領域と画像領域の対比を最小化するために,正と負の領域間の意味距離を考慮した,新しい意味論的に寛容な画像対点コントラスト損失法を提案する。さらに,クラス不均衡度を,集合的なサンプルとサンプル間のセマンティック類似度によって近似するクラス非均衡損失を設計することで,クラス不均衡に対処する。クラスバランスによるセマンティック・トレラントなコントラスト損失は,3次元セマンティックセグメンテーションのすべての評価設定において,最先端の2D-to-3D表現学習を改善することを示す。提案手法は,最先端の2D-to-3D表現学習フレームワークを多種多様な自己教師付き事前学習モデルで一貫した性能を発揮する。 An effective framework for learning 3D representations for perception tasks is distilling rich self-supervised image features via contrastive learning. However, image-to point representation learning for autonomous driving datasets faces two main challenges: 1) the abundance of self-similarity, which results in the contrastive losses pushing away semantically similar point and image regions and thus disturbing the local semantic structure of the learned representations, and 2) severe class imbalance as pretraining gets dominated by over-represented classes. We propose to alleviate the self-similarity problem through a novel semantically tolerant image-to-point contrastive loss that takes into consideration the semantic distance between positive and negative image regions to minimize contrasting semantically similar point and image regions. Additionally, we address class imbalance by designing a class-agnostic balanced loss that approximates the degree of class imbalance through an aggregate sample-to-samples semantic similarity measure. We demonstrate that our semantically-tolerant contrastive loss with class balancing improves state-of-the art 2D-to-3D representation learning in all evaluation settings on 3D semantic segmentation. Our method consistently outperforms state-of-the-art 2D-to-3D representation learning frameworks across a wide range of 2D self-supervised pretrained models.	翻訳日:2023-03-27 17:44:10 公開日:2023-03-24
# 音声アシスタントの親制御に向けて Towards Usable Parental Control for Voice Assistants ( http://arxiv.org/abs/2303.04957v2 ) ライセンス: Link先を確認	Peiyi Yang, Jie Fan, Zice Wei, Haoqian Li, Tu Le, and Yuan Tian	(参考訳) ボイスパーソナルアシスタント(VPA)は一般的な家電製品となっている。 VPA技術の主要なプラットフォームのひとつとして、AmazonはAlexaを開発し、子供向けのAmazon Kidsを設計し、VPAの豊富な機能を安全に享受し、親がペアレントダッシュボードを通じて子供の活動を監視するようにした。このエコシステムは存在するが、親ダッシュボードの利用は親にはまだ普及していない。本稿では,親による調査を行い,親のコントロール機能について,親の好みや嫌いについて調査する。親は、子どもの活動、子どものセキュリティ機能へのアクセスの容易化、ユーザーインターフェースの改善など、より視覚的な情報を必要としている。本調査から得られた知見をもとに,親の期待を鑑み,親のダッシュボードに新たなデザインを提案する。 Voice Personal Assistants (VPA) have become a common household appliance. As one of the leading platforms for VPA technology, Amazon created Alexa and designed Amazon Kids for children to safely enjoy the rich functionalities of VPA and for parents to monitor their kids' activities through the Parent Dashboard. Although this ecosystem is in place, the usage of Parent Dashboard is not yet popularized among parents. In this paper, we conduct a parent survey to find out what they like and dislike about the current parental control features. We find that parents need more visuals about their children's activity, easier access to security features for their children, and a better user interface. Based on the insights from our survey, we present a new design for the Parent Dashboard considering the parents' expectations.	翻訳日:2023-03-27 17:37:59 公開日:2023-03-24
# HairStep:シングルビュー3次元ヘアモデリングのためのストランドマップと深さマップを用いた実写合成 HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling ( http://arxiv.org/abs/2303.02700v2 ) ライセンス: Link先を確認	Yujian Zheng, Zirong Jin, Moran Li, Haibin Huang, Chongyang Ma, Shuguang Cui, Xiaoguang Han	(参考訳) 本研究では,学習型単一視点3Dヘアモデリングの課題に対処する。実画像と3Dヘアデータを集めることの難しさから, 合成データを用いて, 実領域の事前知識を提供する手法が主流となっている。残念ながら、これはドメインギャップの課題をもたらします。現実的なヘアレンダリングが本質的に困難であるため、既存の手法では、ギャップを埋める入力としてヘアイメージの代わりに方向マップを使用するのが一般的である。中間表現は不可欠であると考えるが、支配的なフィルタリングに基づく手法を用いた方向マップは不確定なノイズに敏感であり、有能な表現とは程遠い。そこで本研究では,まずこの問題を提起し,ストランドマップと深さマップからなるヘアステップと呼ばれる新しい中間表現を提案する。 HairStepは正確な3Dヘアモデリングに十分な情報を提供するだけでなく、実際の画像から推測できる。具体的には、2種類のアノテーションで1,250枚の肖像画画像のデータセットを収集する。さらに学習フレームワークは、実際の画像をストランドマップと深さマップに転送するように設計されている。新たなデータセットの付加的なボーナスが3Dヘアモデリングの最初の定量的指標であることに注意が必要だ。実験の結果, ヘアステップは合成とリアルの領域ギャップを狭くし, 単視点3dヘアリコンストラクションの最先端性能を実現することがわかった。 In this work, we tackle the challenging problem of learning-based single-view 3D hair modeling. Due to the great difficulty of collecting paired real image and 3D hair data, using synthetic data to provide prior knowledge for real domain becomes a leading solution. This unfortunately introduces the challenge of domain gap. Due to the inherent difficulty of realistic hair rendering, existing methods typically use orientation maps instead of hair images as input to bridge the gap. We firmly think an intermediate representation is essential, but we argue that orientation map using the dominant filtering-based methods is sensitive to uncertain noise and far from a competent representation. Thus, we first raise this issue up and propose a novel intermediate representation, termed as HairStep, which consists of a strand map and a depth map. It is found that HairStep not only provides sufficient information for accurate 3D hair modeling, but also is feasible to be inferred from real images. Specifically, we collect a dataset of 1,250 portrait images with two types of annotations. A learning framework is further designed to transfer real images to the strand map and depth map. It is noted that, an extra bonus of our new dataset is the first quantitative metric for 3D hair modeling. Our experiments show that HairStep narrows the domain gap between synthetic and real and achieves state-of-the-art performance on single-view 3D hair reconstruction.	翻訳日:2023-03-27 17:37:46 公開日:2023-03-24
# PixMIM:マズーク画像モデリングにおけるピクセル再構成の再考 PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling ( http://arxiv.org/abs/2303.02416v2 ) ライセンス: Link先を確認	Yuan Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin	(参考訳) Masked Image Modeling (MIM) は Masked Autoencoders (MAE) と BEiT の出現によって有望な進歩を遂げた。しかし、その後の作業は、新しい補助タスクや予備訓練されたモデルでフレームワークを複雑化し、必然的に計算オーバーヘッドを増加させた。本稿では、画素再構成の観点からMIMの基本的な解析を行い、入力画像パッチと再構成ターゲットを調べ、2つの重要なボトルネックを強調する。この分析に基づいて, 2つの戦略を包含する非常に単純で効果的な方法, {\ourmethod} を提案する。 1) 再構成対象から高周波成分をフィルタリングし、テクスチャに富む詳細へのネットワークの焦点を強調しない。 2)MIMトレーニングにおける前景不足の問題を軽減するため,保守的なデータ変換戦略を採用する。 {\ourmethod} は、既存のピクセルベースのMIMアプローチ (\ie, using raw image as reconstruction target) に、無視できる追加計算で簡単に統合できる。ベルとホイッスルがなければ,提案手法は様々な下流タスクにおいて,MAE,ConvMAE,LSMAEの3つのMIMアプローチを一貫して改善する。我々は,この効果的なプラグアンドプレイ方式が,自己指導型学習の強力なベースラインとなり,MIMフレームワークの今後の改良に対する洞察を提供すると考えている。コードとモデルは \url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim} で利用可能である。 Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT. However, subsequent works have complicated the framework with new auxiliary tasks or extra pre-trained models, inevitably increasing computational overhead. This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction, which examines the input image patches and reconstruction target, and highlights two critical but previously overlooked bottlenecks. Based on this analysis, we propose a remarkably simple and effective method, {\ourmethod}, that entails two strategies: 1) filtering the high-frequency components from the reconstruction target to de-emphasize the network's focus on texture-rich details and 2) adopting a conservative data transform strategy to alleviate the problem of missing foreground in MIM training. {\ourmethod} can be easily integrated into most existing pixel-based MIM approaches (\ie, using raw images as reconstruction target) with negligible additional computation. Without bells and whistles, our method consistently improves three MIM approaches, MAE, ConvMAE, and LSMAE, across various downstream tasks. We believe this effective plug-and-play method will serve as a strong baseline for self-supervised learning and provide insights for future improvements of the MIM framework. Code and models are available at \url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim}.	翻訳日:2023-03-27 17:37:21 公開日:2023-03-24
# オンライン討論におけるヘイト、毒性、過激な集団的モデレーション Collective moderation of hate, toxicity, and extremity in online discussions ( http://arxiv.org/abs/2303.00357v2 ) ライセンス: Link先を確認	Jana Lasser and Alina Herderich and Joshua Garland and Segun Taofeek Aroyehun and David Garcia and Mirta Galesic	(参考訳) ネット上でのヘイト、毒性、過激主義を市民はどうやって抑えられるのか? 我々は、移民危機と政治的混乱が続く4年間にわたる混乱の中で、ドイツTwitterに関する13万人以上の議論の大規模なコーパスを分析した。人間の注釈、言語モデル、機械学習分類器、および縦断統計分析の助けを借りて、言論の異なる次元のダイナミクスを識別する。単純な意見を表現することは、必ずしも事実によって支持されるのではなく、侮辱なしでも、後続の議論において、憎悪、毒性、言論の過激さに関係している。サルカズムはこれらの成果、特に組織化された極端な集団の存在の達成にも貢献する。事実の提供や矛盾の露呈といったより建設的なコメントは、反発し、より過激さを惹きつける可能性がある。アウトグループまたはイングループへの言及は、一般的に長期的にの談話の悪化と関連している。怒りや恐怖などの否定的な感情のトーンや、熱意やプライドなどの肯定的な感情のトーンも、より悪い結果をもたらす。会話の小さなサンプルをワンショット分析するだけでなく,集合的市民モデレーションによるオンラインコモンズの管理が成功に繋がる可能性が示唆された。 How can citizens moderate hate, toxicity, and extremism in online discourse? We analyze a large corpus of more than 130,000 discussions on German Twitter over the turbulent four years marked by the migrant crisis and political upheavals. With a help of human annotators, language models, machine learning classifiers, and longitudinal statistical analyses, we discern the dynamics of different dimensions of discourse. We find that expressing simple opinions, not necessarily supported by facts but also without insults, relates to the least hate, toxicity, and extremity of speech and speakers in subsequent discussions. Sarcasm also helps in achieving those outcomes, in particular in the presence of organized extreme groups. More constructive comments such as providing facts or exposing contradictions can backfire and attract more extremity. Mentioning either outgroups or ingroups is typically related to a deterioration of discourse in the long run. A pronounced emotional tone, either negative such as anger or fear, or positive such as enthusiasm and pride, also leads to worse outcomes. Going beyond one-shot analyses on smaller samples of discourse, our findings have implications for the successful management of online commons through collective civic moderation.	翻訳日:2023-03-27 17:36:55 公開日:2023-03-24
# 汎用的映像モーメント検索に向けて:画像テキスト事前学習へのビジュアルダイナミックインジェクション Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training ( http://arxiv.org/abs/2303.00040v2 ) ライセンス: Link先を確認	Dezhao Luo, Jiabo Huang, Shaogang Gong, Hailin Jin, Yang Liu	(参考訳) 視覚とテキストの相関関係はビデオモーメント検索(VMR)において重要であるが,既存の手法では視覚とテキストの理解のために,個別の事前学習機能抽出器に大きく依存している。十分な時間境界アノテーションがなければ、ユニバーサルなビデオテキストアライメントを学ぶことは簡単ではない。本研究では,大規模画像テキストデータから派生したマルチモーダル相関を探索し,vmrの一般化を容易にする。映像変化のキャプチャにおける画像テキスト事前学習モデルの限界に対処するため,映像モーメントの理解を促進するため,視覚動的インジェクション(vdi)と呼ばれる汎用的な手法を提案する。既存のvmr手法は時相認識ビデオ機能の構築に重点を置いているが、時相変化に関するテキスト記述を認識することも重要であるが、元々は静的画像と文をマッチングして事前学習では見過ごされていた。そこで,映像フレームから映像コンテキストと空間動的情報を抽出し,映像変化を表すフレーズ(例えば動詞)とのアライメントを明示的に強制する。これにより、ビデオ中の可能性のある視覚および動きパターンを対応するテキスト埋め込み(インジェクション)にエンコードし、より正確なビデオテキストアライメントを可能にする。我々は2つのVMRベンチマークデータセット(Charades-STAとActivityNet-Captions)で広範な実験を行い、最先端のパフォーマンスを実現した。特に、VDIは、新規なシーンと語彙を含むテストサンプルが配布外分割でテストされる際、顕著な利点をもたらす。 The correlation between the vision and text is essential for video moment retrieval (VMR), however, existing methods heavily rely on separate pre-training feature extractors for visual and textual understanding. Without sufficient temporal boundary annotations, it is non-trivial to learn universal video-text alignments. In this work, we explore multi-modal correlations derived from large-scale image-text data to facilitate generalisable VMR. To address the limitations of image-text pre-training models on capturing the video changes, we propose a generic method, referred to as Visual-Dynamic Injection (VDI), to empower the model's understanding of video moments. Whilst existing VMR methods are focusing on building temporal-aware video features, being aware of the text descriptions about the temporal changes is also critical but originally overlooked in pre-training by matching static images with sentences. Therefore, we extract visual context and spatial dynamic information from video frames and explicitly enforce their alignments with the phrases describing video changes (e.g. verb). By doing so, the potentially relevant visual and motion patterns in videos are encoded in the corresponding text embeddings (injected) so to enable more accurate video-text alignments. We conduct extensive experiments on two VMR benchmark datasets (Charades-STA and ActivityNet-Captions) and achieve state-of-the-art performances. Especially, VDI yields notable advantages when being tested on the out-of-distribution splits where the testing samples involve novel scenes and vocabulary.	翻訳日:2023-03-27 17:36:34 公開日:2023-03-24
# Masked Image Modeling (MIM) を用いたリモートセンシングシーン分類 Remote Sensing Scene Classification with Masked Image Modeling (MIM) ( http://arxiv.org/abs/2302.14256v2 ) ライセンス: Link先を確認	Liya Wang, Alex Tien	(参考訳) リモートセンシングシーンの分類は、地質調査、石油探査、交通管理、地震予知、山火事モニタリング、情報監視において重要な役割を果たしている。過去には、タスクを実行する機械学習(ML)メソッドは、主に教師あり学習(SL)の方法で事前訓練されたバックボーンを使用していた。自己教師付き学習(SSL)技術であるMasked Image Modeling(MIM)が視覚特徴表現学習のより良い方法として示されたため、シーン分類タスクにおけるMLパフォーマンスを改善する新たな機会が提示された。本研究では,merced, aid, nwpu-resisc45, optimal-31の4つの分類データセットにおいて,mim事前学習されたバックボーンの可能性を検討することを目的とした。公開ベンチマークと比較すると,mimプリトレーニング視覚トランスフォーマ(vits)バックボーンは,他の選択肢(トップ1の精度では最大18%)よりも優れており,mimテクニックは教師あり学習よりも優れた特徴表現(トップ1の精度では最大5%)を学習できることが示されている。さらに, 汎用MIM-Pretrained ViTsは, リモートセンシング(TRS)フレームワークとして設計されながら複雑なトランスフォーマーとして, 競争力を発揮することを示す。実験結果は,今後の研究における性能ベースラインも提供する。 Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the Machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked Image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained Vision Transformer (ViTs) backbones outperform other alternatives (up to 18% on top 1 accuracy) and that the MIM technique can learn better feature representation than the supervised learning counterparts (up to 5% on top 1 accuracy). Moreover, we show that the general-purpose MIM-pretrained ViTs can achieve competitive performance as the specially designed yet complicated Transformer for Remote Sensing (TRS) framework. Our experiment results also provide a performance baseline for future studies.	翻訳日:2023-03-27 17:36:06 公開日:2023-03-24
# CBA:物理世界における光学的空中検出に対する背景背景攻撃 CBA: Contextual Background Attack against Optical Aerial Detection in the Physical World ( http://arxiv.org/abs/2302.13519v3 ) ライセンス: Link先を確認	Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei	(参考訳) パッチベースの物理的攻撃はますます懸念を喚起している。しかし、既存の手法のほとんどは地上で捕獲された目標を無視することに焦点を当てており、これらの方法のいくつかは単に空中探知機を欺くために拡張されている。物理的に標的となる物体を精巧な対向パッチで削り、これは空中検出器の予測をわずかに妨げ、攻撃の伝達性が弱いだけである。以上の課題に対処するため,本研究では,空中検出に対する新たな物理的攻撃フレームワークであるコンテキスト背景攻撃(CBA)を提案する。特に、関心の対象、すなわち航空画像における航空機は、敵のパッチをマスキングするために採用されている。マスク領域の外の画素は、生成した対向パッチが検出の重要背景領域を密にカバーするように最適化されており、これは現実世界においてより堅牢で移動可能な攻撃力を持つ対向パッチの贈与に寄与する。攻撃性能をさらに強化するため、敵パッチはトレーニング中に外部目標とされ、検出された対象物(オン・アンド・アウト・パッチ)は攻撃効果の蓄積に寄与する。これにより、高度に設計されたパッチは、対向パッチの上と外の両方のオブジェクトに対して、しっかりとした騙し効果を同時に付与される。大規模にスケールされた実験は、物理的なシナリオにおいて行われ、提案した物理攻撃フレームワークの優位性と可能性を示す。提案手法は,多様な航空検出器と防衛手法の対角的ロバスト性を評価するための指標として期待できる。 Patch-based physical attacks have increasingly aroused concerns. However, most existing methods focus on obscuring targets captured on the ground, and some of these methods are simply extended to deceive aerial detectors. They smear the targeted objects in the physical world with the elaborated adversarial patches, which can only slightly sway the aerial detectors' prediction and with weak attack transferability. To address the above issues, we propose to perform Contextual Background Attack (CBA), a novel physical attack framework against aerial detection, which can achieve strong attack efficacy and transferability in the physical world even without smudging the interested objects at all. Specifically, the targets of interest, i.e. the aircraft in aerial images, are adopted to mask adversarial patches. The pixels outside the mask area are optimized to make the generated adversarial patches closely cover the critical contextual background area for detection, which contributes to gifting adversarial patches with more robust and transferable attack potency in the real world. To further strengthen the attack performance, the adversarial patches are forced to be outside targets during training, by which the detected objects of interest, both on and outside patches, benefit the accumulation of attack efficacy. Consequently, the sophisticatedly designed patches are gifted with solid fooling efficacy against objects both on and outside the adversarial patches simultaneously. Extensive proportionally scaled experiments are performed in physical scenarios, demonstrating the superiority and potential of the proposed framework for physical attacks. We expect that the proposed physical attack method will serve as a benchmark for assessing the adversarial robustness of diverse aerial detectors and defense methods.	翻訳日:2023-03-27 17:35:41 公開日:2023-03-24
# 位置依存有効質量を持つ半圧高調波振動子モデルのウィグナー関数 The Wigner function of a semiconfined harmonic oscillator model with a position-dependent effective mass ( http://arxiv.org/abs/2302.12673v2 ) ライセンス: Link先を確認	S.M. Nagiyev, A.M. Jafarova and E.I. Jafarov	(参考訳) 我々は、量子調和振動子モデルに対するウィグナー関数の観点から位相空間表現の概念を開発し、その位置によって変化する質量を通して半収束効果を示す。このような半閉じ込め量子系に対するウィグナー分布関数の解析計算に新たな手法を適用した。この方法では、量子分布関数の定義における積分のばらつきを抑えることができ、半収束振動子モデルの定常状態に対する解析式の計算に繋がる。この量子系に対する応用外等質場の存在と欠如の両方のケースについて研究した。得られたウィグナー分布関数の正確な表現は、第一種およびラゲール多項式のベッセル関数を介して表現される。さらに、いくつかの特殊な事例と限界を詳細に論じる。 We develop a phase-space representation concept in terms of the Wigner function for a quantum harmonic oscillator model that exhibits the semiconfinement effect through its mass varying with the position. The new method is applied for the analytical computation of the Wigner distribution function for such a semiconfinement quantum system. The method allows for suppression of the divergence of the integrand in the definition of the quantum distribution function and leads to the computation of its analytical expressions for the stationary states of the semiconfined oscillator model. Both cases of the presence and absence of the applied external homogeneous field for this quantum system are studied. Obtained exact expressions of the Wigner distribution function are expressed through the Bessel function of the first kind and Laguerre polynomials. Further, some of the special cases and limits are discussed in detail.	翻訳日:2023-03-27 17:35:14 公開日:2023-03-24
# masterの論文:エネルギーベースモデルによる分布外検出 Master's Thesis: Out-of-distribution Detection with Energy-based Models ( http://arxiv.org/abs/2302.12002v2 ) ライセンス: Link先を確認	Sven Elflein	(参考訳) 現在、ディープラーニングは、自動運転や医療診断のようなセキュリティクリティカルな状況にますます適用されている。その成功にもかかわらず、ディープネットワークの振る舞いと堅牢性はまだ完全には理解されておらず、重大なリスクをもたらしている。特に最近研究者たちは、ニューラルネットワークは、これまで見たことのないデータでも、その予測に過度に自信を持っていることを発見しました。この問題に取り組むために、文献における2つのアプローチを区別することができる。 1つは予測の不確実性を考慮し、もう1つはトレーニングデータの基盤となる密度を推定し、与えられた入力がトレーニングデータに近いかどうかを判断し、ネットワークが期待通りに実行可能であることを示し、本論文では、トレーニングデータ分布を適合させるタスクにおけるebmsの能力を調査し、分散(ood)入力の検出を行う。ほとんどのデータセットでは、EDMは柔軟性に拘わらず、OODデータの検出において、本質的に他の密度推定器よりも優れているわけではない。そこで本研究では,ebmsの性能に対する監督,寸法削減,アーキテクチャ変更の影響についても検討した。 OOD検出問題に対処する2つのアプローチのギャップを埋め、EBM内の様々な不確かさを分類するために推定できるEnergy-Prior Network(EPN)を提案する。 EBMにおけるディリクレ分布の濃度パラメータと接合エネルギーとの間の関係を同定する。さらに、一部のアプリケーションでは利用できない、あるいはコストのかかるOODデータセットを保持せずに最適化できる。最後に, エネルギー優先ネットワーク (epn) がood入力, データセットシフト, 逆例として検出できることを実証的に示す。理論的には、EPNは、入力がトレーニングデータから遠く離れた場合、漸近的ケースに対して好ましい特性を提供する。 Today, deep learning is increasingly applied in security-critical situations such as autonomous driving and medical diagnosis. Despite its success, the behavior and robustness of deep networks are not fully understood yet, posing a significant risk. In particular, researchers recently found that neural networks are overly confident in their predictions, even on data they have never seen before. To tackle this issue, one can differentiate two approaches in the literature. One accounts for uncertainty in the predictions, while the second estimates the underlying density of the training data to decide whether a given input is close to the training data, and thus the network is able to perform as expected.In this thesis, we investigate the capabilities of EBMs at the task of fitting the training data distribution to perform detection of out-of-distribution (OOD) inputs. We find that on most datasets, EBMs do not inherently outperform other density estimators at detecting OOD data despite their flexibility. Thus, we additionally investigate the effects of supervision, dimensionality reduction, and architectural modifications on the performance of EBMs. Further, we propose Energy-Prior Network (EPN) which enables estimation of various uncertainties within an EBM for classification, bridging the gap between two approaches for tackling the OOD detection problem. We identify a connection between the concentration parameters of the Dirichlet distribution and the joint energy in an EBM. Additionally, this allows optimization without a held-out OOD dataset, which might not be available or costly to collect in some applications. Finally, we empirically demonstrate that Energy-Prior Network (EPN) is able to detect OOD inputs, datasets shifts, and adversarial examples. Theoretically, EPN offers favorable properties for the asymptotic case when inputs are far from the training data.	翻訳日:2023-03-27 17:35:03 公開日:2023-03-24
# 高速対人訓練における破滅的オーバーフィッティングの考察--自己適合の視点から Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective ( http://arxiv.org/abs/2302.11963v2 ) ライセンス: Link先を確認	Zhengbao He, Tao Li, Sizhe Chen and Xiaolin Huang	(参考訳) 高速対向トレーニングは、堅牢なネットワークを構築するための効率的なアプローチを提供するが、多段階の堅牢な精度が突然0に崩壊する破滅的なオーバーフィッティング(CO)と呼ばれる深刻な問題に悩まされる。本稿では,データ情報と自己情報に一段階の逆転例を分離し,この現象を「自己適合」と呼ぶ興味深い現象を明らかにした。自己適合、すなわち、ネットワークは単一ステップの摂動に埋め込まれた自己情報を学び、自然にCOが発生する。自己適合が発生すると、ネットワークは明らかな「チャネル分化」現象を経験し、自己情報を認識するための畳み込みチャネルが支配的になり、一方、データ情報のチャンネルは抑圧される。このようにして、ネットワークは十分な自己情報を持つ画像のみを認識でき、他の種類のデータに対する一般化能力を失う。自己適合に基づいて,COを緩和し,COを多段階の対人訓練に拡張する既存手法に関する新たな知見を提供する。本研究は, 対人訓練における自己学習のメカニズムを明らかにし, 異なる種類の情報を抑制してCOを緩和するための新たな視点を開く。 Although fast adversarial training provides an efficient approach for building robust networks, it may suffer from a serious problem known as catastrophic overfitting (CO), where multi-step robust accuracy suddenly collapses to zero. In this paper, we for the first time decouple single-step adversarial examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting". Self-fitting, i.e., the network learns the self-information embedded in single-step perturbations, naturally leads to the occurrence of CO. When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed. In this way, the network can only recognize images with sufficient self-information and loses generalization ability to other types of data. Based on self-fitting, we provide new insights into the existing methods to mitigate CO and extend CO to multi-step adversarial training. Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.	翻訳日:2023-03-27 17:34:35 公開日:2023-03-24
# NISQデバイスにおける変分ギブス状態生成 Variational Gibbs State Preparation on NISQ devices ( http://arxiv.org/abs/2303.11276v2 ) ライセンス: Link先を確認	Mirko Consiglio, Jacopo Settino, Andrea Giordano, Carlo Mastroianni, Francesco Plastina, Salvatore Lorenzo, Sabrina Maniscalco, John Goold, Tony J. G. Apollaro	(参考訳) ノイズのある中間スケール(NISQ)デバイス上での量子多体系の平衡熱状態の生成は、量子計算の応用範囲を広げるために重要な課題である。忠実なギブス状態準備は、熱化や平衡外熱力学などのプロトコルを調査する方法と、ギブス状態からのサンプリングが重要なサブルーチンを構成する量子アルゴリズムに有用なリソースを提供する。量子多体系のギブス状態を作成するための変分量子アルゴリズム(VQA)を提案する。我々のVQAの新規性は、2つの異なる接続された量子レジスタに作用するパラメータ化量子回路を実装することである。 vqaはヘルムホルツ自由エネルギーを評価し、フォン・ノイマンエントロピーは1つのレジスタ上の計算基底測定の事後処理によって得られ、ギブス状態はエネルギー基底のユニタリ回転を介して他のレジスタで作成される。最後に, 逆場イジングモデルのギブズ状態を作成してVQAをベンチマークし, 状態ベクトルシミュレーションにおいて, 広範囲の温度で極めて高い忠実性を実現する。また、IBM量子コンピュータにおけるVQAの性能を評価し、現在のNISQデバイスで実現可能であることを示す。 The preparation of an equilibrium thermal state of a quantum many-body system on noisy intermediate-scale (NISQ) devices is an important task in order to extend the range of applications of quantum computation. Faithful Gibbs state preparation would pave the way to investigate protocols such as thermalization and out-of-equilibrium thermodynamics, as well as providing useful resources for quantum algorithms, where sampling from Gibbs states constitutes a key subroutine. We propose a variational quantum algorithm (VQA) to prepare Gibbs states of a quantum many-body system. The novelty of our VQA consists in implementing a parameterized quantum circuit acting on two distinct, yet connected, quantum registers. The VQA evaluates the Helmholtz free energy, where the von Neumann entropy is obtained via post-processing of computational basis measurements on one register, while the Gibbs state is prepared on the other register, via a unitary rotation in the energy basis. Finally, we benchmark our VQA by preparing Gibbs states of the transverse field Ising model and achieve remarkably high fidelities across a broad range of temperatures in statevector simulations. We also assess the performance of the VQA on IBM quantum computers, showcasing its feasibility on current NISQ devices.	翻訳日:2023-03-27 17:29:10 公開日:2023-03-24
# 自己監督学習のためのオープンセットからのコアセットサンプリング Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning ( http://arxiv.org/abs/2303.11101v2 ) ライセンス: Link先を確認	Sungnyun Kim, Sangmin Bae, Se-Young Yun	(参考訳) 一般領域におけるディープラーニングは、きめ細かい特徴の認識を必要とするドメイン固有のタスクに絶えず拡張されてきた。しかし、細かなタスクに対する現実世界のアプリケーションは、2つの課題に直面している: アノテーションの専門知識に高い依存と、特定のドメインにおける様々な下流タスクの汎用モデルの必要性(例えば、カテゴリの予測、バウンディングボックス、ピクセル単位でのアノテーションなど)。幸いなことに、最近の自己教師型学習(SSL)は、アノテーションなしでモデルを事前トレーニングするための有望なアプローチであり、下流タスクの効果的な初期化として役立ちます。 SSLはアノテーションの存在に依存しないので、一般に、オープンセットと呼ばれる大規模なラベルなしデータセットを使用する。この意味では,事前学習段階において,大規模無ラベルオープンセットと細粒度目標データセットが利用可能であることを前提として,新しいオープンセット自己教師付き学習問題を導入する。問題設定では、オープンセットとターゲットデータセットの分布ミスマッチを考慮することが重要である。そこで我々はSimCoreアルゴリズムを用いて、潜在空間内のターゲットデータセットに最小距離を持つオープンセットのサブセットであるコアセットをサンプリングする。また,SimCoreは,11個の細粒度データセットと7つのオープンセットを含む広範囲な実験的な設定により,表現学習性能を著しく向上することを示した。 Deep learning in general domains has constantly been extended to domain-specific tasks requiring the recognition of fine-grained characteristics. However, real-world applications for fine-grained tasks suffer from two challenges: a high reliance on expert knowledge for annotation and necessity of a versatile model for various downstream tasks in a specific domain (e.g., prediction of categories, bounding boxes, or pixel-wise annotations). Fortunately, the recent self-supervised learning (SSL) is a promising approach to pretrain a model without annotations, serving as an effective initialization for any downstream tasks. Since SSL does not rely on the presence of annotation, in general, it utilizes the large-scale unlabeled dataset, referred to as an open-set. In this sense, we introduce a novel Open-Set Self-Supervised Learning problem under the assumption that a large-scale unlabeled open-set is available, as well as the fine-grained target dataset, during a pretraining phase. In our problem setup, it is crucial to consider the distribution mismatch between the open-set and target dataset. Hence, we propose SimCore algorithm to sample a coreset, the subset of an open-set that has a minimum distance to the target dataset in the latent space. We demonstrate that SimCore significantly improves representation learning performance through extensive experimental settings, including eleven fine-grained datasets and seven open-sets in various downstream tasks.	翻訳日:2023-03-27 17:28:47 公開日:2023-03-24
# ERSAM: エネルギー効率とリアルタイムソーシャルアンビアンス測定のためのニューラルアーキテクチャ検索 ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement ( http://arxiv.org/abs/2303.10727v2 ) ライセンス: Link先を確認	Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Lin, Ashutosh Sabharwal	(参考訳) ソーシャル・アンビアンス(social ambiance)は、社会的相互作用が起こるコンテキストを記述し、同時話者数を数えることで音声を用いて測定することができる。この測定により、さまざまなメンタルヘルストラッキングと人間中心のIoTアプリケーションが可能になる。デバイス上のSocal Ambiance Measure(SAM)は、ユーザのプライバシの確保と、前述のアプリケーションの広範な採用を促進するために非常に望ましいものだが、最先端のディープニューラルネットワーク(DNN)を使用したSAMソリューションに必要な計算複雑性は、モバイルデバイス上の制約の多いリソースとは相反する。さらに、様々なプライバシの制約と必要な人的努力により、SAMの臨床的設定下において限られたラベル付きデータのみが利用可能または実用的であり、オンデバイスSAMソリューションの達成可能な正確性に挑戦する。そこで本研究では,エネルギー効率とリアルタイムSAM(ERSAM)のためのニューラルネットワーク検索フレームワークを提案する。具体的には、当社のERSAMフレームワークは、モバイルSAMソリューションのハードウェア効率フロンティアに対して達成可能な精度を推し進めるDNNを自動的に検索することができる。例えば、ERSAMが配信するDNNは、Pixel 3の5秒の音声セグメントで40mW x 12hエネルギーと0.05秒の処理レイテンシしか消費せず、LibriSpeechが生成した社会環境データセットでは14.3%のエラー率しか達成していない。当社のERSAMフレームワークは、需要が増大しているデバイス上のSAMソリューションをユビキタスに構築できることを期待しています。 Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW x 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.	翻訳日:2023-03-27 17:28:22 公開日:2023-03-24
# 映像予測のための動的マルチスケールVoxel Flow Network A Dynamic Multi-Scale Voxel Flow Network for Video Prediction ( http://arxiv.org/abs/2303.09875v2 ) ライセンス: Link先を確認	Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou	(参考訳) ビデオ予測の性能は、高度なディープニューラルネットワークによって大幅に向上している。しかし、現在の手法のほとんどは大きなモデルサイズに悩まされており、将来性のある性能のためにセマンティック/深度マップのような追加の入力を必要とする。本稿では,RGB画像のみを用いて,より少ない計算コストでより優れた映像予測性能を実現するための動的マルチスケールVoxel Flow Network(DMVFN)を提案する。 DMVFNの中核は、ビデオフレームの運動スケールを効果的に知覚できる、微分可能なルーティングモジュールである。トレーニングが完了すると、DMVFNは推論段階で異なる入力に対する適応サブネットワークを選択する。いくつかのベンチマーク実験により、DMVFNはDeep Voxel Flowよりも桁違いに高速であり、生成した画像の品質に対して最先端の反復型OPTを超えることが示されている。コードとデモはhttps://huxiaotaostasy.github.io/dmvfn/で閲覧できます。 The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and demo are available at https://huxiaotaostasy.github.io/DMVFN/.	翻訳日:2023-03-27 17:27:33 公開日:2023-03-24
# ニューラルネットワークトレーニングのためのカスケードフォワードアルゴリズム The Cascaded Forward Algorithm for Neural Network Training ( http://arxiv.org/abs/2303.09728v2 ) ライセンス: Link先を確認	Gongpei Zhao, Tao Wang, Yidong Li, Yi Jin, Congyan Lang, Haibin Ling	(参考訳) バックプロパゲーションアルゴリズムは、過去10年間、ニューラルネットワークの主流となる学習手順として広く使われてきた。しかし、このアルゴリズムにはいくつかの制限があり、例えば局所的な極小さに固執し、その生物学的な可能性に関する疑問を引き起こした。これらの制限に対処するために、バックプロパゲーションの代替アルゴリズムが事前に検討されており、フォワードフォワード(ff)アルゴリズムがよく知られている。本稿では,ニューラルネットワークのための新しい学習フレームワークであるCascaded Forward(CaFo)アルゴリズムを提案する。 FFとは異なり、我々のフレームワークは各カスケードブロックのラベル分布を直接出力するが、これは追加の負のサンプルの生成を必要としないため、トレーニングとテストの両方においてより効率的なプロセスにつながる。さらに,我々のフレームワークでは,各ブロックを独立して訓練することが可能であり,並列加速度システムに容易に展開できる。提案手法を4つの公開画像分類ベンチマークで評価し, 実験結果から, ベースラインと比較した場合の予測精度が有意に向上することを示した。 Backpropagation algorithm has been widely used as a mainstream learning procedure for neural networks in the past decade, and has played a significant role in the development of deep learning. However, there exist some limitations associated with this algorithm, such as getting stuck in local minima and experiencing vanishing/exploding gradients, which have led to questions about its biological plausibility. To address these limitations, alternative algorithms to backpropagation have been preliminarily explored, with the Forward-Forward (FF) algorithm being one of the most well-known. In this paper we propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples and thus leads to a more efficient process at both training and testing. Moreover, in our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems. The proposed method is evaluated on four public image classification benchmarks, and the experimental results illustrate significant improvement in prediction accuracy in comparison with the baseline.	翻訳日:2023-03-27 17:27:16 公開日:2023-03-24
# 3次テンソル用マルチスライスクラスタリングのDBSCAN DBSCAN of Multi-Slice Clustering for Third-Order Tensors ( http://arxiv.org/abs/2303.07768v3 ) ライセンス: Link先を確認	Dina Faneva Andriantsiory, Joseph Ben Geloun, Mustapha Lebbah	(参考訳) 3次元データのトリクラスタリングには、各次元のクラスタサイズやクラスタ数を指定する必要がある。この問題に対処するために、3階テンソルのマルチスライスクラスタリング(msc)は、しきい値の類似性に基づいてクラスタを見つけるために、ランク1テンソルデータセットの低次元部分空間にある信号スライスを見つける。データセットがrランク1テンソル(r > 1)の和である場合、データから異なる部分空間にある異なるスライス群を抽出するMSC-DBSCANという拡張アルゴリズムを提案する。我々のアルゴリズムはMSCアルゴリズムと同じ入力を使い、MSCとランクワンテンソルデータの解を見つけることができる。 Several methods for triclustering three-dimensional data require the cluster size or the number of clusters in each dimension to be specified. To address this issue, the Multi-Slice Clustering (MSC) for 3-order tensor finds signal slices that lie in a low dimensional subspace for a rank-one tensor dataset in order to find a cluster based on the threshold similarity. We propose an extension algorithm called MSC-DBSCAN to extract the different clusters of slices that lie in the different subspaces from the data if the dataset is a sum of r rank-one tensor (r > 1). Our algorithm uses the same input as the MSC algorithm and can find the same solution for rank-one tensor data as MSC.	翻訳日:2023-03-27 17:26:31 公開日:2023-03-24
# vmcdl: ソース制御フロー下のカスケードディープラーニングに基づく脆弱性マイニング VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow ( http://arxiv.org/abs/2303.07128v2 ) ライセンス: Link先を確認	Wen Zhou	(参考訳) コンピュータ産業とコンピュータソフトウェアの急速な発展により、ソフトウェアの脆弱性が悪用されるリスクは大きく増大した。しかし、漏洩源調査のための既存の鉱業技術には、高い誤報率、粗粒度検出、専門家の経験への依存など、多くの欠点がある。本稿では,主にSARDデータセットのc/c++ソースコードデータを使用し,CWE476,CWE469,CWE516,CWE570脆弱性型のソースコードを処理し,最先端ツールのJoern脆弱性スキャン機能をテストするとともに,ソースコード制御フローに基づく新たなカスケード深層学習モデルVMCDLを提案する。まず,感性のある関数や文の探索と抽出にJoernを用い,脆弱なコードの文ライブラリを形成する。そして、CFGフロー脆弱性コードスニペットを双方向の幅優先トラバーサルで生成し、Doc2vecでベクトル化する。最後に、ソースコード制御フローに基づくカスケードディープラーニングモデルを用いて分類を行い、分類結果を得る。実験評価では,特定の脆弱性についてJoernのテスト結果を与え,単一脆弱性型ソースコード上でモデルアルゴリズムのバイナリ分類結果の混乱行列とラベルデータを与え,FPR,FNR,ACC,P,F1の5指標をそれぞれ10.30%,520%,92.50%,85.10%,85.40%とし,静的解析の誤報率を効果的に低減できることを示した。 With the rapid development of the computer industry and computer software, the risk of software vulnerabilities being exploited has greatly increased. However, there are still many shortcomings in the existing mining techniques for leakage source research, such as high false alarm rate, coarse-grained detection, and dependence on expert experience. In this paper, we mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types, test the Joern vulnerability scanning function of the cutting-edge tool, and propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities. First, this paper uses joern to locate and extract sensitive functions and statements to form a sensitive statement library of vulnerable code. Then, the CFG flow vulnerability code snippets are generated by bidirectional breadth-first traversal, and then vectorized by Doc2vec. Finally, the cascade deep learning model based on source code control flow is used for classification to obtain the classification results. In the experimental evaluation, we give the test results of Joern on specific vulnerabilities, and give the confusion matrix and label data of the binary classification results of the model algorithm on single vulnerability type source code, and compare and verify the five indicators of FPR, FNR, ACC, P and F1, respectively reaching 10.30%, 5.20%, 92.50%,85.10% and 85.40%,which shows that it can effectively reduce the false alarm rate of static analysis.	翻訳日:2023-03-27 17:26:17 公開日:2023-03-24
# 合成結晶を用いたニューラルネットワークによるICSD粉末X線回折法による構造情報の抽出 Neural networks trained on synthetically generated crystals can extract structural information from ICSD powder X-ray diffractograms ( http://arxiv.org/abs/2303.11699v2 ) ライセンス: Link先を確認	Henrik Schopmans, Patrick Reiser, Pascal Friederich	(参考訳) 機械学習技術は粉末x線回折から結晶空間群などの構造情報を抽出するのに成功している。しかし、ICSDのようなデータベースからシミュレーションされたディフラクトグラムを直接トレーニングすることは、そのサイズ、クラス不均一性、特定の構造タイプに対するバイアスのために困難である。本稿では,各空間群の対称性演算を用いてランダム座標を持つ合成結晶を生成する方法を提案する。このアプローチに基づいて,1時間に数百万のオンザフライ生成された合成ディフラクトグラムに対して,Deep ResNetライクなモデルのオンライントレーニングを実演する。選択した空間群分類のタスクに対して、ほとんどの空間群からの未確認ICSD構造タイプに対して、79.9%の精度を達成した。これはICSD結晶のトレーニングにおける現在の最先端のアプローチの56.1%を超える。その結果, 合成した結晶は, icd粉体回折から構造情報を抽出でき, 粉体x線回折の領域において, 最先端の機械学習モデルを適用することが可能となった。また、特に高スループット環境では、自動XRDデータ分析が不可欠である実験データに適用するための第一歩を示す。宇宙群の予測に焦点をあてる一方で、我々のアプローチは将来、関連するタスクにまで拡張される可能性がある。 Machine learning techniques have successfully been used to extract structural information such as the crystal space group from powder X-ray diffractograms. However, training directly on simulated diffractograms from databases such as the ICSD is challenging due to its limited size, class-inhomogeneity, and bias toward certain structure types. We propose an alternative approach of generating synthetic crystals with random coordinates by using the symmetry operations of each space group. Based on this approach, we demonstrate online training of deep ResNet-like models on up to a few million unique on-the-fly generated synthetic diffractograms per hour. For our chosen task of space group classification, we achieved a test accuracy of 79.9% on unseen ICSD structure types from most space groups. This surpasses the 56.1% accuracy of the current state-of-the-art approach of training on ICSD crystals directly. Our results demonstrate that synthetically generated crystals can be used to extract structural information from ICSD powder diffractograms, which makes it possible to apply very large state-of-the-art machine learning models in the area of powder X-ray diffraction. We further show first steps toward applying our methodology to experimental data, where automated XRD data analysis is crucial, especially in high-throughput settings. While we focused on the prediction of the space group, our approach has the potential to be extended to related tasks in the future.	翻訳日:2023-03-27 17:16:49 公開日:2023-03-24
# bopr:人体形状とポーズ推定のための身体認識部レグレッサ BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation ( http://arxiv.org/abs/2303.11675v2 ) ライセンス: Link先を確認	Yongkang Cheng, Shaoli Huang, Jifeng Ning, Ying Shan	(参考訳) 本稿では,人体形状を推定し,眼球運動と深度あいまいさの課題に効果的に対処する単眼画像からポーズする新しいアプローチを提案する。提案手法であるBoPR(Body-Aware Part Regressor)は,まず注意誘導機構を用いて身体と部分の両方の特徴を抽出する。次に,クエリとして部分的特徴,参照として身体的特徴を含む部分的レグレッションに対する余分な部分的依存をエンコードするために,これらの機能を利用する。これにより,目に見える部分や身体参照情報を利用することで,身体とオクルードされた部分の空間的関係を推定できる。提案手法は2つのベンチマークデータセット上で既存の最先端手法よりも優れており,提案手法は深度あいまいさや閉塞処理の点で既存手法をはるかに上回っていることを示す。コードとデータは、https://github.com/cyk990422/BoPR.comで研究目的で公開されている。 This paper presents a novel approach for estimating human body shape and pose from monocular images that effectively addresses the challenges of occlusions and depth ambiguity. Our proposed method BoPR, the Body-aware Part Regressor, first extracts features of both the body and part regions using an attention-guided mechanism. We then utilize these features to encode extra part-body dependency for per-part regression, with part features as queries and body feature as a reference. This allows our network to infer the spatial relationship of occluded parts with the body by leveraging visible parts and body reference information. Our method outperforms existing state-of-the-art methods on two benchmark datasets, and our experiments show that it significantly surpasses existing methods in terms of depth ambiguity and occlusion handling. These results provide strong evidence of the effectiveness of our approach.The code and data are available for research purposes at https://github.com/cyk990422/BoPR.	翻訳日:2023-03-27 17:16:28 公開日:2023-03-24
# パラメータ化球面上の確率勾配勾配の収束と変分モンテカルロシミュレーションへの応用 Convergence of stochastic gradient descent on parameterized sphere with applications to variational Monte Carlo simulation ( http://arxiv.org/abs/2303.11602v2 ) ライセンス: Link先を確認	Nilin Abrahamsen and Zhiyan Ding and Gil Goldshlager and Lin Lin	(参考訳) ニューラルネットワークによってパラメータ化される高次元球面上の確率勾配勾配(SGD)型アルゴリズムを正規化定数まで解析する。教師付き学習の設定のための新しいアルゴリズムを提供し,その収束を理論的および数値的に示す。また、量子物理学において広く用いられている変分モンテカルロ法(VMC)に対応する教師なし設定に対する収束の最初の証明も提供する。 We analyze stochastic gradient descent (SGD) type algorithms on a high-dimensional sphere which is parameterized by a neural network up to a normalization constant. We provide a new algorithm for the setting of supervised learning and show its convergence both theoretically and numerically. We also provide the first proof of convergence for the unsupervised setting, which corresponds to the widely used variational Monte Carlo (VMC) method in quantum physics.	翻訳日:2023-03-27 17:16:09 公開日:2023-03-24
# ラベルノイズ学習のためのダイナミクス・アウェアロス Dynamics-Aware Loss for Learning with Label Noise ( http://arxiv.org/abs/2303.11562v2 ) ライセンス: Link先を確認	Xiu-Chuan Li, Xiaobo Xia, Fei Zhu, Tongliang Liu, Xu-Yao Zhang, Cheng-Lin Liu	(参考訳) ラベルノイズはディープニューラルネットワーク(DNN)に深刻な脅威をもたらす。堅牢性で適合性を調整できるロバスト損失関数を採用することは、この問題に対処するための単純だが効果的な戦略である。しかし、これらの2つの要因間の広く使われている静的トレードオフは、ラベルノイズによって学習されるDNNの動的性質と矛盾し、性能が低下する。そこで本稿では,この問題を解決するためにDAL(Dynamics-Aware Los)を提案する。 DNNはまず一般化されたパターンを学習し、ラベルノイズを徐々に過度にオーバーフィットする傾向があるので、DALは最初は適合性を強化し、その後徐々に頑丈さの重みを増す。さらに、後段では、DNNは硬いものよりも正確にラベル付けされる可能性が高い簡単な例に重点を置いて、ラベルノイズの負の影響をさらに低減するためにブートストラップ項を導入する。詳細な理論解析と広範な実験結果の両方が本手法の優越性を示している。 Label noise poses a serious threat to deep neural networks (DNNs). Employing robust loss function which reconciles fitting ability with robustness is a simple but effective strategy to handle this problem. However, the widely-used static trade-off between these two factors contradicts the dynamic nature of DNNs learning with label noise, leading to inferior performance. Therefore, we propose a dynamics-aware loss (DAL) to solve this problem. Considering that DNNs tend to first learn generalized patterns, then gradually overfit label noise, DAL strengthens the fitting ability initially, then gradually increases the weight of robustness. Moreover, at the later stage, we let DNNs put more emphasis on easy examples which are more likely to be correctly labeled than hard ones and introduce a bootstrapping term to further reduce the negative impact of label noise. Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.	翻訳日:2023-03-27 17:16:01 公開日:2023-03-24
# 調和ベースと新しいクラス:一般化Few-Shotセグメンテーションのためのクラスコントラストアプローチ Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation ( http://arxiv.org/abs/2303.13724v1 ) ライセンス: Link先を確認	Weide Liu, Zhonghua Wu, Yang Zhao, Yuming Fang, Chuan-Sheng Foo, Jun Cheng and Guosheng Lin	(参考訳) 少ショットセグメンテーション(FSSeg)の現在の手法は,基本クラスの性能を無視しながら,新しいクラスの性能向上に重点を置いている。この制限を克服するために、ベースクラスと新規クラスのセグメンテーションマスクの予測を目的とした、一般化された小ショットセグメンテーション(GFSSeg)のタスクが導入された。しかし、現在のプロトタイプベースの手法では、プロトタイプを更新する際にベースクラスと新規クラスの関係を明示的に考慮していないため、真のカテゴリを識別する性能は限られている。この課題に対処するために,プロトタイプ更新を規制し,異なるクラスからのプロトタイプ間の距離を広く促進するため,ベースクラスの性能を維持しながらクラスを区別するクラスコントラスト損失とクラス関係損失を提案する。提案手法は,PASCAL VOC および MS COCO データセット上での汎用小ショットセグメンテーションタスクに対して,新しい最先端性能を実現する。 Current methods for few-shot segmentation (FSSeg) have mainly focused on improving the performance of novel classes while neglecting the performance of base classes. To overcome this limitation, the task of generalized few-shot semantic segmentation (GFSSeg) has been introduced, aiming to predict segmentation masks for both base and novel classes. However, the current prototype-based methods do not explicitly consider the relationship between base and novel classes when updating prototypes, leading to a limited performance in identifying true categories. To address this challenge, we propose a class contrastive loss and a class relationship loss to regulate prototype updates and encourage a large distance between prototypes from different classes, thus distinguishing the classes from each other while maintaining the performance of the base classes. Our proposed approach achieves new state-of-the-art performance for the generalized few-shot segmentation task on PASCAL VOC and MS COCO datasets.	翻訳日:2023-03-27 16:23:38 公開日:2023-03-24
# ホモロジー量子ローター符号:トーションからの論理量子ビット Homological Quantum Rotor Codes: Logical Qubits from Torsion ( http://arxiv.org/abs/2303.13723v1 ) ライセンス: Link先を確認	Christophe Vuillot and Alessandro Ciani and Barbara M. Terhal	(参考訳) 複数の量子ローターを用いて論理情報を符号化するホモロジー量子ローター符号を正式に定義する。これらの符号は、論理振動子を符号化する線形振動子符号と同様に、量子ビットや量子ビットのホモロジーまたはCSS量子符号を一般化する。量子ビットや振動子とは異なり、ホモロジー量子ローター符号は、下層の鎖複体のホモロジーによって、論理ローターと論理キューディットの両方を符号化することができる。特に、実射影平面またはM\ "{o}bius strip" が量子ビットを符号化することによって得られる鎖複体に基づくコードである。本稿では, 連続安定器位相シフトによって拡散する論理演算子の概念により, 量子ビットの場合よりも微妙な符号間の距離スケーリングについて考察する。 2次元および3次元多様体に基づくホモロジー量子ロータ符号の構成と連鎖錯体の積を与える。我々は、キータエフの現在のミラー量子ビット(m\"{o}bius strip qubit)と同様に$0$-$\pi$-qubitが、そのようなコードの小さな例であり、拡張の可能性について議論している。 We formally define homological quantum rotor codes which use multiple quantum rotors to encode logical information. These codes generalize homological or CSS quantum codes for qubits or qudits, as well as linear oscillator codes which encode logical oscillators. Unlike for qubits or oscillators, homological quantum rotor codes allow one to encode both logical rotors and logical qudits, depending on the homology of the underlying chain complex. In particular, such a code based on the chain complex obtained from tessellating the real projective plane or a M\"{o}bius strip encodes a qubit. We discuss the distance scaling for such codes which can be more subtle than in the qubit case due to the concept of logical operator spreading by continuous stabilizer phase-shifts. We give constructions of homological quantum rotor codes based on 2D and 3D manifolds as well as products of chain complexes. Superconducting devices being composed of islands with integer Cooper pair charges could form a natural hardware platform for realizing these codes: we show that the $0$-$\pi$-qubit as well as Kitaev's current-mirror qubit -- also known as the M\"{o}bius strip qubit -- are indeed small examples of such codes and discuss possible extensions.	翻訳日:2023-03-27 16:23:22 公開日:2023-03-24
# 放射線治療中の患者の食道炎の存在と重症度を自動的に抽出する自然言語処理 Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy ( http://arxiv.org/abs/2303.13722v1 ) ライセンス: Link先を確認	Shan Chen, Marco Guevara, Nicolas Ramirez, Arpi Murray, Jeremy L. Warner, Hugo JWL Aerts, Timothy A. Miller, Guergana K. Savova, Raymond H. Mak, Danielle S. Bitterman	(参考訳) 放射線治療(RT)毒性は生存と生活の質を損なうことがあるが、未研究のままである。実世界の証拠は毒性の理解を改善する可能性を秘めているが、毒性情報はしばしば臨床記録に残されている。胸部RT治療患者における食道炎の存在と重症度を判定するための自然言語処理(NLP)モデルを開発した。 3つの食道炎分類タスクの統計的および事前訓練されたBERTモデル 1)食道炎の存在,課題 2)重症食道炎の有無、及び課題 3) 食道炎と1学年対2-3。 RTを施行した食道癌患者345名を対象に移植性試験を行った。微調整のPubmedBERTは最高のパフォーマンスを得た。最も優れたマクロF1は、タスク1、2、3それぞれ0.92、0.82、0.74であった。微調整中に最も情報性の高いノートセクションを選択すると、すべてのタスクでマクロF1が2%以上向上した。シルバーラベルデータのマクロF1は全タスクで3%以上改善された。食道癌注記では,第1節,第2節,第3節のマクロF1は0.73,第74,第0.65であった。当科におけるCTCAEガイドラインから食道炎毒性の重症度を自動的に抽出する試みとしては,これが初めてである。有望なパフォーマンスは、拡張されたドメインにおけるNLPベースの自動詳細な毒性監視のための概念実証を提供する。 Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet remain under-studied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for three esophagitis classification tasks: Task 1) presence of esophagitis, Task 2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by over 2% for all tasks. Silver-labeled data improved the macro-F1 by over 3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning. To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinic notes. The promising performance provides proof-of-concept for NLP-based automated detailed toxicity monitoring in expanded domains.	翻訳日:2023-03-27 16:22:58 公開日:2023-03-24
# ReCOGS:セマンティック解釈の評価における論理形式の詳細について ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation ( http://arxiv.org/abs/2303.13716v1 ) ライセンス: Link先を確認	Zhengxuan Wu, Christopher D. Manning, Christopher Potts	(参考訳) 構成一般化ベンチマークは、モデルが新しい文の意味を正確に計算できるかどうかを評価するが、論理形式(LF)予測の観点からこれを運用する。これにより、選択されたLFの意味的に無関係な詳細がモデルのパフォーマンスを形作るという懸念が持ち上がる。この懸念はCOGSベンチマーク(Kim and Linzen, 2020)で実現されていると論じる。 COGSは、現在のモデルでは不可能と思われる一般化分割を呈し、これらのモデルの起訴と見なすことができる。しかし, COGS LFs の偶発的特徴に負の相関がみられた。これらのLFを意味論的に等価なものに変換し、意味論的解釈とは無関係な能力を分解すると、ベースラインモデルでさえ牽引される。近年の COGS LF の変数自由翻訳では同様の結論が示唆されているが,この形式は意味論的に等価ではなく,COGS の意味を正確に表現することはできない。これらの結果から,COGSの改良版であるReCOGSの提案が示唆された。全体として,構成一般化と注意深いベンチマークタスク設計の重要性を再確認した。 Compositional generalization benchmarks seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen, 2020). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.	翻訳日:2023-03-27 16:22:37 公開日:2023-03-24
# 逆量子アニールとh-ゲインによる初期状態符号化 Initial state encoding via reverse quantum annealing and h-gain features ( http://arxiv.org/abs/2303.13748v1 ) ライセンス: Link先を確認	Elijah Pelofske, Georg Hahn, Hristo Djidjev	(参考訳) 量子アニーリング(quantum annealing)は、組合せ最適化問題の大域的最小解を得るために量子揺らぎを利用する特殊な量子計算手法である。 d-wave systems, inc.はクラウドコンピューティングリソースとして利用可能なquantum annealersを製造し、アニーリング計算で使用されるアニールスケジュールをプログラムできる。本稿では,量子アニーラタによって返される解の質を初期状態の符号化により改善することに関心を寄せる。このような初期状態を符号化できる2つのD-Wave機能、リバースアニーリングとh-ゲイン機能について検討する。 Reverse annealing (RA) は、良い解を表す古典的な状態から始まり、逆の場が存在する点へ後退し、前方のアニールでアニール処理を終了する既知の解を洗練することを目的としている。 h-ゲイン(HG)機能により、ハミルトニアンの線形(h$)バイアスに時間依存重み付けスキームを配置することができる。また,RAに類似した後方位相とHG初期状態符号化を用いた前方位相のハイブリッド手法も検討した。問題に対してRAとHGを反復的に適用するという考え方を,最適でない初期状態を単調に改善することを目的として検討する。 HGエンコーディング技術は、重み付き最大カット問題や重み付き最大斜め問題など、様々な入力問題に対して評価され、いくつかの問題に対してHG手法がRAの代替となることを示す。また, D-Wave Chimera チップと Pegasus チップをネイティブ接続したランダムスピングラス上で, RA および HG 初期状態での繰り返し処理の動作について検討した。 Quantum annealing is a specialized type of quantum computation that aims to use quantum fluctuations in order to obtain global minimum solutions of combinatorial optimization problems. D-Wave Systems, Inc., manufactures quantum annealers, which are available as cloud computing resources, and allow users to program the anneal schedules used in the annealing computation. In this paper, we are interested in improving the quality of the solutions returned by a quantum annealer by encoding an initial state. We explore two D-Wave features allowing one to encode such an initial state: the reverse annealing and the h-gain features. Reverse annealing (RA) aims to refine a known solution following an anneal path starting with a classical state representing a good solution, going backwards to a point where a transverse field is present, and then finishing the annealing process with a forward anneal. The h-gain (HG) feature allows one to put a time-dependent weighting scheme on linear ($h$) biases of the Hamiltonian, and we demonstrate that this feature likewise can be used to bias the annealing to start from an initial state. We also consider a hybrid method consisting of a backward phase resembling RA, and a forward phase using the HG initial state encoding. Importantly, we investigate the idea of iteratively applying RA and HG to a problem, with the goal of monotonically improving on an initial state that is not optimal. The HG encoding technique is evaluated on a variety of input problems including the weighted Maximum Cut problem and the weighted Maximum Clique problem, demonstrating that the HG technique is a viable alternative to RA for some problems. We also investigate how the iterative procedures perform for both RA and HG initial state encoding on random spin glasses with the native connectivity of the D-Wave Chimera and Pegasus chips.	翻訳日:2023-03-27 16:15:23 公開日:2023-03-24
# FixFit:過剰決定モデルにおける逆問題の解法にパラメータ圧縮を用いる FixFit: using parameter-compression to solve the inverse problem in overdetermined models ( http://arxiv.org/abs/2303.13746v1 ) ライセンス: Link先を確認	Botond B Antal, Anthony G Chesebro, Helmut H Strey, Lilianne R Mujica-Parodi, Corey Weistuch	(参考訳) 科学のあらゆる分野は数学的モデルに依存する。複雑な非線形モデルを使用する際の根本的な問題の一つは、モデルパラメータ間の相互作用がデータに等しく適合する複数のパラメータセットにつながるため、データ駆動パラメータ推定が失敗することが多いことである。そこで本研究では、与えられた数学的モデルのパラメータをモデル出力に固有の潜在表現に圧縮する、この問題に対処する新しい手法であるFixFitを開発する。この表現は、モデルパラメータとモデル出力のデータ対上にボトルネック層を持つニューラルネットワークをトレーニングすることで得られる。ボトルネック層ノードはユニークな潜在パラメータに対応し、その次元はモデルの情報内容を示す。トレーニングされたニューラルネットワークは、ボトルネック層をエンコーダに分割して冗長性とデコーダを特徴付けて、測定値から潜在パラメータを一意に推定することができる。古典物理学と神経科学の2つのユースケースでFixFitを実証する。 All fields of science depend on mathematical models. One of the fundamental problems with using complex nonlinear models is that data-driven parameter estimation often fails because interactions between model parameters lead to multiple parameter sets fitting the data equally well. Here, we develop a new method to address this problem, FixFit, which compresses a given mathematical model's parameters into a latent representation unique to model outputs. We acquire this representation by training a neural network with a bottleneck layer on data pairs of model parameters and model outputs. The bottleneck layer nodes correspond to the unique latent parameters, and their dimensionality indicates the information content of the model. The trained neural network can be split at the bottleneck layer into an encoder to characterize the redundancies and a decoder to uniquely infer latent parameters from measurements. We demonstrate FixFit in two use cases drawn from classical physics and neuroscience.	翻訳日:2023-03-27 16:14:47 公開日:2023-03-24
# EdgeTran: モバイルエッジプラットフォーム上での効率的な推論のための共設計トランスフォーマー EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms ( http://arxiv.org/abs/2303.13745v1 ) ライセンス: Link先を確認	Shikhar Tuli and Niraj K. Jha	(参考訳) 効率的なトランスモデルの自動設計は、最近、産業や学術から大きな注目を集めている。しかしながら、ほとんどの研究は、最高のパフォーマンスのトランスフォーマーアーキテクチャを探しながら、特定のメトリクスのみに焦点を当てている。さらに、従来の複雑で大規模なトランスフォーマーモデルを低スループットのエッジプラットフォーム上で実行することは難しい問題である。本研究では,トランスアーキテクチャとエッジデバイスの多種多様なセットの設計空間におけるハードウェア性能測定をプロファイリングするProTranというフレームワークを提案する。このプロファイラを,提案する共同設計手法と組み合わせて,与えられたタスクの精度が高く,レイテンシ,エネルギー消費,ピーク電力ドローを最小化し,エッジ展開を可能にする最善のモデルを得る。精度とハードウェア性能を協調最適化するためのフレームワークをEdgeTranと呼ぶ。最高のトランスフォーマーモデルとエッジデバイスペアを検索します。最後にgptranを提案する。gptranは、ハードウェアを意識した方法で精度をさらに向上させる、マルチステージのブロックレベルの成長後処理ステップである。得られたトランスモデルは2.8$\times$小さく、ベースライン(BERT-Base)よりも0.8%高いGLUEスコアを持つ。選択されたエッジデバイス上での推論により、15.0%のレイテンシ、10.0$\times$低エネルギー、および10.8$\times$低ピークパワードローが可能となる。 Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8$\times$ smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0$\times$ lower energy, and 10.8$\times$ lower peak power draw compared to an off-the-shelf GPU.	翻訳日:2023-03-27 16:14:31 公開日:2023-03-24
# 潜流拡散モデルを用いた条件付き画像・映像生成 Conditional Image-to-Video Generation with Latent Flow Diffusion Models ( http://arxiv.org/abs/2303.13744v1 ) ライセンス: Link先を確認	Haomiao Ni, Changhao Shi, Kai Li, Sharon X. Huang, Martin Renqiang Min	(参考訳) 条件付き画像合成(cI2V)は、画像(例えば、人の顔)と条件(例えば、笑顔のようなアクションクラスラベル)から始まる新しい可視ビデオの合成を目的としている。 cI2Vタスクの鍵となる課題は、与えられた画像と条件に対応する現実的な空間的外観と時間的ダイナミクスの同時生成である。本稿では,所定の条件に基づいて潜時空間内の光流列を合成し,所定の画像をワープする新しい潜時流拡散モデル(LFDM)を用いたcI2Vのアプローチを提案する。従来の直接合成法と比較して,提案するLFDMは,与えられた画像の空間的内容を完全に活用し,生成した時間的コヒーレントな流れに応じて潜時空間でワープすることで,空間的詳細と時間的動きをよりよく合成することができる。 LFDMの訓練は,(1)映像フレーム間の潜時流を推定するフロー予測器を含む空間コンテンツ生成のための潜時流自動エンコーダを訓練する教師なし学習段階と,(2)時間潜時流生成のための3D-UNetベースの拡散モデル(DM)を訓練する条件付き学習段階とからなる。従来の画素空間や時間的情報を扱う潜在特徴空間で動作するDMとは異なり、われわれのLFDMのDMは動作生成のための低次元の潜在フロー空間を学習するだけで、より計算効率がよい。複数のデータセットに対して総合的な実験を行い、LFDMは先行技術より一貫して優れています。さらに,LFDMは画像デコーダを微調整することで,新しい領域に容易に適応できることを示す。私たちのコードはhttps://github.com/nihaomiao/CVPR23_LFDMで利用可能です。 Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate stages: (1) an unsupervised learning stage to train a latent flow auto-encoder for spatial content generation, including a flow predictor to estimate latent flow between pairs of video frames, and (2) a conditional learning stage to train a 3D-UNet-based diffusion model (DM) for temporal latent flow generation. Unlike previous DMs operating in pixel space or latent feature space that couples spatial and temporal information, the DM in our LFDM only needs to learn a low-dimensional latent flow space for motion generation, thus being more computationally efficient. We conduct comprehensive experiments on multiple datasets, where LFDM consistently outperforms prior arts. Furthermore, we show that LFDM can be easily adapted to new domains by simply finetuning the image decoder. Our code is available at https://github.com/nihaomiao/CVPR23_LFDM.	翻訳日:2023-03-27 16:14:06 公開日:2023-03-24
# teglo: シングルビュー画像からの高忠実度標準テクスチャマッピング TEGLO: High Fidelity Canonical Texture Mapping from Single-View Images ( http://arxiv.org/abs/2303.13743v1 ) ライセンス: Link先を確認	Vishal Vinod, Tanmay Shah, Dmitry Lagun	(参考訳) 最近のneural fields(nfs)の研究は、クラス固有のシングルビューイメージコレクションから3d表現を学習している。しかし、高周波情報を保存した入力データを再構築することはできない。さらに, これらの手法は外観を幾何学から切り離さないため, テクスチャ転送や編集といった作業には適さない。本研究では,オブジェクトのクラスに対して,単一のビューから3次元表現を学習するためのEGLO(Textured EG3D-GLO)を提案する。我々は, 条件付きニューラルレージアンスフィールド(NeRF)を, 明示的な3次元監視なしに訓練することで実現した。 2次元正準空間への密接な対応写像を作成することにより,この手法を編集機能と対応づける。このようなマッピングは,共有トポロジを持つメッシュを必要とせずに,テクスチャ転送とテクスチャ編集を可能にする。我々の重要な洞察は、入力画像ピクセルをテクスチャ空間にマッピングすることで、ほぼ完璧に再現できる(>=74dB PSNR at 1024^2 resolution)。提案方式により,メガピクセル画像解像度で高周波数詳細を持つ高品質な3次元一貫性を有する新規ビュー合成が可能となる。 Recent work in Neural Fields (NFs) learn 3D representations from class-specific single view image collections. However, they are unable to reconstruct the input data preserving high-frequency details. Further, these methods do not disentangle appearance from geometry and hence are not suitable for tasks such as texture transfer and editing. In this work, we propose TEGLO (Textured EG3D-GLO) for learning 3D representations from single view in-the-wild image collections for a given class of objects. We accomplish this by training a conditional Neural Radiance Field (NeRF) without any explicit 3D supervision. We equip our method with editing capabilities by creating a dense correspondence mapping to a 2D canonical space. We demonstrate that such mapping enables texture transfer and texture editing without requiring meshes with shared topology. Our key insight is that by mapping the input image pixels onto the texture space we can achieve near perfect reconstruction (>= 74 dB PSNR at 1024^2 resolution). Our formulation allows for high quality 3D consistent novel view synthesis with high-frequency details at megapixel image resolution.	翻訳日:2023-03-27 16:13:31 公開日:2023-03-24
# 量子鍵分布プロトコルの準備と測定における安全な距離範囲向上のためのデッドタイム最適化 Dead-time optimization to increase secure distance range in prepare and measure quantum key distribution protocols ( http://arxiv.org/abs/2303.13742v1 ) ライセンス: Link先を確認	Carlos Wiechers, J.L. Lucio, X\'ochitl S\'anchez-Lozano, Rafael G\'omez-Medina, Mariana Salado-Mej\'ia	(参考訳) アフターパルスは、離散可変量子鍵分布系がセキュアである距離を制限する要因であり、単光子検出器で共通の特徴である。この現象の関連性は、確率的かつ自己相互作用的な性質と、その速度が雪崩の回数に比例して上昇し、量子ビット誤差率が増加するという事実に起因している。ここでは、残差補正がデッドタイム値に依存するデッドタイムおよびアフターパルス補正を含む効果的な解析モデルを提案する。このモデルは、ゲート単一光子検出器を用いた量子鍵分布プロトコル(標準およびデコイ版)の準備と測定の性能を評価するのに有用である。このモデルは、安全な通信のために全距離範囲にわたって秘密鍵レートを数値的に最適化する表現を提供し、量子ビット誤り率とセキュア鍵レートの計算を可能にする。従来の方法では、デッドタイム値は距離に関係なく固定され、高い動作周波数でより関係のある残脈効果によりチャネルの距離範囲が制限される。ここでは、デッドタイム値の最適化により、秘密鍵を共有するためのチャネル距離が増加することを示す。 Afterpulsing is a factor limiting the distance over which discrete-variable quantum key distribution systems are secure, and a common feature in single-photon detectors. The relevance of this phenomenon stems from its stochastic, self-interacting nature and the fact that its rate rises with the number of avalanche events, which increases the quantum bit error rate. Here we introduce an effective analytic model, including dead-time and afterpulsing corrections, where afterpulsing correction depends on dead-time value. This model is useful to evaluate the performance of prepare and measure quantum key distribution protocols (standard and decoy versions) that use gated single photon detectors. The model provides an expression to numerically optimize the secret key rate over the full distance range for secure communication, enabling in this way the calculation of quantum bit error rate and secure key rate. In the conventional procedure, the dead-time value is fixed regardless of distance, limiting the distance range of the channel due to remaining afterpulsing effects, which are more relevant at higher operating frequencies. Here we demonstrate that optimizing the dead-time values increases the distance range of the channel to share secret keys.	翻訳日:2023-03-27 16:13:13 公開日:2023-03-24
# mowe:複数の悪天候除去のための気象専門家の混合 MoWE: Mixture of Weather Experts for Multiple Adverse Weather Removal ( http://arxiv.org/abs/2303.13739v1 ) ライセンス: Link先を確認	Yulin Luo, Rui Zhao, Xiaobao Wei, Jinwei Chen, Yijie Lu, Shenghao Xie, Tianyu Wang, Ruiqin Xiong, Ming Lu, Shanghang Zhang	(参考訳) 現在、ほとんどの悪天候除去タスクは、デライニング、デリーディング、デヘイジングなど、独立して処理されている。しかし、自律運転の場合、天候の種類、強度、混合度は不明であり、分離されたタスク設定はこれらの複雑な条件をうまく扱えない。さらに、自動運転におけるビジョンアプリケーションは、しばしば高レベルなタスクを目標としているが、既存の気象除去手法では、知覚的タスクのパフォーマンスと信号の忠実度の関係を無視している。この目的のために,上流タスクにおいて,複雑な気象除去を扱うための新しい気象専門家(mowe)トランスフォーマフレームワークである \textbf{mixture of weather experts (mowe)を提案する。我々は,天気予報時に天気ラベルを必要とせずに,天気予報に関係のある専門家を対象とする「気象予報ルーター」を設計した。多様な気象条件に対処するため,我々は隣接するトークン間で情報を融合する \textbf{multi-scale experts} を提案する。下流タスクでは, セマンティックラベルを必要とせずに, 画像処理モデルの出力が高レベルの認識タスクに適しているかを測定するために, テキストbf{Label-free Perception-aware Metric}を提案する。我々は、既存の手法の複数の天候除去性能をベンチマークするために、自律運転シナリオに対して構文データセット \textbf{MAW-Sim} を収集する。私たちのmoweは,提案するデータセットと2つのパブリックデータセット,すなわち全天候と降雨/フォグ・シティスケープにおける上流タスクにおけるsoma性能を実現し,他の手法と比較して下流セグメンテーションタスクにおける知覚結果も向上する。私たちのコードとデータセットは受け入れてからリリースされます。 Currently, most adverse weather removal tasks are handled independently, such as deraining, desnowing, and dehazing. However, in autonomous driving scenarios, the type, intensity, and mixing degree of the weather are unknown, so the separated task setting cannot deal with these complex conditions well. Besides, the vision applications in autonomous driving often aim at high-level tasks, but existing weather removal methods neglect the connection between performance on perceptual tasks and signal fidelity. To this end, in upstream task, we propose a novel \textbf{Mixture of Weather Experts(MoWE)} Transformer framework to handle complex weather removal in a perception-aware fashion. We design a \textbf{Weather-aware Router} to make the experts targeted more relevant to weather types while without the need for weather type labels during inference. To handle diverse weather conditions, we propose \textbf{Multi-scale Experts} to fuse information among neighbor tokens. In downstream task, we propose a \textbf{Label-free Perception-aware Metric} to measure whether the outputs of image processing models are suitable for high level perception tasks without the demand for semantic labels. We collect a syntactic dataset \textbf{MAW-Sim} towards autonomous driving scenarios to benchmark the multiple weather removal performance of existing methods. Our MoWE achieves SOTA performance in upstream task on the proposed dataset and two public datasets, i.e. All-Weather and Rain/Fog-Cityscapes, and also have better perceptual results in downstream segmentation task compared to other methods. Our codes and datasets will be released after acceptance.	翻訳日:2023-03-27 16:12:52 公開日:2023-03-24
# GQMモデルに基づく機械学習のためのデータセットのライセンスに関する研究 An investigation of licensing of datasets for machine learning based on the GQM model ( http://arxiv.org/abs/2303.13735v1 ) ライセンス: Link先を確認	Junyu Chen, Norihiro Yoshida, Hiroaki Takada	(参考訳) データセットのライセンスは現在、機械学習システムの開発において問題となっている。そして、機械学習システムの開発において、最も広く使われているのは、利用可能なデータセットである。しかし、公開されているデータセット内の画像は主にインターネットから取得されているため、いくつかの画像は商業的に利用できない。さらに、機械学習システムの開発者は、機械学習モデルをトレーニングする際にデータセットのライセンスを気にしないことが多い。要約すると、機械学習システムのためのデータセットのライセンスは、この段階であらゆる面で不完全である。 2つのコレクションデータセットを調査した結果、現在のデータセットのほとんどはライセンスが欠如しており、ライセンスが欠如しているため、データセットの商用可用性が決定できないことが分かった。そこで、より科学的かつ体系的なアプローチで、データセットのライセンスと、データセットを用いた機械学習システムのライセンスについて調査し、機械学習システムの将来の開発者にとって、より簡単かつコンプライアンスの高いものにすることを決定した。 Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.	翻訳日:2023-03-27 16:12:19 公開日:2023-03-24
# 視覚トランスフォーマーの注意はどのように働くのか? Visual Analyticsの試み How Does Attention Work in Vision Transformers? A Visual Analytics Attempt ( http://arxiv.org/abs/2303.13731v1 ) ライセンス: Link先を確認	Yiran Li, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh, Yan Zheng, Wei Zhang, Kwan-Liu Ma	(参考訳) vision transformer (vit) は、逐次データから画像へトランスフォーマーモデルの成功を広げる。モデルは画像を多数の小さなパッチに分解し、それらをシーケンスに配置する。マルチヘッドの自己注意をシーケンスに適用し、パッチ間の注意を学習する。シーケンシャルデータに対するトランスフォーマーの解釈は成功したが、ViTの解釈にはほとんど取り組みがなく、多くの疑問は未解決のままである。例えば、多くの注目層の中で、どちらが重要なのか? 個々のパッチは、異なる頭の空間的隣人にどれだけ強いか? 個々の頭がどのような注意パターンを学んだか? 本研究では、視覚分析手法を用いてこれらの質問に答える。具体的には、まず、複数のプルーニングベースのメトリクスを導入することで、ViTにおいてどのヘッドがより重要かを特定する。次に,各頭部のパッチ間における注目強度の空間分布と,注目層間における注目強度の傾向を考察した。第3に、オートエンコーダに基づく学習ソリューションを用いて、個々の頭が学習できるすべての注意パターンを要約する。重要な頭部の注意力とパターンを調べることで、なぜ重要なのかを答える。複数のViTについて経験豊富な深層学習の専門家との具体的なケーススタディを通じて、頭の重要性、注意力、注意パターンからViTの理解を深めるソリューションの有効性を検証する。 Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.	翻訳日:2023-03-27 16:12:05 公開日:2023-03-24
# ブロックチェーンを用いたセキュア・プライベートフェデレーション学習に関する調査--資源制約型コンピューティングの理論と応用 A Survey on Secure and Private Federated Learning Using Blockchain: Theory and Application in Resource-constrained Computing ( http://arxiv.org/abs/2303.13727v1 ) ライセンス: Link先を確認	Ervin Moore, Ahmed Imteaj, Shabnam Rezapour, M. Hadi Amini	(参考訳) 近年、高度な機械学習と人工知能の急速なブームと、新たなセキュリティとプライバシーの脅威によって、連合学習(federated learning:fl)が広く普及している。 FLは、機密データをエンティティに公開することなく、エッジデバイスのローカルデータストレージから効率的なモデル生成を可能にする。このパラダイムは、ユーザの機密データのプライバシー問題を部分的に緩和するが、FLプロセスのパフォーマンスは脅威となり、サイバー脅威やプライバシー侵害技術の増加によりボトルネックに達する。 FLプロセスの普及を早めるために、FL環境のためのブロックチェーンの統合は、アカデミックや業界の人々から多くの注目を集めている。ブロックチェーンは、分散化、不変性、コンセンサス、透明性特性によって、セキュリティとプライバシの脅威を防止する可能性がある。しかし、ブロックチェーンメカニズムが高価な計算リソースを必要とする場合、リソースに制約のあるFLクライアントはトレーニングに関わらない。これを踏まえて、この調査は、リソース制約付きfl環境におけるブロックチェーンの展開成功の課題、ソリューション、今後の方向性のレビューに焦点を当てている。 FLプロセスに適したさまざまなブロックチェーンメカニズムを包括的にレビューし、限られたリソース予算に対するトレードオフについて議論する。さらに、リソース制限されたFL環境で観測できるサイバー脅威を広範囲に分析し、ブロックチェーンがサイバー攻撃を阻止する重要な役割を担っているかを分析します。この目的のために、高レベルの信頼性、データプライバシ、分散コンピューティングパフォーマンスを提供するブロックチェーンとフェデレーション付き学習の結合に対する潜在的なソリューションを強調します。 Federated Learning (FL) has gained widespread popularity in recent years due to the fast booming of advanced machine learning and artificial intelligence along with emerging security and privacy threats. FL enables efficient model generation from local data storage of the edge devices without revealing the sensitive data to any entities. While this paradigm partly mitigates the privacy issues of users' sensitive data, the performance of the FL process can be threatened and reached a bottleneck due to the growing cyber threats and privacy violation techniques. To expedite the proliferation of FL process, the integration of blockchain for FL environments has drawn prolific attention from the people of academia and industry. Blockchain has the potential to prevent security and privacy threats with its decentralization, immutability, consensus, and transparency characteristic. However, if the blockchain mechanism requires costly computational resources, then the resource-constrained FL clients cannot be involved in the training. Considering that, this survey focuses on reviewing the challenges, solutions, and future directions for the successful deployment of blockchain in resource-constrained FL environments. We comprehensively review variant blockchain mechanisms that are suitable for FL process and discuss their trade-offs for a limited resource budget. Further, we extensively analyze the cyber threats that could be observed in a resource-constrained FL environment, and how blockchain can play a key role to block those cyber attacks. To this end, we highlight some potential solutions towards the coupling of blockchain and federated learning that can offer high levels of reliability, data privacy, and distributed computing performance.	翻訳日:2023-03-27 16:11:45 公開日:2023-03-24
# イベント誘導ビデオスーパーリゾリューションのための空間的暗黙的ニューラル表現の学習 Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution ( http://arxiv.org/abs/2303.13767v1 ) ライセンス: Link先を確認	Yunfan Lu, Zipeng Wang, Minjie Liu, Hongjian Wang, Lin Wang	(参考訳) イベントカメラは、強度変化を非同期に検知し、高いダイナミックレンジと低レイテンシでイベントストリームを生成する。これは、挑戦的なビデオ超解像(VSR)タスクを導くためにイベントを利用する研究にインスピレーションを与えている。本稿では,イベントの高時間分解能の利点を生かして,ランダムスケールでのVSRの実現という新たな課題に対処する試みを行う。これは、VSRを導く際の事象の時空間的情報を表現することが困難である。そこで本稿では,イベントの時空間補間を統合されたフレームワークでVSRに組み込む新しいフレームワークを提案する。我々のキーとなる考え方は、探索された時空間座標とRGBフレームとイベントの両方の特徴から暗黙の神経表現を学ぶことである。本手法は3つの部分を含む。具体的には、Spatial-Temporal Fusion (STF)モジュールは、まずイベントとRGBフレームから3D特徴を学習する。そして、時間フィルタ(TF)モジュールは、クエリされたタイムスタンプ近くのイベントからより明示的な動作情報をアンロックし、2D特徴を生成する。最後に、Spatial Temporal Implicit Representation (STIR)モジュールは、これらの2つのモジュールの出力から任意の解像度でSRフレームを復元する。さらに、空間的に整列したイベントとRGBフレームを持つ実世界のデータセットを収集する。大規模な実験により,本手法は先行技術を大きく上回り,ランダムスケールのVSR(例えば6.5。コードとデータセットはhttps: //vlis2022.github.io/cvpr23/egvsrで入手できる。 Event cameras sense the intensity changes asynchronously and produce event streams with high dynamic range and low latency. This has inspired research endeavors utilizing events to guide the challenging video superresolution (VSR) task. In this paper, we make the first attempt to address a novel problem of achieving VSR at random scales by taking advantages of the high temporal resolution property of events. This is hampered by the difficulties of representing the spatial-temporal information of events when guiding VSR. To this end, we propose a novel framework that incorporates the spatial-temporal interpolation of events to VSR in a unified framework. Our key idea is to learn implicit neural representations from queried spatial-temporal coordinates and features from both RGB frames and events. Our method contains three parts. Specifically, the Spatial-Temporal Fusion (STF) module first learns the 3D features from events and RGB frames. Then, the Temporal Filter (TF) module unlocks more explicit motion information from the events near the queried timestamp and generates the 2D features. Lastly, the SpatialTemporal Implicit Representation (STIR) module recovers the SR frame in arbitrary resolutions from the outputs of these two modules. In addition, we collect a real-world dataset with spatially aligned events and RGB frames. Extensive experiments show that our method significantly surpasses the prior-arts and achieves VSR with random scales, e.g., 6.5. Code and dataset are available at https: //vlis2022.github.io/cvpr23/egvsr.	翻訳日:2023-03-27 16:05:08 公開日:2023-03-24
# GQE-Net:ポイントクラウドカラー属性のためのグラフベースの品質向上ネットワーク GQE-Net: A Graph-based Quality Enhancement Network for Point Cloud Color Attribute ( http://arxiv.org/abs/2303.13764v1 ) ライセンス: Link先を確認	Jinrui Xing, Hui Yuan, Raouf Hamzaoui, Hao Liu, and Junhui Hou	(参考訳) 近年、点雲は3次元(3次元)の視覚オブジェクトやシーンを表現するために人気が高まっている。点雲を効率的に保存・送信するために圧縮法が開発されているが、品質が劣化することが多い。点雲の色歪みを低減するため,幾何学情報を補助入力とし,グラフ畳み込みブロックを用いて局所特徴を効率的に抽出するグラフベース品質向上ネットワーク(GQE-Net)を提案する。具体的には,マルチヘッドグラフアテンション機構を備えた並列シリアルグラフアテンションモジュールを用いて重要な点や特徴に着目し,それらを融合させる。さらに,点間の正規性と幾何学的距離を考慮に入れた特徴改善モジュールを設計する。 GPUメモリ容量の制限の中で機能するために、歪んだポイントクラウドはオーバーラップ可能な3Dパッチに分割され、品質向上のためにGQE-Netに送られる。異なる色成分間のデータ分布の違いを考慮し、3つの色成分について3つのモデルを訓練する。実験結果から,本手法は最先端性能を実現することが示された。例えば、G-PCCのコーディング標準テストモデルにGQE-Netを実装する際には、Y、Cb、Crの高密度点雲にそれぞれ14.0%、9.3%、14.5%のBD-rateの保存値に対応する0.43dB、0.25dB、0.36dBのBjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR) が達成される。 In recent years, point clouds have become increasingly popular for representing three-dimensional (3D) visual objects and scenes. To efficiently store and transmit point clouds, compression methods have been developed, but they often result in a degradation of quality. To reduce color distortion in point clouds, we propose a graph-based quality enhancement network (GQE-Net) that uses geometry information as an auxiliary input and graph convolution blocks to extract local features efficiently. Specifically, we use a parallel-serial graph attention module with a multi-head graph attention mechanism to focus on important points or features and help them fuse together. Additionally, we design a feature refinement module that takes into account the normals and geometry distance between points. To work within the limitations of GPU memory capacity, the distorted point cloud is divided into overlap-allowed 3D patches, which are sent to GQE-Net for quality enhancement. To account for differences in data distribution among different color omponents, three models are trained for the three color components. Experimental results show that our method achieves state-of-the-art performance. For example, when implementing GQE-Net on the recent G-PCC coding standard test model, 0.43 dB, 0.25 dB, and 0.36 dB Bjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR), corresponding to 14.0%, 9.3%, and 14.5% BD-rate savings can be achieved on dense point clouds for the Y, Cb, and Cr components, respectively.	翻訳日:2023-03-27 16:04:43 公開日:2023-03-24
# エッジフリーだが構造対応:GNNからMPPへのプロトタイプ誘導知識蒸留 Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs ( http://arxiv.org/abs/2303.13763v1 ) ライセンス: Link先を確認	Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang	(参考訳) グラフタスクにおける低遅延多層パーセプトロン~(MLP)への高精度グラフニューラルネットワーク〜(GNN)の蒸留はホットな研究トピックとなっている。しかし、MPPはノード機能にのみ依存しており、グラフ構造情報の取得に失敗する。従来の手法では、グラフエッジをMLPの余分な入力に処理することでこの問題に対処するが、このようなグラフ構造は様々なシナリオでは利用できない。そこで我々は,グラフエッジ~(エッジフリー)を必要とせず,構造を意識したMLPを学習するプロトタイプガイド型知識蒸留(PGKD)法を提案する。具体的には, GNN教師のグラフ構造情報を解析し, エッジフリー環境でプロトタイプを用いて, GNNからMPPに抽出する。一般的なグラフベンチマーク実験の結果,提案したPGKDの有効性とロバスト性を示した。 Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.	翻訳日:2023-03-27 16:04:08 公開日:2023-03-24
# 逆気象光流に対する教師なし階層型ドメイン適応 Unsupervised Hierarchical Domain Adaptation for Adverse Weather Optical Flow ( http://arxiv.org/abs/2303.13761v1 ) ライセンス: Link先を確認	Hanyu Zhou, Yi Chang, Gang Chen, Luxin Yan	(参考訳) 光流量推定は大きな進歩を遂げたが、通常は悪天候下で劣化する。 semi/full-supervisedメソッドは良い試みをしてきたが、合成画像と実際の悪天候画像のドメインシフトはパフォーマンスを低下させるだろう。この問題を軽減するため、私たちの出発点は、ソースクリーンドメインからの知識を無監督で、劣化したドメインをターゲットに移すことです。我々の重要な洞察は、悪天候はシーンの内在的な光学的流れを変えるのではなく、クリーン画像と劣化画像のワープ誤差に大きな違いをもたらすことである。本研究では,階層的運動境界適応による悪天候光流に対する初の教師なしフレームワークを提案する。具体的には、まず画像翻訳を用いて、クリーンドメインと劣化ドメイン間の変換関係を構築する。動き適応では, フロー一貫性の知識を用いて, クロスドメイン光流を運動非分散共通空間に整列し, 清涼な天候からの光流を案内知識として利用し, 悪天候のための予備光流を得る。さらに, クリーン領域と劣化領域の境界の運動不整合を計測するワープ誤差の不整合を利用して, 動作境界を洗練させるために, コントラスト適応を共同で提案する。階層運動と境界適応は、統一された枠組みにおける光の流れを共同で促進する。提案手法の優位性を検証するため, 大規模定量および定性的実験を行った。 Optical flow estimation has made great progress, but usually suffers from degradation under adverse weather. Although semi/full-supervised methods have made good attempts, the domain shift between the synthetic and real adverse weather images would deteriorate their performance. To alleviate this issue, our start point is to unsupervisedly transfer the knowledge from source clean domain to target degraded domain. Our key insight is that adverse weather does not change the intrinsic optical flow of the scene, but causes a significant difference for the warp error between clean and degraded images. In this work, we propose the first unsupervised framework for adverse weather optical flow via hierarchical motion-boundary adaptation. Specifically, we first employ image translation to construct the transformation relationship between clean and degraded domains. In motion adaptation, we utilize the flow consistency knowledge to align the cross-domain optical flows into a motion-invariance common space, where the optical flow from clean weather is used as the guidance-knowledge to obtain a preliminary optical flow for adverse weather. Furthermore, we leverage the warp error inconsistency which measures the motion misalignment of the boundary between the clean and degraded domains, and propose a joint intra- and inter-scene boundary contrastive adaptation to refine the motion boundary. The hierarchical motion and boundary adaptation jointly promotes optical flow in a unified framework. Extensive quantitative and qualitative experiments have been performed to verify the superiority of the proposed method.	翻訳日:2023-03-27 16:03:53 公開日:2023-03-24
# 構造的不均衡を考慮したグラフ強化学習 Structural Imbalance Aware Graph Augmentation Learning ( http://arxiv.org/abs/2303.13757v1 ) ライセンス: Link先を確認	Zulong Liu, Kejia-Chen, Zheng Liu	(参考訳) グラフ機械学習(GML)は,ノード分類やリンク予測,グラフ分類などにおいて大きな進歩を遂げている。しかし、現実のグラフはしばしば構造的に不均衡であり、わずかなハブノードだけがより密度の高い局所構造を持ち、より影響が大きい。不均衡は既存のGMLモデルの堅牢性を損なう可能性がある。本稿では,この問題を解決するために,選択的グラフ拡張法(SAug)を提案する。まず、pagerankベースのサンプリング戦略は、グラフのハブノードとテールノードを識別するために設計されている。次に,一方のハブノードのノイズの多い隣接ノードを除去し,潜在隣接ノードを検出し,他方のテールノードに対して擬似隣接を生成する選択的拡張戦略を提案する。 2つのタイプのノード間の構造的不均衡を軽減することもできる。最後に、GNNモデルが拡張グラフ上で再トレーニングされる。大規模な実験により、SAugはバックボーンのGNNを大幅に改善し、グラフ拡張法やハブ/テール認識法との競合よりも優れた性能を達成できることが示された。 Graph machine learning (GML) has made great progress in node classification, link prediction, graph classification and so on. However, graphs in reality are often structurally imbalanced, that is, only a few hub nodes have a denser local structure and higher influence. The imbalance may compromise the robustness of existing GML models, especially in learning tail nodes. This paper proposes a selective graph augmentation method (SAug) to solve this problem. Firstly, a Pagerank-based sampling strategy is designed to identify hub nodes and tail nodes in the graph. Secondly, a selective augmentation strategy is proposed, which drops the noisy neighbors of hub nodes on one side, and discovers the latent neighbors and generates pseudo neighbors for tail nodes on the other side. It can also alleviate the structural imbalance between two types of nodes. Finally, a GNN model will be retrained on the augmented graph. Extensive experiments demonstrate that SAug can significantly improve the backbone GNNs and achieve superior performance to its competitors of graph augmentation methods and hub/tail aware methods.	翻訳日:2023-03-27 16:03:28 公開日:2023-03-24
# gp-vton:コラボレーティブなローカルフローグローバルパーシング学習による汎用仮想トライオンに向けて GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning ( http://arxiv.org/abs/2303.13756v1 ) ライセンス: Link先を確認	Zhenyu Xie and Zaiyu Huang and Xin Dong and Fuwei Zhao and Haoye Dong and Xijin Zhang and Feida Zhu and Xiaodan Liang	(参考訳) イメージベースのVirtual Try-ONは、ショップ内服を特定の人に転送する。既存の手法では、異なる衣服部品の異方性変形をモデル化するためにグローバルワーピングモジュールを使用しており、困難な入力を受ける際に異なる部品の意味情報を保存できない(例えば、複雑な人間のポーズ、難しい衣服)。さらに、それらのほとんどは、通常境界形状の制約を満たすためにテクスチャスクイージングを必要とする保存領域の境界に合わせるために、入力された衣料を直接反動させ、テクスチャ歪みを生じさせる。上記の劣った性能は、実世界のアプリケーションから既存の方法を妨げる。これらの問題を解決するために,GP-VTONと呼ばれる汎用仮想トライオンフレームワークを提案し,革新的なローカルフロー・グローバル・パーシング(LFGP)ワーピングモジュールと動的グラディエント・トランニケーション(DGT)トレーニング戦略を開発した。 Specifically, compared with the previous global warping mechanism, LFGP employs local flows to warp garments parts individually, and assembles the local warped results via the global garment parsing, resulting in reasonable warped parts and a semantic-correct intact garment even with challenging inputs.On the other hand, our DGT training strategy dynamically truncates the gradient in the overlap area and the warped garment is no more required to meet the boundary constraint, which effectively avoids the texture squeezing problem. さらに,GP-VTONは多カテゴリーのシナリオに容易に拡張でき,異なる衣服カテゴリーのデータを用いて共同で訓練することができる。 2つの高分解能ベンチマークに関する広範囲な実験は、既存の最先端手法よりも優れていることを示している。 Image-based Virtual Try-ON aims to transfer an in-shop garment onto a specific person. Existing methods employ a global warping module to model the anisotropic deformation for different garment parts, which fails to preserve the semantic information of different parts when receiving challenging inputs (e.g, intricate human poses, difficult garments). Moreover, most of them directly warp the input garment to align with the boundary of the preserved region, which usually requires texture squeezing to meet the boundary shape constraint and thus leads to texture distortion. The above inferior performance hinders existing methods from real-world applications. To address these problems and take a step towards real-world virtual try-on, we propose a General-Purpose Virtual Try-ON framework, named GP-VTON, by developing an innovative Local-Flow Global-Parsing (LFGP) warping module and a Dynamic Gradient Truncation (DGT) training strategy. Specifically, compared with the previous global warping mechanism, LFGP employs local flows to warp garments parts individually, and assembles the local warped results via the global garment parsing, resulting in reasonable warped parts and a semantic-correct intact garment even with challenging inputs.On the other hand, our DGT training strategy dynamically truncates the gradient in the overlap area and the warped garment is no more required to meet the boundary constraint, which effectively avoids the texture squeezing problem. Furthermore, our GP-VTON can be easily extended to multi-category scenario and jointly trained by using data from different garment categories. Extensive experiments on two high-resolution benchmarks demonstrate our superiority over the existing state-of-the-art methods.	翻訳日:2023-03-27 16:03:10 公開日:2023-03-24
# Sparsifiner: 効率的な視覚変換器のためのスパースインスタンス依存注意学習 Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers ( http://arxiv.org/abs/2303.13755v1 ) ライセンス: Link先を確認	Cong Wei and Brendan Duke and Ruowei Jiang and Parham Aarabi and Graham W. Taylor and Florian Shkurti	(参考訳) ビジョントランスフォーマー(ViT)は、畳み込みニューラルネットワーク(CNN)と比較してパフォーマンス面での競争上の優位性を示しているが、高い計算コストが伴うことが多い。この目的のために,vitのマルチヘッドセルフアテンション(multi-head self-attention, mhsa)操作を高速化するために,空間的に近接するトークンの数を制限し,様々な注意パターンを探索した。しかし、そのような構造化された注意パターンは、その空間的関連性に対するトークン対token接続を制限し、完全な注意マスクから意味的関係を無視する。本研究では,軽量な接続予測モジュールを考案し,各トークンの接続スコアを推定することで,インスタンス依存の注意パターンを学習する新しい手法を提案する。直感的には、2つのトークンは、その特徴が空間的または意味的に関連があると考えられる場合、高い接続性スコアを持つ。各トークンは他の少数のトークンにしか対応しないため、双有線接続マスクは本質的に非常に疎いため、スパース計算によってネットワークを加速する機会を与える。学習された未構造化の注意パターンと合わせて、スパークアテンションViT(Sparsifiner)は、トークンの間隔と比較して、ImageNet上のFLOPとトップ1の精度との間に優れたパレート最適トレードオフを生成する。 MHSAの48%から69%のFLOPを削減し, 精度は0.4%以内である。また、注意とトークンの間隔を組み合わせることで、ViT FLOPsが60%以上減少することを示す。 Vision Transformers (ViT) have shown their competitive advantages performance-wise compared to convolutional neural networks (CNNs) though they often come with high computational costs. To this end, previous methods explore different attention patterns by limiting a fixed number of spatially nearby tokens to accelerate the ViT's multi-head self-attention (MHSA) operations. However, such structured attention patterns limit the token-to-token connections to their spatial relevance, which disregards learned semantic connections from a full attention mask. In this work, we propose a novel approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module to estimate the connectivity score of each pair of tokens. Intuitively, two tokens have high connectivity scores if the features are considered relevant either spatially or semantically. As each token only attends to a small number of other tokens, the binarized connectivity masks are often very sparse by nature and therefore provide the opportunity to accelerate the network via sparse computations. Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity. Our method reduces 48% to 69% FLOPs of MHSA while the accuracy drop is within 0.4%. We also show that combining attention and token sparsity reduces ViT FLOPs by over 60%.	翻訳日:2023-03-27 16:02:43 公開日:2023-03-24
# EMS-Net:ハイパースペクトル変化検出のための効率的なマルチテンポラル自己注意 EMS-Net: Efficient Multi-Temporal Self-Attention For Hyperspectral Change Detection ( http://arxiv.org/abs/2303.13753v1 ) ライセンス: Link先を確認	Meiqi Hu, Chen Wu, Bo Du	(参考訳) ハイパースペクトル変化検出は、動的都市開発を監視し、精密な物体の進化と変化を検出する上で重要な役割を担っている。本稿では,高スペクトル変化検出のための高効率多時間自己アテンションネットワーク(EMS-Net)を提案する。設計されたEMSモジュールは、類似した非変更機能マップの冗長性を削減し、正確なバイナリ変更マップのための効率的なマルチ時間変更情報を計算する。また、変更検出のクラスタリング特性を探索するために、変更のコンパクト性を高めるために、教師付きコントラスト損失が新たに提供される。 2つのハイパースペクトル変化検出データセットに実装された実験は、提案手法の性能と妥当性を示す。 Hyperspectral change detection plays an essential role of monitoring the dynamic urban development and detecting precise fine object evolution and alteration. In this paper, we have proposed an original Efficient Multi-temporal Self-attention Network (EMS-Net) for hyperspectral change detection. The designed EMS module cuts redundancy of those similar and containing-no-changes feature maps, computing efficient multi-temporal change information for precise binary change map. Besides, to explore the clustering characteristics of the change detection, a novel supervised contrastive loss is provided to enhance the compactness of the unchanged. Experiments implemented on two hyperspectral change detection datasets manifests the out-standing performance and validity of proposed method.	翻訳日:2023-03-27 16:02:14 公開日:2023-03-24
# 医用画像の新しいクラスを継続的に学習する古い知識の活用 Leveraging Old Knowledge to Continually Learn New Classes in Medical Images ( http://arxiv.org/abs/2303.13752v1 ) ライセンス: Link先を確認	Evelyn Chee, Mong Li Lee, Wynne Hsu	(参考訳) クラス増分連続学習は、以前学んだことを忘れずに新しい概念を学習することで、環境の変化に継続的に適応できる人工知能システムを開発するための中核的なステップである。これは、拡大した疾患群を分類するために、新しい入力データから継続的に学習する必要がある医療領域において特に必要である。本研究は,古い知識をいかに活用し,破滅的な忘れを伴わずに新しいクラスを学習できるかに焦点をあてる。本研究では,(1)事前学習した特徴を保存し,新たな特徴に対応するために表現を拡張した動的アーキテクチャ,(2)古いクラスにおけるモデルの性能を維持しながら,新たな特徴の学習のバランスをとるための2つの目的の訓練手順を提案する。複数の医学データセットに対する実験結果から,我々のソリューションは,クラス精度や忘れやすさの観点から,最先端のベースラインよりも優れた性能が得られることが示された。 Class-incremental continual learning is a core step towards developing artificial intelligence systems that can continuously adapt to changes in the environment by learning new concepts without forgetting those previously learned. This is especially needed in the medical domain where continually learning from new incoming data is required to classify an expanded set of diseases. In this work, we focus on how old knowledge can be leveraged to learn new classes without catastrophic forgetting. We propose a framework that comprises of two main components: (1) a dynamic architecture with expanding representations to preserve previously learned features and accommodate new features; and (2) a training procedure alternating between two objectives to balance the learning of new features while maintaining the model's performance on old classes. Experiment results on multiple medical datasets show that our solution is able to achieve superior performance over state-of-the-art baselines in terms of class accuracy and forgetting.	翻訳日:2023-03-27 16:02:02 公開日:2023-03-24
# LONGNN:学習可能な直交基底を持つスペクトルGNN LONGNN: Spectral GNNs with Learnable Orthonormal Basis ( http://arxiv.org/abs/2303.13750v1 ) ライセンス: Link先を確認	Qian Tao, Zhen Wang, Wenyuan Yu, Yaliang Li, Zhewei Wei	(参考訳) 近年,スペクトルグラフニューラルネットワーク(GNN)手法は,多くのノードレベルタスクにおいて最上位性能を達成するために,学習可能な係数を多項式ベースとして活用している。様々な多項式基底が研究されているが、与えられたグラフの最適選択ではない固定多項式基底を採用する。また,これらの手法のいわゆる過渡問題を特定し,その非正規化戦略と非正規化基底にいくらか根ざしていることを示す。本稿では,この2つの課題に対する最初の試みについて述べる。ヤコビ多項式を用いて,学習可能な正規直交基底を持つ新しいスペクトルgnn,lon-gnnを設計し,正規化係数が現在学習フィルタ関数のノルムを正規化することと同値になることを示す。様々なグラフデータセットについて広範な実験を行い,lon-gnnの適合性と一般化能力を評価した。 In recent years, a plethora of spectral graph neural networks (GNN) methods have utilized polynomial basis with learnable coefficients to achieve top-tier performances on many node-level tasks. Although various kinds of polynomial bases have been explored, each such method adopts a fixed polynomial basis which might not be the optimal choice for the given graph. Besides, we identify the so-called over-passing issue of these methods and show that it is somewhat rooted in their less-principled regularization strategy and unnormalized basis. In this paper, we make the first attempts to address these two issues. Leveraging Jacobi polynomials, we design a novel spectral GNN, LON-GNN, with Learnable OrthoNormal bases and prove that regularizing coefficients becomes equivalent to regularizing the norm of learned filter function now. We conduct extensive experiments on diverse graph datasets to evaluate the fitting and generalization capability of LON-GNN, where the results imply its superiority.	翻訳日:2023-03-27 16:01:45 公開日:2023-03-24
# ロバストビュー合成のためのプログレッシブ最適化局所放射場 Progressively Optimized Local Radiance Fields for Robust View Synthesis ( http://arxiv.org/abs/2303.13791v1 ) ライセンス: Link先を確認	Andreas Meuleman and Yu-Lun Liu and Chen Gao and Jia-Bin Huang and Changil Kim and Min H. Kim and Johannes Kopf	(参考訳) 本稿では,1つのカジュアルな映像から大規模シーンの放射界を再構成するアルゴリズムを提案する。課題は2つある。まず、既存のラディアンスフィールド再構成手法はStructure-from-Motionアルゴリズムから推定された正確なカメラのポーズに頼っている。第二に、有限表現容量を持つ単一の大域的放射場を使うことは、無界シーンの長い軌道にスケールしない。未知のポーズを扱うために,カメラのポーズをプログレッシブな方法でラミアンスフィールドと共同で推定する。プログレッシブ最適化は再建の堅牢性を大幅に向上させることを示す。大きな境界のないシーンを扱うために、テンポラルウィンドウ内でフレームで訓練された新しい局所放射フィールドを動的に割り当てる。これにより、さらに堅牢性が向上し(例えば、適度なポーズドリフトでもうまく機能する)、大きなシーンにスケールできます。 Tanks and Templesデータセットと、収集した屋外データセットであるStatic Hikesに対する広範な評価は、我々のアプローチが最先端技術と比較できることを示している。 We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene. For handling unknown poses, we jointly estimate the camera poses with radiance field in a progressive manner. We show that progressive optimization significantly improves the robustness of the reconstruction. For handling large unbounded scenes, we dynamically allocate new local radiance fields trained with frames within a temporal window. This further improves robustness (e.g., performs well even under moderate pose drifts) and allows us to scale to large scenes. Our extensive evaluation on the Tanks and Temples dataset and our collected outdoor dataset, Static Hikes, show that our approach compares favorably with the state-of-the-art.	翻訳日:2023-03-27 15:56:48 公開日:2023-03-24
# 患者レベルの公正度制約による公正な患者・Trial Matchingに向けて Towards Fair Patient-Trial Matching via Patient-Criterion Level Fairness Constraint ( http://arxiv.org/abs/2303.13790v1 ) ライセンス: Link先を確認	Chia-Yuan Chang, Jiayi Yuan, Sirui Ding, Qiaoyu Tan, Kai Zhang, Xiaoqian Jiang, Xia Hu, Na Zou	(参考訳) 臨床試験は新しい治療法の開発には欠かせないが、患者の採用と維持の障害に直面し、必要な参加者の受け入れを妨げる。これらの課題に対処するために、患者と試行にマッチするディープラーニングフレームワークが開発された。これらの枠組みは, 受け入れ基準と除外基準の相違を考慮し, 臨床治験適格基準と患者の類似性を計算する。最近の研究では、これらのフレームワークは以前のアプローチよりも優れていることが示されている。しかし、深層学習モデルは、臨床試験において特定の敏感な個人の集団が不足している場合に、患者と臨床のマッチングにおいて公平性の問題を引き起こす可能性がある。本研究は,公平性の問題に対処するために,患者基準レベルの公正性制約を発生させることにより,公正な患者と法廷のマッチングフレームワークを提案する。本研究の枠組みは,異なる敏感群群における包摂の埋め込みと排除基準の矛盾を考察したものである。実世界の患者-心房および患者-基準のマッチングタスクにおける実験結果から,提案手法が偏りやすい予測を効果的に緩和できることが示されている。 Clinical trials are indispensable in developing new treatments, but they face obstacles in patient recruitment and retention, hindering the enrollment of necessary participants. To tackle these challenges, deep learning frameworks have been created to match patients to trials. These frameworks calculate the similarity between patients and clinical trial eligibility criteria, considering the discrepancy between inclusion and exclusion criteria. Recent studies have shown that these frameworks outperform earlier approaches. However, deep learning models may raise fairness issues in patient-trial matching when certain sensitive groups of individuals are underrepresented in clinical trials, leading to incomplete or inaccurate data and potential harm. To tackle the issue of fairness, this work proposes a fair patient-trial matching framework by generating a patient-criterion level fairness constraint. The proposed framework considers the inconsistency between the embedding of inclusion and exclusion criteria among patients of different sensitive groups. The experimental results on real-world patient-trial and patient-criterion matching tasks demonstrate that the proposed framework can successfully alleviate the predictions that tend to be biased.	翻訳日:2023-03-27 15:56:28 公開日:2023-03-24
# さまざまなシナリオにおける人計数のためのアプリケーション駆動aiパラダイム Application-Driven AI Paradigm for Person Counting in Various Scenarios ( http://arxiv.org/abs/2303.13788v1 ) ライセンス: Link先を確認	Minjie Hua, Yibing Nan, Shiguo Lian	(参考訳) 計数はビデオ監視の基本的な課題と考えられている。しかし、実用的な応用におけるシナリオの多様性は、1人の人計数モデルを一般に利用するのを困難にしている。その結果、エンジニアはビデオストリームをプレビューし、特に大規模デプロイメントにおいて、時間を要するカメラショットのシナリオに基づいて、適切な人物カウントモデルを手動で指定する必要がある。本稿では,シナリオ分類器を用いて,キャプチャされたフレーム毎に適切な人計数モデルを自動的に選択する人計数パラダイムを提案する。まず、入力画像がシナリオ分類器に渡されてシナリオラベルを取得し、そのフレームを5つの微調整されたモデルのうちの1つに割り当てて人物を数える。さらに,さまざまなシナリオから収集した拡張データセットとして,サイドビュー,ロングショット,トップビュー,カスタマイズ,クラウドの5つを紹介し,26323サンプルを含むシナリオ分類データセットも統合する。比較実験において,提案手法は統合データセット上のどのモデルよりもバランスが良く,様々なシナリオでの一般化が証明されている。 Person counting is considered as a fundamental task in video surveillance. However, the scenario diversity in practical applications makes it difficult to exploit a single person counting model for general use. Consequently, engineers must preview the video stream and manually specify an appropriate person counting model based on the scenario of camera shot, which is time-consuming, especially for large-scale deployments. In this paper, we propose a person counting paradigm that utilizes a scenario classifier to automatically select a suitable person counting model for each captured frame. First, the input image is passed through the scenario classifier to obtain a scenario label, which is then used to allocate the frame to one of five fine-tuned models for person counting. Additionally, we present five augmentation datasets collected from different scenarios, including side-view, long-shot, top-view, customized and crowd, which are also integrated to form a scenario classification dataset containing 26323 samples. In our comparative experiments, the proposed paradigm achieves better balance than any single model on the integrated dataset, thus its generalization in various scenarios has been proved.	翻訳日:2023-03-27 15:56:08 公開日:2023-03-24
# 経路コヒーレンスと位相差に基づく指向性単一光子ルーティングのための量子ルータモデル A quantum router model for directional single-photon routing based on pathway coherence and phase difference ( http://arxiv.org/abs/2303.13784v1 ) ライセンス: Link先を確認	Xu Yang, Lei Tan	(参考訳) 4つの空洞を持つ多チャネル量子ルータは、2つの結合共振器導波路と4つの単一空洞によって構成される。特定のポートから出る光子の確率を100$\%$に近いように調整することで、方向方向のルーティングを実現することができる。このハイブリッドシステムでは、入射ポートから出射ポートまでの光子の間には複数の経路がある。 2つの古典的光場間の位相差の影響下では、異なる経路間の相互干渉を破壊的干渉や建設的干渉に調整することができ、ルーティング確率の増大と減少の基礎となる。単一光子ルーティング確率に対するパラメータ値の影響についても検討した。確率振幅の解析式を研究することにより、あるパラメータ条件下での出口ポートの閉口の物理機構を得る。 A multi-channel quantum router with four nodal cavities is constructed by two coupled-resonator waveguides and four single cavities. We can achieve directional routing by adjusting the probability of photon exiting from the specified port to close to 100$\%$. There are multiple pathways between the photon from the incident port to the outgoing port in this hybrid system. Under the effect of phase difference between two classical light fields, the mutual interference between different pathways can be adjusted to destructive interference or constructive interference, which lays the foundation for the increase and decrease of the routing probability. The influence of different parameter values on single photon routing probability is also studied. By studying the analytic formula of probability amplitude, we get the physical mechanism of exiting ports being closed under certain parameter conditions.	翻訳日:2023-03-27 15:55:50 公開日:2023-03-24
# 機械翻訳におけるChatGPTの活用に向けて Towards Making the Most of ChatGPT for Machine Translation ( http://arxiv.org/abs/2303.13780v1 ) ライセンス: Link先を確認	Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao	(参考訳) ChatGPTは機械翻訳(MT)の優れた機能を示す。以前のいくつかの研究では、高リソース言語の商用システムと同等の結果が得られるが、低リソース翻訳や遠距離言語-ペア変換のような複雑なタスクでは遅れている。しかし、彼らは通常、ChatGPTの能力を十分に引き出すことができない単純なプロンプトを採用する。本稿では,ChatGPTの翻訳能力について,温度,タスク情報,ドメイン情報といったいくつかの側面を再考し,それに対応する2つのプロンプト,タスク特化プロンプト(TSP)とドメイン特化プロンプト(DSP)を提案する。ご覧の通りです 1)ChatGPTの性能は温度に大きく依存し,低い温度では高い性能が得られる。 2)タスク情報の強調は,特に複雑なmtタスクにおいて,chatgptの性能をさらに向上させる。 3) ドメイン情報の導入により,chatgptの一般化能力が向上し,そのドメインにおける性能が向上する。 4)ChatGPTは非英語中心のMTタスクに対して幻覚を引き起こす傾向があり,これは提案したプロンプトによって部分的に対処できるが,MT/NLPコミュニティでは強調する必要がある。また、高度な文脈内学習戦略の効果を探究し、(否定的だが興味深い)観察を見出す: 強力な連鎖的プロンプトは、単語毎の翻訳行動につながり、翻訳の大幅な低下をもたらす。 ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g, low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this report, we aim to further mine ChatGPT's translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information further improves ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation.	翻訳日:2023-03-27 15:55:39 公開日:2023-03-24
# より高精細なSBIRの爆発写真 Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR ( http://arxiv.org/abs/2303.13779v1 ) ライセンス: Link先を確認	Aneeshan Sain, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Soumitri Chattopadhyay, Tao Xiang, Yi-Zhe Song	(参考訳) 本稿では, 先行技術を11%オーバーシュートする強力なベースラインを提示することで, きめ細かなスケッチベース画像検索(FG-SBIR)を推し進める。これは複雑な設計ではなく、コミュニティが直面している2つの重要な問題に対処することで (i)金標準三重項損失は、全体論的潜在空間幾何学を強制せず、 (ii)精度の高いモデルを訓練するだけのスケッチは決して存在しない。前者に対しては、写真/スケッチインスタンス間の分離を明示的に強制する標準三重項損失の簡単な修正を提案する。後者については,モデル学習に写真データを活用する新たな知識蒸留モジュールを提案する。どちらのモジュールもプラグイン可能な新しいトレーニングパラダイムにプラグインされ、より安定したトレーニングが可能になる。具体的には (i)スケッチ間でのモダル内トリプルトロスを利用して、同一のインスタンスのスケッチを他と近づき、さらに1枚写真間で異なる写真インスタンスをプッシュし、同じ写真の構造的に拡張されたバージョン(約4～6%)を近付けます。取り組み方 (ii) 前述したモーダル写真三重項損失に対して,教師がラベルなしの写真の大規模なセットを事前学習した。次に,両組込み空間における各サンプルの特徴間距離の分布を一致させることで,教師の組込み空間のインスタンス間の文脈的類似性を生徒の組込み空間のそれと比較する(さらに4～5%の利得を得る)。先行技術の成績を著しく上回るだけでなく,新しいクラスへの一般化にも満足のいく結果をもたらしている。プロジェクトページ: https://aneeshan95.github.io/Sketch_PVT/ This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%. This is not via complicated design though, but by addressing two critical issues facing the community (i) the gold standard triplet loss does not enforce holistic latent space geometry, and (ii) there are never enough sketches to train a high accuracy model. For the former, we propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances. For the latter, we put forward a novel knowledge distillation module can leverage photo data for model training. Both modules are then plugged into a novel plug-n-playable training paradigm that allows for more stable training. More specifically, for (i) we employ an intra-modal triplet loss amongst sketches to bring sketches of the same instance closer from others, and one more amongst photos to push away different photo instances while bringing closer a structurally augmented version of the same photo (offering a gain of ~4-6%). To tackle (ii), we first pre-train a teacher on the large set of unlabelled photos over the aforementioned intra-modal photo triplet loss. Then we distill the contextual similarity present amongst the instances in the teacher's embedding space to that in the student's embedding space, by matching the distribution over inter-feature distances of respective samples in both embedding spaces (delivering a further gain of ~4-5%). Apart from outperforming prior arts significantly, our model also yields satisfactory results on generalising to new classes. Project page: https://aneeshan95.github.io/Sketch_PVT/	翻訳日:2023-03-27 15:55:12 公開日:2023-03-24
# gm-nerf: 多視点画像から一般化したモデルベースニューラルラミアンスフィールドの学習 GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images ( http://arxiv.org/abs/2303.13777v1 ) ライセンス: Link先を確認	Jianchuan Chen, Wentao Yi, Liqian Ma, Xu Jia, Huchuan Lu	(参考訳) 本研究では,スパースな多視点画像の集合を考えると,任意の人間パフォーマーのための高忠実なノベルビュー画像の合成に焦点をあてる。明瞭な身体ポーズと重度の自己閉塞の多様さから、これは困難な課題である。これを緩和するために,モデルベースニューラル・ラジアンス・フィールド(gm-nerf)を用いて,自由視点画像を合成する効果的な一般化フレームワークを提案する。具体的には,多視点2次元画像からの出現コードを,不正確な図形先行と画素空間のずれを緩和する幾何プロキシに登録する幾何ガイド付注意機構を提案する。さらに,より効率的な知覚の監督と合成の質の向上のために,神経レンダリングと部分勾配のバックプロパゲーションを実施している。提案手法を評価するため, 合成データセット THuman2.0 と Multi-garment と実世界のデータセット Genebody と ZJUMocap について実験を行った。提案手法は,新しいビュー合成と幾何再構成の観点から,最先端の手法よりも優れていることを示す。 In this work, we focus on synthesizing high-fidelity novel view images for arbitrary human performers, given a set of sparse multi-view images. It is a challenging task due to the large variation among articulated body poses and heavy self-occlusions. To alleviate this, we introduce an effective generalizable framework Generalizable Model-based Neural Radiance Fields (GM-NeRF) to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy which can alleviate the misalignment between inaccurate geometry prior and pixel space. On top of that, we further conduct neural rendering and partial gradient backpropagation for efficient perceptual supervision and improvement of the perceptual quality of synthesis. To evaluate our method, we conduct experiments on synthesized datasets THuman2.0 and Multi-garment, and real-world datasets Genebody and ZJUMocap. The results demonstrate that our approach outperforms state-of-the-art methods in terms of novel view synthesis and geometric reconstruction.	翻訳日:2023-03-27 15:54:21 公開日:2023-03-24
# gsplit: スプリットパラレル主義による大規模グラフ上のグラフニューラルネットワークトレーニング GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism ( http://arxiv.org/abs/2303.13775v1 ) ライセンス: Link先を確認	Sandeep Polisetty, Juelin Liu, Kobi Falus, Yi Ren Fung, Seung-Hwan Lim, Hui Guan, Marco Serafini	(参考訳) 数十億のエッジを持つ大規模グラフは、レコメンデーションシステム、社会グラフ分析、知識ベース、物質科学、生物学など、多くの産業、科学、工学の分野で広く使われている。機械学習モデルの新たなクラスであるgraph neural networks(gnn)は、さまざまなグラフ分析タスクのパフォーマンスが優れているため、これらのグラフを学習するためにますます採用されている。ミニバッチトレーニングは大規模グラフのトレーニングに一般的に採用されており、データ並列処理は複数のGPUにミニバッチトレーニングをスケールするための標準的なアプローチである。本稿では、GNNトレーニングシステムの基本的な性能ボトルネックは、データ並列アプローチの固有の制限と関係している、と論じる。次に,新しい並列ミニバッチトレーニングパラダイムであるsplit parallelismを提案する。我々は、gsplitと呼ばれる新しいシステムで分割並列性を実装し、DGL、Quiver、PaGraphといった最先端システムより優れていることを示す。 Large-scale graphs with billions of edges are ubiquitous in many industries, science, and engineering fields such as recommendation systems, social graph analysis, knowledge base, material science, and biology. Graph neural networks (GNN), an emerging class of machine learning models, are increasingly adopted to learn on these graphs due to their superior performance in various graph analytics tasks. Mini-batch training is commonly adopted to train on large graphs, and data parallelism is the standard approach to scale mini-batch training to multiple GPUs. In this paper, we argue that several fundamental performance bottlenecks of GNN training systems have to do with inherent limitations of the data parallel approach. We then propose split parallelism, a novel parallel mini-batch training paradigm. We implement split parallelism in a novel system called gsplit and show that it outperforms state-of-the-art systems such as DGL, Quiver, and PaGraph.	翻訳日:2023-03-27 15:53:50 公開日:2023-03-24
# ナノサテライトタスクスケジューリングへのグラフニューラルネットワークアプローチ:混合整数モデル学習への洞察 A Graph Neural Network Approach to Nanosatellite Task Scheduling: Insights into Learning Mixed-Integer Models ( http://arxiv.org/abs/2303.13773v1 ) ライセンス: Link先を確認	Bruno Machado Pacheco, Laio Oriel Seman, Cezar Ant\^onio Rigo, Eduardo Camponogara, Eduardo Augusto Bezerra, Leandro dos Santos Coelho	(参考訳) 本研究では,グラフニューラルネットワーク(GNN)を用いて,ナノサテライトタスクをより効率的にスケジュールする方法を検討する。オフライン・ナノサテライト・タスク・スケジューリング(onts)問題では、優先度、最小および最大アクティベーションイベント、実行時間枠、期間、実行ウィンドウといったqos(quality-of-service)の考慮事項や、衛星の電力資源の制約、エネルギーの収穫および管理の複雑さを考慮して、軌道上で実行するタスクの最適なスケジュールを見出すことが目的である。 ONTS問題は、従来の数学的定式化や正確な方法を用いてアプローチされてきたが、問題の挑戦事例への適用性は限られている。本研究は,旅行セールスマン問題,スケジューリング問題,施設配置問題など,多くの最適化問題に効果的に適用されたgnnの利用について検討する。ここでは、二部グラフにおけるONTS問題のMILPインスタンスを完全に表現する。 reluアクティベーション機能と連携した特徴集約とメッセージパッシング手法を適用し、古典的なディープラーニングモデルを用いて学習し、最適なパラメータセットを得る。さらに、新たな研究分野である Explainable AI (XAI) を適用して、どの機能 -- ノード、制約 -- が学習のパフォーマンスに最も大きな影響を与え、それらのモデルの内部動作と決定プロセスに光を当てています。また, 最適解における解の実現可能性と決定変数値の確率の予測において, 80\%以上の精度を得ることにより, 初期固定手法を検討した。以上の結果から,gnnはナノサテライトタスクのスケジューリングに有効な手法であり,組合せ最適化問題に対する説明可能な機械学習モデルの利点を浮き彫りにした。 This study investigates how to schedule nanosatellite tasks more efficiently using Graph Neural Networks (GNN). In the Offline Nanosatellite Task Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks to be carried out in orbit while taking into account Quality-of-Service (QoS) considerations such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as constraints on the satellite's power resources and the complexity of energy harvesting and management. The ONTS problem has been approached using conventional mathematical formulations and precise methods, but their applicability to challenging cases of the problem is limited. This study examines the use of GNNs in this context, which has been effectively applied to many optimization problems, including traveling salesman problems, scheduling problems, and facility placement problems. Here, we fully represent MILP instances of the ONTS problem in bipartite graphs. We apply a feature aggregation and message-passing methodology allied to a ReLU activation function to learn using a classic deep learning model, obtaining an optimal set of parameters. Furthermore, we apply Explainable AI (XAI), another emerging field of research, to determine which features -- nodes, constraints -- had the most significant impact on learning performance, shedding light on the inner workings and decision process of such models. We also explored an early fixing approach by obtaining an accuracy above 80\% both in predicting the feasibility of a solution and the probability of a decision variable value being in the optimal solution. Our results point to GNNs as a potentially effective method for scheduling nanosatellite tasks and shed light on the advantages of explainable machine learning models for challenging combinatorial optimization problems.	翻訳日:2023-03-27 15:53:33 公開日:2023-03-24
# 物体検出のための未知のスニッファー: 未知の物体に盲目を向けるな Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects ( http://arxiv.org/abs/2303.13769v1 ) ライセンス: Link先を確認	Wenteng Liang, Feng Xue, Yihao Liu, Guofeng Zhong, Anlong Ming	(参考訳) 最近提案されたopen-world objectとopen-set detectionは、never-seen-beforeオブジェクトを発見し、それらをクラス既知のオブジェクトと区別するブレークスルーを達成している。しかし、既知のクラスから未知クラスへの知識伝達の研究はより深く、背景に隠された未知のクラスを検出するための精細な能力に繋がる必要がある。本稿では,未知のオブジェクトと未知のオブジェクトの両方を見つけるための未知のスニファー(UnSniffer)を提案する。まず、一般化オブジェクト信頼度(GOC)スコアを導入し、クラス既知のサンプルのみを監督し、背景にある未知の不正な抑制を回避する。特に、クラスに知られたオブジェクトから学んだ信頼度スコアは、未知のものまで一般化することができる。さらに,背景の非対象サンプルを更に制限するために,負のエネルギー抑制損失を提案する。次に、各未知の最良のボックスは、トレーニング中に意味情報を欠いているため、推論中に取得することが難しい。この問題を解決するために,手動設計による非最大抑圧(NMS)後処理を置き換えるグラフベースの決定手法を提案する。最後に、未知のオブジェクト検出の精度評価を知識に含む最初の公開ベンチマークである、未知のオブジェクト検出ベンチマークを示す。実験の結果,本手法は既存の最先端手法よりもはるかに優れていることがわかった。コードは、https://github.com/Went-Liang/UnSniffer.comで入手できる。 The recently proposed open-world object and open-set detection achieve a breakthrough in finding never-seen-before objects and distinguishing them from class-known ones. However, their studies on knowledge transfer from known classes to unknown ones need to be deeper, leading to the scanty capability for detecting unknowns hidden in the background. In this paper, we propose the unknown sniffer (UnSniffer) to find both unknown and known objects. Firstly, the generalized object confidence (GOC) score is introduced, which only uses class-known samples for supervision and avoids improper suppression of unknowns in the background. Significantly, such confidence score learned from class-known objects can be generalized to unknown ones. Additionally, we propose a negative energy suppression loss to further limit the non-object samples in the background. Next, the best box of each unknown is hard to obtain during inference due to lacking their semantic information in training. To solve this issue, we introduce a graph-based determination scheme to replace hand-designed non-maximum suppression (NMS) post-processing. Finally, we present the Unknown Object Detection Benchmark, the first publicly benchmark that encompasses precision evaluation for unknown object detection to our knowledge. Experiments show that our method is far better than the existing state-of-the-art methods. Code is available at: https://github.com/Went-Liang/UnSniffer.	翻訳日:2023-03-27 15:52:58 公開日:2023-03-24
# UniTS: 自己教師型表現学習を備えたユニバーサル時系列分析フレームワーク UniTS: A Universal Time Series Analysis Framework with Self-supervised Representation Learning ( http://arxiv.org/abs/2303.13804v1 ) ライセンス: Link先を確認	Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang	(参考訳) 機械学習は時系列分析の強力なツールとして登場した。既存のメソッドは通常、異なる分析タスク用にカスタマイズされ、部分的なラベリングやドメインシフトといった実用的な問題に取り組む際の課題に直面します。上記の問題を解決するために,自己指導型表現学習(あるいは事前学習)を取り入れた新しいフレームワークであるUniTSを開発した。 UniTSのコンポーネントは、柔軟な拡張を可能にするためにsklearnのようなAPIを使って設計されている。ユーザがユーザフレンドリなGUIを使って分析タスクを簡単に実行できることを示し、従来のタスク固有の手法よりもUniTSの方が5つのメインストリームタスクと2つの実践的な設定で自己教師付き事前学習なしで優れた性能を示す。 Machine learning has emerged as a powerful tool for time series analysis. Existing methods are usually customized for different analysis tasks and face challenges in tackling practical problems such as partial labeling and domain shift. To achieve universal analysis and address the aforementioned problems, we develop UniTS, a novel framework that incorporates self-supervised representation learning (or pre-training). The components of UniTS are designed using sklearn-like APIs to allow flexible extensions. We demonstrate how users can easily perform an analysis task using the user-friendly GUIs, and show the superior performance of UniTS over the traditional task-specific methods without self-supervised pre-training on five mainstream tasks and two practical settings.	翻訳日:2023-03-27 15:46:22 公開日:2023-03-24
# 感情認識のための分離マルチモーダル蒸留 Decoupled Multimodal Distilling for Emotion Recognition ( http://arxiv.org/abs/2303.13802v1 ) ライセンス: Link先を確認	Yong Li, Yuanzhi Wang, Zhen Cui	(参考訳) ヒトのマルチモーダル感情認識(mer)は、言語、視覚、音響的モダリティを通じて人間の感情を知覚することを目的としている。以前のMERアプローチの印象的な性能にもかかわらず、固有の多モード不均一性はまだ残っており、異なるモダリティの寄与は著しく異なる。本研究では,自由で適応的なクロスモーダル知識蒸留を容易にする脱共役マルチモーダル蒸留(dmd)アプローチを提案し,各モーダルの識別的特徴を高めることを目的とした。特に、各モダリティの表現は、自己回帰的な方法で、2つの部分、すなわちモダリティ-非関係/排他的な空間に分解される。 DMDはグラフ蒸留ユニット(GD-Unit)を各分離部に使用し、より専門的で効果的な方法で各GDを実行できる。 GD-Unitは動的グラフで構成され、各頂点はモダリティを表し、各エッジは動的知識蒸留を示す。このようなgdパラダイムは、蒸留重みを自動的に学習できる柔軟な知識伝達方法を提供し、多様なクロスモーダル知識伝達パターンを可能にする。実験結果からDMDは最先端のMER法よりも優れた性能を示した。 DMDのグラフエッジは、モダリティ非関連かつ排他的な特徴空間に意味のある分布パターンを示す。コードは \url{https://github.com/mdswyz/DMD} でリリースされる。 Human multimodal emotion recognition (MER) aims to perceive human emotions via language, visual and acoustic modalities. Despite the impressive performance of previous MER approaches, the inherent multimodal heterogeneities still haunt and the contribution of different modalities varies significantly. In this work, we mitigate this issue by proposing a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation, aiming to enhance the discriminative features of each modality. Specially, the representation of each modality is decoupled into two parts, i.e., modality-irrelevant/-exclusive spaces, in a self-regression manner. DMD utilizes a graph distillation unit (GD-Unit) for each decoupled part so that each GD can be performed in a more specialized and effective manner. A GD-Unit consists of a dynamic graph where each vertice represents a modality and each edge indicates a dynamic knowledge distillation. Such GD paradigm provides a flexible knowledge transfer manner where the distillation weights can be automatically learned, thus enabling diverse crossmodal knowledge transfer patterns. Experimental results show DMD consistently obtains superior performance than state-of-the-art MER methods. Visualization results show the graph edges in DMD exhibit meaningful distributional patterns w.r.t. the modality-irrelevant/-exclusive feature spaces. Codes are released at \url{https://github.com/mdswyz/DMD}.	翻訳日:2023-03-27 15:46:09 公開日:2023-03-24
# 自己教師付き共訓練によるオープンドメインスロット充填に向けて Toward Open-domain Slot Filling via Self-supervised Co-training ( http://arxiv.org/abs/2303.13801v1 ) ライセンス: Link先を確認	Adib Mosharrof, Moghis Fereidouni, A.B. Siddique	(参考訳) スロットフィリングは現代の会話システムにおいて重要なタスクの1つである。既存の文献の大部分は、新しいドメインごとにラベル付きトレーニングデータを必要とする教師付き学習手法を採用している。ゼロショット学習や弱い監督アプローチなどは、手動ラベリングの代替としてpromiseが示されている。それでも、これらの学習パラダイムは、パフォーマンスの観点から教師あり学習アプローチよりもかなり劣っている。この性能ギャップを最小化し、オープンドメインスロットフィリングの可能性を示すために、SCotと呼ばれる自己教師付き協調学習フレームワークを提案する。フェーズ1は2つの補完的な擬似ラベルを自動取得する。フェーズ2は、これらの擬似ラベルセットを使用してスロットフィリングタスクに適応することにより、事前訓練された言語モデルBERTのパワーを活用する。フェーズ3では,両モデルが高信頼度ソフトラベルを自動選択し,他のモデルの性能を反復的に向上する自己教師付き協調機構を導入する。 SCotは,SGDデータセットとMultiWoZデータセットでそれぞれ45.57%,37.56%,最先端モデルよりも優れていた。さらに,提案するフレームワークであるSCotは,最先端の完全教師付きモデルと比較して,同等のパフォーマンスを実現する。 Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to supervised learning approaches in terms of performance. To minimize this performance gap and demonstrate the possibility of open-domain slot filling, we propose a Self-supervised Co-training framework, called SCot, that requires zero in-domain manually labeled training examples and works in three phases. Phase one acquires two sets of complementary pseudo labels automatically. Phase two leverages the power of the pre-trained language model BERT, by adapting it for the slot filling task using these sets of pseudo labels. In phase three, we introduce a self-supervised cotraining mechanism, where both models automatically select highconfidence soft labels to further improve the performance of the other in an iterative fashion. Our thorough evaluations show that SCot outperforms state-of-the-art models by 45.57% and 37.56% on SGD and MultiWoZ datasets, respectively. Moreover, our proposed framework SCot achieves comparable performance when compared to state-of-the-art fully supervised models.	翻訳日:2023-03-27 15:45:45 公開日:2023-03-24
# ビデオデモへのステップバイステップインストラクショナルダイアグラムの適応 Aligning Step-by-Step Instructional Diagrams to Video Demonstrations ( http://arxiv.org/abs/2303.13800v1 ) ライセンス: Link先を確認	Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould	(参考訳) マルチモーダルアライメントは、あるモダリティから別のモダリティを使ってクエリする際のインスタンスの検索を容易にする。本稿では,このようなアライメントが中間にある新しい設定を考える。 (i)組み立て図(イケアの組立マニュアルによく見られる)として表される指示ステップ、及び (ii)内装ビデオの映像セグメント(実世界の組立動作の制定を含む。) このアライメントを学習するために,新しい教師付きコントラスト学習手法を導入する。そこで本研究では,本手法の有効性を実証するために,多様な家具組立コレクションからの183時間のビデオと,関連する指導マニュアルからの8,300点近いイラストと,それらの真実のアライメントに注釈を付したイケア組立用IAWを提案する。第1に,ビデオセグメントとイラストレーション間の最寄りの隣接検索,第2に,各ビデオの指示ステップとセグメントのアラインメント,という2つのタスクを定義した。 iawに関する広範な実験は、代替案に対する我々のアプローチの優れた性能を示している。 Multimodal alignment facilitates the retrieval of instances from one modality when queried using another. In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world. To learn this alignment, we introduce a novel supervised contrastive learning method that learns to align videos with the subtle details in the assembly diagrams, guided by a set of novel losses. To study this problem and demonstrate the effectiveness of our method, we introduce a novel dataset: IAW for Ikea assembly in the wild consisting of 183 hours of videos from diverse furniture assembly collections and nearly 8,300 illustrations from their associated instruction manuals and annotated for their ground truth alignments. We define two tasks on this dataset: First, nearest neighbor retrieval between video segments and illustrations, and, second, alignment of instruction steps and the segments for each video. Extensive experiments on IAW demonstrate superior performances of our approach against alternatives.	翻訳日:2023-03-27 15:45:23 公開日:2023-03-24
# 収束対策と創発モデル:人間自動信頼アンケートのメタ分析 Converging Measures and an Emergent Model: A Meta-Analysis of Human-Automation Trust Questionnaires ( http://arxiv.org/abs/2303.13799v1 ) ライセンス: Link先を確認	Yosef S. Razin and Karen M. Feigh	(参考訳) 人自動信頼度を測定する上で重要な課題は、高度に可変性のある構成拡散、モデル、アンケートの量である。しかし、全員が信頼が技術受容、継続的な使用、流動性、チームワークの重要な要素であることに同意する。そこで本研究では,信頼度・信頼度調査機器のメタ分析を行い,信頼度評価のためのコンセンサスモデルを合成する。この目的を達成するために、この研究は、最も頻繁に引用され、最も有望な人間自動化と人間ロボット信頼のアンケートと、そのような信頼の次元と先行を形成する最も確立された要因を識別する。混乱と人口増加を両立させるため,質問紙間の用語の詳細なマッピングを行う。さらに,多因子サーベイ機器を用いた実験から得られた回帰モデルのメタ分析を行った。このメタアナリシスに基づいて,人間-自律信頼の実験的検証モデルを示す。この収束モデルは、将来の研究のための統合フレームワークを確立する。信頼度測定の現在の境界と、さらなる調査が必要かを特定する。我々は、適切な信頼調査機器の選択と設計を議論することで締めくくります。信頼度調査機器の比較,マッピング,分析により,人間-自律相互作用における信頼のコンセンサス構造を同定する。そうすることで、信頼を測定するためのより完全な基盤が、広く適用できるようになります。信頼という学問的考えと、口語的で常識的な概念を統合する。信頼の重要性がますます認識され、特に人間と自律的な相互作用において、この研究は、それを理解し、測定するためのより良い位置を与えてくれます。 A significant challenge to measuring human-automation trust is the amount of construct proliferation, models, and questionnaires with highly variable validation. However, all agree that trust is a crucial element of technological acceptance, continued usage, fluency, and teamwork. Herein, we synthesize a consensus model for trust in human-automation interaction by performing a meta-analysis of validated and reliable trust survey instruments. To accomplish this objective, this work identifies the most frequently cited and best-validated human-automation and human-robot trust questionnaires, as well as the most well-established factors, which form the dimensions and antecedents of such trust. To reduce both confusion and construct proliferation, we provide a detailed mapping of terminology between questionnaires. Furthermore, we perform a meta-analysis of the regression models that emerged from those experiments which used multi-factorial survey instruments. Based on this meta-analysis, we demonstrate a convergent experimentally validated model of human-automation trust. This convergent model establishes an integrated framework for future research. It identifies the current boundaries of trust measurement and where further investigation is necessary. We close by discussing choosing and designing an appropriate trust survey instrument. By comparing, mapping, and analyzing well-constructed trust survey instruments, a consensus structure of trust in human-automation interaction is identified. Doing so discloses a more complete basis for measuring trust emerges that is widely applicable. It integrates the academic idea of trust with the colloquial, common-sense one. Given the increasingly recognized importance of trust, especially in human-automation interaction, this work leaves us better positioned to understand and measure it.	翻訳日:2023-03-27 15:45:02 公開日:2023-03-24
# ダウンサンプリングに基づく2次元フロアプランセグメンテーション 2D Floor Plan Segmentation Based on Down-sampling ( http://arxiv.org/abs/2303.13798v1 ) ライセンス: Link先を確認	Mohammadreza Sharif, Kiran Mohan, Sarath Suvarna	(参考訳) 近年、フロアプランのセグメンテーションは、フロアプランの再構築やロボット工学における幅広い応用により、注目されている。本稿では,ダウンサンプリング方式に基づく新しい2次元フロアプランセグメンテーション手法を提案する。本手法では,フロアプラン上で連続的なダウンサンプリングを行い,その複雑度を低減しつつ構造情報を維持する。掃除ロボットが未知の環境下で生成した散在するフロアプランとフロアプランのベンチマークから得られた結果を提示することにより,提案手法の有効性を実証する。本手法はフロアプランセグメンテーションの計算と実装の複雑さを大幅に減らし,現実のアプリケーションに適している。さらに,セグメンテーション結果を評価するための適切な指標について検討する。提案手法は, 乱雑な環境下での2次元フロアプランセグメンテーションに有望な結果をもたらす。 In recent years, floor plan segmentation has gained significant attention due to its wide range of applications in floor plan reconstruction and robotics. In this paper, we propose a novel 2D floor plan segmentation technique based on a down-sampling approach. Our method employs continuous down-sampling on a floor plan to maintain its structural information while reducing its complexity. We demonstrate the effectiveness of our approach by presenting results obtained from both cluttered floor plans generated by a vacuum cleaning robot in unknown environments and a benchmark of floor plans. Our technique considerably reduces the computational and implementation complexity of floor plan segmentation, making it more suitable for real-world applications. Additionally, we discuss the appropriate metric for evaluating segmentation results. Overall, our approach yields promising results for 2D floor plan segmentation in cluttered environments.	翻訳日:2023-03-27 15:44:36 公開日:2023-03-24
# ゼロショット一般化リワード関数によるタスク指向対話システムのパーソナライズ Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function ( http://arxiv.org/abs/2303.13797v1 ) ライセンス: Link先を確認	A.B. Siddique, M.H. Maqbool, Kshitija Taywade, Hassan Foroosh	(参考訳) タスク指向対話システムは、自然言語を使ってタスクを達成できる。最新システムは、個性に関係なくユーザーに対して同じように反応するが、対話のパーソナライズは、より高いレベルの採用とより良いユーザーエクスペリエンスをもたらす可能性がある。パーソナライズされたダイアログシステムの構築は重要だが、挑戦的な取り組みであり、その課題にはほんの一握りの作業しかなかった。既存の作業の多くは教師付き学習アプローチに依存しており、各ユーザプロファイルに対して、厳格で高価なラベル付きトレーニングデータを必要とする。さらに、各ユーザプロファイルのデータ収集とラベル付けは事実上不可能である。本研究では、ゼロショット一般化報酬関数を用いて、広範囲のユーザプロファイルに適応可能なタスク指向対話システムを、教師なしでパーソナライズする新しいフレームワークP-ToDを提案する。 P-ToDは、トレーニング済みのGPT-2をバックボーンモデルとして使用し、3つのフェーズで動作する。第1段階はタスク固有の訓練を行う。フェーズ2は、ゼロショット一般化報酬関数で導かれるポリシー勾配を実行する近似ポリシー最適化アルゴリズムを活用することにより、教師なしのパーソナライゼーションを開始する。新たな報酬機能は,未発見のプロファイルにおいても生成した応答の品質を定量化することができる。オプションの最終フェーズは、いくつかのラベル付きトレーニング例を使用してパーソナライズされたモデルを微調整する。パーソナライズされたbAbIダイアログベンチマークを用いて,5つのタスクと最大180種類のユーザプロファイルに対して,広範な実験分析を行う。実験結果から,P-ToDはラベル付きサンプルがゼロであっても,最先端の教師付きパーソナライゼーションモデルより優れ,強力な完全教師付きGPT-2ベースラインと比較してBLEUおよびROUGEメトリクス上での競争性能が向上することが示された。 Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the challenge. Most existing works rely on supervised learning approaches and require laborious and expensive labeled training data for each user profile. Additionally, collecting and labeling data for each user profile is virtually impossible. In this work, we propose a novel framework, P-ToD, to personalize task-oriented dialog systems capable of adapting to a wide range of user profiles in an unsupervised fashion using a zero-shot generalizable reward function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases. Phase one performs task-specific training. Phase two kicks off unsupervised personalization by leveraging the proximal policy optimization algorithm that performs policy gradients guided by the zero-shot generalizable reward function. Our novel reward function can quantify the quality of the generated responses even for unseen profiles. The optional final phase fine-tunes the personalized model using a few labeled training examples. We conduct extensive experimental analysis using the personalized bAbI dialogue benchmark for five tasks and up to 180 diverse user profiles. The experimental results demonstrate that P-ToD, even when it had access to zero labeled examples, outperforms state-of-the-art supervised personalization models and achieves competitive performance on BLEU and ROUGE metrics when compared to a strong fully-supervised GPT-2 baseline	翻訳日:2023-03-27 15:44:23 公開日:2023-03-24
# Zolly:人間のメッシュ再建のためのズーム焦点長の補正 Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction ( http://arxiv.org/abs/2303.13796v1 ) ライセンス: Link先を確認	Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura	(参考訳) 野生での単視RGB画像のキャリブレーションが難しいため、既存の3次元メッシュ再構成(3DHMR)手法では、焦点距離が一定であり、背景環境の文脈に基づいて推定することは困難であり、カメラが人体に近づいたときの視界カメラ投影による胴体、手、顔の歪みの問題に対処できない。単純焦点距離の仮定は、不正確な定式化された射影行列でこの課題を害することができる。そこで本稿では,遠近像に着目した最初の3dhmr法であるzollyを提案する。私たちのアプローチは、主に人体のカメラセンターへの相対的な位置によって引き起こされる遠近的歪みの理由を分析することから始まります。本研究では,人体の2次元密歪スケールを記述する新しいカメラモデルと,新しい2次元表現である歪み画像を提案する。次に,環境文脈特徴よりも歪みスケール特徴から距離を推定する。その後、歪み特徴と画像特徴を統合し、ボディメッシュを再構築する。正しい投影行列を定式化し、人体の位置を特定するために、遠近法と弱視投影損失を同時に利用する。既存のデータセットは、このタスクを処理できないため、最初の合成データセットPDHumanを提案し、このタスクに適した2つの実世界のデータセットを拡張する。広範な実験により、zollyはパースペクティブディストリクトデータセットと標準ベンチマーク(3dpw)の両方において、既存の最先端のメソッドよりも優れていることが示されている。 As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body. The naive focal length assumptions can harm this task with the incorrectly formulated projection matrices. To solve this, we propose Zolly, the first 3DHMR method focusing on perspective-distorted images. Our approach begins with analysing the reason for perspective distortion, which we find is mainly caused by the relative location of the human body to the camera center. We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body. We then estimate the distance from distortion scale features rather than environment context features. Afterwards, we integrate the distortion feature with image features to reconstruct the body mesh. To formulate the correct projection matrix and locate the human body position, we simultaneously use perspective and weak-perspective projection loss. Since existing datasets could not handle this task, we propose the first synthetic dataset PDHuman and extend two real-world datasets tailored for this task, all containing perspective-distorted human images. Extensive experiments show that Zolly outperforms existing state-of-the-art methods on both perspective-distorted datasets and the standard benchmark (3DPW).	翻訳日:2023-03-27 15:43:54 公開日:2023-03-24
# マッチングキーポイント作物(mkpc)を用いた効率的かつ高精度なコビナブル領域定位 : 画像マッチング性能向上のための2段階パイプライン Efficient and Accurate Co-Visible Region Localization with Matching Key-Points Crop (MKPC): A Two-Stage Pipeline for Enhancing Image Matching Performance ( http://arxiv.org/abs/2303.13794v1 ) ライセンス: Link先を確認	Hongjian Song, Yuki Kashiwaba, Shuai Wu, Canming Wang	(参考訳) 画像マッチングはコンピュータビジョンにおける古典的な基本的なタスクである。本稿では,コビジブル領域以外の領域にはほとんど情報を持たないという仮説の下で,マッチング型キーポイント作物(MKPC)アルゴリズムを提案する。 MKPCは、非常に効率と精度のよい視認可能な領域である臨界領域を特定し、提案し、収穫する。さらに,mkpcを基盤として,任意の画像マッチングモデルや組合せと互換性のある画像マッチングのための一般的な2段階パイプラインを提案する。 2段階のパイプラインにsuperpoint + superglueを差し込む実験を行い,提案手法が屋外ポーズ推定の性能を向上させることを示した。さらに,本手法は,画像マッチングチャレンジ2022ベンチマークにおいて,現在最も難しい屋外ベンチマークであるsoma on image matching challenge 2022を上回っている。 Image matching is a classic and fundamental task in computer vision. In this paper, under the hypothesis that the areas outside the co-visible regions carry little information, we propose a matching key-points crop (MKPC) algorithm. The MKPC locates, proposes and crops the critical regions, which are the co-visible areas with great efficiency and accuracy. Furthermore, building upon MKPC, we propose a general two-stage pipeline for image matching, which is compatible to any image matching models or combinations. We experimented with plugging SuperPoint + SuperGlue into the two-stage pipeline, whose results show that our method enhances the performance for outdoor pose estimations. What's more, in a fair comparative condition, our method outperforms the SOTA on Image Matching Challenge 2022 Benchmark, which represents the hardest outdoor benchmark of image matching currently.	翻訳日:2023-03-27 15:43:26 公開日:2023-03-24
# 関連イベントによるコンペティション予測 Forecasting Competitions with Correlated Events ( http://arxiv.org/abs/2303.13793v1 ) ライセンス: Link先を確認	Rafael Frongillo, Manuel Lladser, Anish Thilagar, Bo Waggoner	(参考訳) Witkowskiらから始める。 [2022]近年の競争予測に関する研究は、共通の勝者獲得機構によるインセンティブの問題に対処している。 Frongilloなど。 2021] オンライン学習フレームワークであるフォロー・ザ・レギュラライズド・リーダー(FTRL)に基づく競争機構を提案する。それらのメカニズムは、$O(\log(n)/\epsilon^2)$イベントのみを使用して高い確率で$\epsilon$-optimal forecasterを選択する。これらの作業は、この問題に関するこれまでのすべての先行作業とともに、イベントが独立していると仮定する。相関イベントの予測競争に関する研究を開始する。相関を定量化するために、ブロック相関の概念を導入し、各事象を最大$b$の他の事象と強く相関させることができる。この相関による分布の下では、FTRL機構は$O(b^2 \log(n)/\epsilon^2)$イベントを使用して、$\epsilon$-optimal guaranteeを保っている。我々の証明は、より広い関心を持つかもしれない相関確率変数に束縛された新しい濃度を含む。 Beginning with Witkowski et al. [2022], recent work on forecasting competitions has addressed incentive problems with the common winner-take-all mechanism. Frongillo et al. [2021] propose a competition mechanism based on follow-the-regularized-leader (FTRL), an online learning framework. They show that their mechanism selects an $\epsilon$-optimal forecaster with high probability using only $O(\log(n)/\epsilon^2)$ events. These works, together with all prior work on this problem thus far, assume that events are independent. We initiate the study of forecasting competitions for correlated events. To quantify correlation, we introduce a notion of block correlation, which allows each event to be strongly correlated with up to $b$ others. We show that under distributions with this correlation, the FTRL mechanism retains its $\epsilon$-optimal guarantee using $O(b^2 \log(n)/\epsilon^2)$ events. Our proof involves a novel concentration bound for correlated random variables which may be of broader interest.	翻訳日:2023-03-27 15:43:10 公開日:2023-03-24
# テキスト・画像合成のための因子分解型生成逆数ネットワーク Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis ( http://arxiv.org/abs/2303.13821v1 ) ライセンス: Link先を確認	Jiguo Li, Xiaobin Liu, Lirong Zheng	(参考訳) テキストと画像の合成に関する以前の研究は、通常、文章の埋め込みとノイズベクトルを結合し、文章の埋め込みとノイズベクトルは、生成の異なる側面を制御する2つの異なる要因である。単純にそれらを結合すれば、潜伏因子を絡み、生成モデルを包み込む。本稿では,これら2つの要因を分解し,FDGAN(Facter Decomposed Generative Adversarial Networks)を提案する。これを実現するために、まずノイズベクトルから画像を生成し、その後、生成者および判別者の両方に正規化層に埋め込まれた文を適用する。また,テキスト画像機能を調整するための付加型ノルム層も設計した。実験の結果,雑音の分解と文の埋め込みは,テキストから画像への合成において潜在因子を分離し,生成モデルをより効率的にすることが示された。ベースラインと比較すると、FDGANは性能が向上し、パラメータが少ない。 Prior works about text-to-image synthesis typically concatenated the sentence embedding with the noise vector, while the sentence embedding and the noise vector are two different factors, which control the different aspects of the generation. Simply concatenating them will entangle the latent factors and encumber the generative model. In this paper, we attempt to decompose these two factors and propose Factor Decomposed Generative Adversarial Networks~(FDGAN). To achieve this, we firstly generate images from the noise vector and then apply the sentence embedding in the normalization layer for both generator and discriminators. We also design an additive norm layer to align and fuse the text-image features. The experimental results show that decomposing the noise and the sentence embedding can disentangle latent factors in text-to-image synthesis, and make the generative model more efficient. Compared with the baseline, FDGAN can achieve better performance, while fewer parameters are used.	翻訳日:2023-03-27 15:37:19 公開日:2023-03-24
# Pre-RadGraphFormer:X線から放射線グラフを生成するための事前知識強化変換器 Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays ( http://arxiv.org/abs/2303.13818v1 ) ライセンス: Link先を確認	Yiheng Xiong, Jingsong Liu, Kamilia Zaripova, Sahand Sharifzadeh, Matthias Keicher, Nassir Navab	(参考訳) ラジオグラフィーグラフを用いた自由テキストX線写真からの構造化された臨床情報の抽出は, レポートジェネレーション法の臨床正当性を評価する上で有用であることが示されている。しかし胸部X線像(CXR)からの放射線線図の直接生成は試みられていない。このギャップに対処するために,確率的知識グラフ(PKG)の形で事前知識を持つトランスフォーマーモデルを用いて,CXR画像から直接ラジオロジーグラフを生成する,Preside-RadGraphFormerという新しい手法を提案する。 PKGは、解剖学的構造や医学的観察を含む放射線学の実体間の統計的関係をモデル化する。この追加の文脈情報は、エンティティと関係抽出の精度を高める。生成されたラジオロジーグラフは、自由テキストや構造化レポートの生成や病理の多ラベル分類など、様々な下流タスクに適用することができる。提案手法は,CXR画像から直接ラジオグラフィーグラフを生成するための有望な手法であり,医用画像解析や臨床診断に有意な可能性を秘めている。 The extraction of structured clinical information from free-text radiology reports in the form of radiology graphs has been demonstrated to be a valuable approach for evaluating the clinical correctness of report-generation methods. However, the direct generation of radiology graphs from chest X-ray (CXR) images has not been attempted. To address this gap, we propose a novel approach called Prior-RadGraphFormer that utilizes a transformer model with prior knowledge in the form of a probabilistic knowledge graph (PKG) to generate radiology graphs directly from CXR images. The PKG models the statistical relationship between radiology entities, including anatomical structures and medical observations. This additional contextual information enhances the accuracy of entity and relation extraction. The generated radiology graphs can be applied to various downstream tasks, such as free-text or structured reports generation and multi-label classification of pathologies. Our approach represents a promising method for generating radiology graphs directly from CXR images, and has significant potential for improving medical image analysis and clinical decision-making.	翻訳日:2023-03-27 15:37:03 公開日:2023-03-24
# ABLE-NeRF:ニューラルラジアンスフィールドのための学習可能な埋め込みによる注意に基づくレンダリング ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field ( http://arxiv.org/abs/2303.13817v1 ) ライセンス: Link先を確認	Zhe Jun Tang, Tat-Jen Cham, Haiyu Zhao	(参考訳) neural radiance field(nerf)は、連続ボリュームシーン機能を最適化して3dシーンを表現する一般的な方法である。ボリュームレンダリング(VR)を適用した大きな成功は、ビュー依存効果を生み出す際のアキレスのヒールでもある。その結果、光沢と透明な表面はしばしば濁っている。これらのアーティファクトを減らすための治療法は、後ろ向きの正常なボリュームを除外することで、このVR方程式を制約することである。このアプローチは光沢のある表面をレンダリングすることに成功したが、半透明なオブジェクトはいまだに表現に乏しい。本稿では,光線に沿ったボリュームに自己注意型フレームワークを導入することで,物理ベースのVRアプローチに代わる手法を提案する。また、光プローブを利用してシーンを透過するローカル照明を記憶する現代のゲームエンジンにインスパイアされ、Learningable Embeddingsを組み込んでシーン内のビュー依存効果をキャプチャする。 ABLE-NeRFと呼ばれる本手法は,レンダリングにおける光沢表面を著しく低減し,先行技術に欠ける現実的な半透明表面を生成する。 Blenderデータセットでは、ABLE-NeRFはSOTAの結果を達成し、3つの画像品質指標PSNR、SSIM、LPIPSでRef-NeRFを上回っている。 Neural Radiance Field (NeRF) is a popular method in representing 3D scenes by optimising a continuous volumetric scene function. Its large success which lies in applying volumetric rendering (VR) is also its Achilles' heel in producing view-dependent effects. As a consequence, glossy and transparent surfaces often appear murky. A remedy to reduce these artefacts is to constrain this VR equation by excluding volumes with back-facing normal. While this approach has some success in rendering glossy surfaces, translucent objects are still poorly represented. In this paper, we present an alternative to the physics-based VR approach by introducing a self-attention-based framework on volumes along a ray. In addition, inspired by modern game engines which utilise Light Probes to store local lighting passing through the scene, we incorporate Learnable Embeddings to capture view dependent effects within the scene. Our method, which we call ABLE-NeRF, significantly reduces `blurry' glossy surfaces in rendering and produces realistic translucent surfaces which lack in prior art. In the Blender dataset, ABLE-NeRF achieves SOTA results and surpasses Ref-NeRF in all 3 image quality metrics PSNR, SSIM, LPIPS.	翻訳日:2023-03-27 15:36:44 公開日:2023-03-24
# キーレスアテンションに基づくディープニューラルネットワークによる顔と歩行の特徴のマルチモーダル・アダプティブフュージョン Multimodal Adaptive Fusion of Face and Gait Features using Keyless attention based Deep Neural Networks for Human Identification ( http://arxiv.org/abs/2303.13814v1 ) ライセンス: Link先を確認	Ashwin Prakash, Thejaswin S, Athira Nambiar and Alexandre Bernardino	(参考訳) バイオメトリックスは視覚ベースの監視アプリケーションにおいて重要な役割を果たす。ガイトのようなソフトバイオメトリックスは、人物認識や再同定といった監視タスクで顔と共に広く使われている。それにもかかわらず、実用的なシナリオでは、古典的な融合技術は個々のユーザや外部環境の変化にはあまり反応しない。そこで本研究では,キーレスアテンション深層ニューラルネットワークを活用することで,歩行と顔の動的取り込みを実現するための適応型マルチバイオメトリックフュージョン戦略を提案する。本研究では,カメラの視点や距離など様々な外部要因について検討した。大規模な実験により,提案モデルの性能は最先端モデルと比較して優れていた。 Biometrics plays a significant role in vision-based surveillance applications. Soft biometrics such as gait is widely used with face in surveillance tasks like person recognition and re-identification. Nevertheless, in practical scenarios, classical fusion techniques respond poorly to changes in individual users and in the external environment. To this end, we propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks. Various external factors such as viewpoint and distance to the camera, are investigated in this study. Extensive experiments have shown superior performanceof the proposed model compared with the state-of-the-art model.	翻訳日:2023-03-27 15:36:22 公開日:2023-03-24
# Generalist:自然とロバストな一般化の分離 Generalist: Decoupling Natural and Robust Generalization ( http://arxiv.org/abs/2303.13813v1 ) ライセンス: Link先を確認	Hongjun Wang, Yisen Wang	(参考訳) 標準訓練によって得られた深層ニューラルネットワークは、常に敵の例に苦しめられている。敵の訓練は敵の例から防御する能力を示すが、残念ながら自然の一般化は必然的に減少する。この問題に対処するため,共同トレーニングから自然の一般化と堅牢な一般化を分離し,それぞれ異なるトレーニング戦略を定式化する。具体的には,これらの2つの一般化誤差に対する期待値の国際的損失を最小化する代わりに,基本学習者をタスク認識戦略で同時に訓練し,それぞれの分野に特化できるようにする,「emph{Generalist}」というバイエキスパートフレームワークを提案する。ベース学習者のパラメータを収集して結合し、トレーニングプロセス中の間隔でグローバル学習者を形成する。グローバル学習者は、継続トレーニングの初期化パラメータとして、ベース学習者に配布される。理論的には、基礎学習者が十分に訓練された後、ジェネラリストのリスクが低下することを証明する。広範囲な実験により、自然例に対して高い精度を達成し、敵例に対するかなりの堅牢性を維持しながら、ジェネラリストの適用性を検証する。コードはhttps://github.com/PKU-ML/Generalist.comで入手できる。 Deep neural networks obtained by standard training have been constantly plagued by adversarial examples. Although adversarial training demonstrates its capability to defend against adversarial examples, unfortunately, it leads to an inevitable drop in the natural generalization. To address the issue, we decouple the natural generalization and the robust generalization from joint training and formulate different training strategies for each one. Specifically, instead of minimizing a global loss on the expectation over these two generalization errors, we propose a bi-expert framework called \emph{Generalist} where we simultaneously train base learners with task-aware strategies so that they can specialize in their own fields. The parameters of base learners are collected and combined to form a global learner at intervals during the training process. The global learner is then distributed to the base learners as initialized parameters for continued training. Theoretically, we prove that the risks of Generalist will get lower once the base learners are well trained. Extensive experiments verify the applicability of Generalist to achieve high accuracy on natural examples while maintaining considerable robustness to adversarial ones. Code is available at https://github.com/PKU-ML/Generalist.	翻訳日:2023-03-27 15:36:13 公開日:2023-03-24
# Evidence-aware multi-modal data fusion と人工膝置換予測への応用 Evidence-aware multi-modal data fusion and its application to total knee replacement prediction ( http://arxiv.org/abs/2303.13810v1 ) ライセンス: Link先を確認	Xinwen Liu, Jing Wang, S. Kevin Zhou, Craig Engstrom, Shekhar S. Chandra	(参考訳) ディープニューラルネットワークは、人工膝置換(TKR)などの医療状態を予測するために広く研究されている。画像データ, 臨床変数, 人口統計情報などの異なる形態のデータが補完的情報を提供し, 予測精度を共に向上できることを示した。しかし、様々なモダリティのデータソースは必ずしも高品質であるとは限らないし、それぞれのモダリティは医療条件の部分的な情報しか持たないかもしれない。したがって、異なるモダリティからの予測は反対であり、最終的な予測はそのような矛盾が存在する場合に失敗する可能性がある。したがって、最終決定を行う際に、各ソースデータの信頼性と予測出力を考慮することが重要である。本稿では,DST(Dempster-Shafer theory)に基づくエビデンス対応マルチモーダルデータ融合フレームワークを提案する。バックボーンモデルにはイメージブランチ、非イメージブランチ、フュージョンブランチが含まれる。各ブランチには、抽出された特徴を入力として出力するエビデンスネットワークがあり、これは現在のブランチからの出力の信頼性を表すように設計されている。複数の支店からの証拠スコアと共に出力確率をデンプスターの組合せ則と組み合わせて最終的な予測を行う。 TKR予測タスクのための公共OAイニシアチブ(OAI)データセットの実験結果は、様々なバックボーンモデル上で提案された融合戦略の優位性を示している。 Deep neural networks have been widely studied for predicting a medical condition, such as total knee replacement (TKR). It has shown that data of different modalities, such as imaging data, clinical variables and demographic information, provide complementary information and thus can improve the prediction accuracy together. However, the data sources of various modalities may not always be of high quality, and each modality may have only partial information of medical condition. Thus, predictions from different modalities can be opposite, and the final prediction may fail in the presence of such a conflict. Therefore, it is important to consider the reliability of each source data and the prediction output when making a final decision. In this paper, we propose an evidence-aware multi-modal data fusion framework based on the Dempster-Shafer theory (DST). The backbone models contain an image branch, a non-image branch and a fusion branch. For each branch, there is an evidence network that takes the extracted features as input and outputs an evidence score, which is designed to represent the reliability of the output from the current branch. The output probabilities along with the evidence scores from multiple branches are combined with the Dempster's combination rule to make a final prediction. Experimental results on the public OA initiative (OAI) dataset for the TKR prediction task show the superiority of the proposed fusion strategy on various backbone models.	翻訳日:2023-03-27 15:35:52 公開日:2023-03-24
# 大規模言語モデルにおけるヒューマンライクな翻訳評価を可能にする誤り解析プロンプト:ChatGPTを事例として Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT ( http://arxiv.org/abs/2303.13809v1 ) ライセンス: Link先を確認	Qingyu Lu, Baopu Qiu, Liang Ding, Liping Xie, Dacheng Tao	(参考訳) 生成型大規模言語モデル(LLM)、例えばChatGPTは、機械翻訳、質問応答、テキスト要約、自然言語理解など、いくつかのNLPタスクにおいて顕著な習熟性を示している。近年の研究では,機械翻訳(MT)の品質評価にChatGPTを用いることで,システムレベルでの最先端性能が向上するが,セグメントレベルでは性能が低下することが示されている。 MT品質評価におけるLCMの性能向上を図るため,いくつかのプロンプト法の検討を行った。その結果、連鎖的思考と誤り分析を組み合わせることで、chatgptのようなllmは、システムレベルとセグメントレベルの両方において、人間的mt評価を生成できるという新しいプロンプト法である \textbf{\texttt{error analysis prompting}} が得られた。さらに, MT評価器としてのChatGPTの限界として, 不安定なスコアリングや1つのクエリで複数の翻訳が提供される場合のバイアスなどを見出した。本研究の目的は,ChatGPTの翻訳品質を適切に評価する上で,テキスト内学習のためのプロンプトを設計する上で,様々なトリックを提供することである。本報告は,メトリクスの精度と信頼性を両立させることで,LSMによる翻訳評価の分野を推し進めることに新たな光を当てることが期待できる。このプロジェクトは \url{https://github.com/Coldmist-Lu/ErrorAnalysis_Prompt} で見ることができる。 Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks such as machine translation, question answering, text summarization, and natural language understanding. Recent research has shown that utilizing ChatGPT for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level but performs poorly at the segment level. To further improve the performance of LLMs on MT quality assessment, we conducted an investigation into several prompting methods. Our results indicate that by combining Chain-of-Thoughts and Error Analysis, a new prompting method called \textbf{\texttt{Error Analysis Prompting}}, LLMs like ChatGPT can \textit{generate human-like MT evaluations at both the system and segment level}. Additionally, we discovered some limitations of ChatGPT as an MT evaluator, such as unstable scoring and biases when provided with multiple translations in a single query. Our findings aim to provide a preliminary experience for appropriately evaluating translation quality on ChatGPT while offering a variety of tricks in designing prompts for in-context learning. We anticipate that this report will shed new light on advancing the field of translation evaluation with LLMs by enhancing both the accuracy and reliability of metrics. The project can be found in \url{https://github.com/Coldmist-Lu/ErrorAnalysis_Prompt}.	翻訳日:2023-03-27 15:35:29 公開日:2023-03-24
# marl-jax: 社会一般化のための多エージェント強化学習フレームワーク marl-jax: Multi-agent Reinforcement Leaning framework for Social Generalization ( http://arxiv.org/abs/2303.13808v1 ) ライセンス: Link先を確認	Kinal Mehta, Anuj Mahajan, Pawan Kumar	(参考訳) 強化学習(RL)の最近の進歩は、多くのエキサイティングな応用につながっている。これらの進歩は、アルゴリズムと工学の両方の改善によって推進され、RLエージェントの訓練が高速化された。エージェントの社会的一般化を訓練・評価するためのマルチエージェント強化学習ソフトウェアであるmarl-jaxを提案する。このパッケージは、マルチエージェント環境でエージェントの集団を訓練し、さまざまなバックグラウンドエージェントに一般化する能力を評価するために設計されている。 DeepMindのJAXエコシステム~\cite{deepmind2020jax}の上に構築されており、DeepMindが開発したRLエコシステムを活用している。当社のフレームワークであるmarl-jaxは,複数のエージェントと協調的かつ競争的かつ同時動作する環境で動作する。このパッケージは、人口を訓練し、その一般化能力を評価するための直感的でユーザフレンドリなコマンドラインインターフェースを提供する。結論として、Marl-jaxは、MARLの文脈における社会的一般化の探求に興味を持つ研究者に貴重なリソースを提供する。 marl-jaxのオープンソースコードは以下の通りである。 \href{https://github.com/kinalmehta/marl-jax}{https://github.com/kinalmehta/marl-jax} Recent advances in Reinforcement Learning (RL) have led to many exciting applications. These advancements have been driven by improvements in both algorithms and engineering, which have resulted in faster training of RL agents. We present marl-jax, a multi-agent reinforcement learning software package for training and evaluating social generalization of the agents. The package is designed for training a population of agents in multi-agent environments and evaluating their ability to generalize to diverse background agents. It is built on top of DeepMind's JAX ecosystem~\cite{deepmind2020jax} and leverages the RL ecosystem developed by DeepMind. Our framework marl-jax is capable of working in cooperative and competitive, simultaneous-acting environments with multiple agents. The package offers an intuitive and user-friendly command-line interface for training a population and evaluating its generalization capabilities. In conclusion, marl-jax provides a valuable resource for researchers interested in exploring social generalization in the context of MARL. The open-source code for marl-jax is available at: \href{https://github.com/kinalmehta/marl-jax}{https://github.com/kinalmehta/marl-jax}	翻訳日:2023-03-27 15:34:58 公開日:2023-03-24
# PFT-SSR:ステレオ画像超解像用パララックス核融合変圧器 PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution ( http://arxiv.org/abs/2303.13807v1 ) ライセンス: Link先を確認	Hansheng Guo, Juncheng Li, Guangwei Gao, Zhi Li, Tieyong Zeng	(参考訳) ステレオ画像超解像は、双眼鏡システムが提供する補助情報を活用することにより、画像超解像の性能を高めることを目的としている。従来の手法は有望な結果を得たが、クロスビューやイントラビューの情報を十分に活用しなかった。両眼画像の可能性をさらに解き放つために,Parallax Fusion Transformer (PFT) と呼ばれるトランスフォーマーベースの新しいパララックス融合モジュールを提案する。 PFTは、クロスビュー情報を利用するためにクロスビューフュージョントランス(CVFT)と、イントラビュー機能改善のためのイントラビューリファインメントトランス(IVRT)を使用している。一方,機能抽出とSR再構成のバックボーンとしてSwin Transformerを採用し,PFT-SSRと呼ばれる純粋なTransformerアーキテクチャを構築した。大規模な実験とアブレーション研究により、PFT-SSRは競争的な結果を得ることができ、ほとんどのSOTA法より優れていることが示されている。ソースコードはhttps://github.com/MIVRC/PFT-PyTorchで入手できる。 Stereo image super-resolution aims to boost the performance of image super-resolution by exploiting the supplementary information provided by binocular systems. Although previous methods have achieved promising results, they did not fully utilize the information of cross-view and intra-view. To further unleash the potential of binocular images, in this letter, we propose a novel Transformerbased parallax fusion module called Parallax Fusion Transformer (PFT). PFT employs a Cross-view Fusion Transformer (CVFT) to utilize cross-view information and an Intra-view Refinement Transformer (IVRT) for intra-view feature refinement. Meanwhile, we adopted the Swin Transformer as the backbone for feature extraction and SR reconstruction to form a pure Transformer architecture called PFT-SSR. Extensive experiments and ablation studies show that PFT-SSR achieves competitive results and outperforms most SOTA methods. Source code is available at https://github.com/MIVRC/PFT-PyTorch.	翻訳日:2023-03-27 15:34:42 公開日:2023-03-24
# ガラスを通して見る:透明な容器の中の物体のニューラルな3D再構成 Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container ( http://arxiv.org/abs/2303.13805v1 ) ライセンス: Link先を確認	Jinguang Tong, Sundaram Muthu, Fahira Afzal Maken, Chuong Nguyen, Hongdong Li	(参考訳) 本稿では,透明な囲いの中に閉じ込められた物体の3次元形状を復元する新たな問題を定義する。また,この課題を解決するための新しい手法を提案する。透明な囲いは、空気やガラスなどの異なる伝搬媒体間の界面における複数の光反射と屈折の課題をもたらす。これらの多重反射と屈折は深刻な画像歪みを引き起こし、単一の視点仮定を無効にする。したがって、これらの物体の3次元形状は、運動による伝統的な構造や現代の神経再構成法といった既存の手法で確実に再構築することはできない。この問題を、透明な囲いの内部と外側の2つの異なる部分空間として明示的にモデル化することで解決する。我々は,内部部分空間の形状と外観を暗黙的に表現する既存のニューラルリコンストラクション法(neus)を用いる。複雑な光相互作用を説明するため,ボリュームレンダリングとレイトレーシングを組み合わせたハイブリッドレンダリング戦略を開発した。次に,実画像とハイブリッド画像の違いを最小化することで,モデルの基本形状と外観を復元する。本手法を合成データと実データの両方で評価する。実験の結果,本手法は最先端(SOTA)法よりも優れていた。コードとデータはhttps://github.com/hirotong/ReNeuSで入手できる。 In this paper, we define a new problem of recovering the 3D geometry of an object confined in a transparent enclosure. We also propose a novel method for solving this challenging problem. Transparent enclosures pose challenges of multiple light reflections and refractions at the interface between different propagation media e.g. air or glass. These multiple reflections and refractions cause serious image distortions which invalidate the single viewpoint assumption. Hence the 3D geometry of such objects cannot be reliably reconstructed using existing methods, such as traditional structure from motion or modern neural reconstruction methods. We solve this problem by explicitly modeling the scene as two distinct sub-spaces, inside and outside the transparent enclosure. We use an existing neural reconstruction method (NeuS) that implicitly represents the geometry and appearance of the inner subspace. In order to account for complex light interactions, we develop a hybrid rendering strategy that combines volume rendering with ray tracing. We then recover the underlying geometry and appearance of the model by minimizing the difference between the real and hybrid rendered images. We evaluate our method on both synthetic and real data. Experiment results show that our method outperforms the state-of-the-art (SOTA) methods. Codes and data will be available at https://github.com/hirotong/ReNeuS	翻訳日:2023-03-27 15:34:23 公開日:2023-03-24
# 分布シフトによる異常検出 Anomaly Detection under Distribution Shift ( http://arxiv.org/abs/2303.13845v1 ) ライセンス: Link先を確認	Tri Cao, Jiawen Zhu, and Guansong Pang	(参考訳) 異常検出(AD)は、通常のトレーニングサンプルのセットからパターンを学習し、テストデータの異常サンプルを特定することを目的とした、重要な機械学習タスクである。既存のad研究のほとんどは、トレーニングデータとテストデータが同じデータ分布から引き出されると仮定しているが、テストデータは、新しい照明条件、オブジェクトのポーズ、背景の外観など、様々な自然な変化のために、多くの現実世界のアプリケーションで発生する大きな分散シフトを持つ可能性がある。本稿では,分布シフトによる異常検出の問題点を考察し,広く使用されている3つのADおよびアウト・オブ・ディストリビューション(OOD)一般化データセットの性能ベンチマークを確立する。ラベル付き異常データの欠如により,AD設定への最新のOOD一般化手法の簡単な適応が効果的に機能しないことを示す。さらに, 訓練段階と推論段階の両方において, 分布内サンプルとood正規サンプルの分布ギャップを最小化し, 多様な分布シフトに対する新しいロバストad手法を導入する。 3つのデータセットの広範な実験結果から,本手法は分布シフトの異なるデータに対して,現状のAD法とOOD一般化法を大幅に上回るが,分布内データの検出精度は維持されている。 Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data. Most existing AD studies assume that the training and test data are drawn from the same data distribution, but the test data can have large distribution shifts arising in many real-world applications due to different natural variations such as new lighting conditions, object poses, or background appearances, rendering existing AD methods ineffective in such cases. In this paper, we consider the problem of anomaly detection under distribution shift and establish performance benchmarks on three widely-used AD and out-of-distribution (OOD) generalization datasets. We demonstrate that simple adaptation of state-of-the-art OOD generalization methods to AD settings fails to work effectively due to the lack of labeled anomaly data. We further introduce a novel robust AD approach to diverse distribution shifts by minimizing the distribution gap between in-distribution and OOD normal samples in both the training and inference stages in an unsupervised way. Our extensive empirical results on the three datasets show that our approach substantially outperforms state-of-the-art AD methods and OOD generalization methods on data with various distribution shifts, while maintaining the detection accuracy on in-distribution data.	翻訳日:2023-03-27 15:28:20 公開日:2023-03-24
# CompoNeRF:編集可能な3Dシーンレイアウトによるテキスト誘導多目的合成型NeRF CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout ( http://arxiv.org/abs/2303.13843v1 ) ライセンス: Link先を確認	Yiqi Lin, Haotian Bai, Sijia Li, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang	(参考訳) 最近の研究により、ニューラルネットワークの放射場(nerfs)と事前学習された拡散モデルの組み合わせは、テキストから3dを生成する上で大きな可能性を秘めていることが示されている。テキストと画像の拡散モデルは本質的に制約がなく、オブジェクトのセマンティクスと特定の3D構造を正確に関連付ける能力が低い。この問題に対処するため,我々はCompoNeRFと呼ばれる新しいフレームワークを提案し,編集可能な3Dシーンレイアウトを明示的に組み込んで,単一のオブジェクト(ローカル)と全シーン(グローバル)レベルで効果的なガイダンスを提供する。まず、複数オブジェクトのテキストを、オブジェクト固有の3Dボックス座標とテキストプロンプトに関連付けられた複数のローカルNeRFを含む編集可能な3Dシーンレイアウトとして解釈する。次に,局所的NeRFから合成潜時特徴を校正するグローバルMPPを導入し,異なる局所的NeRF間のビュー一貫性を驚くほど向上させる。最後に,グローバルレベルとローカルレベルに関するテキストガイダンスを,対応するビューを通じて適用することで,ガイダンスあいまいさを回避できる。このようにして、当社のCompoNeRFは、3Dレイアウトやテキストプロンプトを操作することで、訓練されたローカルNeRFのフレキシブルなシーン編集と再構成を可能にします。オープンソースの安定拡散モデルを利用することで,編集可能な3dシーンレイアウトを通じて,テキスト誘導型多目的合成の潜在的方向を開放しつつ,忠実かつ編集可能なテキストから3dへの結果を生成することができる。 Recent research endeavors have shown that combining neural radiance fields (NeRFs) with pre-trained diffusion models holds great potential for text-to-3D generation.However, a hurdle is that they often encounter guidance collapse when rendering complex scenes from multi-object texts. Because the text-to-image diffusion models are inherently unconstrained, making them less competent to accurately associate object semantics with specific 3D structures. To address this issue, we propose a novel framework, dubbed CompoNeRF, that explicitly incorporates an editable 3D scene layout to provide effective guidance at the single object (i.e., local) and whole scene (i.e., global) levels. Firstly, we interpret the multi-object text as an editable 3D scene layout containing multiple local NeRFs associated with the object-specific 3D box coordinates and text prompt, which can be easily collected from users. Then, we introduce a global MLP to calibrate the compositional latent features from local NeRFs, which surprisingly improves the view consistency across different local NeRFs. Lastly, we apply the text guidance on global and local levels through their corresponding views to avoid guidance ambiguity. This way, our CompoNeRF allows for flexible scene editing and re-composition of trained local NeRFs into a new scene by manipulating the 3D layout or text prompt. Leveraging the open-source Stable Diffusion model, our CompoNeRF can generate faithful and editable text-to-3D results while opening a potential direction for text-guided multi-object composition via the editable 3D scene layout.	翻訳日:2023-03-27 15:27:55 公開日:2023-03-24
# fishdreamer: 画像の切り出しとセグメンテーションの統合による魚眼意味完成に向けて FishDreamer: Towards Fisheye Semantic Completion via Unified Image Outpainting and Segmentation ( http://arxiv.org/abs/2303.13842v1 ) ライセンス: Link先を確認	Hao Shi, Yu Li, Kailun Yang, Jiaming Zhang, Kunyu Peng, Alina Roitberg, Yaozu Ye, Huajian Ni, Kaiwei Wang, Rainer Stiefelhagen	(参考訳) 本稿では,魚眼画像の密接なテクスチャ,構造,意味を,センサ・フィールド・オブ・ビュー(FoV)を超えて推定する,魚眼セマンティック・コンプリート(FSC)の新たな課題を提起する。魚眼カメラは通常のピンホールカメラよりもfovが大きいが、そのユニークな特殊な撮像モデルは、画像平面の端にある盲目領域を自然に導く。これは、セマンティックセグメンテーションのような重要な知覚タスクがブラインドゾーン内で非常に困難になるため、安全クリティカルなアプリケーションには最適である。以前の作品では、アウトFoVのアウトペイントとインFoVのセグメンテーションは別々に検討されていた。しかし、これらの2つのタスクは実際には密結合である。魚眼画像とシーン意味論の密接な絡み合いを共同評価するために,新たな極性認識クロスアテンションモジュール (pca) により拡張されたvitsを活用して,異なる極性分布を考慮しつつ,密接な文脈と意味的一貫性のあるコンテンツ生成をガイドする新しい魚眼モデルを提案する。新たなタスクとアーキテクチャの貢献に加えて,Cityscapes-BFとKITTI360-BFデータセットを導出して,この新しいトラックのトレーニングと評価を容易にする。実験により,提案手法が各課題を単独で解決する手法を上回り,魚眼意味補完の代替手法を上回ったことを示す。コードとデータセットはhttps://github.com/masterhow/fishdreamerで入手できる。 This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applications since important perception tasks, such as semantic segmentation, become very challenging within the blind zone. Previous works considered the out-FoV outpainting and in-FoV segmentation separately. However, we observe that these two tasks are actually closely coupled. To jointly estimate the tightly intertwined complete fisheye image and scene semantics, we introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation while considering different polar distributions. In addition to the contribution of the novel task and architecture, we also derive Cityscapes-BF and KITTI360-BF datasets to facilitate training and evaluation of this new track. Our experiments demonstrate that the proposed FishDreamer outperforms methods solving each task in isolation and surpasses alternative approaches on the Fisheye Semantic Completion. Code and datasets will be available at https://github.com/MasterHow/FishDreamer.	翻訳日:2023-03-27 15:27:25 公開日:2023-03-24
# HRDoc:文書構造の階層的再構築に向けたデータセットとベースライン手法 HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures ( http://arxiv.org/abs/2303.13839v1 ) ライセンス: Link先を確認	Jiefeng Ma, Jun Du, Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Huihui Zhu, Cong Liu	(参考訳) 文書構造再構成の問題は、デジタルまたはスキャンされた文書を対応する意味構造に変換することである。既存の作品の多くは、各要素の境界を単一の文書ページに分割することに集中しており、複数ページの文書における意味構造の再構築を無視している。本稿では,NLPおよびCVフィールドに適した新しいタスクとして,文書構造の階層的再構築を提案する。新しいタスクでシステム性能をよりよく評価するために、2500のマルチページ文書と200万近いセマンティックユニットからなるHRDocという大規模なデータセットを構築した。 HRDocのすべてのドキュメントは、ルールベースの抽出器と人間のアノテーションから得られるカテゴリや関係を含む行レベルのアノテーションを持っている。さらに,この問題に対処するためのエンコーダデコーダに基づく階層型文書構造解析システム (DSPS) を提案する。マルチモーダル双方向エンコーダとソフトマスク操作を備えた構造対応GRUデコーダを採用することにより,DSPSモデルはベースライン法を大きなマージンで上回る。すべてのスクリプトとデータセットはhttps://github.com/jfma-USTC/HRDocで公開される。 The problem of document structure reconstruction refers to converting digital or scanned documents into corresponding semantic structures. Most existing works mainly focus on splitting the boundary of each element in a single document page, neglecting the reconstruction of semantic structure in multi-page documents. This paper introduces hierarchical reconstruction of document structures as a novel task suitable for NLP and CV fields. To better evaluate the system performance on the new task, we built a large-scale dataset named HRDoc, which consists of 2,500 multi-page documents with nearly 2 million semantic units. Every document in HRDoc has line-level annotations including categories and relations obtained from rule-based extractors and human annotators. Moreover, we proposed an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem. By adopting a multi-modal bidirectional encoder and a structure-aware GRU decoder with soft-mask operation, the DSPS model surpass the baseline method by a large margin. All scripts and datasets will be made publicly available at https://github.com/jfma-USTC/HRDoc.	翻訳日:2023-03-27 15:26:48 公開日:2023-03-24
# ドライバキャラクタの編集:対話型交通シミュレーションのための社会的制御可能な行動生成 Editing Driver Character: Socially-Controllable Behavior Generation for Interactive Traffic Simulation ( http://arxiv.org/abs/2303.13830v1 ) ライセンス: Link先を確認	Wei-Jer Chang, Chen Tang, Chenran Li, Yeping Hu, Masayoshi Tomizuka, and Wei Zhan	(参考訳) 交通シミュレーションは、自動運転計画システムの評価と改善において重要な役割を果たす。公道に配備された後、自動運転車は異なる社会的好み(利己的あるいは礼儀正しい人間ドライバーなど)を持つ人間の道路参加者と対話する必要がある。自動運転車が、異なる対話的な交通シナリオにおいて安全かつ効率的な操作を行うことを保証するため、シミュレーション環境で異なる社会的特性を持つ反応性エージェントに対して、自動運転車を評価することができる必要がある。本研究では,実世界の運転データから学習することで,現実的かつ人間ライクな軌道生成を保証しつつ,生成した軌道の便宜レベルを指定できる社会的制御可能な行動生成(SCBG)モデルを提案する。具体的には,実世界の運転データから学習した限界的および条件的行動予測モデルを活用して,運転行動の資質を定量化するための新規かつ微分可能な尺度を定式化する。提案手法により,実世界の運転データからトラジェクトリのオペレーショナルレベルを自動ラベルし,入力されたオペレーショナル値に基づいてSCBGモデル生成トラジェクトリを簡便に訓練することができる。 Waymo Open Motion Dataset (WOMD) 上でSCBGモデルを検討した結果,SCBGモデルを制御して,所望の礼儀正しい運転行動を生成することができた。興味深いことに、SCBGモデルでは、シナリオによって異なる動きパターンを識別できることが判明した。 Traffic simulation plays a crucial role in evaluating and improving autonomous driving planning systems. After being deployed on public roads, autonomous vehicles need to interact with human road participants with different social preferences (e.g., selfish or courteous human drivers). To ensure that autonomous vehicles take safe and efficient maneuvers in different interactive traffic scenarios, we should be able to evaluate autonomous vehicles against reactive agents with different social characteristics in the simulation environment. We propose a socially-controllable behavior generation (SCBG) model for this purpose, which allows the users to specify the level of courtesy of the generated trajectory while ensuring realistic and human-like trajectory generation through learning from real-world driving data. Specifically, we define a novel and differentiable measure to quantify the level of courtesy of driving behavior, leveraging marginal and conditional behavior prediction models trained from real-world driving data. The proposed courtesy measure allows us to auto-label the courtesy levels of trajectories from real-world driving data and conveniently train an SCBG model generating trajectories based on the input courtesy values. We examined the SCBG model on the Waymo Open Motion Dataset (WOMD) and showed that we were able to control the SCBG model to generate realistic driving behaviors with desired courtesy levels. Interestingly, we found that the SCBG model was able to identify different motion patterns of courteous behaviors according to the scenarios.	翻訳日:2023-03-27 15:26:31 公開日:2023-03-24
# 小型変圧器を用いた複合型ウェーハ欠陥パターン認識 Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers ( http://arxiv.org/abs/2303.13827v1 ) ライセンス: Link先を確認	Nitish Shukla	(参考訳) ウェハーの製造は何千ものステップを伴う複雑な作業です。ウェハマップの欠陥パターン認識(DPR)は,問題の根本原因を見つけ,ウェハファウントリーの収量を改善するために重要である。混合型DPRは, 空間的特徴の変化, 欠陥の不確かさ, 存在する欠陥の数により, 単型DPRよりも複雑である。欠陥数と欠陥の種類を正確に予測するために, コンパクトな変形可能な畳み込み変圧器 (DC Transformer) を提案する。特に、DC Transformerは、学習可能な変形可能なカーネルとグローバル機能へのマルチヘッドによる、ウェハマップに存在するグローバル機能に焦点を当てている。提案手法は,ウェハマップと欠陥の関係を簡潔にモデル化する。 DC Transformerは38の欠陥パターンを含む実際のデータセットで評価される。実験結果から,DCトランスフォーマーは単型と混合型の両方の欠陥を認識するのに極めて優れた性能を示した。提案手法はモデルの現在の状態をかなりのマージンで上回る Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately predict the number of defects as well as the types of defects, we propose a novel compact deformable convolutional transformer (DC Transformer). Specifically, DC Transformer focuses on the global features present in the wafer map by virtue of learnable deformable kernels and multi-head attention to the global features. The proposed method succinctly models the internal relationship between the wafer maps and the defects. DC Transformer is evaluated on a real dataset containing 38 defect patterns. Experimental results show that DC Transformer performs exceptionally well in recognizing both single and mixed-type defects. The proposed method outperforms the current state of the models by a considerable margin	翻訳日:2023-03-27 15:26:05 公開日:2023-03-24
# ゼロショット量子化におけるハードサンプルの重要事項 Hard Sample Matters a Lot in Zero-Shot Quantization ( http://arxiv.org/abs/2303.13826v1 ) ライセンス: Link先を確認	Huantong Li, Xiangmiao Wu, Fanbing Lv, Daihai Liao, Thomas H. Li, Yonggang Zhang, Bo Han, Mingkui Tan	(参考訳) ゼロショット量子化(ZSQ)は、完全精度モデルのトレーニング用データがアクセスできない場合に、ディープニューラルネットワークの圧縮と加速を約束する。 ZSQでは、合成サンプルを用いてネットワーク量子化を行うため、量子化モデルの性能は合成サンプルの品質に大きく依存する。しかし, 既存のZSQ法で構築した合成試料は, モデルにより容易に装着できることが判明した。したがって, これらの手法により得られた定量化モデルは, 硬質試料の顕著な性能劣化に悩まされる。この問題に対処するため,我々はHArdサンプル合成訓練(HAST)を提案する。具体的には、HASTはサンプルを合成する際に硬いサンプルに注意を払い、量子化されたモデルを訓練する際に合成サンプルが適合しにくくする。 HASTは、これらの2つのモデルによって抽出された特徴の類似性を保証するために、完全精度と量子化モデルによって抽出された特徴を整列する。大規模な実験により、HASTは既存のZSQ手法を著しく上回り、実データで定量化されるモデルに匹敵する性能を達成することが示された。 Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural networks when the data for training full-precision models are inaccessible. In ZSQ, network quantization is performed using synthetic samples, thus, the performance of quantized models depends heavily on the quality of synthetic samples. Nonetheless, we find that the synthetic samples constructed in existing ZSQ methods can be easily fitted by models. Accordingly, quantized models obtained by these methods suffer from significant performance degradation on hard samples. To address this issue, we propose HArd sample Synthesizing and Training (HAST). Specifically, HAST pays more attention to hard samples when synthesizing samples and makes synthetic samples hard to fit when training quantized models. HAST aligns features extracted by full-precision and quantized models to ensure the similarity between features extracted by these two models. Extensive experiments show that HAST significantly outperforms existing ZSQ methods, achieving performance comparable to models that are quantized with real data.	翻訳日:2023-03-27 15:25:47 公開日:2023-03-24
# HandNeRF: Animatable Interacting Handsのための神経放射場 HandNeRF: Neural Radiance Fields for Animatable Interacting Hands ( http://arxiv.org/abs/2303.13825v1 ) ライセンス: Link先を確認	Zhiyang Guo, Wengang Zhou, Min Wang, Li Li, Houqiang Li	(参考訳) そこで本研究では,ニューラル・ラジアンス・フィールド(neural radiance fields, nerf)を用いて手の動きを再現し,任意の視点からのジェスチャ・アニメーションのためのフォトリアリスティックな画像や映像のレンダリングを可能にする新しい枠組みを提案する。シングルハンドのマルチビュー画像やインタラクションハンドが与えられた場合、オフザシェルフスケルトン推定器がまず手ポーズのパラメータ化に使用される。そこで我々は,ポーズ駆動変形場を設計し,それらの異なるポーズから共通な標準空間への対応性を確立する。このような統一されたモデリングは、両手でほとんど観測されない領域における幾何学とテクスチャの手がかりを効率的に補完する。また, 咬合認識密度学習のためのガイダンスとして, ポーズプリエントを活用し, 擬似深度マップを生成する。さらに,色最適化のためのクロスドメインアライメントを実現するために,神経特徴蒸留法を提案する。我々は,提案したHandNeRFのメリットを検証するための広範な実験を行い,大規模なInterHand2.6Mデータセット上で定性的かつ定量的に,一連の最先端の結果を報告する。 We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands, enabling the rendering of photo-realistic images and videos for gesture animation from arbitrary views. Given multi-view images of a single hand or interacting hands, an off-the-shelf skeleton estimator is first employed to parameterize the hand poses. Then we design a pose-driven deformation field to establish correspondence from those different poses to a shared canonical space, where a pose-disentangled NeRF for one hand is optimized. Such unified modeling efficiently complements the geometry and texture cues in rarely-observed areas for both hands. Meanwhile, we further leverage the pose priors to generate pseudo depth maps as guidance for occlusion-aware density learning. Moreover, a neural feature distillation method is proposed to achieve cross-domain alignment for color optimization. We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results both qualitatively and quantitatively on the large-scale InterHand2.6M dataset.	翻訳日:2023-03-27 15:25:29 公開日:2023-03-24
# $k$NN Prompting: キャリブレーションのない近接的推論によるコンテキスト学習 $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference ( http://arxiv.org/abs/2303.13824v1 ) ライセンス: Link先を確認	Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, Yongdong Zhang	(参考訳) インコンテキスト・ラーニング (ICL) は、インコンテキスト・デモの即時完了条件として目標タスクを定式化し、LLMの利用が主流となっている。本稿では,コンテキスト長制限のためトレーニングデータではスケールアップできないという,この典型的な使用方法の前提を最初に明らかにする。また、既存の研究によれば、iclも様々なバイアスを負い、微妙な校正処理を必要とすることが示されている。両課題に対処するために,まず LLM を分散表現のトレーニングデータでクエリし,次に近くの隣人を参照してテストインスタンスを予測する,シンプルで効果的なソリューションである $k$NN Prompting を提唱する。我々は、その2倍の優位性を示す包括的な実験を行う。 1) Calibration-Free: $k$NN Promptingは、LSM出力分布とタスク固有のラベル空間を直接整列するのではなく、テストとトレーニングインスタンスを整列するためにそのような分布を利用する。数ショットのシナリオでは、最先端のキャリブレーションベースの手法よりも大幅に優れています。 2)Beyond-Context:$k$NN Promptingは、可能な限り多くのトレーニングデータを使って、より効果的にスケールアップでき、継続的な改善をもたらす。スケーリングの傾向は、2ショットから1024ショットまでの10桁、および0.8Bから30Bまでの様々なLLMスケールにまたがる。データスケーリングをモデルスケーリングにブリッジし、LLMデプロイメントの勾配のないパラダイムに新たな可能性をもたらす。コードは公開されている。 In-Context Learning (ICL), which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing utilization of LLMs. In this paper, we first disclose an actual predicament for this typical usage that it can not scale up with training data due to context length restriction. Besides, existing works have shown that ICL also suffers from various biases and requires delicate calibration treatment. To address both challenges, we advocate a simple and effective solution, $k$NN Prompting, which first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors. We conduct comprehensive experiments to demonstrate its two-fold superiority: 1) Calibration-Free: $k$NN Prompting does not directly align LLM output distribution with task-specific label space, instead leverages such distribution to align test and training instances. It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario. 2) Beyond-Context: $k$NN Prompting can further scale up effectively with as many training data as are available, continually bringing substantial improvements. The scaling trend holds across 10 orders of magnitude ranging from 2 shots to 1024 shots as well as different LLMs scales ranging from 0.8B to 30B. It successfully bridges data scaling into model scaling, and brings new potentials for the gradient-free paradigm of LLM deployment. Code is publicly available.	翻訳日:2023-03-27 15:25:09 公開日:2023-03-24
# マグノン量子系におけるマグノン遮断 Magnon blockade in magnon-qubit systems ( http://arxiv.org/abs/2303.13823v1 ) ライセンス: Link先を確認	Zhu-yao Jin and Jun Jing	(参考訳) マグノンと超伝導トランスモン量子ビットの直接相互作用によって構築されるハイブリッドシステムを考える。マグノンを弱く駆動し、クォービットを探索することで、単一のマグノンのレベルでの量子操作に対する収入を賄う高次マグノン遮断を実現することができる。 magnon-blockadeの提案は、magnon-qubit のトランスバーサル結合強度が、qubit と probing field のデチューニング、あるいは magnon と driving field のデチューニングと等価である場合に最適化される。この条件下では、同時間二階相関関数 $g^{(2)}(0)$ の解析式は、検出強度が駆動強度の約3倍であるときに最小化することができる。さらに、適切な駆動強度とシステム崩壊速度を選択することにより、マグノン遮断の度合いをさらに高めることができる。実験と関係のあるパラメータでは、相関関数は、キャビティ光力学系における光子遮断に対して約2桁低い$g^{(2)}(0)\sim10^{-7}$となる。また、マグノンとクビットの熱雑音による$g^{(2)}(0)$の効果と、2つの成分間の超長手相互作用についても論じる。封鎖に最適化された条件は、いまだに両方の非理想的な状況で保たれている。 We consider a hybrid system that is established by the direct interaction between a magnon and a superconducting transmon qubit. Through weakly driving the magnon and probing the qubit, a high-degree magnon blockade can be realized, which paves a revenue toward quantum manipulation at the level of a single magnon. Our magnon-blockade proposal is optimized when the magnon-qubit transversal coupling strength is equivalent to the detuning of the qubit and the probing field or that of the magnon and the driving field. Under this condition, the analytical expression of the equal-time second-order correlation function $g^{(2)}(0)$ can be minimized when the probing intensity is about three times of the driving intensity. Moreover, the degree of the magnon blockade could be further enhanced by choosing proper driving intensity and system decay rate. With experimental-relevant parameters, the correlation function attains $g^{(2)}(0)\sim10^{-7}$, about two orders lower than that for the photon blockade in cavity optomechanical systems. Also we discuss the effects on $g^{(2)}(0)$ from the thermal noise on magnon and qubit and the extra longitudinal interaction between the two components. Our optimized conditions for blockade still hold in both nonideal situations.	翻訳日:2023-03-27 15:24:43 公開日:2023-03-24
# クロスアテンショントランスを用いた医用画像セグメンテーション Few Shot Medical Image Segmentation with Cross Attention Transformer ( http://arxiv.org/abs/2303.13867v1 ) ライセンス: Link先を確認	Yi Lin, Yufan Chen, Kwang-Ting Cheng, Hao Chen	(参考訳) 近年,医用画像分割が大きな進歩を遂げている。ディープラーニングベースのメソッドは、手動アノテーションで大量のデータを必要とするデータ格納技術として認識される。しかし、手動アノテーションは、ドメイン固有の専門知識を必要とする医療画像解析の分野では高価である。この課題に対処するために、少数のショットラーニングでは、少数の例から新しいクラスを学ぶことができる。本研究では,クロスマスク型アテンショントランスフォーマーをベースとした,数発の医用画像セグメンテーションのための新しいフレームワークCAT-Netを提案する。提案するネットワークは,支援画像と問合せ画像との相関関係をマイニングし,有用なフォアグラウンド情報のみに限定し,サポートプロトタイプと問合せ機能の両方の表現能力を高める。さらに,クエリイメージのセグメンテーションを反復的に洗練する反復的精錬フレームワークを設計し,サポート機能を促進する。提案手法を,Abd-CT,Abd-MRI,Card-MRIの3つの公開データセットで検証した。実験の結果,最先端手法と比較して優れた性能を示し,各成分の有効性を示した。受け入れ次第、私たちのメソッドのソースコードをリリースします。 Medical image segmentation has made significant progress in recent years. Deep learning-based methods are recognized as data-hungry techniques, requiring large amounts of data with manual annotations. However, manual annotation is expensive in the field of medical image analysis, which requires domain-specific expertise. To address this challenge, few-shot learning has the potential to learn new classes from only a few examples. In this work, we propose a novel framework for few-shot medical image segmentation, termed CAT-Net, based on cross masked attention Transformer. Our proposed network mines the correlations between the support image and query image, limiting them to focus only on useful foreground information and boosting the representation capacity of both the support prototype and query features. We further design an iterative refinement framework that refines the query image segmentation iteratively and promotes the support feature in turn. We validated the proposed method on three public datasets: Abd-CT, Abd-MRI, and Card-MRI. Experimental results demonstrate the superior performance of our method compared to state-of-the-art methods and the effectiveness of each component. we will release the source codes of our method upon acceptance.	翻訳日:2023-03-27 15:18:45 公開日:2023-03-24
# ヘルツレート大都市圏量子テレポーテーション Hertz-rate metropolitan quantum teleportation ( http://arxiv.org/abs/2303.13866v1 ) ライセンス: Link先を確認	Si Shen, Chenzhi Yuan, Zichang Zhang, Hao Yu, Ruiming Zhang, Chuanrong Yang, Hao Li, Zhen Wang, You Wang, Guangwei Deng, Haizhi Song, Lixing You, Yunru Fan, Guangcan Guo, Qiang Zhou	(参考訳) 量子テレポーテーションは、未知の量子状態を遠くの量子ノード間で転送することができる。量子テレポーテーションの完全なポテンシャルを推し進めるためには、量子状態は長距離の高速で忠実に転送されなければならない。最近の目覚ましい進歩にもかかわらず、大都市圏のファイバネットワークにまたがる高速量子テレポーテーションシステムは極めて望まれている。ここでは、64kmのファイバーチャネル上で、7.1$\pm$0.4Hzの速度で独立光子によって運ばれる量子状態を転送する量子テレポーテーションシステムを示す。平均的な単光子忠実度は 90.6$\pm$2.6% であり、古典体制では最大忠実度 2/3 を超える。我々の結果は量子ネットワークにとって重要なマイルストーンであり、将来の量子インターネットへの量子絡み合いベースの情報応用への扉を開く。 Quantum teleportation can transfer an unknown quantum state between distant quantum nodes, which holds great promise in enabling large-scale quantum networks. To advance the full potential of quantum teleportation, quantum states must be faithfully transferred at a high rate over long distance. Despite recent impressive advances, a high-rate quantum teleportation system across metropolitan fiber networks is extremely desired. Here, we demonstrate a quantum teleportation system which transfers quantum states carried by independent photons at a rate of 7.1$\pm$0.4 Hz over 64-km-long fiber channel. An average single-photon fidelity of $\geqslant$ 90.6$\pm$2.6% is achieved, which exceeds the maximum fidelity of 2/3 in classical regime. Our result marks an important milestone towards quantum networks and opens the door to exploring quantum entanglement based informatic applications for the future quantum internet.	翻訳日:2023-03-27 15:18:25 公開日:2023-03-24
# MagicEye:視覚障害者の自立生活を目指す知的ウェアラブル MagicEye: An Intelligent Wearable Towards Independent Living of Visually Impaired ( http://arxiv.org/abs/2303.13863v1 ) ライセンス: Link先を確認	Sibi C. Sethuraman, Gaurav R. Tadkapally, Saraju P. Mohanty, Gautam Galada and Anitha Subramanian	(参考訳) 視覚障害を持つ個人は、日常生活で多くの困難に直面している。視覚障害は、人の働き、ナビゲート、独立性を維持する能力を著しく損なうことがある。これは教育の限界、事故のリスクの増大、その他多くの問題を引き起こす可能性がある。そこで我々は,視覚障害者を支援する最先端のウェアラブルデバイスであるmagiceyeを提案する。 MagicEyeはカスタムトレーニングされたCNNベースのオブジェクト検出モデルを採用しており、日常生活で頻繁に遭遇する広範囲の屋内および屋外オブジェクトを認識することができる。合計35のクラスで、magiceyeが使用するニューラルネットワークは、オブジェクト検出において高いレベルの効率と精度を達成するために特別に設計されている。また、顔認識と通貨識別モジュールを備えており、視覚障害者に貴重な支援を提供する。さらにmagiceyeはナビゲーション用のgpsセンサーも備えており、ユーザーは簡単に動き回れるほか、物理的に接触せずに近くの物体を検知できる近接センサーも備えている。 MagicEyeは革新的で高度なウェアラブルデバイスで、視覚障害者が直面する多くの課題に対処するために設計されている。最先端の物体検出とナビゲーション機能を備えており、視覚障害者のニーズに合わせて調整されており、視覚障害者を支援する最も有望なソリューションの1つである。 Individuals with visual impairments often face a multitude of challenging obstacles in their daily lives. Vision impairment can severely impair a person's ability to work, navigate, and retain independence. This can result in educational limits, a higher risk of accidents, and a plethora of other issues. To address these challenges, we present MagicEye, a state-of-the-art intelligent wearable device designed to assist visually impaired individuals. MagicEye employs a custom-trained CNN-based object detection model, capable of recognizing a wide range of indoor and outdoor objects frequently encountered in daily life. With a total of 35 classes, the neural network employed by MagicEye has been specifically designed to achieve high levels of efficiency and precision in object detection. The device is also equipped with facial recognition and currency identification modules, providing invaluable assistance to the visually impaired. In addition, MagicEye features a GPS sensor for navigation, allowing users to move about with ease, as well as a proximity sensor for detecting nearby objects without physical contact. In summary, MagicEye is an innovative and highly advanced wearable device that has been designed to address the many challenges faced by individuals with visual impairments. It is equipped with state-of-the-art object detection and navigation capabilities that are tailored to the needs of the visually impaired, making it one of the most promising solutions to assist those who are struggling with visual impairments.	翻訳日:2023-03-27 15:18:08 公開日:2023-03-24
# クラスインクリメンタル学習のための2段階グラフネットワーク Two-level Graph Network for Few-Shot Class-Incremental Learning ( http://arxiv.org/abs/2303.13862v1 ) ライセンス: Link先を確認	Hao Chen, Linyan Li, Fan Lyu, Fuyuan Hu, Zhenping Xia and Fenglei Xu	(参考訳) FSCIL(Few-shot class-incremental Learning)は、古いクラスの知識を忘れずに、いくつかのデータポイントから新しい概念を継続的に学習できる機械学習アルゴリズムを設計することを目的としている。難点は、新しいクラスからの限られたデータが、重大な過度な問題を引き起こすだけでなく、破滅的な忘れの問題も悪化させることにある。しかし、既存のFSCILメソッドはサンプルレベルとクラスレベルの意味関係を無視している。この論文では,サンプルレベルとクラスレベルのグラフニューラルネットワーク(SCGN, Sample-level and Class-level Graph Neural Network)という,FSCIL用の2レベルグラフネットワークを設計した。具体的には、SCGNモデルパラメータを事前に最適化するための新しいタスクとして、仮想小ショットタスクを合成する擬似漸進学習パラダイムをSCGNで設計する。サンプルレベルのグラフネットワークは、いくつかのサンプルの関係を利用して類似のサンプルを集約し、洗練されたクラスレベルの特徴を得る。クラスレベルのグラフネットワークは、新しいクラスのプロトタイプ機能と古いクラスのセマンティックコンフリクトを軽減することを目的としている。 SCGNは2レベルグラフネットワークを構築し、各数ショットクラスの潜在意味をFSCILで効果的に表現できるようにする。 3つの人気のあるベンチマークデータセットの実験により、我々の手法はベースラインを著しく上回り、新しい最先端の成果を顕著な優位性で設定することを示した。 Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points, without forgetting knowledge of old classes. The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. However, existing FSCIL methods ignore the semantic relationships between sample-level and class-level. % Using the advantage that graph neural network (GNN) can mine rich information among few samples, In this paper, we designed a two-level graph network for FSCIL named Sample-level and Class-level Graph Neural Network (SCGN). Specifically, a pseudo incremental learning paradigm is designed in SCGN, which synthesizes virtual few-shot tasks as new tasks to optimize SCGN model parameters in advance. Sample-level graph network uses the relationship of a few samples to aggregate similar samples and obtains refined class-level features. Class-level graph network aims to mitigate the semantic conflict between prototype features of new classes and old classes. SCGN builds two-level graph networks to guarantee the latent semantic of each few-shot class can be effectively represented in FSCIL. Experiments on three popular benchmark datasets show that our method significantly outperforms the baselines and sets new state-of-the-art results with remarkable advantages.	翻訳日:2023-03-27 15:17:48 公開日:2023-03-24
# 救世主か破壊者か? Unleasing ChatGPT on the Metaverse: Savior or Destroyer? ( http://arxiv.org/abs/2303.13856v1 ) ライセンス: Link先を確認	Pengyuan Zhou	(参考訳) 人工知能(AI)技術の組み込み、特に自然言語処理(NLP)は、没入的で対話的なメタバース体験の開発にますます不可欠になりつつある。メタバースで注目を集めている人工知能ツールのひとつに、OpenAIがトレーニングした大規模な言語モデルであるChatGPTがある。この記事は、メタバースベースの教育、エンターテイメント、パーソナライゼーション、サポートにChatGPTを活用することの長所と短所を掘り下げている。この技術では動的でパーソナライズされた体験が可能だが、正当なプライバシー、バイアス、倫理的な問題もある。本稿は,ChatGPTがメタバースに与える影響と,これらの機会と障害を評価することで,より没入的で魅力的な仮想環境を効果的に構築する方法について,読者の理解を支援することを目的とする。 The incorporation of artificial intelligence (AI) technology, and in particular natural language processing (NLP), is becoming increasingly vital for the development of immersive and interactive metaverse experiences. One such artificial intelligence tool that is gaining traction in the metaverse is ChatGPT, a large language model trained by OpenAI. The article delves into the pros and cons of utilizing ChatGPT for metaverse-based education, entertainment, personalization, and support. Dynamic and personalized experiences are possible with this technology, but there are also legitimate privacy, bias, and ethical issues to consider. This article aims to help readers understand the possible influence of ChatGPT on the metaverse and how it may be used to effectively create a more immersive and engaging virtual environment by evaluating these opportunities and obstacles.	翻訳日:2023-03-27 15:17:23 公開日:2023-03-24
# 変形性モデル駆動型ニューラルレンダリングによる低視野環境下での頭部の高忠実度3次元再構成 Deformable Model Driven Neural Rendering for High-fidelity 3D Reconstruction of Human Heads Under Low-View Settings ( http://arxiv.org/abs/2303.13855v1 ) ライセンス: Link先を確認	Baixin Xu, Jiarui Zhang, Kwan-Yee Lin, Chen Qian and Ying He	(参考訳) 低視点入力から高忠実度幾何で3次元頭部を再構成する神経暗黙関数の頑健な学習法を提案する。我々は3次元人間の頭部を、スムーズなテンプレート、非剛性変形、高周波変位場からなる符号付き距離場のゼロレベルセットとして表現する。テンプレートは、変形ネットワークとともに複数の個人でトレーニングされるアイデンティティ非依存および表現ニュートラルの特徴を表す。変位場は、個人ごとに訓練されたアイデンティティ依存の幾何学的詳細を符号化する。我々は3Dの監督なしに粗大な戦略を用いてネットワークを2段階に訓練する。実験により, 幾何分解と2段階の訓練により, 提案手法は頑健であり, 低視点環境下での再現精度と新規ビュー合成の点で, 既存手法よりも優れることが示された。さらに、事前学習されたテンプレートは、私たちのモデルが目に見えない個人に適応するための適切な初期化に役立ちます。 We propose a robust method for learning neural implicit functions that can reconstruct 3D human heads with high-fidelity geometry from low-view inputs. We represent 3D human heads as the zero level-set of a composed signed distance field that consists of a smooth template, a non-rigid deformation, and a high-frequency displacement field. The template represents identity-independent and expression-neutral features, which is trained on multiple individuals, along with the deformation network. The displacement field encodes identity-dependent geometric details, trained for each specific individual. We train our network in two stages using a coarse-to-fine strategy without 3D supervision. Our experiments demonstrate that the geometry decomposition and two-stage training make our method robust and our model outperforms existing methods in terms of reconstruction accuracy and novel view synthesis under low-view settings. Additionally, the pre-trained template serves a good initialization for our model to adapt to unseen individuals.	翻訳日:2023-03-27 15:17:08 公開日:2023-03-24
# 2pcnet:昼夜無教師ドメイン適応オブジェクト検出のための2相一貫性トレーニング 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection ( http://arxiv.org/abs/2303.13853v1 ) ライセンス: Link先を確認	Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan	(参考訳) 夜のオブジェクト検出は、夜の画像アノテーションがないため、難しい問題である。いくつかのドメイン適応手法にもかかわらず、高精度な結果を達成することは依然として問題である。偽陽性の誤り伝播は、確立された学生-教師フレームワーク、特に小規模で低照度なオブジェクトを用いた方法でもまだ観察されている。本稿では,これらの問題に対処するため,二相無教師付きドメイン適応ネットワークである2PCNetを提案する。このネットワークは、教師から第1フェーズにおける高信頼境界予測を採用し、第2フェーズにおける教師の再評価を学生の地域提案に付加することで、高信頼と低信頼の擬似ラベルの組み合わせをもたらす。夜間画像と擬似ラベルは、学生への入力として使用される前にスケールダウンされ、より強力な小型の擬似ラベルを提供する。画像中の低照度領域や他の夜間関連属性から発生するエラーに対処するため,NightAugと呼ばれる夜間特化パイプラインを提案する。このパイプラインは、日中の画像にグラア、ぼかし、ノイズなどのランダムな拡張を適用します。公開データセットを用いた実験により,本手法は20\%の精度で最先端の手法に優れた結果が得られること,および対象データに基づいて直接トレーニングされたモデルを監督できることが証明された。 Object detection at night is a challenging problem due to the absence of night image annotations. Despite several domain adaptation methods, achieving high-precision results remains an issue. False-positive error propagation is still observed in methods using the well-established student-teacher framework, particularly for small-scale and low-light objects. This paper proposes a two-phase consistency unsupervised domain adaptation network, 2PCNet, to address these issues. The network employs high-confidence bounding-box predictions from the teacher in the first phase and appends them to the student's region proposals for the teacher to re-evaluate in the second phase, resulting in a combination of high and low confidence pseudo-labels. The night images and pseudo-labels are scaled-down before being used as input to the student, providing stronger small-scale pseudo-labels. To address errors that arise from low-light regions and other night-related attributes in images, we propose a night-specific augmentation pipeline called NightAug. This pipeline involves applying random augmentations, such as glare, blur, and noise, to daytime images. Experiments on publicly available datasets demonstrate that our method achieves superior results to state-of-the-art methods by 20\%, and to supervised models trained directly on the target data.	翻訳日:2023-03-27 15:16:52 公開日:2023-03-24
# 弱教師付きシングルビュー画像のリライト Weakly-supervised Single-view Image Relighting ( http://arxiv.org/abs/2303.13852v1 ) ライセンス: Link先を確認	Renjiao Yi, Chenyang Zhu, Kai Xu	(参考訳) 本稿では,ランベルトおよび低周波スペクトルの単一像をリライトする学習に基づくアプローチを提案する。本手法では,写真からオブジェクトを新しいシーンに挿入し,ar応用に不可欠な新しい環境照明下でリライトすることができる。オブジェクトをリライトするため、逆レンダリングと再レンダリングの両方を解決します。この逆レンダリングを解消するために,低ランク制約による弱教師付き手法を提案する。弱教師付きトレーニングを容易にするために,照度の変化を伴ってビデオの大規模(750Kイメージ)データセットであるRelitをコントリビュートする。再レンダリングのために、球面調和の様々な照明下で低周波非ランベルト材料をレンダリングする微分可能な特異なレンダリング層を提案する。パイプライン全体がエンドツーエンドで効率的で、ARオブジェクト挿入のモバイルアプリ実装を可能にする。大規模評価は,本手法が最先端性能を実現することを示す。プロジェクトページ: https://renjiaoyi.github.io/relighting/ We present a learning-based approach to relight a single image of Lambertian and low-frequency specular objects. Our method enables inserting objects from photographs into new scenes and relighting them under the new environment lighting, which is essential for AR applications. To relight the object, we solve both inverse rendering and re-rendering. To resolve the ill-posed inverse rendering, we propose a weakly-supervised method by a low-rank constraint. To facilitate the weakly-supervised training, we contribute Relit, a large-scale (750K images) dataset of videos with aligned objects under changing illuminations. For re-rendering, we propose a differentiable specular rendering layer to render low-frequency non-Lambertian materials under various illuminations of spherical harmonics. The whole pipeline is end-to-end and efficient, allowing for a mobile app implementation of AR object insertion. Extensive evaluations demonstrate that our method achieves state-of-the-art performance. Project page: https://renjiaoyi.github.io/relighting/.	翻訳日:2023-03-27 15:16:29 公開日:2023-03-24
# ニューラルネットワークにおける因果関係の学習--直接的な効果を超えて Learning Causal Attributions in Neural Networks: Beyond Direct Effects ( http://arxiv.org/abs/2303.13850v1 ) ライセンス: Link先を確認	Abbaavaram Gowtham Reddy, Saketh Bachu, Harsharaj Pathak, Benin L Godfrey, Vineeth N. Balasubramanian, Varshaneya V, Satya Narayanan Kar	(参考訳) 近年,ニューラルネットワーク(NN)モデルにおける因果関係の捕捉と維持に対する関心が高まっている。本研究では,NNモデルの入出力属性を推定・維持するための因果的アプローチについて検討する。特に、この方向の既存の取り組みは(NNアーキテクチャにより)入力変数間の独立性を前提としており、直接的な因果効果のみを研究する。 NNを構造因果モデル(Structuor causal model, SCM)と見なして、直接効果を超えて入力特徴間のエッジを導入し、NNモデルをトレーニングしながら直接的かつ間接的因果効果を捕捉し、維持するためのシンプルで効果的な方法論を提供する。また,高次元データにおける因果帰属を定量化する効果的な近似戦略を提案する。合成および実世界のデータセットに関する幅広い実験により、提案手法は、基底真理効果に近い直接的および間接的因果効果の因果属性を学習することを示す。 There has been a growing interest in capturing and maintaining causal relationships in Neural Network (NN) models in recent years. We study causal approaches to estimate and maintain input-output attributions in NN models in this work. In particular, existing efforts in this direction assume independence among input variables (by virtue of the NN architecture), and hence study only direct causal effects. Viewing an NN as a structural causal model (SCM), we instead focus on going beyond direct effects, introduce edges among input features, and provide a simple yet effective methodology to capture and maintain direct and indirect causal effects while training an NN model. We also propose effective approximation strategies to quantify causal attributions in high dimensional data. Our wide range of experiments on synthetic and real-world datasets show that the proposed ante-hoc method learns causal attributions for both direct and indirect causal effects close to the ground truth effects.	翻訳日:2023-03-27 15:16:11 公開日:2023-03-24
# 対人ロバスト性の特徴分離と再検討 Feature Separation and Recalibration for Adversarial Robustness ( http://arxiv.org/abs/2303.13846v1 ) ライセンス: Link先を確認	Woo Jae Kim, Yoonki Cho, Junsik Jung, Sung-Eui Yoon	(参考訳) 深いニューラルネットワークは、特徴レベルの摂動の蓄積による敵対的攻撃の影響を受けやすく、多くの研究がモデル誤予測を引き起こす非破壊的特徴アクティベーションを非活性化することによってモデルの堅牢性を高めている。しかし、これらの悪意あるアクティベーションは依然として識別的手がかりを含んでおり、再校正によってモデルの正しい予測のために追加の有用な情報を捉えることができると主張している。そこで本研究では,より堅牢な特徴マップに対して,悪意のある非ロバストアクティベーションを分離と再調整によって再結合する機能分離再調整(fsr)という新しい手法を提案する。分離部は、入力特徴マップを、モデルが正しい予測を行うのに役立つアクティベーション付きロバスト特徴と、敵の攻撃時にモデル予測の誤りの原因となるアクティベーションとで区別する。 Recalibration部は、モデル予測のための潜在的に有用なキューを復元するために、非ロバストなアクティベーションを調整する。大規模な実験は、従来の非活性化技術と比較してFSRの優位性を検証し、計算オーバーヘッドを小さくして8.57%まで向上することを示した。コードはhttps://github.com/wkim97/fsrで入手できる。 Deep neural networks are susceptible to adversarial attacks due to the accumulation of perturbations in the feature level, and numerous works have boosted model robustness by deactivating the non-robust feature activations that cause model mispredictions. However, we claim that these malicious activations still contain discriminative cues and that with recalibration, they can capture additional useful information for correct model predictions. To this end, we propose a novel, easy-to-plugin approach named Feature Separation and Recalibration (FSR) that recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration. The Separation part disentangles the input feature map into the robust feature with activations that help the model make correct predictions and the non-robust feature with activations that are responsible for model mispredictions upon adversarial attack. The Recalibration part then adjusts the non-robust activations to restore the potentially useful cues for model predictions. Extensive experiments verify the superiority of FSR compared to traditional deactivation techniques and demonstrate that it improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead. Codes are available at https://github.com/wkim97/FSR.	翻訳日:2023-03-27 15:15:53 公開日:2023-03-24
# 過去を思い出す:アナログプロンプトによるインクリメンタル学習 Remind of the Past: Incremental Learning with Analogical Prompts ( http://arxiv.org/abs/2303.13898v1 ) ライセンス: Link先を確認	Zhiheng Ma, Xiaopeng Hong, Beinan Liu, Yabin Wang, Pinyue Guo, Huiyun Li	(参考訳) データフリーの漸進的学習法はメモリフレンドリだが、過去のデータがない場合には、正確に推定と対応が難しい。本稿では,人間のアナロジー能力に触発された新しいインクリメンタル学習手法を提案する。具体的には、即時チューニングにより新しいデータを古いクラスに再マップするアナロジー作成機構を設計する。これは、新しいクラスのサンプルのみを使用して、古いモデルのターゲットの古いクラスのフィーチャ分布を模倣する。学習プロンプトは、歴史的プロトタイプの微調整による表現シフトを推定し、対処するためにさらに使用される。提案手法は,4つのインクリメンタルラーニングベンチマークに対して,クラスとドメインのインクリメンタルラーニング設定の下で,新しい最先端性能を設定する。クラスごとに機能プロトタイプを保存するだけで、データ再生メソッドを一貫して上回る。 Core50ベンチマークのジョイントトレーニングにより、実証上界にほぼ到達した。コードは \url{https://github.com/ZhihengCV/A-Prompts} でリリースされる。 Although data-free incremental learning methods are memory-friendly, accurately estimating and counteracting representation shifts is challenging in the absence of historical data. This paper addresses this thorny problem by proposing a novel incremental learning method inspired by human analogy capabilities. Specifically, we design an analogy-making mechanism to remap the new data into the old class by prompt tuning. It mimics the feature distribution of the target old class on the old model using only samples of new classes. The learnt prompts are further used to estimate and counteract the representation shift caused by fine-tuning for the historical prototypes. The proposed method sets up new state-of-the-art performance on four incremental learning benchmarks under both the class and domain incremental learning settings. It consistently outperforms data-replay methods by only saving feature prototypes for each class. It has almost hit the empirical upper bound by joint training on the Core50 benchmark. The code will be released at \url{https://github.com/ZhihengCV/A-Prompts}.	翻訳日:2023-03-27 15:09:01 公開日:2023-03-24
# 画像認識のための多項式ネットワークの規則化 Regularization of polynomial networks for image recognition ( http://arxiv.org/abs/2303.13896v1 ) ライセンス: Link先を確認	Grigorios G Chrysos, Bohan Wang, Jiankang Deng, Volkan Cevher	(参考訳) ディープニューラルネットワーク(Deep Neural Networks, DNN)は、タスク全体にわたって優れたパフォーマンスを得ているが、ブラックボックスとして残っている。同時に、PN(Polynomial Networks)は、有望な性能と解釈性を改善した代替手法として登場したが、強力なDNNベースラインのパフォーマンスには達していない。この作業では、パフォーマンスギャップを埋めることを目指しています。 6つのベンチマークでResNetのパフォーマンスに到達できるPNのクラスを紹介します。強正則化が重要であることを実証し,性能に適合するために必要な完全正則化スキームを広範囲に検討した。正規化スキームをさらに進めるために,従来提案されていた多項式ネットワークよりも高次展開を実現するD-PolyNetを導入する。 D-PolyNetはパラメータ効率が良く、他の多項式ネットワークと同じような性能を実現する。我々の新しいモデルは、要素的活性化関数(PNの訓練にはもはや必要ない)の役割の理解につながると期待している。ソースコードはhttps://github.com/grigorisg9gr/regularized_polynomialsで入手できる。 Deep Neural Networks (DNNs) have obtained impressive performance across tasks, however they still remain as black boxes, e.g., hard to theoretically analyze. At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close this performance gap. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks. We demonstrate that strong regularization is critical and conduct an extensive study of the exact regularization schemes required to match performance. To further motivate the regularization schemes, we introduce D-PolyNets that achieve a higher-degree of expansion than previously proposed polynomial networks. D-PolyNets are more parameter-efficient while achieving a similar performance as other polynomial networks. We expect that our new models can lead to an understanding of the role of elementwise activation functions (which are no longer required for training PNs). The source code is available at https://github.com/grigorisg9gr/regularized_polynomials.	翻訳日:2023-03-27 15:08:46 公開日:2023-03-24
# 結合スピン系におけるハイゼンベルク制限スピンスクイージング Heisenberg-limited spin squeezing in coupled spin systems ( http://arxiv.org/abs/2303.13889v1 ) ライセンス: Link先を確認	Long-Gang Huang, Xuanchen Zhang, Yanzhen Wang, Zhenxing Hua, Yuanjiang Tang and Yong-Chun Liu	(参考訳) スピンスクイージングは量子力学と量子情報科学において重要な役割を果たす。その生成は、さらなる応用の前提条件であるが、既存の物理系が要求されるスクイーズ相互作用をほとんど含まないため、依然として大きな課題に直面している。本稿では, スピンスピン相互作用を持つ結合スピンモデルにおいてスピンスクイージングを生成するための普遍的なスキームを提案する。我々の手法は、結合したスピン相互作用をスクイーズ相互作用に変換し、ハイゼンベルクで制限された測定精度を1/N$$$N$の粒子で極端にスクイーズする。一定かつ連続的な駆動場のみが必要であり、これが現在の現実的な実験の連続に利用できる。この研究は、ハイゼンベルク制限されたスピンスクイージングを生成できる様々なシステムを強化し、量子精密測定に広く応用されている。 Spin squeezing plays a crucial role in quantum metrology and quantum information science. Its generation is the prerequisite for further applications but still faces an enormous challenge since the existing physical systems rarely contain the required squeezing interactions. Here we propose a universal scheme to generate spin squeezing in coupled spin models with collective spin-spin interactions, which commonly exist in various systems. Our scheme can transform the coupled spin interactions into squeezing interactions, and reach the extreme squeezing with Heisenberg-limited measurement precision scaling as $1/N$ for $N$ particles. Only constant and continuous driving fields are required, which is accessible to a series of current realistic experiments. This work greatly enriches the variety of systems that can generate the Heisenberg-limited spin squeezing, with broad applications in quantum precision measurement.	翻訳日:2023-03-27 15:08:29 公開日:2023-03-24
# 手作りカーネルによる効果的なブラックボックス対向攻撃 Effective black box adversarial attack with handcrafted kernels ( http://arxiv.org/abs/2303.13887v1 ) ライセンス: Link先を確認	Petr Dvo\v{r}\'a\v{c}ek, Petr Hurtik, Petra \v{S}tevuli\'akov\'a	(参考訳) ブラックボックス攻撃の敵例を作成するための,新しいシンプルなフレームワークを提案する。そのアイデアは、手作りの畳み込みカーネルのただ1つの層からなる訓練不能なモデルで置換モデルをシミュレートし、生成ニューラルネットワークを訓練して、原画像と生成した逆画像の出力距離を最大化する。本研究では,第1層の予測を騙すことで,ネットワーク全体が騙され,逆入力の精度が低下することを示す。さらに、第1の畳み込み層カーネルを得るためのニューラルネットワークのトレーニングは行わないが、f変換技術を用いてニューラルネットワークを作成する。したがって,本手法は非常に時間と資源効率が高い。 We propose a new, simple framework for crafting adversarial examples for black box attacks. The idea is to simulate the substitution model with a non-trainable model compounded of just one layer of handcrafted convolutional kernels and then train the generator neural network to maximize the distance of the outputs for the original and generated adversarial image. We show that fooling the prediction of the first layer causes the whole network to be fooled and decreases its accuracy on adversarial inputs. Moreover, we do not train the neural network to obtain the first convolutional layer kernels, but we create them using the technique of F-transform. Therefore, our method is very time and resource effective.	翻訳日:2023-03-27 15:08:15 公開日:2023-03-24
# ARKitTrack: モバイルRGB-Dデータによるトラッキングのための新しい横データセット ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data ( http://arxiv.org/abs/2303.13885v1 ) ライセンス: Link先を確認	Haojie Zhao and Junsong Chen and Lijun Wang and Huchuan Lu	(参考訳) 従来のRGBのみのビジュアルトラッキングと比較して、RGB-Dトラッキング用に構築されたデータセットはほとんどない。本稿では、appleのiphoneおよびipadに搭載された消費者級lidarスキャナーを用いて、静的および動的シーンをキャプチャする新しいrgb-dトラッキングデータセットであるarkittrackを提案する。 ARKitTrackには300のRGB-Dシーケンス、455のターゲット、229.7Kのビデオフレームが含まれている。境界ボックスアノテーションとフレームレベルの属性とともに、このデータセットに123.9Kピクセルレベルのターゲットマスクをアノテートする。また、将来の展開のために、各フレームのカメラ固有およびカメラポーズが設けられる。このデータセットの有用性を示すため,RGB機能と鳥眼視表示を統合したボックスレベルとピクセルレベルのトラッキングの統一ベースラインを新たに提案し,モジュラリティ3次元形状の探索を行う。詳細な実験分析により,ARKitTrackデータセットがRGB-D追跡を著しく促進し,提案手法が芸術的状況と良好に比較できることが確認された。コードとデータセットはhttps://arkittrack.github.ioで入手できる。 Compared with traditional RGB-only visual tracking, few datasets have been constructed for RGB-D tracking. In this paper, we propose ARKitTrack, a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total. Along with the bounding box annotations and frame-level attributes, we also annotate this dataset with 123.9K pixel-level target masks. Besides, the camera intrinsic and camera pose of each frame are provided for future developments. To demonstrate the potential usefulness of this dataset, we further present a unified baseline for both box-level and pixel-level tracking, which integrates RGB features with bird's-eye-view representations to better explore cross-modality 3D geometry. In-depth empirical analysis has verified that the ARKitTrack dataset can significantly facilitate RGB-D tracking and that the proposed baseline method compares favorably against the state of the arts. The code and dataset is available at https://arkittrack.github.io.	翻訳日:2023-03-27 15:08:04 公開日:2023-03-24
# グラフ表現と変化点検出を用いたシンボリック音楽構造解析 Symbolic Music Structure Analysis with Graph Representations and Changepoint Detection Methods ( http://arxiv.org/abs/2303.13881v1 ) ライセンス: Link先を確認	Carlos Hernandez-Olivan, Sonia Rubio Llamas, Jose R. Beltran	(参考訳) 音楽構造分析は音楽情報検索(MIR)におけるオープンな研究課題である。過去には、楽曲を音響領域と記号領域に分割しようとする作品がいくつかあるが、音楽構造の異なるレベルでの識別と分節化は、まだこの分野では未解決の課題である。本研究は,3つの手法を提案する。そのうちの2つは,その形式や構造によって記号的音楽を分割することを目的とした,新しいグラフベースのアルゴリズムである。本研究では,異なる形態や構造を持つ2つの公開データセットを用いたアブレーション実験を行い,それらのパラメータ値の異なる手法を比較し,異なる音楽スタイルと比較した。グラフ表現によるシンボリック音楽を符号化し,グラフから得られる隣接行列の新規性を計算することで,特徴を抽出せずにシンボリック楽曲の構造を表現できることがわかった。オンラインの教師なしのchangepoint検出メソッドで境界を検出でき、この方法のテストに使用した公開データセットの1つで、1バーの許容性に対して、f_1が 0.5640 である。また,提案手法のパラメータが,そのレベルに応じてどのように調整されるかを示すため,構造,高,中,低の異なるレベルでのアルゴリズムの性能評価結果も提供する。本研究の再現性とユーザビリティを高めるため,各構造レベルのパラメータを,オープンソースのpythonパッケージである musicaiz に追加した。この手法が、構造、音楽分類、キーチェンジ検出などの他のMIRタスクを改善するために使用できることを願っている。 Music Structure Analysis is an open research task in Music Information Retrieval (MIR). In the past, there have been several works that attempt to segment music into the audio and symbolic domains, however, the identification and segmentation of the music structure at different levels is still an open research problem in this area. In this work we propose three methods, two of which are novel graph-based algorithms that aim to segment symbolic music by its form or structure: Norm, G-PELT and G-Window. We performed an ablation study with two public datasets that have different forms or structures in order to compare such methods varying their parameter values and comparing the performance against different music styles. We have found that encoding symbolic music with graph representations and computing the novelty of Adjacency Matrices obtained from graphs represent the structure of symbolic music pieces well without the need to extract features from it. We are able to detect the boundaries with an online unsupervised changepoint detection method with a F_1 of 0.5640 for a 1 bar tolerance in one of the public datasets that we used for testing our methods. We also provide the performance results of the algorithms at different levels of structure, high, medium and low, to show how the parameters of the proposed methods have to be adjusted depending on the level. We added the best performing method with its parameters for each structure level to musicaiz, an open source python package, to facilitate the reproducibility and usability of this work. We hope that this methods could be used to improve other MIR tasks such as music generation with structure, music classification or key changes detection.	翻訳日:2023-03-27 15:07:43 公開日:2023-03-24
# モーメント検索とハイライト検出のためのクエリ依存ビデオ表現 Query-Dependent Video Representation for Moment Retrieval and Highlight Detection ( http://arxiv.org/abs/2303.13874v1 ) ライセンス: Link先を確認	WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, Jae-Pil Heo	(参考訳) 近年,映像理解の需要が大幅に増加し,映像モーメント検索とハイライト検出(MR/HD)が注目されている。 MR/HDの主な目的は、与えられたテキストクエリに対して、モーメントをローカライズし、クリップワイドのレベルを推定することである。最近のトランスフォーマーベースモデルにはいくつかの進歩があったが、これらの手法が与えられたクエリの情報を完全に活用していないことがわかった。例えば、テキストクエリとビデオコンテンツの関連性は、モーメントとそのサルジェンシーを予測する際に無視されることがある。本稿では,MR/HDに適した検出変換器であるQuery-Dependent DETR(QD-DETR)を紹介する。トランスフォーマーアーキテクチャにおいて、与えられたクエリの重要でない役割を観察するため、エンコーディングモジュールは、テキストクエリのコンテキストをビデオ表現に明示的に注入するために、クロスアテンション層から始まります。そして,クエリ情報を活用するモデルの性能を高めるために,ビデオクエリペアを操作して無関係なペアを生成する。このような負の(無関係な)ビデオクエリペアは、低いサリエンシースコアを得るために訓練され、その結果、クエリとビデオのペア間の正確な一致をモデルが推定することを奨励する。最後に,与えられたビデオクエリ対に対するサリエンシースコアの基準を適応的に定義する入力適応サリエンシー予測器を提案する。本研究は,mr/hdにおけるクエリ依存表現の構築の重要性を検証する。具体的には、QD-DETRはQVHighlights、TVSum、Charades-STAデータセットで最先端の手法より優れている。コードはgithub.com/wjun0830/QD-DETRで入手できる。 Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative (irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets. Codes are available at github.com/wjun0830/QD-DETR.	翻訳日:2023-03-27 15:07:18 公開日:2023-03-24
# Fantasia3D:高品質なテキストから3Dコンテンツ作成のための幾何学と外観 Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation ( http://arxiv.org/abs/2303.13873v1 ) ライセンス: Link先を確認	Rui Chen, Yongwei Chen, Ningxin Jiao, Kui Jia	(参考訳) 3Dコンテンツの自動作成は、事前訓練された大規模言語モデルと画像拡散モデルが利用可能であることから、近年急速に進歩している。既存のtext-to-3dメソッドでは、ボリュームレンダリングによる幾何学と外観を結合した暗黙的なシーン表現が一般的であり、より細かいジオメトリの復元とフォトリアリスティックなレンダリングの面では最適ではない。本稿では,高品質テキストから3dコンテンツ作成のためのfantasia3dの新しい手法を提案する。 fantasia3dの鍵は、幾何学と外観の疎結合なモデリングと学習である。幾何学学習では,ハイブリッドなシーン表現に依拠し,画像拡散モデルの入力として表現から抽出した面正規化を符号化する。本研究では,空間的に変化する双方向反射率分布関数 (brdf) をtext-to-3dタスクに導入し, 生成面の光リアリスティックレンダリングのための表面材料を学習する。当社のdisentangledフレームワークは、一般的なグラフィックエンジンと互換性があり、生成された3dアセットのリライト、編集、物理シミュレーションをサポートしています。異なるテキストから3dのタスク設定下で既存の方法よりも優れた方法を示す徹底的な実験を行う。プロジェクトページとソースコード: https://fantasia3d.github.io/ Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: https://fantasia3d.github.io/.	翻訳日:2023-03-27 15:06:51 公開日:2023-03-24
# キャビティ設計を用いた通信周波数におけるオンデマンド非識別・絡み合い光子 On-demand indistinguishable and entangled photons at telecom frequencies using tailored cavity designs ( http://arxiv.org/abs/2303.13871v1 ) ライセンス: Link先を確認	David Bauch, Dustin Siebert, Klaus D. J\"ons, Jens F\"orstner and Stefan Schumacher	(参考訳) 量子ドット系でよく用いられるバイエキシトン・エキシトン放出カスケードは、偏光エンタングルメントを生成するために、本質的に区別不可能な光子を生成する。本研究は, 偏光絡み合いの度合いが高く, 同時に不明瞭度が高い光子の対を生成することに焦点を当てる。光共振器を用いたバイエクシトン寿命を選択的に低減することで、この目標を達成する。広帯域光子抽出と2重縮退光モードを併用したバイエクシトンエミッションの十分なパーセル向上の要求を満たすように調整した円形ブラッグ反射器を試作した。我々の詳細な理論研究が組み合わさる (i)モデルパラメータを入力として抽出したマクスウェル方程式を解いた現実的なフォトニック構造の最適化 (ii)光子特性に完全にアクセスできる量子ドットキャビティ励起ダイナミクスの微視的シミュレーション我々は,システムパラメータに対する非自明な依存性を報告し,1550\,\mathrm{nm}$ における通信用cバンドの非識別性と近接ユニティ値への絡み合いを最大化するパーセル強化の最適範囲を決定するために,複合理論手法の予測力を利用する。 The biexciton-exciton emission cascade commonly used in quantum-dot systems to generate polarization entanglement yields photons with intrinsically limited indistinguishability. In the present work we focus on the generation of pairs of photons with high degrees of polarization entanglement and simultaneously high indistinguishibility. We achieve this goal by selectively reducing the biexciton lifetime with an optical resonator. We demonstrate that a suitably tailored circular Bragg reflector fulfills the requirements of sufficient selective Purcell enhancement of biexciton emission paired with spectrally broad photon extraction and two-fold degenerate optical modes. Our in-depth theoretical study combines (i) the optimization of realistic photonic structures solving Maxwell's equations from which model parameters are extracted as input for (ii) microscopic simulations of quantum-dot cavity excitation dynamics with full access to photon properties. We report non-trivial dependencies on system parameters and use the predictive power of our combined theoretical approach to determine the optimal range of Purcell enhancement that maximizes indistinguishability and entanglement to near unity values in the telecom C-band at $1550\,\mathrm{nm}$.	翻訳日:2023-03-27 15:06:25 公開日:2023-03-24
# 学習可能な形状と位置を持つ物理的に対立する赤外線パッチ Physically Adversarial Infrared Patches with Learnable Shapes and Locations ( http://arxiv.org/abs/2303.13868v1 ) ライセンス: Link先を確認	Wei Xingxing and Yu Jie and Huang Yao	(参考訳) 安全クリティカルなタスクにおける赤外線物体検出器の広範な応用のために,実世界の敵対的事例に対するロバスト性を評価する必要がある。しかし、デジタル世界から物理世界への複雑な変換のため、現実的な応用では、現在の数少ない物理的赤外線攻撃は複雑である。この問題に対処するため,本論文では"adversarial infrared patch"と呼ばれる物理的に実現可能な赤外線攻撃手法を提案する。対象物の熱放射を捉えた赤外線カメラの撮像機構を考慮すると、対向的赤外線パッチはその熱分布を操作するために対象物に熱絶縁材料のパッチを取り付けて攻撃を行う。敵の攻撃を強化するため,対象物体上のパッチの形状と位置の同時学習を誘導する新たなアグリゲーション正規化を提案する。したがって、簡単な勾配に基づく最適化を適用することができる。様々な物体検出装置を用いて、異なる物体検出タスクにおける逆赤外線パッチを検証する。実験の結果, 物体を異なる角度, 距離, 姿勢, シーンで捉えた物理環境において, 歩行者検知器と車両検知器に対して, 攻撃成功率 (asr) が90 %以上に達することがわかった。より重要なことに、敵対的な赤外線パッチは実装が容易であり、物理的な世界で構築するのに0.5時間しかかからず、その効果と効率を検証する。 Owing to the extensive application of infrared object detectors in the safety-critical tasks, it is necessary to evaluate their robustness against adversarial examples in the real world. However, current few physical infrared attacks are complicated to implement in practical application because of their complex transformation from digital world to physical world. To address this issue, in this paper, we propose a physically feasible infrared attack method called "adversarial infrared patches". Considering the imaging mechanism of infrared cameras by capturing objects' thermal radiation, adversarial infrared patches conduct attacks by attaching a patch of thermal insulation materials on the target object to manipulate its thermal distribution. To enhance adversarial attacks, we present a novel aggregation regularization to guide the simultaneous learning for the patch' shape and location on the target object. Thus, a simple gradient-based optimization can be adapted to solve for them. We verify adversarial infrared patches in different object detection tasks with various object detectors. Experimental results show that our method achieves more than 90\% Attack Success Rate (ASR) versus the pedestrian detector and vehicle detector in the physical environment, where the objects are captured in different angles, distances, postures, and scenes. More importantly, adversarial infrared patch is easy to implement, and it only needs 0.5 hours to be constructed in the physical world, which verifies its effectiveness and efficiency.	翻訳日:2023-03-27 15:06:06 公開日:2023-03-24
# C2Cサービスにおける信頼管理アルゴリズムの適用性 Applicability of Trust Management Algorithm in C2C services ( http://arxiv.org/abs/2303.13919v1 ) ライセンス: Link先を確認	Ryohei Suzuki, Iifan Tyou, Shigenori Ohashi, Kazutoshi Sasahara	(参考訳) C2C(Consumer-to-Consumer)プラットフォームの出現により、消費者は商品を直接購入および販売できるようになったが、商品詐欺や偽レビューといった問題も生じている。信頼管理アルゴリズム(TMA)は不正ユーザを検出する対策として期待されている。しかし、ネットワーク上のデバイス間のピアツーピア(p2p)通信用に設計されたtmasが報告されるほど効果的かどうかは不明である。本稿では,エージェントベースモデルを用いたC2Cサービスにおける代表的TMAであるEigenTrustの適用性を検討する。まず、C2Cサービスでトランザクションプロセスを定義し、6種類の不正取引を仮定し、シミュレーションによりC2CシステムにおけるEigenTrustのダイナミクスを分析した。 EigenTrustは2種類の単純な詐欺の信頼度を正確に推定できることがわかった。さらに,2種類の先進的詐欺に対する信頼スコアの揺らぎが,これまでの研究では問題にならなかった。これは、そのような振動を検出することで、EigenTrustがいくつかの(すべてではないが)高度な詐欺を検出できることを示している。本研究は,C2Cサービスにおける取引の信頼性向上に寄与し,消費者サービスのさらなる技術開発に関する洞察を提供する。 The emergence of Consumer-to-Consumer (C2C) platforms has allowed consumers to buy and sell goods directly, but it has also created problems, such as commodity fraud and fake reviews. Trust Management Algorithms (TMAs) are expected to be a countermeasure to detect fraudulent users. However, it is unknown whether TMAs are as effective as reported as they are designed for Peer-to-Peer (P2P) communications between devices on a network. Here we examine the applicability of `EigenTrust', a representative TMA, for the use case of C2C services using an agent-based model. First, we defined the transaction process in C2C services, assumed six types of fraudulent transactions, and then analysed the dynamics of EigenTrust in C2C systems through simulations. We found that EigenTrust could correctly estimate low trust scores for two types of simple frauds. Furthermore, we found the oscillation of trust scores for two types of advanced frauds, which previous research did not address. This suggests that by detecting such oscillations, EigenTrust may be able to detect some (but not all) advanced frauds. Our study helps increase the trustworthiness of transactions in C2C services and provides insights into further technological development for consumer services.	翻訳日:2023-03-27 15:00:19 公開日:2023-03-24
# 胎児超音波画像からの複合情報除去 Removing confounding information from fetal ultrasound images ( http://arxiv.org/abs/2303.13918v1 ) ライセンス: Link先を確認	Kamil Mikolaj, Manxi Lin, Zahra Bashir, Morten Bo S{\o}ndergaard Svendsen, Martin Tolsgaard, Anders Nymark and Aasa Feragen	(参考訳) 医療画像に埋め込まれたテキストやマーキングの形で情報を結合することは、診断深層学習アルゴリズムのトレーニングに深刻な影響を与える可能性がある。しかし、臨床目的で収集されたデータは、しばしばそのようなマークが埋め込まれている。皮膚科では、悪性病変の画像で過剰に表現される図面や定規が知られている。本稿では,胎児検診用超音波スキャンを含む国立データベースに掲載されている画像にテキストと校正器を配置し,標準平面と相関して予測を行う。これらのデータベースで利用可能な膨大なデータを活用するために,標準平面分類をテストケースとして,超音波による深層学習アルゴリズムにおける埋め込みテキストと校正アルゴリズムの結合効果を最小化する手法を開発・検証した。 Confounding information in the form of text or markings embedded in medical images can severely affect the training of diagnostic deep learning algorithms. However, data collected for clinical purposes often have such markings embedded in them. In dermatology, known examples include drawings or rulers that are overrepresented in images of malignant lesions. In this paper, we encounter text and calipers placed on the images found in national databases containing fetal screening ultrasound scans, which correlate with standard planes to be predicted. In order to utilize the vast amounts of data available in these databases, we develop and validate a series of methods for minimizing the confounding effects of embedded text and calipers on deep learning algorithms designed for ultrasound, using standard plane classification as a test case.	翻訳日:2023-03-27 14:59:59 公開日:2023-03-24
# 重力波データストリームにおけるグリッチの分類のための畳み込みニューラルネットワーク Convolutional Neural Networks for the classification of glitches in gravitational-wave data streams ( http://arxiv.org/abs/2303.13917v1 ) ライセンス: Link先を確認	Tiago S. Fernandes and Samuel J. Vieira and Antonio Onofre and Juan Calder\'on Bustillo and Alejandro Torres-Forn\'e and Jos\'e A. Font	(参考訳) 本稿では,最新のConvNeXtネットワークファミリを含む畳み込みニューラルネットワークを用いて,高度LIGO検出器のデータ中の過渡的雑音信号(グリッチ)と重力波を分類する。まず、Gravity Spyデータセットを使用してゼロからトレーニングされたモデルと、このデータセットでトレーニング済みモデルを微調整して移行学習するモデルを使用する。第2に、自動生成擬似ラベルを用いた事前学習モデルの自己教師型アプローチについても検討する。我々の結果は、同じデータセットの既存の結果に非常に近いものであり、最高の教師付き(自己監督)モデルに対して、F1スコアの97.18%(94.15%)に達した。さらに、LIGO-VirgoのO3ランの実際の重力波信号を用いてモデルをテストする。以前の実行(o1とo2)のデータを使用してトレーニングされるが、特に転送学習を使用する場合、モデルのパフォーマンスは良好である。 Gravity Spyデータセットにあるハードウェアインジェクションの50チャープ未満の例とは別に、実際の信号のトレーニングを必要とせずに、転送学習がスコアを改善することがわかった。これにより、エラー分類だけでなく、信号分類にもトランスファー学習が用いられるようになった。 We investigate the use of Convolutional Neural Networks (including the modern ConvNeXt network family) to classify transient noise signals (i.e.~glitches) and gravitational waves in data from the Advanced LIGO detectors. First, we use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset and employing transfer learning by fine-tuning pre-trained models in this dataset. Second, we also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels. Our findings are very close to existing results for the same dataset, reaching values for the F1 score of 97.18% (94.15%) for the best supervised (self-supervised) model. We further test the models using actual gravitational-wave signals from LIGO-Virgo's O3 run. Although trained using data from previous runs (O1 and O2), the models show good performance, in particular when using transfer learning. We find that transfer learning improves the scores without the need for any training on real signals apart from the less than 50 chirp examples from hardware injections present in the Gravity Spy dataset. This motivates the use of transfer learning not only for glitch classification but also for signal classification.	翻訳日:2023-03-27 14:59:47 公開日:2023-03-24
# 参照誘導動的パラメータ選択による自己監督逆画像信号処理 Self-Supervised Reversed Image Signal Processing via Reference-Guided Dynamic Parameter Selection ( http://arxiv.org/abs/2303.13916v1 ) ライセンス: Link先を確認	Junji Otsuka, Masakazu Yoshimura, Takeshi Ohashi	(参考訳) 非処理センサ出力(RAW画像)は、低レベルと高レベルの両方のコンピュータビジョンアルゴリズムを改善する可能性があるが、大規模RAW画像データセットの欠如は研究の障壁である。そこで,既存のRGB画像をRAWに変換する逆画像信号処理(ISP)について検討した。しかし、既存のほとんどの方法は変換をモデル化するためにカメラ固有のメタデータやRGBとRAWのペア画像を必要とする。さらに、多様なISPの扱いや、世界的な照明の回復に問題がある。これらの制約に対処するために,メタデータとペア画像を必要としない自己教師付き逆ISP方式を提案する。提案手法は,RGB画像を参照RAW画像に基づいて逆ISPパイプラインのパラメータを動的に選択することにより,参照RAW画像と同じ環境下で撮像されたRAWライクな画像に変換する。パラメータ選択は、未ペアRGBおよびRAW画像から生成された擬似ペアデータを介して訓練される。提案手法は,他の最先端教師付き手法と同等の精度で様々な逆ISPを学習し,未知のRGB画像をCOCOやFlickr1Mから変換し,RAWライクな画像を画素分布でより正確にターゲットできることを示す。また、生成したRAW画像が実際のRAW画像オブジェクト検出タスクの性能を向上させることを示す。 Unprocessed sensor outputs (RAW images) potentially improve both low-level and high-level computer vision algorithms, but the lack of large-scale RAW image datasets is a barrier to research. Thus, reversed Image Signal Processing (ISP) which converts existing RGB images into RAW images has been studied. However, most existing methods require camera-specific metadata or paired RGB and RAW images to model the conversion, and they are not always available. In addition, there are issues in handling diverse ISPs and recovering global illumination. To tackle these limitations, we propose a self-supervised reversed ISP method that does not require metadata and paired images. The proposed method converts a RGB image into a RAW-like image taken in the same environment with the same sensor as a reference RAW image by dynamically selecting parameters of the reversed ISP pipeline based on the reference RAW image. The parameter selection is trained via pseudo paired data created from unpaired RGB and RAW images. We show that the proposed method is able to learn various reversed ISPs with comparable accuracy to other state-of-the-art supervised methods and convert unknown RGB images from COCO and Flickr1M to target RAW-like images more accurately in terms of pixel distribution. We also demonstrate that our generated RAW images improve performance on real RAW image object detection task.	翻訳日:2023-03-27 14:59:26 公開日:2023-03-24
# 12レベル心電図の深層学習に基づく心房細動分類におけるノイズの影響のベンチマーク Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG ( http://arxiv.org/abs/2303.13915v1 ) ライセンス: Link先を確認	Theresa Bender, Philip Gemke, Ennio Idrobo-Avila, Henning Dathe, Dagmar Krefting, Nicolai Spicher	(参考訳) 心電図解析は様々な臨床応用で広く使われており、分類タスクのディープラーニングモデルが現在研究の焦点となっている。データ駆動特性のため、信号ノイズを効率的に処理する可能性を秘めているが、これらの手法の精度への影響はいまだ不明である。そこで本研究では,12誘導心電図における心房細動検出のためのDeep Learning-based methodの精度に対する4種類のノイズの影響をベンチマークした。我々は、公開データセット(PTBXL)のサブセットを使用し、ノイズに関する人間の専門家が提供するメタデータを使用して、各心電図に信号品質を割り当てる。さらに,心電図毎に定量的信号対雑音比を算出する。両指標について深層学習モデルの精度を解析し,ヒトの専門家が複数の手がかりにうるさい信号とラベル付けした場合でも,心房細動を確実に識別できることを観察する。偽陽性率と偽陰性率は、データにノイズとラベル付けされる場合、やや悪化する。興味深いことに、ベースラインドリフトノイズを示すようにアノテートされたデータは、不要なデータと非常によく似た精度をもたらす。ノイズの多い心電図データの処理は,従来の方法のように事前処理を必要としない深層学習法で実現可能であると結論付けた。 Electrocardiography analysis is widely used in various clinical applications and Deep Learning models for classification tasks are currently in the focus of research. Due to their data-driven character, they bear the potential to handle signal noise efficiently, but its influence on the accuracy of these methods is still unclear. Therefore, we benchmark the influence of four types of noise on the accuracy of a Deep Learning-based method for atrial fibrillation detection in 12-lead electrocardiograms. We use a subset of a publicly available dataset (PTBXL) and use the metadata provided by human experts regarding noise for assigning a signal quality to each electrocardiogram. Furthermore, we compute a quantitative signal-to-noise ratio for each electrocardiogram. We analyze the accuracy of the Deep Learning model with respect to both metrics and observe that the method can robustly identify atrial fibrillation, even in cases signals are labelled by human experts as being noisy on multiple leads. False positive and false negative rates are slightly worse for data being labelled as noisy. Interestingly, data annotated as showing baseline drift noise results in an accuracy very similar to data without. We conclude that the issue of processing noisy electrocardiography data can be addressed successfully by Deep Learning methods that might not need preprocessing as many conventional methods do.	翻訳日:2023-03-27 14:59:02 公開日:2023-03-24
# GarmentTracking:カテゴリーレベルのガーメントポッドトラッキング GarmentTracking: Category-Level Garment Pose Tracking ( http://arxiv.org/abs/2303.13913v1 ) ライセンス: Link先を確認	Han Xue, Wenqiang Xu, Jieyi Zhang, Tutian Tang, Yutong Li, Wenxin Du, Ruolin Ye, Cewu Lu	(参考訳) 衣服は人間にとって重要である。完全な衣服のポーズを推定し追跡できる視覚システムは、多くの下流タスクや現実世界のアプリケーションに有用である。本研究は,(1)VRインタフェースを通じて仮想衣料品モデルを操作することができるVR-Garmentを収録した,カテゴリーレベルの衣料品ポーズ追跡タスクに対処するための完全なパッケージを提案する。 2) フラット化や折りたたみなどの操作において, 複雑な衣料を施した大規模データセットVRフォールディング。 (3) エンド・ツー・エンドのオンライントラッキングフレームワークであるGarmentTrackingは、ポイントクラウドシーケンスを与えられた標準空間とタスク空間の両方で、完全な衣服のポーズを予測する。広汎な実験により, 衣服の非剛性変形が大きい場合でも, 提案したGarmentTrackingは優れた性能を発揮することが示された。速度と精度の両方でベースラインアプローチより優れている。提案されたソリューションが将来の研究のプラットフォームになることを期待しています。コードとデータセットはhttps://garment-tracking.robotflow.aiで利用可能である。 Garments are important to humans. A visual system that can estimate and track the complete garment pose can be useful for many downstream tasks and real-world applications. In this work, we present a complete package to address the category-level garment pose tracking task: (1) A recording system VR-Garment, with which users can manipulate virtual garment models in simulation through a VR interface. (2) A large-scale dataset VR-Folding, with complex garment pose configurations in manipulation like flattening and folding. (3) An end-to-end online tracking framework GarmentTracking, which predicts complete garment pose both in canonical space and task space given a point cloud sequence. Extensive experiments demonstrate that the proposed GarmentTracking achieves great performance even when the garment has large non-rigid deformation. It outperforms the baseline approach on both speed and accuracy. We hope our proposed solution can serve as a platform for future research. Codes and datasets are available in https://garment-tracking.robotflow.ai.	翻訳日:2023-03-27 14:58:43 公開日:2023-03-24
# 任意量子プロセスのクロスプラットフォーム比較 Cross-Platform Comparison of Arbitrary Quantum Processes ( http://arxiv.org/abs/2303.13911v1 ) ライセンス: Link先を確認	Congcong Zheng, Xutao Yu, Kun Wang	(参考訳) 本研究では,局所演算と古典通信(LOCC)を用いて,空間的にあるいは時間的に異なる量子プラットフォーム上で実行される任意の量子プロセスの性能を比較するプロトコルを提案する。このプロトコルは局所ユニタリ演算子をサンプリングし、古典的通信を介して各プラットフォームと通信し、量子状態の準備と測定回路を構築する。その後、各プラットフォームに局所ユニタリ演算子を実装し、測定結果の確率分布を生成する。最大過程の忠実度は確率分布から推定され、量子プロセスの相対的性能を最終的に定量化する。さらに,このプロトコルが量子プロセストモグラフィーに適用可能であることを示す。我々は,IBMの5つの量子デバイスとBaiduのQianshi量子コンピュータの性能をクラウド経由で比較するためにプロトコルを適用した。驚くべきことに、このプロトコルは異なる量子コンピュータに実装された量子プロセスの性能を正確に比較することができ、完全な量子プロセストモグラフィに必要な測定値よりもはるかに少ない測定量を必要とする。我々の研究は、量子コンピュータのクロスプラットフォーム比較における協力的取り組みの触媒であると考えています。 In this work, we present a protocol for comparing the performance of arbitrary quantum processes executed on spatially or temporally disparate quantum platforms using Local Operations and Classical Communication (LOCC). The protocol involves sampling local unitary operators, which are then communicated to each platform via classical communication to construct quantum state preparation and measurement circuits. Subsequently, the local unitary operators are implemented on each platform, resulting in the generation of probability distributions of measurement outcomes. The max process fidelity is estimated from the probability distributions, which ultimately quantifies the relative performance of the quantum processes. Furthermore, we demonstrate that this protocol can be adapted for quantum process tomography. We apply the protocol to compare the performance of five quantum devices from IBM and the "Qianshi" quantum computer from Baidu via the cloud. Remarkably, the experimental results reveal that the protocol can accurately compare the performance of the quantum processes implemented on different quantum computers, requiring significantly fewer measurements than those needed for full quantum process tomography. We view our work as a catalyst for collaborative efforts in cross-platform comparison of quantum computers.	翻訳日:2023-03-27 14:58:26 公開日:2023-03-24
# Wave-U-Net Discriminator: 音声合成のための高速かつ軽量な識別器 Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis ( http://arxiv.org/abs/2303.13909v1 ) ライセンス: Link先を確認	Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki	(参考訳) 音声合成では、ジェネレータ(音声合成器)と識別器をmin-maxゲームで訓練するGAN(Generative Adversarial Network)が音声品質向上に広く利用されている。識別器のアンサンブルは、近年のニューラルボコーダ(HiFi-GANなど)や、複数の視点から波形を精査するためにTTSシステム(VITSなど)で一般的に用いられている。このような判別器は、合成された音声が実際の音声に適切に近づくことができるが、識別器の数の増加に応じて、モデルサイズと計算時間を増加させる必要がある。あるいは、Wave-U-Netアーキテクチャを持つ単一だが表現力のある識別器であるWave-U-Net判別器を提案する。この判別器は一意で、入力信号と同じ解像度でサンプル的に波形を評価でき、同時にスキップ接続のあるエンコーダとデコーダを介して多レベル特徴を抽出することができる。このアーキテクチャは、合成された音声が実際の音声と密にマッチするのに十分な情報を持つジェネレータを提供する。実験中,提案したアイデアを代表型ニューラルボコーダ (HiFi-GAN) とエンドツーエンドTSシステム (VITS) に適用した。その結果,提案手法は,hifi-ganでは2.31倍高速で14.5倍,vitsでは1.90倍高速で9.62倍軽量な判別器を用いて,同等の音声品質を達成できることがわかった。オーディオサンプルはhttps://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/で入手できる。 In speech synthesis, a generative adversarial network (GAN), training a generator (speech synthesizer) and a discriminator in a min-max game, is widely used to improve speech quality. An ensemble of discriminators is commonly used in recent neural vocoders (e.g., HiFi-GAN) and end-to-end text-to-speech (TTS) systems (e.g., VITS) to scrutinize waveforms from multiple perspectives. Such discriminators allow synthesized speech to adequately approach real speech; however, they require an increase in the model size and computation time according to the increase in the number of discriminators. Alternatively, this study proposes a Wave-U-Net discriminator, which is a single but expressive discriminator with Wave-U-Net architecture. This discriminator is unique; it can assess a waveform in a sample-wise manner with the same resolution as the input signal, while extracting multilevel features via an encoder and decoder with skip connections. This architecture provides a generator with sufficiently rich information for the synthesized speech to be closely matched to the real speech. During the experiments, the proposed ideas were applied to a representative neural vocoder (HiFi-GAN) and an end-to-end TTS system (VITS). The results demonstrate that the proposed models can achieve comparable speech quality with a 2.31 times faster and 14.5 times more lightweight discriminator when used in HiFi-GAN and a 1.90 times faster and 9.62 times more lightweight discriminator when used in VITS. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/.	翻訳日:2023-03-27 14:58:08 公開日:2023-03-24
# 磁気共鳴画像の高分解能化のための3プレーヤGAN A Three-Player GAN for Super-Resolution in Magnetic Resonance Imaging ( http://arxiv.org/abs/2303.13900v1 ) ライセンス: Link先を確認	Qi Wang, Lucas Mahler, Julius Steiglechner, Florian Birk, Klaus Scheffler, Gabriele Lohmann	(参考訳) 学習ベース単一画像スーパーレゾリューション(SISR)タスクは2次元画像でよく研究されている。しかし、3D磁気共鳴画像(MRI)のSISRは、主にニューラルネットワークパラメータの増大、メモリ要求の増大、利用可能なトレーニングデータの制限により、2Dと比較して困難である。現在の3次元ボリューム画像のSISR法は、GAN(Generative Adversarial Networks)、特にWasserstein GANsのトレーニング安定性に基づくものである。 2Dドメインの他の一般的なアーキテクチャ、例えばトランスフォーマーモデルでは、大量のトレーニングデータを必要とするため、限られた3Dデータには適さない。しかしながら、wasserstein gansは、グローバル最適に収束せず、ぼやけた結果をもたらす可能性があるため、問題となることがある。本稿では,GANフレームワークに基づく3次元SRの新しい手法を提案する。具体的には、GANトレーニングのバランスをとるためにインスタンスノイズを使用します。さらに,学習過程において,相対論的GAN損失関数と更新特徴抽出器を用いる。本手法は高精度な結果が得られることを示す。また、トレーニングサンプルが極めて少ないことも示しています。特に、以前の研究で通常必要とされる何千ものトレーニングサンプルではなく、30未満のサンプルが必要です。最後に,本モデルによるサンプル外結果の改善を示す。 Learning based single image super resolution (SISR) task is well investigated in 2D images. However, SISR for 3D Magnetics Resonance Images (MRI) is more challenging compared to 2D, mainly due to the increased number of neural network parameters, the larger memory requirement and the limited amount of available training data. Current SISR methods for 3D volumetric images are based on Generative Adversarial Networks (GANs), especially Wasserstein GANs due to their training stability. Other common architectures in the 2D domain, e.g. transformer models, require large amounts of training data and are therefore not suitable for the limited 3D data. However, Wasserstein GANs can be problematic because they may not converge to a global optimum and thus produce blurry results. Here, we propose a new method for 3D SR based on the GAN framework. Specifically, we use instance noise to balance the GAN training. Furthermore, we use a relativistic GAN loss function and an updating feature extractor during the training process. We show that our method produces highly accurate results. We also show that we need very few training samples. In particular, we need less than 30 samples instead of thousands of training samples that are typically required in previous studies. Finally, we show improved out-of-sample results produced by our model.	翻訳日:2023-03-27 14:57:35 公開日:2023-03-24
# 動的シナリオにおけるロバストテスト時間適応 Robust Test-Time Adaptation in Dynamic Scenarios ( http://arxiv.org/abs/2303.13899v1 ) ライセンス: Link先を確認	Longhui Yuan, Binhui Xie, Shuang Li	(参考訳) テスト時適応(TTA)は、未ラベルのテストデータストリームのみを用いて、事前訓練されたモデルを分散をテストすることを目的としている。従来のTTA手法のほとんどは、単一あるいは複数のディストリビューションから独立したサンプルデータなど、単純なテストデータストリームで大きな成功を収めている。しかしながら、これらの試みは、環境が徐々に変化し、テストデータが時間とともに相関してサンプリングされるような、自律運転のような現実のアプリケーションの動的シナリオでは失敗する可能性がある。そこで本研究では,実運用テスト時適応 (PTTA) と呼ばれる,実運用テストデータストリームをオンザフライで展開する手法について検討する。そこで我々は,PTTAの複雑なデータストリームに対してロバストテスト時間適応法(RoTTA)を詳述する。より具体的には、正規化統計を推定する頑健なバッチ正規化スキームを提案する。一方、メモリバンクは、時系列や不確実性を考慮したカテゴリバランスデータのサンプリングに利用される。さらに,学習手順を安定させるために,教師・生徒モデルを用いた時間対応型重み付け戦略を考案する。大規模な実験により、RoTTAは相関サンプルデータストリーム上で連続的なテストタイム適応を可能にすることが証明された。私たちのメソッドの実装は簡単で、迅速なデプロイメントに適しています。コードはhttps://github.com/BIT-DA/RoTTAで公開されている。 Test-time adaptation (TTA) intends to adapt the pretrained model to test distributions with only unlabeled test data streams. Most of the previous TTA methods have achieved great success on simple test data streams such as independently sampled data from single or multiple distributions. However, these attempts may fail in dynamic scenarios of real-world applications like autonomous driving, where the environments gradually change and the test data is sampled correlatively over time. In this work, we explore such practical test data streams to deploy the model on the fly, namely practical test-time adaptation (PTTA). To do so, we elaborate a Robust Test-Time Adaptation (RoTTA) method against the complex data stream in PTTA. More specifically, we present a robust batch normalization scheme to estimate the normalization statistics. Meanwhile, a memory bank is utilized to sample category-balanced data with consideration of timeliness and uncertainty. Further, to stabilize the training procedure, we develop a time-aware reweighting strategy with a teacher-student model. Extensive experiments prove that RoTTA enables continual testtime adaptation on the correlatively sampled data streams. Our method is easy to implement, making it a good choice for rapid deployment. The code is publicly available at https://github.com/BIT-DA/RoTTA	翻訳日:2023-03-27 14:57:17 公開日:2023-03-24
# 英国における政府サイバーセキュリティイニシアチブの効果評価 Evaluating the impact of government Cyber Security initiatives in the UK ( http://arxiv.org/abs/2303.13943v1 ) ライセンス: Link先を確認	Adejoke T. Odebade, Elhadj Benkhelifa	(参考訳) サイバーセキュリティイニシアチブは、政府にとって、企業や一般大衆のサイバー衛生を教育し、訓練し、啓発し、促進する大きな機会を提供する。これらのイニシアチブの作成と推進は、政府が国家のサイバー健康を確保するための必要な手段である。ユーザが安全で自信のあるオンラインであることを保証するために、英国政府は、慈善団体のための小さなチャリティーガイド、小規模ビジネスのための小さなビジネスガイド、一般向けに安全なオンライン化、組織のためのサイバーイニシアチブなど、さまざまなユーザのニーズを満たすように設計されたイニシアチブを作成した。しかし、これらのイニシアチブが目的を達成することを保証することは、特に人口に手を差し伸べる場合には、厄介なことだ。したがって、政府は、サイバーセキュリティに対する義務を認識していることを確実にするために、ユーザに連絡する実践的な方法を強化することが不可欠である。この研究は、英国政府のサイバーセキュリティイニシアティブのうち16つを評価し、これらのイニシアチブが失敗した4つの顕著な理由を発見した。これらの理由は、意識と訓練の不足、影響を測定するためのイニシアチブの非評価、行動の変化の不十分、意図された目標に到達するための限られた範囲である。これらの知見に基づく勧告は、これらのイニシアチブを全国および地域レベルで推進することである。 Cyber security initiatives provide immense opportunities for governments to educate, train, create awareness, and promote cyber hygiene among businesses and the general public. Creating and promoting these initiatives are necessary steps governments take to ensure the cyber health of a nation. To ensure users are safe and confident, especially online, the UK government has created initiatives designed to meet the needs of various users such as small charity guide for charity organisations, small business guide for small businesses, get safe online for the general public, and cyber essentials for organisations, among many others. However, ensuring that these initiatives deliver on their objectives can be daunting, especially when reaching out to the whole population. It is, therefore, vital for the government to intensify practical ways of reaching out to users to make sure that they are aware of their obligation to cyber security. This study evaluates sixteen of the UK government's cyber security initiatives and discovers four notable reasons why these initiatives are failing. These reasons are insufficient awareness and training, non-evaluation of initiatives to measure impact, insufficient behavioural change, and limited coverage to reach intended targets. The recommendation based on these findings is to promote these initiatives both nationally and at community levels.	翻訳日:2023-03-27 14:50:09 公開日:2023-03-24
# 機械学習による光電子モーメントからのフェムト秒パルスパラメータ推定 Femtosecond pulse parameter estimation from photoelectron momenta using machine learning ( http://arxiv.org/abs/2303.13940v1 ) ライセンス: Link先を確認	Tomasz Szo{\l}dra, Marcelo F. Ciappina, Nicholas Werby, Philip H. Bucksbaum, Maciej Lewenstein, Jakub Zakrzewski, and Andrew S. Maxwell	(参考訳) ディープラーニングモデルは、画像のようなデータに膨大な解釈能力を提供している。特に、畳み込みニューラルネットワーク(CNN)は、特徴抽出やパラメータ推定といったタスクに対して驚くほどの明度を示した。ここでは強電離光電子スペクトルのcnnをテストし、理論データ集合をインバート実験データにトレーニングする。パルスキャラクタリゼーションは「テストグラウンド」として使われ、具体的には「伝統的な」測定値が通常20\%の不確かさをもたらすレーザー強度を取得する。本稿では,理論データのトレーニングに成功し,検出器飽和度計算を含む実験から一貫した結果を返すために必要な重要なデータ拡張手法について報告する。同じ手順を繰り返すことで、強電界イオン化の様々なシナリオにcnnを適用することができる。予測の不確実性推定を用いて、信頼性のある数パーセントのレーザー強度の不確実性を抽出することができる。解釈可能性法を用いることで、ホログラフィック干渉に直接関連しうるレーザー強度に最も敏感な分布の一部を明らかにすることができる。 CNNは、パラメータを抽出する正確で便利な方法を提供し、強磁場電離スペクトルの新しい解釈ツールを表現している。 Deep learning models have provided huge interpretation power for image-like data. Specifically, convolutional neural networks (CNNs) have demonstrated incredible acuity for tasks such as feature extraction or parameter estimation. Here we test CNNs on strong-field ionization photoelectron spectra, training on theoretical data sets to `invert' experimental data. Pulse characterization is used as a `testing ground', specifically we retrieve the laser intensity, where `traditional' measurements typically leads to 20\% uncertainty. We report on crucial data augmentation techniques required to successfully train on theoretical data and return consistent results from experiments, including accounting for detector saturation. The same procedure can be repeated to apply CNNs in a range of scenarios for strong-field ionization. Using a predictive uncertainty estimation, reliable laser intensity uncertainties of a few percent can be extracted, which are consistently lower than those given by traditional techniques. Using interpretability methods can reveal parts of the distribution that are most sensitive to laser intensity, which can be directly associated to holographic interferences. The CNNs employed provide an accurate and convenient ways to extract parameters, and represent a novel interpretational tool for strong-field ionization spectra.	翻訳日:2023-03-27 14:49:43 公開日:2023-03-24
# MUG: 理解と生成のベンチマーク MUG: A General Meeting Understanding and Generation Benchmark ( http://arxiv.org/abs/2303.13939v1 ) ライセンス: Link先を確認	Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao	(参考訳) ビデオ会議やオンラインコースから長いビデオ/オーディオ録音を聴くことは極めて非効率である。 ASRシステムは、記録を長文の音声文書に書き起こした後でも、ASRの書き起こしを読むことは、情報の検索を高速化するだけである。キーフレーズ抽出やトピックセグメンテーション,要約など,さまざまなNLPアプリケーションが重要情報の収集において,ユーザの効率を著しく向上させることがわかった。ミーティングシナリオは,これらの言語処理(SLP)機能をデプロイする上で,最も価値のあるシナリオのひとつだ。しかし、これらのSLPタスクに注釈を付けた大規模な公開ミーティングデータセットの欠如は、彼らの進歩を著しく妨げている。 slpの進歩を促進するために,トピックセグメンテーション,トピックレベルおよびセッションレベルの抽出要約,トピックタイトル生成,キーフレーズ抽出,アクションアイテム検出など,幅広いslpタスクのパフォーマンスをベンチマークするために,mug(general meeting understanding and generation benchmark)を確立した。 mugベンチマークを容易にするために,大規模会議データセットであるalimeeting4mugコーパスを構築して公開する。このコーパスは654回録音されたマンダリン会議セッションで,トピックカバレッジが多様であり,会議記録のマニュアル書き起こしにslpタスクのマニュアルアノテーションが組み込まれている。私たちの知る限りでは、AliMeeting4MUG Corpusは規模で最大のミーティングコーパスであり、ほとんどのSLPタスクを促進する。本稿では,本コーパスの詳細な紹介,slpタスクと評価方法,ベースラインシステムとその性能について述べる。 Listening to long video/audio recordings from video conferencing and online courses for acquiring information is extremely inefficient. Even after ASR systems transcribe recordings into long-form spoken language documents, reading ASR transcripts only partly speeds up seeking information. It has been observed that a range of NLP applications, such as keyphrase extraction, topic segmentation, and summarization, significantly improve users' efficiency in grasping important information. The meeting scenario is among the most valuable scenarios for deploying these spoken language processing (SLP) capabilities. However, the lack of large-scale public meeting datasets annotated for these SLP tasks severely hinders their advancement. To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection. To facilitate the MUG benchmark, we construct and release a large-scale meeting dataset for comprehensive long-form SLP development, the AliMeeting4MUG Corpus, which consists of 654 recorded Mandarin meeting sessions with diverse topic coverage, with manual annotations for SLP tasks on manual transcripts of meeting recordings. To the best of our knowledge, the AliMeeting4MUG Corpus is so far the largest meeting corpus in scale and facilitates most SLP tasks. In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance.	翻訳日:2023-03-27 14:49:24 公開日:2023-03-24
# 10カ国における国家サイバーセキュリティ戦略の比較研究 A Comparative Study of National Cyber Security Strategies of ten nations ( http://arxiv.org/abs/2303.13938v1 ) ライセンス: Link先を確認	Adejoke T. Odebade, Elhadj Benkhelifa	(参考訳) この研究は、ヨーロッパ(イギリス、フランス、リトアニア、エストニア、スペイン、ノルウェー)、アジア太平洋(シンガポールとオーストラリア)、アメリカ地域(米国とカナダ)の10カ国で公開されている文書のNCSS(National Cybersecurity Strategies)を比較した。この研究は「サイバーセキュリティ」という用語の統一的な理解は存在しないが、NCSSの共通の軌道は、サイバー犯罪との戦いは様々な利害関係者の協力によるものであり、国際協力の必要性が強いことを示している。比較構造とnscフレームワークを用いて、重要な資産の保護、研究開発へのコミットメント、国内外のコラボレーションの改善に類似性を見出した。この研究は、基盤となるサイバーセキュリティフレームワークが統一されていないことが、戦略の構造と内容に相違をもたらすことを示唆している。この研究のNCSSの強みと弱点は、サイバーセキュリティ戦略の開発や更新を計画している国に恩恵をもたらす。この研究は、NCSSを開発する際に戦略開発者が考慮できる推奨事項を提供する。 This study compares the National Cybersecurity Strategies (NCSSs) of publicly available documents of ten nations across Europe (United Kingdom, France, Lithuania, Estonia, Spain, and Norway), Asia-Pacific (Singapore and Australia), and the American region (the United States of America and Canada). The study observed that there is not a unified understanding of the term "Cybersecurity"; however, a common trajectory of the NCSSs shows that the fight against cybercrime is a joint effort among various stakeholders, hence the need for strong international cooperation. Using a comparative structure and an NCSS framework, the research finds similarities in protecting critical assets, commitment to research and development, and improved national and international collaboration. The study finds that the lack of a unified underlying cybersecurity framework leads to a disparity in the structure and contents of the strategies. The strengths and weaknesses of the NCSSs from the research can benefit countries planning to develop or update their cybersecurity strategies. The study gives recommendations that strategy developers can consider when developing an NCSS.	翻訳日:2023-03-27 14:48:55 公開日:2023-03-24
# グラフニューラルネットワークによる粒子物理過程の位相再構成 Topological Reconstruction of Particle Physics Processes using Graph Neural Networks ( http://arxiv.org/abs/2303.13937v1 ) ライセンス: Link先を確認	Lukas Ehrke, John Andrew Raine, Knut Zoch, Manuel Guth, Tobias Golling	(参考訳) 本稿では,粒子の減衰とメッセージパッシンググラフニューラルネットワークの柔軟性を基礎として,中間粒子を含む基礎となる物理過程を再構築する新しい手法であるtopographを提案する。トポグラフは観測された最終状態天体の組合せ的な割り当てを解き、元の母粒子と関連付けるだけでなく、ハード散乱過程における中間粒子の性質とそれに続く崩壊を直接予測する。グラフニューラルネットワークを用いた標準的なコンビネータアプローチや現代的なアプローチと比較すると、グラフの複雑さは再構成されたオブジェクトの数と線形にスケールする。我々は、全ハドロン減衰チャネルにおけるトップクォーク対生成にトポグラフを適用し、標準手法より優れ、最先端の機械学習技術の性能に適合する。 We present a new approach, the Topograph, which reconstructs underlying physics processes, including the intermediary particles, by leveraging underlying priors from the nature of particle physics decays and the flexibility of message passing graph neural networks. The Topograph not only solves the combinatoric assignment of observed final state objects, associating them to their original mother particles, but directly predicts the properties of intermediate particles in hard scatter processes and their subsequent decays. In comparison to standard combinatoric approaches or modern approaches using graph neural networks, which scale exponentially or quadratically, the complexity of Topographs scales linearly with the number of reconstructed objects. We apply Topographs to top quark pair production in the all hadronic decay channel, where we outperform the standard approach and match the performance of the state-of-the-art machine learning technique.	翻訳日:2023-03-27 14:48:39 公開日:2023-03-24
# ソフトウェア開発教育におけるジェネレーティブAIアシスタント Generative AI Assistants in Software Development Education ( http://arxiv.org/abs/2303.13936v1 ) ライセンス: Link先を確認	Christopher Bull, Ahmed Kharrufa	(参考訳) ソフトウェア開発業界は、ソフトウェア開発にジェネレーティブAI(GAI)アシスタントを使用するという、潜在的に破壊的なパラダイムの変化に直面している。 AIはすでにソフトウェアエンジニアリングのさまざまな分野で使用されているが、GitHub CopilotやChatGPTといったGAIテクノロジは、多くの人々の想像力(と恐怖)に火をつけている。業界がこれらのテクノロジをどのように採用し、適応するのかは不明だが、microsoft(github、bing)やgoogle(bard)といった大企業による、より広範な業界への統合の動きは、意図と方向性を明確に示している。私たちは、現在の実践と課題を理解するために、業界専門家と探索的なインタビューを行い、ソフトウェア開発教育の将来というビジョンに組み込んで、教育的なレコメンデーションを実施しました。 The software development industry is amid another potentially disruptive paradigm change--adopting the use of generative AI (GAI) assistants for software development. Whilst AI is already used in various areas of software engineering, GAI technologies, such as GitHub Copilot and ChatGPT, have ignited the imaginations (and fears) of many people. Whilst it is unclear how the industry will adopt and adapt to these technologies, the move to integrate these technologies into the wider industry by large software companies, such as Microsoft (GitHub, Bing) and Google (Bard), is a clear indication of intent and direction. We performed exploratory interviews with industry professionals to understand current practices and challenges, which we incorporate into our vision of a future of software development education and make some pedagogical recommendations.	翻訳日:2023-03-27 14:48:23 公開日:2023-03-24
# DisC-Diff:マルチコントラストMRI超解像のための遠方拡散モデル DisC-Diff: Disentangled Conditional Diffusion Model for Multi-Contrast MRI Super-Resolution ( http://arxiv.org/abs/2303.13933v1 ) ライセンス: Link先を確認	Ye Mao, Lan Jiang, Xi Chen, and Chao Li	(参考訳) マルチコントラストMRI(Multi-Contrast MRI)は、脳組織のコントラストに基づいて神経疾患を特徴づける最も一般的な管理ツールである。しかし、高分解能MRIスキャンの取得には時間がかかり、特定の条件下では不可能である。そこで, マルチコントラスト超解像法は, マルチコントラストMRIの相補的情報を活用することで, 低コントラストの品質を向上させるために開発された。現在のディープラーニングに基づく超解法は、復元の不確実性の推定とモード崩壊の回避に限界がある。拡散モデルは画像強調のための有望なアプローチとして現れてきたが、マルチコントラストMRIによる複数の条件間の複雑な相互作用を捉えることは、臨床応用の課題である。本稿では,マルチコントラスト脳MRI超解像のための不整合拡散モデルDisC-Diffを提案する。拡散モデルのサンプリングベース生成と単純な目的関数を利用して、修復における不確実性を効果的に推定し、安定した最適化プロセスを保証する。さらに,DEC-Diffは,マルチコントラストMRIからの補完的情報をフル活用し,マルチコントラスト入力の複数の条件下でのモデル解釈を改善する。 578個の正常脳を含むIXIデータセットと316個の病理脳を含む臨床データセットの2つのデータセットに対するDisC-Diffの有効性を検証した。実験の結果,DisC-Diffは,他の最先端手法よりも定量的にも視覚的にも優れていた。 Multi-contrast magnetic resonance imaging (MRI) is the most common management tool used to characterize neurological disorders based on brain tissue contrasts. However, acquiring high-resolution MRI scans is time-consuming and infeasible under specific conditions. Hence, multi-contrast super-resolution methods have been developed to improve the quality of low-resolution contrasts by leveraging complementary information from multi-contrast MRI. Current deep learning-based super-resolution methods have limitations in estimating restoration uncertainty and avoiding mode collapse. Although the diffusion model has emerged as a promising approach for image enhancement, capturing complex interactions between multiple conditions introduced by multi-contrast MRI super-resolution remains a challenge for clinical applications. In this paper, we propose a disentangled conditional diffusion model, DisC-Diff, for multi-contrast brain MRI super-resolution. It utilizes the sampling-based generation and simple objective function of diffusion models to estimate uncertainty in restorations effectively and ensure a stable optimization process. Moreover, DisC-Diff leverages a disentangled multi-stream network to fully exploit complementary information from multi-contrast MRI, improving model interpretation under multiple conditions of multi-contrast inputs. We validated the effectiveness of DisC-Diff on two datasets: the IXI dataset, which contains 578 normal brains, and a clinical dataset with 316 pathological brains. Our experimental results demonstrate that DisC-Diff outperforms other state-of-the-art methods both quantitatively and visually.	翻訳日:2023-03-27 14:48:09 公開日:2023-03-24
# ICASSP 2023総合会議理解・生成チャレンジ(MUG)の概要 Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG) ( http://arxiv.org/abs/2303.13932v1 ) ライセンス: Link先を確認	Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao	(参考訳) ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) は、会議において重要な情報を把握する上で、SLPアプリケーションがユーザの効率を向上させるために重要であるため、会議記述書に対する幅広い言語処理(SLP)研究を促進することに焦点を当てている。 MUGにはトピックセグメンテーション、トピックレベルおよびセッションレベルの抽出要約、トピックタイトル生成、キーフレーズ抽出、アクションアイテム検出の5つのトラックが含まれている。 MUGを容易にするために,大規模なミーティングデータセットであるAliMeeting4MUG Corpusを構築し,リリースする。 ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in grasping important information in meetings. MUG includes five tracks, including topic segmentation, topic-level and session-level extractive summarization, topic title generation, keyphrase extraction, and action item detection. To facilitate MUG, we construct and release a large-scale meeting dataset, the AliMeeting4MUG Corpus.	翻訳日:2023-03-27 14:47:42 公開日:2023-03-24
# MSdocTr-Lite: フルページマルチスクリプト手書き文字認識のためのリテラル変換器 MSdocTr-Lite: A Lite Transformer for Full Page Multi-script Handwriting Recognition ( http://arxiv.org/abs/2303.13931v1 ) ライセンス: Link先を確認	Marwa Dhiaf, Ahmed Cheikh Rouhou, Yousri Kessentini, Sinda Ben Salem	(参考訳) トランスフォーマーは、長距離表現能力のため、様々なパターン認識タスクにおいて急速に支配的なアーキテクチャとなっている。しかし、トランスフォーマーはデータハングリーモデルであり、トレーニングには大きなデータセットが必要です。手書き文字認識(HTR)では、大量のラベル付きデータを収集することは複雑で高価な作業である。本稿では,フルページマルチスクリプト手書き文字認識のためのライトトランスアーキテクチャを提案する。提案モデルには3つの利点がある: まず、データ不足の一般的な問題を解決するために、ほとんどのHTRパブリックデータセットにおいて、外部データを必要とせずに、適切な量のデータに基づいてトレーニングできるライトトランスフォーマーモデルを提案する。第二に、カリキュラムの学習戦略のおかげでページレベルでの読み込み順序を学習でき、行分割エラーを避け、より大きなコンテキストを活用し、コストのかかるセグメンテーションアノテーションの必要性を減らすことができる。第3に、ページレベルのラベル付き画像のみを使用して、簡単なトランスファー学習プロセスを適用することで、他のスクリプトに容易に適応できる。異なるスクリプト(フランス語、英語、スペイン語、アラビア語)の異なるデータセットに関する広範な実験は、提案モデルの有効性を示している。 The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transfer-learning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model.	翻訳日:2023-03-27 14:47:31 公開日:2023-03-24
# 粒子平均場変動ベイズ Particle Mean Field Variational Bayes ( http://arxiv.org/abs/2303.13930v1 ) ライセンス: Link先を確認	Minh-Ngoc Tran, Paco Tseng, Robert Kohn	(参考訳) 平均場変分ベイズ法 (MFVB) はベイズ推論において最も計算効率のよい手法の1つである。しかし、その用途は共役前のモデルや解析計算を必要とするモデルに限られている。本稿では,MFVB法の適用性を大幅に拡大する粒子ベースMFVB法を提案する。本研究では,wasserstein勾配流とlangevin拡散ダイナミクスの結合を利用して,新しい手法の理論的基礎を確立し,ベイズロジスティック回帰,確率的ボラティリティ,ディープニューラルネットワークを用いた手法の有効性を示す。 The Mean Field Variational Bayes (MFVB) method is one of the most computationally efficient techniques for Bayesian inference. However, its use has been restricted to models with conjugate priors or those that require analytical calculations. This paper proposes a novel particle-based MFVB approach that greatly expands the applicability of the MFVB method. We establish the theoretical basis of the new method by leveraging the connection between Wasserstein gradient flows and Langevin diffusion dynamics, and demonstrate the effectiveness of this approach using Bayesian logistic regression, stochastic volatility, and deep neural networks.	翻訳日:2023-03-27 14:47:14 公開日:2023-03-24
# オフライン模倣学習のための最適輸送 Optimal Transport for Offline Imitation Learning ( http://arxiv.org/abs/2303.13971v1 ) ライセンス: Link先を確認	Yicheng Luo, Zhengyao Jiang, Samuel Cohen, Edward Grefenstette, Marc Peter Deisenroth	(参考訳) 大規模データセットの出現に伴い、オフライン強化学習(rl)は、実環境と対話することなく、優れた意思決定ポリシーを学ぶための有望なフレームワークである。しかし、オフラインのRLでは、報酬アノテートが必要なため、報酬エンジニアリングが難しい場合や、報酬アノテートを取得する場合など、現実的な課題が生じる。本稿では,オフライン軌道に報酬を割り当てるアルゴリズムであるOptimal Transport Reward labeling (OTR)を紹介する。 OTRの鍵となる考え方は、データセット内のラベルなし軌跡と専門家のデモンストレーションとの間の最適なアライメントを計算するために最適なトランスポートを使用することで、報酬として解釈可能な類似度測定値を取得し、オフラインのRLアルゴリズムでポリシーを学ぶことができることである。 OTRは実装が簡単で、計算効率が良い。 D4RL ベンチマークでは,単一実演を用いた OTR がオフライン RL の性能に一定の精度で一致することを示す。 With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.	翻訳日:2023-03-27 14:41:50 公開日:2023-03-24
# Si/SiGe構造における量子ドットに影響を及ぼす1/f電荷雑音のシミュレーション Simulation of 1/f charge noise affecting a quantum dot in a Si/SiGe structure ( http://arxiv.org/abs/2303.13968v1 ) ライセンス: Link先を確認	Marcin K\k{e}pa, Niels Focke, {\L}ukasz Cywi\'nski, Jan. A. Krzywda	(参考訳) コヒーレントスピン制御に必要な磁場勾配が存在するため、シリコン量子ドット内の単一電子スピン量子ビットの強調はしばしば1/f$の電荷ノイズによって支配される。現実的なSi/SiGe構造におけるゲート量子ドット中の電子の基底状態エネルギーの理論的変動について検討する。電荷ノイズは半導体-酸化物界面で捕捉された電荷の運動に起因すると仮定する。捕獲された電荷密度の現実的な範囲を考えると、$\rho \! てめえ! 10^{10}$ cm$^{-2}$、およびそれらの電荷の等方分布変位の典型的なレゲットスケール、$\delta r \! \leq \! 1$ nm で、ノイズスペクトルの振幅と形状が類似した構造に関する最近の実験で再構成されたスペクトルとよく一致しているペア $(\rho,\delta r)$ を識別する。 Due to presence of magnetic field gradient needed for coherent spin control, dephasing of single-electron spin qubits in silicon quantum dots is often dominated by $1/f$ charge noise. We investigate theoretically fluctuations of ground state energy of an electron in gated quantum dot in realistic Si/SiGe structure. We assume that the charge noise is caused by motion of charges trapped at the semiconductor-oxide interface. We consider a realistic range of trapped charge densities, $\rho \! \sim \! 10^{10}$ cm$^{-2}$, and typical lenghtscales of isotropically distributed displacements of these charges, $\delta r \! \leq \! 1$ nm, and identify pairs $(\rho,\delta r)$ for which the amplitude and shape of the noise spectrum is in good agreement with spectra reconstructed in recent experiments on similar structures.	翻訳日:2023-03-27 14:41:31 公開日:2023-03-24
# グラフ学習のための2レベル最適化による勾配不足 Gradient scarcity with Bilevel Optimization for Graph Learning ( http://arxiv.org/abs/2303.13964v1 ) ライセンス: Link先を確認	Hashem Ghanem (IMB), Samuel Vaiter (CNRS, JAD), Nicolas Keriven (CNRS, IRISA)	(参考訳) 半教師付き環境下でのグラフ学習の一般的な問題は勾配不足と呼ばれる。すなわち、ノードのサブセットの損失を最小限にすることでグラフを学習すると、ラベル付けされていないノード間のエッジがゼロ勾配を受ける。この現象は、グラフとグラフニューラルネットワーク(GCN)の重みを共同最適化アルゴリズムで最適化する際に初めて説明された。本研究では,この現象を正確に数学的に解析し,二段階最適化においても問題パラメータ間の追加依存性が存在することを証明した。 GCNの勾配の不足は、その有限受容場によって生じるが、ラプラシア正規化モデルにおいても、勾配の振幅がラベル付きノードとの距離によって指数関数的に減少することを示す。この問題を緩和するために,グラフ・ツー・グラフモデル(G2G)を用いた潜時グラフ学習,グラフに先行構造を課すグラフ正規化,あるいは直径を縮小した元のグラフよりも大きなグラフを最適化することを提案する。合成および実データを用いた実験により,提案手法の有効性が検証された。 A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.	翻訳日:2023-03-27 14:41:17 公開日:2023-03-24
# ステレオシーン:BEV支援のステレオマッチングパワーで3Dセマンティックシーンが完成 StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion ( http://arxiv.org/abs/2303.13959v1 ) ライセンス: Link先を確認	Bohan Li, Yasheng Sun, Xin Jin, Wenjun Zeng, Zheng Zhu, Xiaoefeng Wang, Yunpeng Zhang, James Okae, Hang Xiao, Dalong Du	(参考訳) 3Dセマンティックシーン補完(SSC)は、不完全な観察から密集した3Dシーンを推測する必要がある不適切な課題である。従来の手法では、3Dの幾何学的入力を明示的に取り入れるか、単眼のRGB画像の後方で学習した3Dに頼っていた。しかし、LiDARのような3Dセンサーは高価で侵入性があり、モノクラーカメラは固有の曖昧さのために正確な幾何学をモデル化する上で困難に直面している。本研究では,外部の3dセンサを使わずに,軽量カメラ入力を最大限に活用する3dセマンティックシーン補完(ssc)のためのステレオセンシングを提案する。私たちの重要な洞察は、ステレオマッチングを利用して幾何学的曖昧さを解決することです。未マッチング領域におけるロバスト性を改善するため,リッチな文脈情報による幻覚能力を高めるために,鳥眼ビュー(BEV)表現を導入する。ステレオおよびBEV表現の上に、相互インタラクティブアグリゲーション(MIA)モジュールを慎重に設計し、そのパワーを完全に解放する。具体的には、信頼度再重み付けを付加した双方向相互作用変換器(BIT)を用いて相互誘導による信頼性予測を行い、二重体積集約(DVA)モジュールは相補的な集約を容易にするように設計されている。 semantickittiの実験結果は、提案されたステレオシーンが最先端のカメラベース手法を上回り、相対的に26.9%、セマンティクスが38.6%改善していることを示している。 3D semantic scene completion (SSC) is an ill-posed task that requires inferring a dense 3D scene from incomplete observations. Previous methods either explicitly incorporate 3D geometric input or rely on learnt 3D prior behind monocular RGB images. However, 3D sensors such as LiDAR are expensive and intrusive while monocular cameras face challenges in modeling precise geometry due to the inherent ambiguity. In this work, we propose StereoScene for 3D Semantic Scene Completion (SSC), which explores taking full advantage of light-weight camera inputs without resorting to any external 3D sensors. Our key insight is to leverage stereo matching to resolve geometric ambiguity. To improve its robustness in unmatched areas, we introduce bird's-eye-view (BEV) representation to inspire hallucination ability with rich context information. On top of the stereo and BEV representations, a mutual interactive aggregation (MIA) module is carefully devised to fully unleash their power. Specifically, a Bi-directional Interaction Transformer (BIT) augmented with confidence re-weighting is used to encourage reliable prediction through mutual guidance while a Dual Volume Aggregation (DVA) module is designed to facilitate complementary aggregation. Experimental results on SemanticKITTI demonstrate that the proposed StereoScene outperforms the state-of-the-art camera-based methods by a large margin with a relative improvement of 26.9% in geometry and 38.6% in semantic.	翻訳日:2023-03-27 14:40:56 公開日:2023-03-24
# 量子および半量子通信プロトコルの強化 Boosted quantum and semi-quantum communication protocols ( http://arxiv.org/abs/2303.13958v1 ) ライセンス: Link先を確認	Rajni Bala, Sooryansh Asthana, V. Ravishankar	(参考訳) 準備・測定方式に基づくセキュアな量子通信プロトコルは、相互に偏りのないベースを用いる。これらのプロトコルでは、さまざまな参加者が異なるベースで測定する多数の実行が、単に無駄になってしまう。本稿では,鍵生成規則の適切な設計により,そのような実行回数を減らすことができることを示す。これにより、キー生成速度(KGR)が大幅に増加する。本稿では,高次元量子システムで符号化された効果的な量子ビットを用いて,量子鍵分散プロトコルと半量子鍵分散プロトコルを提案する。いずれも資源として絡み合った状態の準備を要求せず、比較的大量の情報を転送できるため、我々の提案は実験的に追求する価値があると信じている。 Secure quantum communication protocols based on prepare-and-measure scheme employ mutually unbiased bases. In these protocols, a large number of runs, in which different participants measure in different bases, simply go wasted. In this paper, we show that it is possible to reduce the number of such runs by a suitable design of the key generation rule. This results in a significant increase in the key generation rate (KGR). We illustrate this advantage by proposing quantum key distribution protocols and semi-quantum key distribution protocols by employing effective qubits encoded in higher dimensional quantum systems. Since none of them demands the preparation of entangled states as resources and a relatively large amount of information can be transferred, we believe that our proposals are worth pursuing experimentally.	翻訳日:2023-03-27 14:40:13 公開日:2023-03-24
# 分散スパースブロック符号のための因子 Factorizers for Distributed Sparse Block Codes ( http://arxiv.org/abs/2303.13957v1 ) ライセンス: Link先を確認	Michael Hersche, Aleksandar Terzic, Geethan Karunaratne, Jovin Langenegger, Ang\'eline Pouget, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi	(参考訳) 分散スパースブロック符号(SBC)は固定ベクトルを用いてシンボルデータ構造を符号化し操作するためのコンパクトな表現を示す。しかし、大きな課題の1つは、可能なすべての組み合わせを探索することなく、そのようなデータ構造を構成要素に切り離し、あるいは分解することである。この因子化は、現代のニューラルネットワークを用いてクエリベクトルを生成するときの知覚的不確実性や近似によってシンボル表現が緩和されるノイズの多いSBCによってクエリされるとより困難になる。これらの課題に対処するために,我々はまず,GSBCと呼ばれるより柔軟で一般化されたSBCを分解する高速かつ高精度な手法を提案する。反復分解器はしきい値に基づく非線形活性化,条件付きランダムサンプリング,$\ell_\infty$-based similarityメトリックを導入する。そのランダムサンプリング機構と重ね合わせの探索の組み合わせは、gsbcのバンドル能力まで経験的な観察と一致するデコードイテレーションの期待数を解析的に決定することができる。第二に,深層畳み込みニューラルネットワーク (cnns) を用いて生成したノイズ製品ベクトルを問合せした場合,提案手法は高い精度を維持する。これは、cnnの大規模完全連結層(fcl)を置き換えることで、cの訓練可能なクラスベクターまたは属性の組み合わせは、それぞれ$\sqrt[\leftroot{-2}\uproot{2}f]{c}$固定コードベクタを持つf-factor codebookを持つファクタライザによって暗黙的に表現できる。我々は,新しい損失関数を持つcnnの分類層に,因子化器を柔軟に統合する手法を提案する。 CIFAR-100, ImageNet-1K, RAVENデータセット上での4つの深層CNNアーキテクチャの実現可能性を示す。すべてのユースケースにおいて、パラメータと操作の数はFCLに比べて大幅に削減される。 Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-with vectors. One major challenge however is to disentangle, or factorize, such data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when queried by noisy SBCs wherein symbol representations are relaxed due to perceptual uncertainty and approximations made when modern neural networks are used to generate the query vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, a conditional random sampling, and an $\ell_\infty$-based similarity metric. Its random sampling mechanism in combination with the search in superposition allows to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC's bundling capacity. Secondly, the proposed factorizer maintains its high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are significantly reduced compared to the FCL.	翻訳日:2023-03-27 14:39:54 公開日:2023-03-24
# PIAT:パラメータ補間に基づく画像分類のための逆学習 PIAT: Parameter Interpolation based Adversarial Training for Image Classification ( http://arxiv.org/abs/2303.13955v1 ) ライセンス: Link先を確認	Kun He, Xin Liu, Yichen Yang, Zhou Qin, Weigao Wen, Hui Xue, John E. Hopcroft	(参考訳) 敵の攻撃に対して最も効果的なアプローチは、敵の訓練である。しかし, 既存の対人訓練法は, 防御効果を低下させ, トレーニング過程において明らかに振動や過度に適合する問題を示す。本研究では,パラメータ補間に基づく適応学習(PIAT)と呼ばれる新しいフレームワークを提案する。具体的には、各エポックの終わりにpiatはモデルパラメータを前と現在のエポックのパラメータの補間としてチューニングする。さらに、正規化平均正方形誤差(NMSE)を用いて、クリーンかつ対角的な例を整列することにより、ロバスト性をさらに向上することを提案する。他の正規化法と比較して、NMSEは絶対等級よりもロジットの相対等級に重点を置いている。いくつかのベンチマークデータセットと各種ネットワークに対する大規模な実験により,本手法はモデルの堅牢性を顕著に改善し,一般化誤差を低減できることが示された。さらに,我々のフレームワークは汎用的で,他の対向訓練手法と組み合わせることで,ロバストな精度をさらに高めることができる。 Adversarial training has been demonstrated to be the most effective approach to defend against adversarial attacks. However, existing adversarial training methods show apparent oscillations and overfitting issue in the training process, degrading the defense efficacy. In this work, we propose a novel framework, termed Parameter Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training. Specifically, at the end of each epoch, PIAT tunes the model parameters as the interpolation of the parameters of the previous and current epochs. Besides, we suggest to use the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the clean and adversarial examples. Compared with other regularization methods, NMSE focuses more on the relative magnitude of the logits rather than the absolute magnitude. Extensive experiments on several benchmark datasets and various networks show that our method could prominently improve the model robustness and reduce the generalization error. Moreover, our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.	翻訳日:2023-03-27 14:39:20 公開日:2023-03-24
# AssetField: 地平面表現におけるアセットマイニングと再構成 AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation ( http://arxiv.org/abs/2303.13953v1 ) ライセンス: Link先を確認	Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin	(参考訳) 屋内環境も屋外環境も本質的に構造的で反復的である。従来のモデリングパイプラインでは、ユニークなオブジェクトテンプレートを格納するアセットライブラリが維持されている。そこで本研究では,テンプレート特徴パッチを格納したアセットライブラリを教師なしで構築できる,シーンを表現するオブジェクト認識基底特徴平面のセットを学習するニューラルシーン表現であるアセットフィールドを提案する。オブジェクトの編集に空間点を問うためにオブジェクトマスクを必要とする既存の方法とは異なり、地上特徴平面表現は鳥眼ビューのシーンを自然に視覚化し、オブジェクト上の様々な操作(例えば、翻訳、複製、変形)で新しいシーンを構成することができる。テンプレート機能パッチにより、多数の繰り返しアイテムを持つシーンでグループ編集が有効になり、オブジェクト個人に対する反復的な作業が回避される。 AssetFieldは新規ビュー合成のための競争性能を達成するだけでなく、新しいシーン構成のためのリアルレンダリングを生成する。 Both indoor and outdoor environments are inherently structured and repetitive. Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice. Inspired by this observation, we propose AssetField, a novel neural scene representation that learns a set of object-aware ground feature planes to represent the scene, where an asset library storing template feature patches can be constructed in an unsupervised manner. Unlike existing methods which require object masks to query spatial points for object editing, our ground feature plane representation offers a natural visualization of the scene in the bird-eye view, allowing a variety of operations (e.g. translation, duplication, deformation) on objects to configure a new scene. With the template feature patches, group editing is enabled for scenes with many recurring items to avoid repetitive work on object individuals. We show that AssetField not only achieves competitive performance for novel-view synthesis but also generates realistic renderings for new scene configurations.	翻訳日:2023-03-27 14:39:03 公開日:2023-03-24
# CCL:LiDAR位置認識のための連続的コントラスト学習 CCL: Continual Contrastive Learning for LiDAR Place Recognition ( http://arxiv.org/abs/2303.13952v1 ) ライセンス: Link先を確認	Jiafeng Cui, Xieyuanli Chen	(参考訳) 位置認識は、ロボットや自動運転アプリケーションのためのループクローズとグローバルローカライズにおいて、必須かつ困難なタスクである。近年のディープラーニング技術の発展により,LiDAR位置認識(LPR)の性能は大幅に向上した。しかし、現在のディープラーニングベースの手法は、一般化能力の低さと破滅的な忘れることの2つの大きな問題に悩まされている。本稿では,大惨な忘れの問題に対処し,LPRアプローチの堅牢性を改善するために,CCLという連続的なコントラスト学習手法を提案する。我々のCCLは、コントラスト的特徴プールを構築し、コントラスト的損失を利用して、より移動可能な場所表現を訓練する。新たな環境に移行すると、CCLはコントラストメモリバンクを継続的にレビューし、新しいデータから新しい場所を認識することを継続的に学習しながら、過去のデータの検索能力を維持するために分布ベースの知識蒸留を適用します。我々は3つの異なるLPR手法を用いてオックスフォード、MulRan、PNVデータセットに対するアプローチを徹底的に評価した。実験の結果,我々のCCLは,異なる環境における異なる手法の性能を常に改善し,最先端の継続的学習法よりも優れていた。このメソッドの実装はhttps://github.com/cloudcjf/cclでリリースされた。 Place recognition is an essential and challenging task in loop closing and global localization for robotics and autonomous driving applications. Benefiting from the recent advances in deep learning techniques, the performance of LiDAR place recognition (LPR) has been greatly improved. However, current deep learning-based methods suffer from two major problems: poor generalization ability and catastrophic forgetting. In this paper, we propose a continual contrastive learning method, named CCL, to tackle the catastrophic forgetting problem and generally improve the robustness of LPR approaches. Our CCL constructs a contrastive feature pool and utilizes contrastive loss to train more transferable representations of places. When transferred into new environments, our CCL continuously reviews the contrastive memory bank and applies a distribution-based knowledge distillation to maintain the retrieval ability of the past data while continually learning to recognize new places from the new data. We thoroughly evaluate our approach on Oxford, MulRan, and PNV datasets using three different LPR methods. The experimental results show that our CCL consistently improves the performance of different methods in different environments outperforming the state-of-the-art continual learning method. The implementation of our method has been released at https://github.com/cloudcjf/CCL.	翻訳日:2023-03-27 14:38:43 公開日:2023-03-24
# ナレッジグラフ: 機会と課題 Knowledge Graphs: Opportunities and Challenges ( http://arxiv.org/abs/2303.13948v1 ) ライセンス: Link先を確認	Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, Francesco Osborne	(参考訳) 人工知能(AI)とビッグデータの爆発的成長により、膨大な量の知識を適切に整理し、表現することが極めて重要である。グラフデータとして、知識グラフは現実世界の知識を蓄積し伝達する。知識グラフは複雑な情報を効果的に表現していることがよく認識されており、近年は学術や産業の注目を集めている。そこで本稿では,知識グラフの理解を深めるために,この分野を体系的に概観する。具体的には、知識グラフの機会と課題に焦点を当てる。まず,(1)知識グラフに基づくaiシステム,(2)知識グラフの応用分野の可能性,という2つの側面から知識グラフの機会を考察する。そこで我々は,知識グラフの埋め込み,知識獲得,知識グラフの完成,知識融合,知識推論など,この分野の深刻な技術的課題を徹底的に議論する。この調査は、今後の研究と知識グラフの開発に新たな光を当てることを期待している。 With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs.	翻訳日:2023-03-27 14:38:24 公開日:2023-03-24
# 運用中の量子参照フレーム変換 Operational Quantum Reference Frame Transformations ( http://arxiv.org/abs/2303.14002v1 ) ライセンス: Link先を確認	Titouan Carette, Jan G{\l}owacki and Leon Loveridge	(参考訳) 量子参照フレームとその変換の汎用的、操作的、厳密な基礎を提供し、共変正演算子値測度を用いてフレームの可観測性を表現する。このフレームワークは局所コンパクトなグループを対象とし、フレーム変更に関する以前の提案と異なり、物理的に区別できない状態が識別される操作同値の概念を中心に構築されている。これにより、(不変)相対可観測体の空間と相対状態の凸集合を双対対象として構成することができる。フレームの性質を考慮に入れた相対状態について、より等価な関係を求めることにより、量子参照フレーム変更マップを提供する。この写像は、初期フレームと最終フレームが任意に局所化された状態を認めるとき、正確には可逆であることを示す。提案するフレーム変更を文献で利用可能な他の構成と比較し,共通適用領域における運用上の合意を見いだした。 We provide a general, operational, and rigorous basis for quantum reference frames and their transformations using covariant positive operator valued measures to represent frame observables. The framework holds for locally compact groups and differs from all prior proposals for frame changes, being built around the notion of operational equivalence, in which states that cannot be distinguished physically are identified. This allows for the construction of the space of (invariant) relative observables and the convex set of relative states as dual objects. By demanding a further equivalence relation on the relative states which takes into account the nature of the frames, we provide a quantum reference frame change map. We show that this map is invertible exactly when the initial and final frames admit states which are arbitrarily well localized. We compare the presented frame change with other constructions available in the literature, finding operational agreement on the domain of common applicability.	翻訳日:2023-03-27 14:31:20 公開日:2023-03-24
# 大規模都市景観のためのグリッド誘導型ニューラルラジアンスフィールド Grid-guided Neural Radiance Fields for Large Urban Scenes ( http://arxiv.org/abs/2303.14001v1 ) ライセンス: Link先を確認	Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin	(参考訳) 純粋なMLPベースのニューラルラジアンスフィールド(NeRF法)は、モデル容量の制限により、大規模なシーンでぼやけたレンダリングで不適合に陥ることが多い。近年のアプローチでは、シーンを地理的に分割し、各領域を個別にモデル化するために複数のサブNeRFを採用することが提案されている。別の解決策は、計算効率が高く、グリッド解像度が向上した大きなシーンに自然にスケールできる機能グリッド表現を使用することである。しかし、機能グリッドは制約が少なく、しばしば最適以下のソリューションに到達し、特に複雑な幾何学とテクスチャの領域において、レンダリングにおいてノイズの多いアーティファクトを生成する。本研究では,大規模都市における高忠実度レンダリングを実現するための新しいフレームワークを提案する。我々は,コンパクトな多分解能地上特徴面表現を用いてシーンを粗くキャプチャし,別のNeRFブランチを介して位置符号化入力を補完し,共同学習方式でレンダリングすることを提案する。このような統合は、2つの代替ソリューションの利点を生かしうることを示す: 軽量のNeRFは、特徴格子表現の指導の下で、細部でフォトリアリスティックなノベルビューをレンダリングするのに十分であり、また、共同最適化された地上特徴面は、さらに洗練され、より正確でコンパクトな特徴空間を形成し、より自然なレンダリング結果を生成することができる。 Purely MLP-based neural radiance fields (NeRF-based methods) often suffer from underfitting with blurred renderings on large-scale scenes due to limited model capacity. Recent approaches propose to geographically divide the scene and adopt multiple sub-NeRFs to model each region individually, leading to linear scale-up in training costs and the number of sub-NeRFs as the scene expands. An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene with increased grid resolutions. However, the feature grid tends to be less constrained and often reaches suboptimal solutions, producing noisy artifacts in renderings, especially in regions with complex geometry and texture. In this work, we present a new framework that realizes high-fidelity rendering on large urban scenes while being computationally efficient. We propose to use a compact multiresolution ground feature plane representation to coarsely capture the scene, and complement it with positional encoding inputs through another NeRF branch for rendering in a joint learning fashion. We show that such an integration can utilize the advantages of two alternative solutions: a light-weighted NeRF is sufficient, under the guidance of the feature grid representation, to render photorealistic novel views with fine details; and the jointly optimized ground feature planes, can meanwhile gain further refinements, forming a more accurate and compact feature space and output much more natural rendering results.	翻訳日:2023-03-27 14:31:06 公開日:2023-03-24
# powerpruning: ニューラルネットワーク高速化のための重みとアクティベーションの選択 PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration ( http://arxiv.org/abs/2303.13997v1 ) ライセンス: Link先を確認	Richard Petri, Grace Li Zhang, Yiran Chen, Ulf Schlichtmann, Bing Li	(参考訳) ディープニューラルネットワーク(DNN)は様々な分野に適用されている。 DNNを特にエッジデバイスにデプロイする際の大きな課題は、多数の乗算および累積(MAC)操作のために消費電力である。この課題に対処するため,我々は,mac 操作の消費電力を減少させる重みを選択することで,デジタルニューラルネットワーク加速器の消費電力を削減する新しい手法であるpowerpruningを提案する。また、選択された重みと全ての活性化遷移のタイミング特性を評価する。より小さな遅延につながる重みと活性化がさらに選択される。これにより、MACユニットを変更することなくMACユニットの感度回路パスの最大遅延を低減し、サプライ電圧の柔軟なスケーリングを可能にし、電力消費をさらに削減できる。リトレーニングとともに、提案手法はハードウェア上でのdnnの消費電力を最大78.3%削減できるが、精度の低下は少ない。 Deep neural networks (DNNs) have been successfully applied in various fields. A major challenge of deploying DNNs, especially on edge devices, is power consumption, due to the large number of multiply-and-accumulate (MAC) operations. To address this challenge, we propose PowerPruning, a novel method to reduce power consumption in digital neural network accelerators by selecting weights that lead to less power consumption in MAC operations. In addition, the timing characteristics of the selected weights together with all activation transitions are evaluated. The weights and activations that lead to small delays are further selected. Consequently, the maximum delay of the sensitized circuit paths in the MAC units is reduced even without modifying MAC units, which thus allows a flexible scaling of supply voltage to reduce power consumption further. Together with retraining, the proposed method can reduce power consumption of DNNs on hardware by up to 78.3% with only a slight accuracy loss.	翻訳日:2023-03-27 14:30:38 公開日:2023-03-24
# line: 重要なニューロンを利用した分布外検出 LINe: Out-of-Distribution Detection by Leveraging Important Neurons ( http://arxiv.org/abs/2303.13995v1 ) ライセンス: Link先を確認	Yong Hyun Ahn, Gyeong-Moon Park, Seong Tae Kim	(参考訳) 特に自律運転や医療といったミッションクリティカルな分野において、アウト・オブ・ディストリビューション(OOD)データの障害予測が大きな問題を引き起こす可能性がある場合において、入力サンプルの不確実性を定量化することが重要である。 OOD検出問題は、モデルが認識していないことを表現できないという点で、基本的に始まります。ポストホックなood検出アプローチは、モデルのパフォーマンスを低下させ、トレーニングコストを増加させる追加の再トレーニングプロセスを必要としないため、広く検討されている。本研究では,高次特徴を表すモデルの深層におけるニューロンの観点から,分布内データとOODデータ間のモデル出力の差を解析するための新しい側面を紹介する。本稿では,分布検出のポストホックアウトのための新しい手法であるLINe( Leveraging Important Neurons)を提案する。 shapley値に基づくプルーニングは、入力データの特定のクラスを予測し、残りをマスキングするために高分配ニューロンのみを選択することで、ノイズ出力の効果を減少させる。アクティベーション・クリッピングは、あるしきい値以上のすべての値を同じ値に固定し、lineがクラス固有のすべての特徴を等しく扱うことを可能にし、インディストリビューションとoodデータの間の活性化された特徴の差の数を単に考慮するだけでよい。 CIFAR-10, CIFAR-100, ImageNetデータセット上で, 最先端のOOD検出手法よりも高い性能で提案手法の有効性を検証する。 It is important to quantify the uncertainty of input samples, especially in mission-critical domains such as autonomous driving and healthcare, where failure predictions on out-of-distribution (OOD) data are likely to cause big problems. OOD detection problem fundamentally begins in that the model cannot express what it is not aware of. Post-hoc OOD detection approaches are widely explored because they do not require an additional re-training process which might degrade the model's performance and increase the training cost. In this study, from the perspective of neurons in the deep layer of the model representing high-level features, we introduce a new aspect for analyzing the difference in model outputs between in-distribution data and OOD data. We propose a novel method, Leveraging Important Neurons (LINe), for post-hoc Out of distribution detection. Shapley value-based pruning reduces the effects of noisy outputs by selecting only high-contribution neurons for predicting specific classes of input data and masking the rest. Activation clipping fixes all values above a certain threshold into the same value, allowing LINe to treat all the class-specific features equally and just consider the difference between the number of activated feature differences between in-distribution and OOD data. Comprehensive experiments verify the effectiveness of the proposed method by outperforming state-of-the-art post-hoc OOD detection methods on CIFAR-10, CIFAR-100, and ImageNet datasets.	翻訳日:2023-03-27 14:30:23 公開日:2023-03-24
# 到達性解析を用いた自律走行車の物理的バックドアトリガー起動 Physical Backdoor Trigger Activation of Autonomous Vehicle using Reachability Analysis ( http://arxiv.org/abs/2303.13992v1 ) ライセンス: Link先を確認	Wending Li, Yum Wang, Muhammad Shafique, Saif Eddin Jabari	(参考訳) 近年の研究では、自律走行車(AV)は隠れたバックドアで操作でき、物理的トリガーによって起動されると有害な行動を起こすことが示されている。しかし、これらのトリガーが交通原則に固執しながらどのように活性化されるのかはまだ不明である。動的なトラフィック環境でこの脆弱性を理解することは重要です。この研究は、制御された動的システムの到達可能性問題として物理的トリガの活性化を提示することで、このギャップに対処する。本手法は,事故の引き金条件に到達可能な交通システムにおけるセキュリティクリティカル領域を特定し,その状況に到達するための軌道を提供する。典型的なトラフィックシナリオをテストすると、システムは100%に近いアクティベーション率の条件をトリガーすることに成功した。本手法は,av脆弱性を識別し,効果的な安全性戦略を実現することに有用である。 Recent studies reveal that Autonomous Vehicles (AVs) can be manipulated by hidden backdoors, causing them to perform harmful actions when activated by physical triggers. However, it is still unclear how these triggers can be activated while adhering to traffic principles. Understanding this vulnerability in a dynamic traffic environment is crucial. This work addresses this gap by presenting physical trigger activation as a reachability problem of controlled dynamic system. Our technique identifies security-critical areas in traffic systems where trigger conditions for accidents can be reached, and provides intended trajectories for how those conditions can be reached. Testing on typical traffic scenarios showed the system can be successfully driven to trigger conditions with near 100% activation rate. Our method benefits from identifying AV vulnerability and enabling effective safety strategies.	翻訳日:2023-03-27 14:29:58 公開日:2023-03-24
# パラフレーズ検出:人間と機械のコンテンツ Paraphrase Detection: Human vs. Machine Content ( http://arxiv.org/abs/2303.13989v1 ) ライセンス: Link先を確認	Jonas Becker and Jan Philip Wahle and Terry Ruas and Bela Gipp	(参考訳) GPT-4やChatGPTといった大規模言語モデルの普及は、機械生成コンテンツやパラフレーズ化の可能性により、学術的整合性に対する懸念が高まっている。人間と機械によるパラフロードコンテンツの検出についての研究は行われてきたが、これらのタイプのコンテンツの比較は未調査のままである。本稿では,パラファーゼ検出タスクに一般的に使用される各種データセットの包括的解析を行い,検出手法の配列を評価する。この結果から,個々のデータセットのパフォーマンスの観点から異なる検出手法の長所と短所を強調し,人間の期待に合致する適切なマシン生成データセットの欠如を明らかにした。我々の主な発見は、人間が書いたパラフレーズが機械で作成したものを超え、難易度、多様性、類似性は、自動生成されたテキストが人間レベルのパフォーマンスとまだ一致していないことを示唆している。トランスフォーマーは、意味的に多様なコーパスに優れたTF-IDFを持つデータセット間で最も効果的な方法として登場した。さらに、4つのデータセットをパラフレーズ検出の最も多様で困難なものとして特定した。 The growing prominence of large language models, such as GPT-4 and ChatGPT, has led to increased concerns over academic integrity due to the potential for machine-generated content and paraphrasing. Although studies have explored the detection of human- and machine-paraphrased content, the comparison between these types of content remains underexplored. In this paper, we conduct a comprehensive analysis of various datasets commonly employed for paraphrase detection tasks and evaluate an array of detection methods. Our findings highlight the strengths and limitations of different detection methods in terms of performance on individual datasets, revealing a lack of suitable machine-generated datasets that can be aligned with human expectations. Our main finding is that human-authored paraphrases exceed machine-generated ones in terms of difficulty, diversity, and similarity implying that automatically generated texts are not yet on par with human-level performance. Transformers emerged as the most effective method across datasets with TF-IDF excelling on semantically diverse corpora. Additionally, we identify four datasets as the most diverse and challenging for paraphrase detection.	翻訳日:2023-03-27 14:29:43 公開日:2023-03-24
# 機械心理学:心理学的手法を用いた大規模言語モデルにおける創発的能力と行動の調査 Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods ( http://arxiv.org/abs/2303.13988v1 ) ライセンス: Link先を確認	Thilo Hagendorff	(参考訳) 大規模言語モデル(LLM)は、現在、人間のコミュニケーションと日常の生活を結び付けるAIシステムの最前線にある。急速な技術進歩と極端な汎用性により、LLMは今や数百万人のユーザを抱えており、情報検索、コンテンツ生成、問題解決などの主要なゴート技術になりつつある。そのため、その能力を徹底的に評価し、精査することが重要である。現在のllmでは、ますます複雑で新しい行動パターンがみられるため、もともと人間をテストするために設計された心理学実験の参加者として扱うことができる。そこで本研究では,「機械心理学」と呼ばれる新しい研究分野を紹介する。この論文は、心理学の異なるサブフィールドがLLMの行動テストにどのように影響するかを概説する。機械心理学研究の方法論的基準を定義しており、特にプロンプトデザインのポリシーに焦点を当てている。さらに、LLMで発見された行動パターンがどのように解釈されるかを記述する。要約すると、機械心理学は従来の自然言語処理ベンチマークでは検出できないLLMの創発的能力を発見することを目的としている。 Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Due to rapid technological advances and their extreme versatility, LLMs nowadays have millions of users and are at the cusp of being the main go-to technology for information retrieval, content generation, problem-solving, etc. Therefore, it is of great importance to thoroughly assess and scrutinize their capabilities. Due to increasingly complex and novel behavioral patterns in current LLMs, this can be done by treating them as participants in psychology experiments that were originally designed to test humans. For this purpose, the paper introduces a new field of research called "machine psychology". The paper outlines how different subfields of psychology can inform behavioral tests for LLMs. It defines methodological standards for machine psychology research, especially by focusing on policies for prompt designs. Additionally, it describes how behavioral patterns discovered in LLMs are to be interpreted. In sum, machine psychology aims to discover emergent abilities in LLMs that cannot be detected by most traditional natural language processing benchmarks.	翻訳日:2023-03-27 14:29:24 公開日:2023-03-24
# 階層的模倣学習による都市走行の解釈可能な運動プランナ Interpretable Motion Planner for Urban Driving via Hierarchical Imitation Learning ( http://arxiv.org/abs/2303.13986v1 ) ライセンス: Link先を確認	Bikun Wang, Zhipeng Wang, Chenhao Zhu, Zhiqiang Zhang, Zhichen Wang, Penghong Lin, Jingchu Liu and Qian Zhang	(参考訳) 学習ベースのアプローチは、自律運転において目覚ましいパフォーマンスを達成し、意思決定と計画モジュールでデータ駆動の作業が研究されている。しかし、ニューラルネットワークの信頼性と安定性は依然として課題に満ちている。本稿では,個々のデータ駆動型運転方針だけでなく,ルールベースのアーキテクチャにも容易に組み込むことが可能な,高レベルグリッドベースの行動プランナーと低レベル軌道プランナーを含む階層的模倣手法を提案する。本手法をクローズドループシミュレーションと実世界走行の両方で評価し,複雑な都市自律運転シナリオにおいて,ニューラルネットワークプランナが優れた性能を示した。 Learning-based approaches have achieved impressive performance for autonomous driving and an increasing number of data-driven works are being studied in the decision-making and planning module. However, the reliability and the stability of the neural network is still full of challenges. In this paper, we introduce a hierarchical imitation method including a high-level grid-based behavior planner and a low-level trajectory planner, which is not only an individual data-driven driving policy and can also be easily embedded into the rule-based architecture. We evaluate our method both in closed-loop simulation and real world driving, and demonstrate the neural network planner has outstanding performance in complex urban autonomous driving scenarios.	翻訳日:2023-03-27 14:29:08 公開日:2023-03-24
# 知識蒸留を用いた低メモリデバイス用混合型ウェハ分類 Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation ( http://arxiv.org/abs/2303.13974v1 ) ライセンス: Link先を確認	Nitish Shukla, Anurima Dey, Srivatsan K	(参考訳) ウェハーの製造は何千ものステップを伴う複雑な作業です。ウェハマップの欠陥パターン認識(DPR)は生産欠陥の根本原因決定に不可欠であり、ウェハファウントリーの収量改善の洞察を与える可能性がある。製造中、様々な欠陥がウエハに単独で現れるか、異なる組み合わせとして現れる。ウエハ内の複数の欠陥を特定することは、単一の欠陥を特定するよりも一般的に難しい。近年,混合型DPRの深層学習手法が注目されている。しかし、欠陥の複雑さは複雑で大きなモデルを必要とするため、製造ラボで一般的に使用される低メモリの組み込みデバイスで運用するのが非常に困難である。もうひとつの一般的な問題は、複雑なネットワークをトレーニングするためのラベル付きデータの可用性の欠如である。本研究では,複雑な事前学習モデルの知識を軽量なデプロイメント対応モデルに割くための教師なしトレーニングルーチンを提案する。教師モデルよりも最大10倍小さくても, 精度を犠牲にすることなく, モデルを圧縮できることを実証的に示す。圧縮されたモデルは、現代の最先端モデルよりも優れている。 Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models.	翻訳日:2023-03-27 14:28:57 公開日:2023-03-24
# 深層学習におけるエネルギー効率の高い実践の解明--グリーンaiに向けた予備的ステップ Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI ( http://arxiv.org/abs/2303.13972v1 ) ライセンス: Link先を確認	Tim Yarally, Lu\'is Cruz, Daniel Feitosa, June Sallou, Arie van Deursen	(参考訳) 現代のAIプラクティスは、すべて同じ目標(より良い結果)に向かっています。ディープラーニングの文脈では、"results"という用語は、しばしば競合問題集合における達成された正確さを指す。本稿では,グリーンAIの新興分野からのアイデアを,エネルギー消費を精度に等しい重要性の指標として捉え,無関係なタスクやエネルギー使用量を減らすために採用する。本研究では,パイプライン全体のエネルギー消費に大きな影響を与える2つの要因であるハイパーパラメータチューニング戦略とモデル複雑性の研究を通じて,持続可能性の観点からディープラーニングパイプラインのトレーニング段階について検討する。まず,ハイパーパラメータチューニングにおけるグリッド探索,ランダム探索,ベイズ最適化の有効性について検討し,ベイズ最適化が他の戦略を大きく支配することを示す。さらに,畳み込みニューラルネットワークのアーキテクチャを,畳み込み層,線形層,relu層という3つの著名な層をエネルギー消費として分析した。その結果,畳み込み層は高いマージンで計算コストが最も高いことがわかった。さらに,エネルギー空調モデルに対する精度の低下を観察する。トレーニングの全体的なエネルギー消費量は、ネットワークの複雑さを減らすことで半減できる。結論として,ディープラーニングモデルのトレーニングにおける革新的かつ有望なエネルギ効率のプラクティスを強調する。グリーンAIの適用を拡大するために,エネルギー効率と精度のトレードオフを考慮し,ディープラーニングモデルの設計の転換を提唱する。 Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of Green AI to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline's energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.	翻訳日:2023-03-27 14:28:39 公開日:2023-03-24
# スターク効果下における量子相関の保存と強化 Preservation and enhancement of quantum correlations under Stark effect ( http://arxiv.org/abs/2303.14030v1 ) ライセンス: Link先を確認	Nitish Kumar Chandra, Rajiuddin Sk and Prasanta K. Panigrahi	(参考訳) 本研究では,バーズ距離エンタングルメント,トレース距離不協和,および2つの二層原子の局所量子不確かさを正確に表現することにより,量子相関のダイナミクスを解析する。ここでは、2光子遷移が中間的な仮想状態を通して媒介され、各原子はスタークシフト効果の存在下で0温度で散逸性貯水池に分離結合される。我々は,この原子系の動力学を環境の2つの異なる初期条件について検討した。第一のケースでは、環境の状態が基底状態であると仮定し、他方のケースでは、状態が第一励起状態であると仮定した。第2の初期条件は、第1の初期条件におけるスタークシフトパラメータの1つとは対照的に、両方のスタークシフトパラメータが果たす役割を示すため重要である。その結果, マルコフ貯水池と非マルコフ貯水池のいずれにおいても, スタークシフト効果の存在下では, 量子相関は長期間持続できることがわかった。非マルコフ貯水池における効果は、スタークシフトパラメータの非常に小さな値であっても、マルコフ貯水池よりも顕著である。相関測度のうち、局所的な量子不確実性のみが突然の変化現象、すなわち相関測度の崩壊速度の急激な変化を伴うことが観察された。量子相関の保存は,量子情報処理における最適性能を達成する上で不可欠である。 We analyze the dynamics of quantum correlations by obtaining the exact expression of Bures distance entanglement, trace distance discord, and local quantum uncertainty of two two-level atoms. Here, the atoms undergo two-photon transitions mediated through an intermediate virtual state where each atom is separately coupled to a dissipative reservoir at zero temperature in the presence of the Stark shift effect. We have investigated the dynamics of this atomic system for two different initial conditions of the environment. In the first case, we have assumed the environment's state to be in ground state and in the other case, we have assumed the state to be in first excited state. The second initial condition is significant as it shows the role played by both the Stark shift parameters in contrast to only one of the Stark shift parameters for the first initial condition. Our results demonstrate that quantum correlations can be sustained for an extended period in the presence of Stark shift effect in the case of both Markovian and non-Markovian reservoirs. The effect in the non-Markovian reservoir is more prominent than the Markovian reservoir, even for a very small value of the Stark shift parameter. We observe that among the correlation measures considered, only local quantum uncertainty is accompanied by a sudden change phenomenon, i.e., an abrupt change in the decay rate of a correlation measure. Our findings are significant as preserving quantum correlations is one of the essential aspects in attaining optimum performance in quantum information tasks.	翻訳日:2023-03-27 14:22:22 公開日:2023-03-24
# PENTACET データ -- 2300万のコンテキストコードコメントと50万のSATDコメント PENTACET data -- 23 Million Contextual Code Comments and 500,000 SATD comments ( http://arxiv.org/abs/2303.14029v1 ) ライセンス: Link先を確認	Murali Sridharan, Leevi Rantala, Mika M\"antyl\"a	(参考訳) 多くのSATD研究は、SATD検出に「TODO」や「FIXME」のような明示的なSATD特徴を利用している。より詳しく見てみると、SATDの研究は、文脈データ(ソースコードコンテキストの先行と継承)なしで、単純なSATD('Easy to Find')コードコメントを使用する。この作業はpentacet(または5cデータセット)データを通じてこのギャップに対処する。 PENTACETは、コントリビュータ毎のCurated Contextual Code Commentsと、最も広範なSATDデータである。 9,096のオープンソースソフトウェアJavaプロジェクトと合計4億3500万LOCをマイニングしています。結果は、各コメントのソースコードコンテキストに先行して続く2300万のコードコメントと、"Easy to Find"と"Hard to Find"のSATDを含む50,000以上のコメントからなるデータセットである。我々は、PENTACETデータが人工知能技術を用いてSATDの研究をさらに進めると考えている。 Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD features such as 'TODO' and 'FIXME' for SATD detection. A closer look reveals several SATD research uses simple SATD ('Easy to Find') code comments without the contextual data (preceding and succeeding source code context). This work addresses this gap through PENTACET (or 5C dataset) data. PENTACET is a large Curated Contextual Code Comments per Contributor and the most extensive SATD data. We mine 9,096 Open Source Software Java projects with a total of 435 million LOC. The outcome is a dataset with 23 million code comments, preceding and succeeding source code context for each comment, and more than 500,000 comments labeled as SATD, including both 'Easy to Find' and 'Hard to Find' SATD. We believe PENTACET data will further SATD research using Artificial Intelligence techniques.	翻訳日:2023-03-27 14:22:00 公開日:2023-03-24
# Poincar\'e ResNet Poincar\'e ResNet ( http://arxiv.org/abs/2303.14027v1 ) ライセンス: Link先を確認	Max van Spengler, Erwin Berkhout, Pascal Mettes	(参考訳) 本稿では,双曲空間のPoincar\'e球モデルで完全に動作するエンドツーエンド残差ネットワークを提案する。双曲学習は近年、視覚的理解に大きな可能性を示しているが、現在はディープネットワークの最後尾層でのみ実施されている。すべての視覚的表現は、標準ユークリッドネットワークを通じて学習される。本稿では,視覚データの双曲表現をピクセルレベルから直接学習する方法を検討する。我々は,Poincar\'e 2Dコンボリューションから,Poincar\'e残コネクションまで,有名な残留ネットワークの双曲的対向であるPoincar\'e ResNetを提案する。畳み込みネットワークを完全に双曲空間で訓練するための3つの障害を特定し,それぞれに解を提案する。 (i)現在の双曲的ネットワークの初期化は原点に崩壊し、より深いネットワークでの適用性が制限される。多くの層にまたがって標準を保存するアイデンティティベースの初期化を提供する。 (II)残差ネットワークは高額なFr'echet平均計算を双曲空間で行うバッチ正規化に大きく依存する。 poincar\'e 中間点バッチ正規化を高速かつ均等に有効な代替として導入する。 3) Poincar\'e 層における多くの中間処理により,ディープラーニングライブラリの計算グラフが爆発的に爆発し,深層双曲ネットワークのトレーニング能力が制限されることがわかった。我々は、管理可能な計算グラフを維持するために、コア双曲演算を手動で逆向きに導出する。 This paper introduces an end-to-end residual network that operates entirely on the Poincar\'e ball model of hyperbolic space. Hyperbolic learning has recently shown great potential for visual understanding, but is currently only performed in the penultimate layer(s) of deep networks. All visual representations are still learned through standard Euclidean networks. In this paper we investigate how to learn hyperbolic representations of visual data directly from the pixel-level. We propose Poincar\'e ResNet, a hyperbolic counterpart of the celebrated residual network, starting from Poincar\'e 2D convolutions up to Poincar\'e residual connections. We identify three roadblocks for training convolutional networks entirely in hyperbolic space and propose a solution for each: (i) Current hyperbolic network initializations collapse to the origin, limiting their applicability in deeper networks. We provide an identity-based initialization that preserves norms over many layers. (ii) Residual networks rely heavily on batch normalization, which comes with expensive Fr\'echet mean calculations in hyperbolic space. We introduce Poincar\'e midpoint batch normalization as a faster and equally effective alternative. (iii) Due to the many intermediate operations in Poincar\'e layers, we lastly find that the computation graphs of deep learning libraries blow up, limiting our ability to train on deep hyperbolic networks. We provide manual backward derivations of core hyperbolic operations to maintain manageable computation graphs.	翻訳日:2023-03-27 14:21:44 公開日:2023-03-24
# CF-Font:Few-shot Font生成のためのコンテンツ融合 CF-Font: Content Fusion for Few-shot Font Generation ( http://arxiv.org/abs/2303.14017v1 ) ライセンス: Link先を確認	Chi Wang, Min Zhou, Tiezheng Ge, Yuning Jiang, Hujun Bao, Weiwei Xu	(参考訳) コンテンツとスタイルの切り離しは、少数ショットフォント生成を実現する効果的な方法である。ソースドメイン内のフォントイメージのスタイルを、ターゲットドメイン内のいくつかの参照イメージで定義されたスタイルに転送することができる。しかし、代表フォントで抽出されたコンテンツ機能は最適ではないかもしれない。そこで本研究では,基本フォントのコンテンツ特徴によって定義された線形空間にコンテンツ特徴を投影するコンテンツ融合モジュール(cfm)を提案する。また,isr(lightweightly style-vectorfinement)戦略により,参照画像のスタイル表現ベクトルを最適化する手法を提案する。さらに、文字画像の1次元投影を確率分布として扱い、2つの分布間の距離を再構成損失(すなわち投影文字損失、pcl)として利用する。 L2またはL1再構成損失と比較して、分布距離は文字のグローバルな形状により多くの注意を払う。我々は,6.5k文字の300フォントのデータセットを用いて評価を行った。実験結果から,本手法が既存の最先端フォント生成手法を大差で上回ることを確認した。ソースコードはhttps://github.com/wangchi95/CF-Font.orgにある。 Content and style disentanglement is an effective way to achieve few-shot font generation. It allows to transfer the style of the font image in a source domain to the style defined with a few reference images in a target domain. However, the content feature extracted using a representative font might not be optimal. In light of this, we propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts, which can take the variation of content features caused by different fonts into consideration. Our method also allows to optimize the style representation vector of reference images through a lightweight iterative style-vector refinement (ISR) strategy. Moreover, we treat the 1D projection of a character image as a probability distribution and leverage the distance between two distributions as the reconstruction loss (namely projected character loss, PCL). Compared to L2 or L1 reconstruction loss, the distribution distance pays more attention to the global shape of characters. We have evaluated our method on a dataset of 300 fonts with 6.5k characters each. Experimental results verify that our method outperforms existing state-of-the-art few-shot font generation methods by a large margin. The source code can be found at https://github.com/wangchi95/CF-Font.	翻訳日:2023-03-27 14:21:19 公開日:2023-03-24
# SPEC:低リソース抽象要約のための概要選好分解 SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization ( http://arxiv.org/abs/2303.14011v1 ) ライセンス: Link先を確認	Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai	(参考訳) 神経抽象的要約は広く研究され、大規模コーパスで大きな成功を収めた。しかし、データアノテーションのかなりのコストは、低リソース環境下での学習戦略の必要性を動機付けている。本稿では,ごく少数の例で要約学習の問題点を考察し,改善のための対応手法を提案する。第一に、典型的な転送学習手法は、プリテキストタスクにおけるデータ特性や学習目標に影響を受けやすい。そこで,事前訓練された言語モデルに基づいて,ソースコーパスからターゲットコーパスへ数発の学習プロセスを伝達するメタラーニングフレームワークを提案する。第2に、事前の方法は、内容や好みを分解することなく、トレーニング例から学習する。したがって、生成された要約はトレーニングセットの選好バイアス、特に低リソース設定によって制約される可能性がある。そこで本研究では,パラメータ変調を通じて学習中の内容と嗜好を分解し,推論中の嗜好を制御できるようにする。第三に、対象のアプリケーションがある場合、要求される選好を特定することは、観察を通して選好を引き出すのが難しいため、自明ではない可能性がある。そこで本研究では,適切な好みを自動的に推定し,少数の学習例から対応する要約候補を生成する新しい復号法を提案する。実験により, ROUGE-1/2/Lを10時間, 100時間で平均改善した6種類のコーパスにおいて, 30.11%/33.95%/27.51%, 26.74%/31.14%/24.48%の精度向上が得られた。 Neural abstractive summarization has been widely studied and achieved great success with large-scale corpora. However, the considerable cost of annotating data motivates the need for learning strategies under low-resource settings. In this paper, we investigate the problems of learning summarizers with only few examples and propose corresponding methods for improvements. First, typical transfer learning methods are prone to be affected by data properties and learning objectives in the pretext tasks. Therefore, based on pretrained language models, we further present a meta learning framework to transfer few-shot learning processes from source corpora to the target corpus. Second, previous methods learn from training examples without decomposing the content and preference. The generated summaries could therefore be constrained by the preference bias in the training set, especially under low-resource settings. As such, we propose decomposing the contents and preferences during learning through the parameter modulation, which enables control over preferences during inference. Third, given a target application, specifying required preferences could be non-trivial because the preferences may be difficult to derive through observations. Therefore, we propose a novel decoding method to automatically estimate suitable preferences and generate corresponding summary candidates from the few training examples. Extensive experiments demonstrate that our methods achieve state-of-the-art performance on six diverse corpora with 30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on ROUGE-1/2/L under 10- and 100-example settings.	翻訳日:2023-03-27 14:21:04 公開日:2023-03-24
# 低温ハイブリッド無線/量子コヒーレントネットワーク・イン・パッケージによるスケーラブルマルチチップ量子アーキテクチャ Scalable multi-chip quantum architectures enabled by cryogenic hybrid wireless/quantum-coherent network-in-package ( http://arxiv.org/abs/2303.14008v1 ) ライセンス: Link先を確認	Eduard Alarc\'on, Sergi Abadal, Fabio Sebastiano, Massoud Babaie, Eduardo Charbon, Peter Haring Bol\'ivar, Maurizio Palesi, Elena Blokhina, Dirk Leipold, Bogdan Staszewski, Artur Garcia-S\'aez, Carmen G. Almudever	(参考訳) 量子コンピュータのスケールアップという大きな課題は、フルスタックアーキテクチャの観点を必要とする。本稿では,分散量子コア(Qcore)を量子コヒーレントな量子ビット状態伝達リンクで相互接続し,統合された無線接続でオーケストレーションする,次世代のスケーラブル量子コンピューティングアーキテクチャの展望を示す。 The grand challenge of scaling up quantum computers requires a full-stack architectural standpoint. In this position paper, we will present the vision of a new generation of scalable quantum computing architectures featuring distributed quantum cores (Qcores) interconnected via quantum-coherent qubit state transfer links and orchestrated via an integrated wireless interconnect.	翻訳日:2023-03-27 14:20:42 公開日:2023-03-24
# チーム・イン・ザ・ループ」によるハイリスクAIの組織的監視 'Team-in-the-loop' organisational oversight of high-stakes AI ( http://arxiv.org/abs/2303.14007v1 ) ライセンス: Link先を確認	Deborah Morgan, Youmna Hashem, Vincent J. Straub, Jonathan Bright	(参考訳) 監視は、意思決定が個人的および集団的な影響をもたらす高リスクの公共部門aiアプリケーションにおいて不可欠であると正しく認識されている。公共部門におけるaiの監視メカニズムの形式に関する最近の多くの考え方は、人間の意思決定者が「ループ内」であり、エラーや潜在的な危害を防ぐために介入できるという考えに起因している。しかし、多くの公共セクターの文脈では、個人ではなく専門家チームによって意思決定の運用上の監督が行われる。デプロイされたaiシステムを既存の運用チームの監視プロセスに統合する方法は、まだ大きな注目を集めていない。我々は、制度分析を通じて、臨床意思決定の事前監視に対するAIの影響を探ることで、このギャップに対処する。既存の監視は専門家のトレーニング要件に埋もれており、重要な情報を引き出すための説明と質問に大きく依存しています。専門的な身体と責任のメカニズムは、監視のさらなるレバーとしても機能する。これらの監視の次元は、AIシステムによって影響を受け、再構成される可能性がある。そこで我々は,公共部門展開におけるai導入に必要なシステムレベルの分析を概念化するために,より広い範囲の「チーム・イン・ザ・ループ」のレンズを提案する。 Oversight is rightly recognised as vital within high-stakes public sector AI applications, where decisions can have profound individual and collective impacts. Much current thinking regarding forms of oversight mechanisms for AI within the public sector revolves around the idea of human decision makers being 'in-the-loop' and thus being able to intervene to prevent errors and potential harm. However, in a number of high-stakes public sector contexts, operational oversight of decisions is made by expert teams rather than individuals. The ways in which deployed AI systems can be integrated into these existing operational team oversight processes has yet to attract much attention. We address this gap by exploring the impacts of AI upon pre-existing oversight of clinical decision-making through institutional analysis. We find that existing oversight is nested within professional training requirements and relies heavily upon explanation and questioning to elicit vital information. Professional bodies and liability mechanisms also act as additional levers of oversight. These dimensions of oversight are impacted, and potentially reconfigured, by AI systems. We therefore suggest a broader lens of 'team-in-the-loop' to conceptualise the system-level analysis required for adoption of AI within high-stakes public sector deployment.	翻訳日:2023-03-27 14:20:34 公開日:2023-03-24
# ASTRA-sim2.0:大規模モデル学習のための階層型ネットワークと分散システムの構築 ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale ( http://arxiv.org/abs/2303.14006v1 ) ライセンス: Link先を確認	William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna	(参考訳) ディープラーニングモデルと入力データが前例のない速度でスケールしているため、モデルを適合させ、トレーニングスループットを向上させるために、分散トレーニングプラットフォームに移行することは避けられない。ウエハスケールノード、多次元ネットワークトポロジ、分散メモリシステム、並列化戦略といった最先端のアプローチと技術は、新興の分散トレーニングシステムに積極的に採用されている。これにより、分散トレーニングの複雑なSW/HW共同設計スタックが実現され、設計空間探索のためのモデリング/シミュレーションインフラストラクチャが必要とされる。本稿では,オープンソースのASTRA-simインフラストラクチャを拡張し,最先端の分散トレーニングモデルとプラットフォームをモデル化する機能を備える。具体的には i)ASTRA-simがグラフベースのトレーニングループ実装を通じて任意のモデル並列化戦略をサポートできるようにする。 (ii)対象システムを大規模にシミュレーション可能な解析性能推定を伴うパラメータ可能な多次元不均質トポロジー生成基盤を実装した。 (iii)ネットワーク内集団通信と分散メモリシステムの正確なモデリングを支援するために,メモリシステムのモデリングを強化する。このような機能により、新興の分散モデルとプラットフォームをターゲットにした包括的なケーススタディを実行します。このインフラストラクチャは、システム設計者が複雑な共同設計スタックを素早く横断し、分散トレーニングプラットフォームを大規模に設計およびデプロイする際に意味のある洞察を与える。 As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emerging distributed training systems. This results in a complex SW/HW co-design stack of distributed training, necessitating a modeling/simulation infrastructure for design-space exploration. In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms. More specifically, (i) we enable ASTRA-sim to support arbitrary model parallelization strategies via a graph-based training-loop implementation, (ii) we implement a parameterizable multi-dimensional heterogeneous topology generation infrastructure with analytical performance estimates enabling simulating target systems at scale, and (iii) we enhance the memory system modeling to support accurate modeling of in-network collective communication and disaggregated memory systems. With such capabilities, we run comprehensive case studies targeting emerging distributed models and platforms. This infrastructure lets system designers swiftly traverse the complex co-design stack and give meaningful insights when designing and deploying distributed training platforms at scale.	翻訳日:2023-03-27 14:20:13 公開日:2023-03-24
# 人間とオブジェクトのインタラクション分類のためのカテゴリクエリ学習 Category Query Learning for Human-Object Interaction Classification ( http://arxiv.org/abs/2303.14005v1 ) ライセンス: Link先を確認	Chi Xie, Fangao Zeng, Yue Hu, Shuang Liang and Yichen Wei	(参考訳) 従来のhoi法と異なり,人間-対象の特徴の学習に焦点を合わせ,カテゴリ問合せ学習と呼ばれる新しい補完的アプローチを提案する。このようなクエリは、インタラクションカテゴリに明示的に関連付けられ、トランスフォーマーデコーダを介してイメージ固有のカテゴリ表現に変換され、補助的なイメージレベルの分類タスクを介して学習される。このアイデアは、初期のマルチラベル画像分類法に動機づけられているが、挑戦的な人間と対象のインタラクション分類タスクに初めて適用される。私たちの方法は単純で汎用的で効果的です。 3つのHOIベースラインで検証され、2つのベンチマークで新たな最先端結果が得られる。 Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted to image specific category representation via a transformer decoder, and learnt via an auxiliary image-level classification task. This idea is motivated by an earlier multi-label image classification method, but is for the first time applied for the challenging human-object interaction classification task. Our method is simple, general and effective. It is validated on three representative HOI baselines and achieves new state-of-the-art results on two benchmarks.	翻訳日:2023-03-27 14:19:49 公開日:2023-03-24
# 顔形態形成攻撃の脆弱性 : 外観的・同一性双生児を事例として Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins ( http://arxiv.org/abs/2303.14004v1 ) ライセンス: Link先を確認	Raghavendra Ramachandra, Sushma Venkatesh, Gaurav Jaswal, Guoqiang Li	(参考訳) 顔の変形攻撃は、特に自動境界制御シナリオにおいて潜在的な脅威として浮上している。モーフィング攻撃により、複数の個人が自動境界制御ゲートを使用して国境を越えるのに使用できる旅行文書を使用できる。モーフィング攻撃の可能性は、データ主体(共犯者および悪意あるアクター)の選択に依存する。本研究は、顔形態形成の発生源として、見かけ上の類似性および同一の双生児について検討する。本稿では,顔認識システム(frs)の脆弱性を,見た目と同一の双晶画像にベンチマークする体系的な研究を行う。そこで我々は16対の同一の双子と見た目の類似データを用いた顔形態データセットを構築した。ランドマークベースの手法により、見た目と同一の双生児からの画像が生成される。外観と同一の双子の攻撃ポテンシャルをベンチマークするために、広範囲な実験が行われた。さらに、通常の顔形態変化による脆弱性の影響に関する洞察を、見た目や同一の顔形態変化と比較して提供する実験も行われている。 Face morphing attacks have emerged as a potential threat, particularly in automatic border control scenarios. Morphing attacks permit more than one individual to use travel documents that can be used to cross borders using automatic border control gates. The potential for morphing attacks depends on the selection of data subjects (accomplice and malicious actors). This work investigates lookalike and identical twins as the source of face morphing generation. We present a systematic study on benchmarking the vulnerability of Face Recognition Systems (FRS) to lookalike and identical twin morphing images. Therefore, we constructed new face morphing datasets using 16 pairs of identical twin and lookalike data subjects. Morphing images from lookalike and identical twins are generated using a landmark-based method. Extensive experiments are carried out to benchmark the attack potential of lookalike and identical twins. Furthermore, experiments are designed to provide insights into the impact of vulnerability with normal face morphing compared with lookalike and identical twin face morphing.	翻訳日:2023-03-27 14:19:39 公開日:2023-03-24
# dance the quantum waltz: 3量子ビットゲートを4レベルアーキテクチャにコンパイルする Dancing the Quantum Waltz: Compiling Three-Qubit Gates on Four Level Architectures ( http://arxiv.org/abs/2303.14069v1 ) ライセンス: Link先を確認	Andrew Litteken (1), Lennart Maximilian Siefert (1), Jason D. Chadwick (1), Natalia Nottingham (1), Tanay Roy (1 and 2), Ziqian Li (1 and 3), David Schuster (1 and 3), Frederic T. Chong (1), Jonathan M. Baker (4) ((1) University of Chicago, (2) Fermilab, (3) Stanford University, (4) Duke University)	(参考訳) 超伝導量子デバイスは量子計算の最先端技術であるが、いくつかの課題を抱えている。ゲートエラー、コヒーレンスエラー、接続性の欠如はいずれも、信頼性の低い結果に寄与する。特に接続制限は、3量子ゲートを1または2量子ゲートに分解する必要があるゲートセットを強制する。これにより、実行すべき2ビットゲートの数を大幅に増加させる。しかし、多くの量子デバイスはより高いエネルギーレベルにアクセスできる。 qubitの$\|0\rangle$と$\|1\rangle$の抽象化を$\|2\rangle$と$\|3\rangle$の状態にアクセスできるququartに拡張できます。これにより、2つの量子ビットを1つの量子ビットにエンコードすることができ、2つの隣接する量子ビットから4つの完全に接続された量子ビットへの物理ユニット間の仮想接続が増加する。この接続方式により、2つの物理デバイス間でより効率的に3量子ビットゲートを実行できる。最適制御により合成された数個の3量子ゲートの直接対パルス実装を行い、最適制御により設計された4レベルキュートゲートの最初の実験実験で、4レベルデバイスにアクセス可能な超伝導アーキテクチャ上に3量子ゲートをコンパイルする。我々は、トッフォリゲートの実行に一時的に高レベル状態を使用し、常に高レベル状態を使用して量子回路のフィダリティを改善する戦略を示す。これらの手法は,中間符号化を用いて回路サイズを2倍に向上し,完全符号化クォートコンパイルでは3倍に向上する。 Superconducting quantum devices are a leading technology for quantum computation, but they suffer from several challenges. Gate errors, coherence errors and a lack of connectivity all contribute to low fidelity results. In particular, connectivity restrictions enforce a gate set that requires three-qubit gates to be decomposed into one- or two-qubit gates. This substantially increases the number of two-qubit gates that need to be executed. However, many quantum devices have access to higher energy levels. We can expand the qubit abstraction of $\|0\rangle$ and $\|1\rangle$ to a ququart which has access to the $\|2\rangle$ and $\|3\rangle$ state, but with shorter coherence times. This allows for two qubits to be encoded in one ququart, enabling increased virtual connectivity between physical units from two adjacent qubits to four fully connected qubits. This connectivity scheme allows us to more efficiently execute three-qubit gates natively between two physical devices. We present direct-to-pulse implementations of several three-qubit gates, synthesized via optimal control, for compilation of three-qubit gates onto a superconducting-based architecture with access to four-level devices with the first experimental demonstration of four-level ququart gates designed through optimal control. We demonstrate strategies that temporarily use higher level states to perform Toffoli gates and always use higher level states to improve fidelities for quantum circuits. We find that these methods improve expected fidelities with increases of 2x across circuit sizes using intermediate encoding, and increases of 3x for fully-encoded ququart compilation.	翻訳日:2023-03-27 14:13:10 公開日:2023-03-24
# 自動識別システム(AIS)データを用いた船舶軌道協会のためのCNN-LSTMアーキテクチャ A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data ( http://arxiv.org/abs/2303.14068v1 ) ライセンス: Link先を確認	Md Asif Bin Syed and Imtiaz Ahmed	(参考訳) 海上監視では、通常と異常な船の動きパターンを区別することは、潜在的脅威をタイムリーに識別するために重要である。一旦検出されると、必要な介入が発生するまでこれらの容器を監視し追跡することが重要である。これを実現するために、血管の地質パラメータと運動パラメータを含む逐次観測を行い、それらをそれぞれの容器に関連付けるトラックアソシエーションアルゴリズムを用いる。これらのシーケンシャルな観測に内在する空間的および時間的変化は、従来のマルチオブジェクト追跡アルゴリズムにとって、アソシエーションタスクを困難にする。さらに、重複するトラックと欠落するデータの存在は、軌跡追跡プロセスをさらに複雑にする可能性がある。これらの課題に対処するため、本研究では、このトラッキングタスクを多変量時系列問題としてアプローチし、トラックアソシエーションのための1D CNN-LSTMアーキテクチャベースのフレームワークを導入する。この特別なニューラルネットワークアーキテクチャは、シーケンシャルな観測の間に存在する空間パターンと長期的な時間的関係をキャプチャすることができる。訓練の過程で、基礎となる各船の軌道を学習し、構築する。訓練を終えると、提案されたフレームワークは、自動識別システム(ais)によって収集された船舶の位置と動きデータを入力として取り、最も可能性の高い船舶軌道をリアルタイムで出力として返す。提案手法の有効性を評価するため,特定地域を航行する327隻の船舶の観測データを含むAISデータセットを用いた。提案するフレームワークの性能は,精度,精度,リコール,F1スコアなどの標準的なパフォーマンス指標を用いて測定する。他の競合ニューラルネットワークアーキテクチャと比較すると、このアプローチは優れたトラッキング性能を示している。 In marine surveillance, distinguishing between normal and anomalous vessel movement patterns is critical for identifying potential threats in a timely manner. Once detected, it is important to monitor and track these vessels until a necessary intervention occurs. To achieve this, track association algorithms are used, which take sequential observations comprising geological and motion parameters of the vessels and associate them with respective vessels. The spatial and temporal variations inherent in these sequential observations make the association task challenging for traditional multi-object tracking algorithms. Additionally, the presence of overlapping tracks and missing data can further complicate the trajectory tracking process. To address these challenges, in this study, we approach this tracking task as a multivariate time series problem and introduce a 1D CNN-LSTM architecture-based framework for track association. This special neural network architecture can capture the spatial patterns as well as the long-term temporal relations that exist among the sequential observations. During the training process, it learns and builds the trajectory for each of these underlying vessels. Once trained, the proposed framework takes the marine vessel's location and motion data collected through the Automatic Identification System (AIS) as input and returns the most likely vessel track as output in real-time. To evaluate the performance of our approach, we utilize an AIS dataset containing observations from 327 vessels traveling in a specific geographic region. We measure the performance of our proposed framework using standard performance metrics such as accuracy, precision, recall, and F1 score. When compared with other competitive neural network architectures our approach demonstrates a superior tracking performance.	翻訳日:2023-03-27 14:12:38 公開日:2023-03-24
# SEAL: ロボット行動認識のための意味的フレーム実行と局所化 SEAL: Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions ( http://arxiv.org/abs/2303.14067v1 ) ライセンス: Link先を確認	Cameron Kisailus, Daksh Narang, Matthew Shannon, Odest Chadwicke Jenkins	(参考訳) ロボット移動操作の最近の進歩は、制約された作業空間から大規模な人的環境へのロボットの動作環境の拡大を刺激している。これらの空間でのタスクを効果的に完了させるためには、ロボットは単純なピック・アンド・プレースを超えて、様々な手当を認識、推論し、実行しなくてはならない。セマンティックフレームの概念は、アクション中心の認識、タスクレベル推論、アクションレベル実行、言語との統合に適するロボットアクションの説得力のある表現を提供する。セマンティックフレーム(Semantic frame)は、言語学コミュニティの産物であり、必要な要素、事前および後条件、そして動詞句によって誘発される行動を実行するのに必要な一連のロボットアクションを定義する。本研究では,ロボット操作動作における意味フレーム表現を拡張し,図形モデルとしてロボット動作知覚のための意味フレーム実行と局所化の問題を導入する。 SEAL問題に対して、ロボットに与えられた行動の場所として、有限のセマンティックフレームに対する信念を維持するための非パラメトリックセマンティックフレームマッピング(SeFM)アルゴリズムについて述べる。 GPT-3のような言語モデルはSEALの定式化でカバーされる汎用タスク実行に対応できないことを示し、SEFMはロボットにビルスケール環境での動作に必要な効率的な探索戦略と長期記憶を提供する。 Recent advances in robotic mobile manipulation have spurred the expansion of the operating environment for robots from constrained workspaces to large-scale, human environments. In order to effectively complete tasks in these spaces, robots must be able to perceive, reason, and execute over a diversity of affordances, well beyond simple pick-and-place. We posit the notion of semantic frames provides a compelling representation for robot actions that is amenable to action-focused perception, task-level reasoning, action-level execution, and integration with language. Semantic frames, a product of the linguistics community, define the necessary elements, pre- and post- conditions, and a set of sequential robot actions necessary to successfully execute an action evoked by a verb phrase. In this work, we extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot. We show that language models such as GPT-3 are insufficient to address generalized task execution covered by the SEAL formulation and SeFM provides robots with efficient search strategies and long term memory needed when operating in building-scale environments.	翻訳日:2023-03-27 14:12:11 公開日:2023-03-24
# 協調型マルチエージェントタスクにおける学習報酬マシン Learning Reward Machines in Cooperative Multi-Agent Tasks ( http://arxiv.org/abs/2303.14061v1 ) ライセンス: Link先を確認	Leo Ardon, Daniel Furelos-Blanco, Alessandra Russo	(参考訳) 本稿では,協調的なタスク分解と,サブタスクの構造を符号化した報酬機械(rms)の学習を組み合わせたマルチエージェント強化学習(marl)への新しいアプローチを提案する。提案手法は, 部分的に観察可能な環境における報酬の非マルコフ的性質に対処し, 協調作業の完了に必要な学習方針の解釈性を向上させる。各サブタスクに関連付けられたrmは分散的に学習され、各エージェントの振る舞いを導くのに使用される。これにより、協調的マルチエージェント問題の複雑さが減少し、より効果的な学習が可能となる。以上の結果から,本手法はMARL,特に大規模状態空間と複数エージェントを持つ複雑な環境での今後の研究の方向性として期待できると考えられる。 This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.	翻訳日:2023-03-27 14:11:46 公開日:2023-03-24
# 量子ビットのキラル基底と可積分スピン鎖への応用 Chiral bases for qubits and their applications to integrable spin chains ( http://arxiv.org/abs/2303.14056v1 ) ライセンス: Link先を確認	Vladislav Popkov, Xin Zhang and Andreas Kl\"umper	(参考訳) 我々は,横スピンヘリックスとキンクからなる新しいクビット基底を提案する。このキラル基底は、通常の計算基底とは対照的に、異なる位相的性質を持ち、非自明な位相を持つ量子状態を記述するのに便利である。適切なパラメータを選択することで、横スピン成分を含む演算子、例えば$\sigma_n^x$または$\sigma_n^y$はキラルベースで対角化され、横スピン成分に焦点を当てた問題の研究が容易になる。本研究では,近年の寒冷原子実験で研究されている$xx$モデルにおけるスピンヘリックスの逆スピンダイナミクスについて述べる。横磁化の時間依存性に関する決定式を導出する。 We propose a novel qubit basis composed of transverse spin helices with kinks. This chiral basis, in contrast to the usual computational basis, possesses distinct topological properties and is convenient for describing quantum states with nontrivial topology. By choosing appropriate parameters, operators containing transverse spin components, such as $\sigma_n^x$ or $\sigma_n^y$, become diagonal in the chiral basis, facilitating the study of problems focused on transverse spin components. As an application, we describe the transverse spin dynamics of a spin helix in the $XX$ model, which has been studied in recent cold atom experiments. We derive a determinantal formula for the temporal dependence of the transverse magnetization.	翻訳日:2023-03-27 14:11:35 公開日:2023-03-24
# ロボット支援療法におけるコミュニケーションの複雑化 Communicating Complex Decisions in Robot-Assisted Therapy ( http://arxiv.org/abs/2303.14054v1 ) ライセンス: Link先を確認	Carl Bettosi, Kefan Chen, Ryan Shah, Lynne Baillie	(参考訳) 社会支援ロボット(SAR)は、意思決定インストラクターやモチベーション・コンパニオンとして治療シナリオにおいて有望な可能性を示してきた。人間と人間のセラピーでは、専門家は透明性を促進し信頼を構築するために意思決定の背後にある思考プロセスを伝える。研究は、より複雑な意思決定モデルをこれらのロボットに組み込むことを目指しており、sarがその決定を説明する能力はますます困難になっている。複雑なSAR意思決定者の最新の例を示す。人間の治療における透過的なコミュニケーションの重要性から、SARはそのようなコンポーネントを設計に組み込むべきであると論じる。この話題に関する議論を刺激するために,研究者に一連の設計考察を提案する。 Socially Assistive Robots (SARs) have shown promising potential in therapeutic scenarios as decision-making instructors or motivational companions. In human-human therapy, experts often communicate the thought process behind the decisions they make to promote transparency and build trust. As research aims to incorporate more complex decision-making models into these robots to drive better interaction, the ability for the SAR to explain its decisions becomes an increasing challenge. We present the latest examples of complex SAR decision-makers. We argue that, based on the importance of transparent communication in human-human therapy, SARs should incorporate such components into their design. To stimulate discussion around this topic, we present a set of design considerations for researchers.	翻訳日:2023-03-27 14:11:20 公開日:2023-03-24
# musicface:音楽駆動型表現型歌唱顔合成 MusicFace: Music-driven Expressive Singing Face Synthesis ( http://arxiv.org/abs/2303.14044v1 ) ライセンス: Link先を確認	Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, and Ming Zeng	(参考訳) 音楽信号による鮮明でリアルな歌声の表情を合成することは、いまだに興味深く難しい問題である。本稿では,唇の自然な動き,表情,頭部のポーズ,眼の状態といった課題について述べる。人間の声と背景音楽の混合情報を音楽音声の共通信号に結合させることにより,課題に取り組むための分離・融合戦略を考案する。まず入力された音楽音声を人間の音声ストリームとバックグラウンド音楽ストリームに分解する。 2つのストリームの入力信号と表情のダイナミクス、頭部の動き、眼の状態との暗黙的かつ複雑な相関関係から、それらの関係を注意スキームでモデル化し、2つのストリームの効果をシームレスに融合させる。さらに、生成した結果の表現性を向上するために、頭部運動生成を速度生成と方向生成に分解し、眼状態生成を短時間点眼生成と長時間点眼生成に分解してモデル化することを提案する。また,この課題の訓練と評価を支援する新たな歌唱表情データセットを構築し,今後の課題への取り組みを促進する。広範囲にわたる実験とユーザ研究により,提案手法は定性的,定量的に実写的な歌唱表情を合成できることがわかった。 It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into human voice stream and background music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressiveness of the generated results, we propose to decompose head movements generation into speed generation and direction generation, and decompose eye states generation into the short-time eye blinking generation and the long-time eye closing generation to model them separately. We also build a novel SingingFace Dataset to support the training and evaluation of this task, and to facilitate future works on this topic. Extensive experiments and user study show that our proposed method is capable of synthesizing vivid singing face, which is better than state-of-the-art methods qualitatively and quantitatively.	翻訳日:2023-03-27 14:11:09 公開日:2023-03-24
# クラスインクリメンタル学習のためのクラスインクリメンタルエクエンプティブ圧縮 Class-Incremental Exemplar Compression for Class-Incremental Learning ( http://arxiv.org/abs/2303.14042v1 ) ライセンス: Link先を確認	Zilin Luo, Yaoyao Liu, Bernt Schiele, Qianru Sun	(参考訳) exemplar-based class-incremental learning (cil) では、新しいクラスのすべてのサンプルでモデルを微調整するが、インクリメンタルなフェーズ毎に古いクラスの少数のexemplarを微調整する。本稿では、この「ファウショット」制限を、非識別画素をダウンサンプリングし、メモリ内の「多くの」圧縮例を節約することで、単純な、驚くほど効果的なアイデアに基づいて破る。手動アノテーションを必要とせず,クラスアクティベーションマップ (cam) から識別画素に0-1マスクを生成することで,この圧縮を実現する。 CAMの2つの難しさを明確に解消するために,CIMと呼ばれる適応マスク生成モデルを提案する。 1)CAMのヒートマップを任意の閾値で0-1マスクに変換すると、全メモリが固定されるにつれて、識別画素のカバレッジと指数の量とのトレードオフにつながる。 2) CILの動的環境において特に明らかな,異なるオブジェクトクラスに対して最適なしきい値が変化する。 CIMモデルを従来のCILモデルに代えてバイレベル最適化問題により最適化する。我々は、Food-101, ImageNet-100, ImageNet-1000などの高分解能CILベンチマークの広範な実験を行い、CIMによる圧縮された例を用いて、10相 ImageNet-1000のFOSTERよりも4.8ポイント高い新しい最先端CIL精度を実現できることを示す。私たちのコードはhttps://github.com/xfflzl/CIM-CILで利用可能です。 Exemplar-based class-incremental learning (CIL) finetunes the model with all samples of new classes but few-shot exemplars of old classes in each incremental phase, where the "few-shot" abides by the limited memory budget. In this paper, we break this "few-shot" limit based on a simple yet surprisingly effective idea: compressing exemplars by downsampling non-discriminative pixels and saving "many-shot" compressed exemplars in the memory. Without needing any manual annotation, we achieve this compression by generating 0-1 masks on discriminative pixels from class activation maps (CAM). We propose an adaptive mask generation model called class-incremental masking (CIM) to explicitly resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to 0-1 masks with an arbitrary threshold leads to a trade-off between the coverage on discriminative pixels and the quantity of exemplars, as the total memory is fixed; and 2) optimal thresholds vary for different object classes, which is particularly obvious in the dynamic environment of CIL. We optimize the CIM model alternatively with the conventional CIL model through a bilevel optimization problem. We conduct extensive experiments on high-resolution CIL benchmarks including Food-101, ImageNet-100, and ImageNet-1000, and show that using the compressed exemplars by CIM can achieve a new state-of-the-art CIL accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase ImageNet-1000. Our code is available at https://github.com/xfflzl/CIM-CIL.	翻訳日:2023-03-27 14:10:46 公開日:2023-03-24
# トポロジカルデータ解析のためのオイラー特性ツール Euler Characteristic Tools For Topological Data Analysis ( http://arxiv.org/abs/2303.14040v1 ) ライセンス: Link先を確認	Olympio Hacquard, Vadim Lebovici	(参考訳) 本稿では,トポロジカルデータ解析におけるオイラー特性技術について述べる。データから構築された単純複体族のオイラー特性をポイントワイドに計算すると、いわゆるオイラー特性プロファイルが生まれる。この単純なディスクリプタは、非常に低い計算コストで教師付きタスクの最先端のパフォーマンスを実現する。信号解析に着想を得て,オイラー特性プロファイルのハイブリッド変換を計算する。これらの積分変換はオイラー特性とルベーグ積分を混合し、トポロジカル信号の高効率な圧縮機を提供する。その結果、教師なしの設定で顕著なパフォーマンスを示した。定性面では、オイラープロファイルとそれらのハイブリッド変換によって捉えられた位相的および幾何学的情報に関する多くのヒューリスティックスを提供する。最後に,これらの記述子に対する安定性とランダム設定における漸近的保証を証明した。 In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.	翻訳日:2023-03-27 14:10:15 公開日:2023-03-24
# 自由言語モデルによる視覚言語事前学習の高速化 Accelerating Vision-Language Pretraining with Free Language Modeling ( http://arxiv.org/abs/2303.14038v1 ) ライセンス: Link先を確認	Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo	(参考訳) state of the arts in vision-language pretraining (vlp)は模範的なパフォーマンスを達成しているが、特に大規模webデータセットでは、収束が遅く、トレーニング時間が長いことによる高いトレーニングコストに苦しむ。トレーニング効率にとって重要な障害は、マスク言語モデリング(MLM)における絡み合った予測率(復元トークンの割合)と腐敗率(劣化トークンの割合)であり、予測損失から除外された出力トークンの大部分のコストで適切な腐敗率を達成することである。本稿では,vlpの収束を早めるために,自由言語モデリング(flm)という新たな事前学習タスクを提案する。 flmは、腐敗率との結び付きから予測レートを解放し、各トークンを予測できるように腐敗スパンをカスタマイズすることに成功した。 FLMでトレーニングされたモデルは、双方向のコンテキストをより柔軟に活用することで、同じGPU時間からより良く、より速く学習することができる。広汎な実験により、FLMはMLMベースの手法と比較して2.5倍の事前学習時間短縮を実現し、視覚言語理解と生成の両タスクにおける競合性能を維持した。コードはhttps://github.com/TencentARC/FLM.comで公開される。 The state of the arts in vision-language pretraining (VLP) achieves exemplary performance but suffers from high training costs resulting from slow convergence and long training time, especially on large-scale web datasets. An essential obstacle to training efficiency lies in the entangled prediction rate (percentage of tokens for reconstruction) and corruption rate (percentage of corrupted tokens) in masked language modeling (MLM), that is, a proper corruption rate is achieved at the cost of a large portion of output tokens being excluded from prediction loss. To accelerate the convergence of VLP, we propose a new pretraining task, namely, free language modeling (FLM), that enables a 100% prediction rate with arbitrary corruption rates. FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted. FLM-trained models are encouraged to learn better and faster given the same GPU time by exploiting bidirectional contexts more flexibly. Extensive experiments show FLM could achieve an impressive 2.5x pretraining time reduction in comparison to the MLM-based methods, while keeping competitive performance on both vision-language understanding and generation tasks. Code will be public at https://github.com/TencentARC/FLM.	翻訳日:2023-03-27 14:10:02 公開日:2023-03-24
# CTセグメンテーションラベリングの最適化 Optimizing the Procedure of CT Segmentation Labeling ( http://arxiv.org/abs/2303.14089v1 ) ライセンス: Link先を確認	Yaroslav Zharov, Tilo Baumbach, Vincent Heuveline	(参考訳) Computed Tomographyでは、機械学習は自動データ処理によく使用される。しかし、モデル複雑性の増大には、巨大なボリュームデータセットが伴うため、モデルトレーニングのコストが増大する。モデルアーキテクチャとトレーニングアルゴリズムの進歩によってこれを緩和するほとんどの作業とは異なり、アノテーションの手順とそのモデル性能への影響について検討する。モデルトレーニングのために収集された優れたデータセットの主な利点は、ラベルの品質、多様性、完全性である。これらのメリットがopen medical ctデータセットを用いたモデルパフォーマンスに与える影響を比較し,ラベリングの初期段階における多様性よりも品質が重要であり,その多様性は完全性よりも重要である,と結論づけた。この結論と追加実験に基づき, モデル性能を最大化しながらラベリングに費やす労力を最小限に抑えるために, 断層画像のセグメンテーションのためのラベリング手順を提案する。 In Computed Tomography, machine learning is often used for automated data processing. However, increasing model complexity is accompanied by increasingly large volume datasets, which in turn increases the cost of model training. Unlike most work that mitigates this by advancing model architectures and training algorithms, we consider the annotation procedure and its effect on the model performance. We assume three main virtues of a good dataset collected for a model training to be label quality, diversity, and completeness. We compare the effects of those virtues on the model performance using open medical CT datasets and conclude, that quality is more important than diversity early during labeling; the diversity, in turn, is more important than completeness. Based on this conclusion and additional experiments, we propose a labeling procedure for the segmentation of tomographic images to minimize efforts spent on labeling while maximizing the model performance.	翻訳日:2023-03-27 14:04:03 公開日:2023-03-24
# OPDMulti: 複数のオブジェクトに対するオープンな部分検出 OPDMulti: Openable Part Detection for Multiple Objects ( http://arxiv.org/abs/2303.14087v1 ) ライセンス: Link先を確認	Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang	(参考訳) 開部検出は、単視点画像中の物体の開部を検出し、対応する運動パラメータを予測するタスクである。以前の研究は、全ての入力画像が単一のオープンなオブジェクトのみを含む非現実的な設定を調査した。我々は,このタスクを複数のオブジェクトを持つシーンに一般化し,実世界のシーンに基づいて対応するデータセットを作成する。次に、このより困難なシナリオに、OPDFormer:part-aware transformerアーキテクチャを使って対処します。私たちの実験では、opdformerアーキテクチャが以前の作業を大幅に上回っています。私たちが調査したより現実的なマルチオブジェクトシナリオは、将来的な仕事の機会を示しながら、すべてのメソッドで難しいままです。 Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset based on real-world scenes. We then address this more challenging scenario with OPDFormer: a part-aware transformer architecture. Our experiments show that the OPDFormer architecture significantly outperforms prior work. The more realistic multiple-object scenarios we investigated remain challenging for all methods, indicating opportunities for future work.	翻訳日:2023-03-27 14:03:50 公開日:2023-03-24
# 微分プライベート合成制御 Differentially Private Synthetic Control ( http://arxiv.org/abs/2303.14084v1 ) ライセンス: Link先を確認	Saeyoung Rho, Rachel Cummings, Vishal Misra	(参考訳) 合成制御(synthetic control)は、合成反事実データを作成することにより介入の治療効果を推定するために用いられる因果推論ツールである。このアプローチは、他の類似した観測(ドナープール)からの測定を組み合わせて、介入前のターゲットとドナープールの関係を分析することによって、対実的時系列(ターゲットユニット)を予測する。機密データやプロプライエタリデータに合成制御ツールがますます適用されるにつれて、正式なプライバシ保護が求められることが多い。本研究では,明示的な誤差境界を持つ微分プライベート合成制御のための最初のアルゴリズムを提案する。我々のアプローチは、非私的合成制御と微分プライベートな経験的リスク最小化のツールに基づいている。我々は、合成制御クエリの感度に関する上限と下限を提供し、プライベート合成制御アルゴリズムの精度に関する明示的な誤差境界を提供する。我々は,アルゴリズムがターゲットユニットの正確な予測を行い,プライバシのコストが小さいことを示す。最後に,提案アルゴリズムの性能を実証的に評価し,パラメータの多様さに好適な性能を示すとともに,ハイパーパラメータチューニングの実践者へのガイダンスを提供する。 Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.	翻訳日:2023-03-27 14:03:39 公開日:2023-03-24
# 学生教師フレームワークにおけるランダム特徴モデルのオンライン学習 Online Learning for the Random Feature Model in the Student-Teacher Framework ( http://arxiv.org/abs/2303.14083v1 ) ライセンス: Link先を確認	Roman Worschech and Bernd Rosenow	(参考訳) ディープニューラルネットワークは、重みが増加するにつれて性能が向上し、過度にパラメータ化されるような予測アルゴリズムとして広く使われている。我々は,第1層が凍結され,第2層がトレーニング可能である2層ニューラルネットワークをランダム特徴モデルと呼ぶ。学習力学のための微分方程式の集合を導出することにより、学生-教師フレームワークの文脈における過度なパラメトリゼーションを考察する。隠れた層の大きさと入力次元の任意の有限比について、学生は完全一般化できず、非零漸近一般化誤差を計算する。学生の隠れた層の大きさが入力次元よりも指数関数的に大きいときのみ、完全一般化へのアプローチが可能となる。 Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.	翻訳日:2023-03-27 14:03:19 公開日:2023-03-24
# CoLa-Diff:マルチモードMRI合成のための条件付き潜時拡散モデル CoLa-Diff: Conditional Latent Diffusion Model for Multi-Modal MRI Synthesis ( http://arxiv.org/abs/2303.14081v1 ) ライセンス: Link先を確認	Lan Jiang, Ye Mao, Xi Chen, Xiangfeng Wang, Chao Li	(参考訳) MRI合成は、臨床実践におけるMRIモダリティの欠如の課題を軽減することを約束する。拡散モデルは複雑なデータ分布と可変データ分布をモデル化して画像合成に有効な手法として登場した。しかし、ほとんどの拡散ベースのMRI合成モデルは単一のモードを使用している。元の画像領域で動作するため、メモリ集約性が高く、マルチモーダル合成では実現不可能である。さらに、MRIでは解剖学的構造を保たないことが多い。さらに、マルチモーダルMRI入力からの複数の条件のバランスは、マルチモーダル合成に不可欠である。本稿では,最初の拡散に基づく多モードMRI合成モデル,すなわち条件付き潜在拡散モデル(CoLa-Diff)を提案する。メモリ消費を低減するため,我々はCoLa-Diffを潜在空間で動作させるために設計する。本稿では,遅延空間における圧縮とノイズを解決するために,協調フィルタリングなどの新しいネットワークアーキテクチャを提案する。解剖学的構造をより良く維持するために、拡散過程を導くために密度分布の優先として脳領域マスクが導入された。さらに、マルチモーダル情報を有効に活用するためのオートウェイト適応を提案する。実験の結果、CoLa-Diffは他の最先端MRI合成法よりも優れており、マルチモーダルMRI合成の有効なツールとして機能することを約束している。 MRI synthesis promises to mitigate the challenge of missing MRI modality in clinical practice. Diffusion model has emerged as an effective technique for image synthesis by modelling complex and variable data distributions. However, most diffusion-based MRI synthesis models are using a single modality. As they operate in the original image domain, they are memory-intensive and less feasible for multi-modal synthesis. Moreover, they often fail to preserve the anatomical structure in MRI. Further, balancing the multiple conditions from multi-modal MRI inputs is crucial for multi-modal synthesis. Here, we propose the first diffusion-based multi-modality MRI synthesis model, namely Conditioned Latent Diffusion Model (CoLa-Diff). To reduce memory consumption, we design CoLa-Diff to operate in the latent space. We propose a novel network architecture, e.g., similar cooperative filtering, to solve the possible compression and noise in latent space. To better maintain the anatomical structure, brain region masks are introduced as the priors of density distributions to guide diffusion process. We further present auto-weight adaptation to employ multi-modal information effectively. Our experiments demonstrate that CoLa-Diff outperforms other state-of-the-art MRI synthesis methods, promising to serve as an effective tool for multi-modal MRI synthesis.	翻訳日:2023-03-27 14:03:07 公開日:2023-03-24
# 両世界のベスト:表データと画像データを用いたマルチモーダルコントラスト学習 Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data ( http://arxiv.org/abs/2303.14080v1 ) ライセンス: Link先を確認	Paul Hager, Martin J. Menten, Daniel Rueckert	(参考訳) 医用データセット、特にバイオバンクは、画像に加えて豊富な臨床情報を含む広範な表型データを含むことが多い。実際には、臨床医は多様性とスケールの両面でデータが少ないが、いまだにディープラーニングソリューションの展開を望んでいる。医療データセットのサイズの増加と高価なアノテーションコストに加えて、マルチモーダルで事前訓練し、一様予測できる教師なしの方法の必要性が高まっている。これらのニーズに対処するために,画像と表データを利用して非モーダルエンコーダを訓練する,自己指導型コントラスト学習フレームワークを提案する。我々のソリューションはSimCLRとSCARFという2つの主要なコントラスト学習戦略を組み合わせており、シンプルで効果的です。実験では,心mri画像と4万人の英国バイオバンク患者から120の臨床的特徴を用いて,心筋梗塞および冠動脈疾患(cad)のリスクを予測することにより,枠組みの強度を実証する。さらに,DVMカー広告データセットを用いて,自然画像へのアプローチの一般化可能性を示す。表データの高い解釈可能性を利用し,帰属実験およびアブレーション実験により,形態計測表の特徴は,大きさと形状を記述し,比較学習過程において重要度を大きくし,学習埋め込みの質を向上させることを見出した。最後に,教師付きコントラスト学習の新たな形式であるlaaf( label as a feature)を導入し,マルチモーダル事前学習中に基底真理ラベルを表型特徴として付加し,教師付きコントラストベースラインを上回った。 Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.	翻訳日:2023-03-27 14:02:50 公開日:2023-03-24
# DistractFlow: リアルディトラクションと擬似ラベルによる光学的フロー推定の改善 DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling ( http://arxiv.org/abs/2303.14078v1 ) ライセンス: Link先を確認	Jisoo Jeong, Hong Cai, Risheek Garrepalli, Fatih Porikli	(参考訳) 入力フレームに現実的な注意をそらすことにより,光フロー推定モデルのトレーニングを行うための新しいデータ拡張手法である distractflow を提案する。混合比に基づいて, 対のフレームの1つを類似領域を描写した気晴らし画像と組み合わせることにより, 自然物やシーンと相反する視覚摂動を誘発する。このようなペアを気晴らしペアと呼ぶ。直感的には、意味的に意味のある注意をそらすことによって、モデルが関連するバリエーションを学習し、挑戦的な偏差に対して堅牢性を達成することができるということです。具体的には, 初期対流と接地対流との間に計算された教師付き損失に加えて, 気をそらした対流と原対の接地対流との間に定義された第2の教師付き損失を同じ混合比で重み付けした。さらに、ラベルなしデータが利用可能であれば、擬似ラベルと相互一貫性の正規化により、自己管理設定への拡張アプローチを拡張します。元のペアとその気晴らしバージョンが与えられた場合、気晴らしペア上の推定フローを、元のペアの流れと一致させるために強制します。当社のアプローチでは,追加のアノテーションを必要とせずに,利用可能なトレーニングペア数を大幅に増やすことが可能です。モデルアーキテクチャに非依存であり、任意の光フロー推定モデルのトレーニングに適用することができる。 Sintel、KITTI、SlowFlowなど、複数のベンチマークに対する広範な評価は、DistractFlowが既存のモデルを一貫して改善し、最新技術よりも優れていることを示している。 We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames. Based on a mixing ratio, we combine one of the frames in the pair with a distractor image depicting a similar domain, which allows for inducing visual perturbations congruent with natural objects and scenes. We refer to such pairs as distracted pairs. Our intuition is that using semantically meaningful distractors enables the model to learn related variations and attain robustness against challenging deviations, compared to conventional augmentation schemes focusing only on low-level aspects and modifications. More specifically, in addition to the supervised loss computed between the estimated flow for the original pair and its ground-truth flow, we include a second supervised loss defined between the distracted pair's flow and the original pair's ground-truth flow, weighted with the same mixing ratio. Furthermore, when unlabeled data is available, we extend our augmentation approach to self-supervised settings through pseudo-labeling and cross-consistency regularization. Given an original pair and its distracted version, we enforce the estimated flow on the distracted pair to agree with the flow of the original pair. Our approach allows increasing the number of available training pairs significantly without requiring additional annotations. It is agnostic to the model architecture and can be applied to training any optical flow estimation models. Our extensive evaluations on multiple benchmarks, including Sintel, KITTI, and SlowFlow, show that DistractFlow improves existing models consistently, outperforming the latest state of the art.	翻訳日:2023-03-27 14:02:22 公開日:2023-03-24
# 適応型インスタンスワイズ・スムースティングによる対人訓練の改善 Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing ( http://arxiv.org/abs/2303.14077v1 ) ライセンス: Link先を確認	Lin Li, Michael Spratling	(参考訳) 深いニューラルネットワークは、逆の摂動によって入力が破壊され、人間の知覚できない人工ノイズによって誤った予測をすることができる。これまでのところ、敵の訓練はこのような敵の攻撃に対する最も成功した防御であった。この研究は、敵の堅牢性を高めるために敵の訓練を改善することに焦点を当てている。まず、インスタンスの観点から、敵のトレーニング中に敵の脆弱性がどのように進化するかを分析します。学習中,攻撃に対して脆弱なトレーニングサンプルのかなりの割合を犠牲にすることで,攻撃的損失の全体的な低減が達成され,その結果,データ間の攻撃的脆弱性が均一に分布することを見出した。このような「不均一な脆弱性」は、いくつかの一般的なロバストなトレーニング方法に広まり、さらに重要なことは、敵のトレーニングにおける過剰フィッティングに関連している。本研究の目的は,新たな対人訓練手法であるインスタンス適応型平滑化強化対人訓練(ISEAT)を提案することである。入力と減量の両方のランドスケープを、適応的でインスタンス固有の方法で円滑にし、高い逆の脆弱性を持つサンプルに対してより堅牢性を高める。本手法が既存の防御法よりも優れていることを示す。特に,最新のデータ拡張と半教師付き学習技術を組み合わせることで,Wide ResNet34-10では59.32%,Wide ResNet28-10では61.55%,CIFAR10では$\ell_{\infty}$-normによる攻撃に対して,最先端の堅牢性を達成している。コードはhttps://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-ATで公開されている。 Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against $\ell_{\infty}$-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data. Code is available at https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT.	翻訳日:2023-03-27 14:01:55 公開日:2023-03-24
# 画像による検索:美容品検索に便利な機能を探る Search By Image: Deeply Exploring Beneficial Features for Beauty Product Retrieval ( http://arxiv.org/abs/2303.14075v1 ) ライセンス: Link先を確認	Mingqiang Wei, Qian Sun, Haoran Xie, Dong Liang, Fu Lee Wang	(参考訳) 画像による検索は人気があるが、広範にわたる干渉により依然として難しい。一実世界の撮影画像のデータ変動(背景、ポーズ、視角、明るさ等) ii) クエリデータセットに類似した画像。本稿では,ニューラルネットワークによる美容積検索(BPR)の実用的意義について検討する。異なるタイプの画像特徴を幅広く抽出し、これらの特徴が有用かどうかという興味深い疑問を提起する。一実世界撮影画像のデータ変動を抑制すること、及び二非常によく似ているが、本質的に異なる美容製品である他の画像とを区別することにより、BPRの能力が向上する。そこで本研究では,美しい製品画像の複数の特徴(VM-Net)の組み合わせを理解するために,新しい可変アテンションニューラルネットワークを提案する。 BPRのトレーニングデータセットが公開されていないことを考えると、100万以上の画像を20K以上のカテゴリに分類した新しいデータセットを構築し、VM-Netや他の手法の一般化と干渉防止の両方を改善する。我々はvm-netとその競合製品のパフォーマンスをベンチマークデータセットperfect-500kで検証する。ソースコードとデータセットは公開時にリリースされる。 Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images, and ii) distinguish one image from others which look very similar but are intrinsically different beauty products in the dataset, therefore leading to an enhanced capability of BPR. To answer it, we present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images. Considering that there are few publicly released training datasets for BPR, we establish a new dataset with more than one million images classified into more than 20K categories to improve both the generalization and anti-interference abilities of VM-Net and other methods. We verify the performance of VM-Net and its competitors on the benchmark dataset Perfect-500K, where VM-Net shows clear improvements over the competitors in terms of MAP@7. The source code and dataset will be released upon publication.	翻訳日:2023-03-27 14:01:23 公開日:2023-03-24
# ChatDoctor:医学領域知識を用いたLLaMAモデルに基づく医用チャットモデル ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge ( http://arxiv.org/abs/2303.14070v1 ) ライセンス: Link先を確認	Li Yunxiang, Li Zihan, Zhang Kai, Dan Ruilong, Zhang You	(参考訳) ChatGPTのような一般領域における最近の大規模言語モデル(LLM)は、指示に従うことや、人間のような反応を生み出すことに顕著な成功を収めている。しかし、これらの言語モデルは医療領域で個別に注意深く学習されておらず、診断の正確性が低く、医療診断や医薬品などの適切な推奨ができない。この問題に対処するために,700以上の疾患とその症状,推奨薬,必要な医療検査を収集し,医師と患者の会話を5k以上生成した。医師と患者の会話の微調整モデルにより、これらのモデルは患者のニーズを理解し、アドバイスを提供し、様々な医療関連分野に有用な支援を提供する大きな可能性を持つ。これらの先進的な言語モデルのヘルスケアへの統合は、医療専門家と患者がコミュニケーションする方法に革命をもたらし、最終的にケアの全体的な品質と患者の結果を改善する。さらに、医療分野における対話モデルのさらなる発展を進めるために、すべてのソースコード、データセット、モデルの重み付けを開放する。さらに、このプロジェクトのトレーニングデータ、コード、重み付けは、https://github.com/Kent0n-Li/ChatDoctor.comで入手できる。 Recent large language models (LLMs) in the general domain, such as ChatGPT, have shown remarkable success in following instructions and producing human-like responses. However, such language models have not been learned individually and carefully for the medical domain, resulting in poor diagnostic accuracy and inability to give correct recommendations for medical diagnosis, medications, etc. To address this issue, we collected more than 700 diseases and their corresponding symptoms, recommended medications, and required medical tests, and then generated 5K doctor-patient conversations. By fine-tuning models of doctor-patient conversations, these models emerge with great potential to understand patients' needs, provide informed advice, and offer valuable assistance in a variety of medical-related fields. The integration of these advanced language models into healthcare can revolutionize the way healthcare professionals and patients communicate, ultimately improving the overall quality of care and patient outcomes. In addition, we will open all source code, datasets and model weights to advance the further development of dialogue models in the medical field. In addition, the training data, code, and weights of this project are available at: https://github.com/Kent0n-Li/ChatDoctor.	翻訳日:2023-03-27 14:01:04 公開日:2023-03-24
# 気象条件下におけるドメイン・インクリメンタルセマンティクスセグメンテーションにおける忘れ方原理 Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions ( http://arxiv.org/abs/2303.14115v1 ) ライセンス: Link先を確認	Tobias Kalb, J\"urgen Beyerer	(参考訳) 自動運転車のシーン認識のためのディープニューラルネットワークは、訓練されたドメインに対して優れた結果をもたらす。しかし,実世界の状況では,操作領域とその基礎となるデータ分布は変化する。特に悪天候条件は、トレーニング中にデータが得られない場合、モデル性能を著しく低下させ、さらに、モデルが新しいドメインに段階的に適合すると、壊滅的な忘れがちとなり、以前観測された領域でパフォーマンスが大幅に低下する。破滅的な忘れを減らそうとする最近の進歩にもかかわらず、その原因と効果はいまだに不明である。そこで本研究では, 気象条件下でのドメインインクリメンタル学習において, 意味セグメンテーションモデルの表現がどう影響するかについて検討する。実験と表現分析の結果,大惨な忘れはドメイン・インクリメンタル・ラーニングにおける低レベルな特徴の変化によって主に引き起こされ,事前学習と画像拡張によるソース・ドメイン上のより一般的な特徴の学習が,その後のタスクにおける効率的な機能の再利用につながることが示唆された。これらの知見は,効果的な連続学習アルゴリズムのための一般化機能を促進する手法の重要性を強調した。 Deep neural networks for scene perception in automated vehicles achieve excellent results for the domains they were trained on. However, in real-world conditions, the domain of operation and its underlying data distribution are subject to change. Adverse weather conditions, in particular, can significantly decrease model performance when such data are not available during training.Additionally, when a model is incrementally adapted to a new domain, it suffers from catastrophic forgetting, causing a significant drop in performance on previously observed domains. Despite recent progress in reducing catastrophic forgetting, its causes and effects remain obscure. Therefore, we study how the representations of semantic segmentation models are affected during domain-incremental learning in adverse weather conditions. Our experiments and representational analyses indicate that catastrophic forgetting is primarily caused by changes to low-level features in domain-incremental learning and that learning more general features on the source domain using pre-training and image augmentations leads to efficient feature reuse in subsequent tasks, which drastically reduces catastrophic forgetting. These findings highlight the importance of methods that facilitate generalized features for effective continual learning algorithms.	翻訳日:2023-03-27 13:54:38 公開日:2023-03-24
# 物体の動き感度:イベントベースカメラのエゴモーション問題に対するバイオインスパイアソリューション Object Motion Sensitivity: A Bio-inspired Solution to the Ego-motion Problem for Event-based Cameras ( http://arxiv.org/abs/2303.14114v1 ) ライセンス: Link先を確認	Shay Snyder (1), Hunter Thompson (2), Md Abdullah-Al Kaiser (3), Gregory Schwartz (4), Akhilesh Jaiswal (3), and Maryam Parsa (4) ((1) George Mason University, (2) Georgia Institute of Technology, (3) University of Southern California, (4) Northwestern University)	(参考訳) ニューロモルフィック(イベントベースの)イメージセンサーは、人間の網膜からインスピレーションを得て、生体によく似た方法で視覚刺激を処理できる電子機器を作る。これらのセンサーは従来のRGBセンサーとは大きく異なる情報を処理する。具体的には、イベントベースイメージセンサが生成する知覚情報は、RGBセンサと比べて桁違いのスペーサーである。第1世代のニューロモルフィック画像センサであるDynamic Vision Sensor (DVS)は、光受容体と最初の網膜シナプスに制限された計算にインスパイアされている。本研究は,ニューロモルフィック画像センサの第2世代,CMOSイメージセンサ(IRIS)における統合網膜機能(Integrated Retinal Functionality in CMOS Image Sensors)の能力を強調するものである。この研究で選択される特徴は、IRISセンサーで局所的に処理されるオブジェクト運動感度(OMS)である。イベントベースカメラのエゴモーション問題を解決するためのOMSの能力について検討する。 OMS は従来の RGB や DVS と同様の効率で標準的なコンピュータビジョンタスクを実現できるが,帯域幅の大幅な削減が可能である。これにより、ワイヤレスおよびコンピューティングの電力予算が削減され、高速、堅牢、エネルギー効率、低帯域幅のリアルタイム意思決定において大きな機会が開ける。 Neuromorphic (event-based) image sensors draw inspiration from the human-retina to create an electronic device that can process visual stimuli in a way that closely resembles its biological counterpart. These sensors process information significantly different than the traditional RGB sensors. Specifically, the sensory information generated by event-based image sensors are orders of magnitude sparser compared to that of RGB sensors. The first generation of neuromorphic image sensors, Dynamic Vision Sensor (DVS), are inspired by the computations confined to the photoreceptors and the first retinal synapse. In this work, we highlight the capability of the second generation of neuromorphic image sensors, Integrated Retinal Functionality in CMOS Image Sensors (IRIS), which aims to mimic full retinal computations from photoreceptors to output of the retina (retinal ganglion cells) for targeted feature-extraction. The feature of choice in this work is Object Motion Sensitivity (OMS) that is processed locally in the IRIS sensor. We study the capability of OMS in solving the ego-motion problem of the event-based cameras. Our results show that OMS can accomplish standard computer vision tasks with similar efficiency to conventional RGB and DVS solutions but offers drastic bandwidth reduction. This cuts the wireless and computing power budgets and opens up vast opportunities in high-speed, robust, energy-efficient, and low-bandwidth real-time decision making.	翻訳日:2023-03-27 13:54:16 公開日:2023-03-24
# 離散最適化による解釈可能な異常検出 Interpretable Anomaly Detection via Discrete Optimization ( http://arxiv.org/abs/2303.14111v1 ) ライセンス: Link先を確認	Simon Lutz, Florian Wittbold, Simon Dierl, Benedikt B\"oing, Falk Howar, Barbara K\"onig, Emmanuel M\"uller, Daniel Neider	(参考訳) 異常検出は、サイバーセキュリティ、法執行、医療、詐欺保護など、多くのアプリケーションドメインにおいて不可欠である。しかし、現在のディープラーニングアプローチの意思決定は理解が難しいことで知られており、多くの場合、実践的な適用性を制限している。この制限を克服するために、シーケンシャルデータから本質的に解釈可能な異常検出器を学習するためのフレームワークを提案する。具体的には、与えられたラベルなしシーケンスの多重集合から決定論的有限オートマトン(DFA)を学ぶことを考える。この問題は計算量的に難しいことを示し,制約最適化に基づく2つの学習アルゴリズムを開発した。さらに, DFAの全体的な解釈性を改善するために, 最適化問題に対する新たな正規化手法を導入する。プロトタイプ実装を用いて,提案手法は精度とF1スコアの点で有望な結果を示す。 Anomaly detection is essential in many application domains, such as cyber security, law enforcement, medicine, and fraud protection. However, the decision-making of current deep learning approaches is notoriously hard to understand, which often limits their practical applicability. To overcome this limitation, we propose a framework for learning inherently interpretable anomaly detectors from sequential data. More specifically, we consider the task of learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled sequences. We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.	翻訳日:2023-03-27 13:53:49 公開日:2023-03-24
# エンコーダ・デコーダを用いた散水滴の形態変化の予測 Prediction of the morphological evolution of a splashing drop using an encoder-decoder ( http://arxiv.org/abs/2303.14109v1 ) ライセンス: Link先を確認	Jingzu Yee, Daichi Igarashi, Shun Miyatake, Yoshiyuki Tagawa	(参考訳) 固体表面への落下の影響は、様々な影響と応用を持つ重要な現象である。しかし、この現象の多相性は、特に落下が跳ね上がると、その形態的進化の予測に複雑を引き起こす。多くの機械学習に基づくドロップインパクト研究は物理パラメータを中心にしているが、この研究ではエンコーダデコーダを訓練し、画像データを用いてドロップ形態を予測するコンピュータビジョン戦略を用いた。ここでは、この訓練されたエンコーダデコーダが、スプラッシュや非スラッシュドロップの形態を示すビデオを生成することができることを示す。興味深いことに、これらの生成されたビデオのフレームごとに、落下の直径が実際のビデオとよく一致していることが判明した。また,スプラッシュ/ノンスプラッシュ予測の精度も高かった。これらの結果は、トレーニングされたエンコーダデコーダが、ドロップ形態を正確に表現できるビデオを生成する能力を示している。このアプローチは、実験および数値研究の高速で安価な代替手段を提供する。 The impact of a drop on a solid surface is an important phenomenon that has various implications and applications. However, the multiphase nature of this phenomenon causes complications in the prediction of its morphological evolution, especially when the drop splashes. While most machine-learning-based drop-impact studies have centred around physical parameters, this study used a computer-vision strategy by training an encoder-decoder to predict the drop morphologies using image data. Herein, we show that this trained encoder-decoder is able to successfully generate videos that show the morphologies of splashing and non-splashing drops. Remarkably, in each frame of these generated videos, the spreading diameter of the drop was found to be in good agreement with that of the actual videos. Moreover, there was also a high accuracy in splashing/non-splashing prediction. These findings demonstrate the ability of the trained encoder-decoder to generate videos that can accurately represent the drop morphologies. This approach provides a faster and cheaper alternative to experimental and numerical studies.	翻訳日:2023-03-27 13:53:36 公開日:2023-03-24
# 超伝導トランスモンプロセッサのクロストーク特性 Characterizing crosstalk of superconducting transmon processors ( http://arxiv.org/abs/2303.14103v1 ) ライセンス: Link先を確認	Andreas Ketterer, Thomas Wellens	(参考訳) 現在利用可能な量子コンピューティングハードウェアは、超伝導トランスモンアーキテクチャに基づくもので、数百キュービットのネットワークを実現する。しかし、そのような量子チップの固有のノイズとデコヒーレンス効果は、基本的なゲート演算をかなり変化させ、ターゲットの量子計算の不完全な出力をもたらす。本研究では,隣接量子ビット上で同時に実行される量子ゲート間の相関関係に現れるクロストーク効果の特性について考察する。このような相関関係の物理的起源を簡潔に説明した後、ランダム化ベンチマークプロトコルを用いて量子チップ全体のクロストーク効果の大きさを効率よく体系的に特徴付ける方法を示す。我々は,IBMが提供する実際の量子ハードウェア上で,クロストークによるゲート忠実度の変化を観測することで,導入プロトコルを実証する。最後に、得られた情報を用いて、適切なクロストーク対応ノイズモデルを考案し、ノイズ量子ハードウェアをシミュレートするより正確な手法を提案する。 Currently available quantum computing hardware based on superconducting transmon architectures realizes networks of hundreds of qubits with the possibility of controlled nearest-neighbor interactions. However, the inherent noise and decoherence effects of such quantum chips considerably alter basic gate operations and lead to imperfect outputs of the targeted quantum computations. In this work, we focus on the characterization of crosstalk effects which manifest themselves in correlations between simultaneously executed quantum gates on neighboring qubits. After a short explanation of the physical origin of such correlations, we show how to efficiently and systematically characterize the magnitude of such crosstalk effects on an entire quantum chip using the randomized benchmarking protocol. We demonstrate the introduced protocol by running it on real quantum hardware provided by IBM observing significant alterations in gate fidelities due to crosstalk. Lastly, we use the gained information in order to propose more accurate means to simulate noisy quantum hardware by devising an appropriate crosstalk-aware noise model.	翻訳日:2023-03-27 13:53:20 公開日:2023-03-24
# 分散シルエットアルゴリズム:ビッグデータによるクラスタリングの評価 Distributed Silhouette Algorithm: Evaluating Clustering on Big Data ( http://arxiv.org/abs/2303.14102v1 ) ライセンス: Link先を確認	Marco Gaido	(参考訳) ビッグデータの時代において、各アルゴリズムが持つ必要のある重要な特徴は、分散環境で効率的に並列に実行する可能性である。クラスタリングの品質を評価するための一般的なシルエット計量は、残念ながらこの性質を持たず、入力データセットのサイズに関して二次計算の複雑さを持っている。このため、クラスタリングを別途評価する必要のあるビッグデータシナリオでは、その実行が妨げられている。本稿では,このギャップを埋めるため,線形複雑性を持つシルエット計量を計算し,分散環境で並列に実行可能な最初のアルゴリズムを提案する。その実装はApache Spark MLライブラリで無料で利用できる。 In the big data era, the key feature that each algorithm needs to have is the possibility of efficiently running in parallel in a distributed environment. The popular Silhouette metric to evaluate the quality of a clustering, unfortunately, does not have this property and has a quadratic computational complexity with respect to the size of the input dataset. For this reason, its execution has been hindered in big data scenarios, where clustering had to be evaluated otherwise. To fill this gap, in this paper we introduce the first algorithm that computes the Silhouette metric with linear complexity and can easily execute in parallel in a distributed environment. Its implementation is freely available in the Apache Spark ML library.	翻訳日:2023-03-27 13:53:04 公開日:2023-03-24
# Nuisance-extended Information Bottleneckによる複数信頼性対策の強化 Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck ( http://arxiv.org/abs/2303.14096v1 ) ライセンス: Link先を確認	Jongheon Jeong, Sihyun Yu, Hankook Lee, Jinwoo Shin	(参考訳) トレーニングデータが制限される現実のシナリオでは、データ内の多くの予測信号は、データ取得のバイアス(つまり一般化できない)からではなく、モデルがそのような(いわゆる)「ショートカット」信号に共適応することを防ぐことができない。このような障害モードを回避すべく、トレーニングにおけるより広い種類の摂動をカバーするために、相互情報制約の下で敵の脅威モデルを考える。これにより、標準情報ボトルネックを拡張して、ニュアサンス情報をモデル化するモチベーションが生まれます。提案する畳み込み型とトランスフォーマー型の両方のアーキテクチャに関するハイブリッド識別生成型トレーニングを容易にするために,目標を実現するためのオートエンコーダベースのトレーニングと,実用的なエンコーダ設計を提案する。実験結果から,提案手法は学習した表現の堅牢性(ドメイン固有の知識を使わずに顕著な)を向上させることが示唆された。例えば、我々のモデルは、aurocで78.4\% \rightarrow 87.2\%$の新規性検出において、最近の挑戦的オブジェクトベンチマークの最先端を前進させ、腐敗、背景、(証明された)敵対的ロバスト性の向上を同時に享受できる。コードはhttps://github.com/jh-jeong/nuisance_ibで入手できる。 In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition (i.e., less generalizable), so that one cannot prevent a model from co-adapting on such (so-called) "shortcut" signals: this makes the model fragile in various distribution shifts. To bypass such failure modes, we consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. This motivates us to extend the standard information bottleneck to additionally model the nuisance information. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training concerning both convolutional- and Transformer-based architectures. Our experimental results show that the proposed scheme improves robustness of learned representations (remarkably without using any domain-specific knowledge), with respect to multiple challenging reliability measures. For example, our model could advance the state-of-the-art on a recent challenging OBJECTS benchmark in novelty detection by $78.4\% \rightarrow 87.2\%$ in AUROC, while simultaneously enjoying improved corruption, background and (certified) adversarial robustness. Code is available at https://github.com/jh-jeong/nuisance_ib.	翻訳日:2023-03-27 13:52:53 公開日:2023-03-24
# パノVPR:パノラマを横切るスライディングウインドウによる一様視界から等角視界認識を目指して PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View ( http://arxiv.org/abs/2303.14095v1 ) ライセンス: Link先を確認	Ze Shi, Hao Shi, Kailun Yang, Zhe Yin, Yining Lin, Kaiwei Wang	(参考訳) 近年、視覚位置認識は自動運転とロボット工学の重要な技術として注目を集めている。現在の主流のアプローチは、視点ビュー検索視点ビュー(P2P)パラダイムまたは等方形画像検索等方形画像(E2E)パラダイムを使用する。しかし、自然で実践的なアイデアは、ユーザーはクエリパースペクティブの画像を取得し、地図プロバイダからパノラマデータベースイメージで取得するために、消費者級のピンホールカメラしか持っていないということである。そこで我々はPanoVPR (P2E) を提案する。PanoVPRは、平板上をスライドするウィンドウと、ウィンドウ間の特徴記述子を比較することで、ハードクロップによる特徴トランケーションを解消する、スライドウインドウに基づく視界-等角形(P2E)視覚位置認識フレームワークである。さらに、この統一フレームワークは、p2p(perspective-to-perspective)メソッドで使用されるネットワーク構造を変更せずに直接転送することができる。トレーニングと評価を容易にするため,pits250kからpits250k-P2Eデータセットを抽出し,有望な結果を得るとともに,モバイルロボットプラットフォームによる現実シナリオにおけるP2Eデータセットも構築する。コードとデータセットはhttps://github.com/zafirshi/PanoVPR.comで公開される。 Visual place recognition has received increasing attention in recent years as a key technology in autonomous driving and robotics. The current mainstream approaches use either the perspective view retrieval perspective view (P2P) paradigm or the equirectangular image retrieval equirectangular image (E2E) paradigm. However, a natural and practical idea is that users only have consumer-grade pinhole cameras to obtain query perspective images and retrieve them in panoramic database images from map providers. To this end, we propose PanoVPR, a sliding-window-based perspective-to-equirectangular (P2E) visual place recognition framework, which eliminates feature truncation caused by hard cropping by sliding windows over the whole equirectangular image and computing and comparing feature descriptors between windows. In addition, this unified framework allows for directly transferring the network structure used in perspective-to-perspective (P2P) methods without modification. To facilitate training and evaluation, we derive the pitts250k-P2E dataset from the pitts250k and achieve promising results, and we also establish a P2E dataset in a real-world scenario by a mobile robot platform, which we refer to YQ360. Code and datasets will be made available at https://github.com/zafirshi/PanoVPR.	翻訳日:2023-03-27 13:52:25 公開日:2023-03-24
# NeuFace:マルチビュー画像からのリアルな3Dニューラルフェイスレンダリング NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images ( http://arxiv.org/abs/2303.14092v1 ) ライセンス: Link先を確認	Mingwu Zheng, Haiyu Zhang, Hongyu Yang, Di Huang	(参考訳) マルチビュー画像からのリアルな顔レンダリングは、様々なコンピュータビジョンやグラフィックアプリケーションに有用である。しかし, 顔の複雑な空間的な反射特性と幾何学的特徴から, 顔の3次元表現を忠実かつ効率的に復元することは依然として困難である。本稿では,ニューラルレンダリング技術を用いて,正確で物理的に意味のある3次元表現を学習する,新しい3次元顔レンダリングモデルneufaceを提案する。自然に神経BRDFを物理的にベースとしたレンダリングに組み込んで、高度な顔形状と外観の手がかりを協調的に捉える。具体的には、近距離BRDF統合と、簡単な新しい低ランク前処理を導入し、曖昧さを効果的に低減し、顔面BRDFの性能を高める。大規模な実験は、人間の顔レンダリングにおけるNeuFaceの優位性を実証し、共通オブジェクトへの適切な一般化能力を示した。 Realistic face rendering from multi-view images is beneficial to various computer vision and graphics applications. Due to the complex spatially-varying reflectance properties and geometry characteristics of faces, however, it remains challenging to recover 3D facial representations both faithfully and efficiently in the current studies. This paper presents a novel 3D face rendering model, namely NeuFace, to learn accurate and physically-meaningful underlying 3D representations by neural rendering techniques. It naturally incorporates the neural BRDFs into physically based rendering, capturing sophisticated facial geometry and appearance clues in a collaborative manner. Specifically, we introduce an approximated BRDF integration and a simple yet new low-rank prior, which effectively lower the ambiguities and boost the performance of the facial BRDFs. Extensive experiments demonstrate the superiority of NeuFace in human face rendering, along with a decent generalization ability to common objects.	翻訳日:2023-03-27 13:51:55 公開日:2023-03-24
# 暗黒物質からの流体力学シミュレーションのレクリエーションにおける物理インフォームニューラルネットワーク Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter ( http://arxiv.org/abs/2303.14090v1 ) ライセンス: Link先を確認	Zhenyu Dai, Ben Moews, Ricardo Vilalta, Romeel Dave	(参考訳) 物理インフォームドニューラルネットワークは、統計的パターンとドメイン知識を組み合わせた予測モデルを構築するためのコヒーレントなフレームワークとして登場した。基本的な考え方は、可能な解の空間を制約するために既知の関係を持つ最適化損失関数を強化することである。流体力学シミュレーションは現代の宇宙論の中核であり、必要な計算は費用も時間もかかる。同時に、ダークマターの比較的高速なシミュレーションには少ないリソースを必要とするため、バリオンを研究の活発な領域として扱うための機械学習アルゴリズムが出現し、水力学シミュレーションで見られる散乱を再現することは、現在進行中の課題である。本稿では,バリオン変換効率に関する理論をモデル損失関数に注入し,ニューラルネットワークアーキテクチャの進歩と物理的制約を組み合わせたバリオン塗装への物理インフォームニューラルネットワークの最初の応用について述べる。また,散乱再生を強制するKulback-Leibler分散に基づく時間的予測比較も導入する。宇宙シミュレーションのシムバ集合に対するバリオニクス特性の完全な集合を同時に抽出することにより, ダークマターハロ特性に基づくバリオニクス予測の精度の向上, 基本的金属性関係の回復, ターゲットシミュレーションの分布を辿る散乱体の回収を実証した。 Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.	翻訳日:2023-03-27 13:51:28 公開日:2023-03-24
# minddiffuser: 意味的および構造的拡散を伴うヒト脳活動からの画像再構成制御 MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion ( http://arxiv.org/abs/2303.14139v1 ) ライセンス: Link先を確認	Yizhuo Lu, Changde Du, Dianpeng Wang and Huiguang He	(参考訳) 機能的磁気共鳴イメージング(fmri)による視覚刺激の再構成は有意義かつ困難な課題である。従来の研究は、いくつかの自然画像の輪郭や大きさなど、原像に似た構造で復元に成功した。しかし、これらの再構成には明確な意味情報がなく、識別が難しい。近年、多くの研究は、より強力な生成能力を持つマルチモーダル事前学習モデルを用いて、本来のものと意味的に類似した画像を再構成している。しかし、これらの画像は位置や方向などの制御不能な構造情報を持っている。両課題を同時に解決するために,安定拡散を利用した2段階画像再構成モデルMindDiffuserを提案する。ステージ1では、VQ-VAE潜在表現とfMRIからデコードされたCLIPテキスト埋め込みを安定拡散のイメージ・ツー・イメージプロセスに配置し、セマンティックおよび構造情報を含む予備画像を生成する。ステージ2では、fMRIからデコードされた低レベルCLIP視覚特徴を監視情報として利用し、バックプロパゲーションによりステージ1の2つの特徴を継続的に調整し、構造情報を整列させる。定性的および定量的解析の結果から,提案モデルが自然景観データセット(NSD)の再構成結果において,現在の最先端モデルを上回っていることが示唆された。さらに, アブレーション実験の結果から, モデルの各成分が画像再構成に有効であることが示唆された。 Reconstructing visual stimuli from measured functional magnetic resonance imaging (fMRI) has been a meaningful and challenging task. Previous studies have successfully achieved reconstructions with structures similar to the original images, such as the outlines and size of some natural images. However, these reconstructions lack explicit semantic information and are difficult to discern. In recent years, many studies have utilized multi-modal pre-trained models with stronger generative capabilities to reconstruct images that are semantically similar to the original ones. However, these images have uncontrollable structural information such as position and orientation. To address both of the aforementioned issues simultaneously, we propose a two-stage image reconstruction model called MindDiffuser, utilizing Stable Diffusion. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion, which yields a preliminary image that contains semantic and structural information. In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information, and continually adjust the two features in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our proposed model has surpassed the current state-of-the-art models in terms of reconstruction results on Natural Scenes Dataset (NSD). Furthermore, the results of ablation experiments indicate that each component of our model is effective for image reconstruction.	翻訳日:2023-03-27 13:45:31 公開日:2023-03-24
# 医用画像解析における敵攻撃と防御:方法と応用 Adversarial Attack and Defense for Medical Image Analysis: Methods and Applications ( http://arxiv.org/abs/2303.14133v1 ) ライセンス: Link先を確認	Junhao Dong, Junxi Chen, Xiaohua Xie, Jianhuang Lai, and Hao Chen	(参考訳) 深層学習技術は, コンピュータ支援画像解析において優れた性能を示しているが, 相変わらず脆弱であり, 臨床における誤診の可能性を秘めている。近年では, 深部医療診断システムにおいて, これらの逆境に対する防御が顕著に進歩している。本論では, 新たな分類法を応用シナリオとして, 対人攻撃の進展と医療画像解析の防御に関する総合的な調査を行う。また,医療画像解析のための異なる種類の敵攻撃と防御方法のための統一的理論的枠組みも提供する。公平な比較のために,様々なシナリオ下での対人訓練により得られた対人的堅牢な医療診断モデルのための新しいベンチマークを構築した。我々の知る限りでは、逆向きに堅牢な医療診断モデルの徹底的な評価を提供する最初の調査論文である。質的,定量的な結果を分析することで,医用画像解析システムにおける敵攻撃と防御の課題を解明し,今後の研究の方向性を明らかにした。 Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in deep medical diagnosis systems. In this exposition, we present a comprehensive survey on recent advances in adversarial attack and defense for medical image analysis with a novel taxonomy in terms of the application scenario. We also provide a unified theoretical framework for different types of adversarial attack and defense methods for medical image analysis. For a fair comparison, we establish a new benchmark for adversarially robust medical diagnosis models obtained by adversarial training under various scenarios. To the best of our knowledge, this is the first survey paper that provides a thorough evaluation of adversarially robust medical diagnosis models. By analyzing qualitative and quantitative results, we conclude this survey with a detailed discussion of current challenges for adversarial attack and defense in medical image analysis systems to shed light on future research directions.	翻訳日:2023-03-27 13:45:07 公開日:2023-03-24
# 量子鎖の準粒子状態におけるシャノンエントロピー Shannon entropy in quasiparticle states of quantum chains ( http://arxiv.org/abs/2303.14132v1 ) ライセンス: Link先を確認	Wentao Ye and Jiaju Zhang	(参考訳) 本稿では,自由ボソニック鎖とフェルミイオン鎖の準粒子励起状態とスピン1/2xxx鎖の強磁性相において,全系とそのサブシステムのシャノンエントロピーとサブシステムシャノン相互情報について検討する。我々は, 単粒子および二重粒子状態に着目し, スケーリング限界における自由ボゾン鎖とフェルミオン鎖の様々な解析式を導出する。これらの公式は、ある条件下でのxxx鎖のマグノン励起状態にも適用できる。絡み合うエントロピーとは異なり、シャノンエントロピーは2つの準粒子が運動量差が大きい場合に分離しない。さらに、大きな運動量差極限では、準粒子の半古典的図では説明できない量子スピン鎖の普遍的な結果が得られる。 In this paper, we investigate the Shannon entropy of the total system and its subsystems, as well as the subsystem Shannon mutual information, in quasiparticle excited states of free bosonic and fermionic chains and the ferromagnetic phase of the spin-1/2 XXX chain. Our focus is on single-particle and double-particle states, and we derive various analytical formulas for free bosonic and fermionic chains in the scaling limit. These formulas are also applicable to magnon excited states in the XXX chain under certain conditions. We discover that, unlike entanglement entropy, Shannon entropy does not separate when two quasiparticles have a large momentum difference. Moreover, in the large momentum difference limit, we obtain universal results for quantum spin chains that cannot be explained by a semiclassical picture of quasiparticles.	翻訳日:2023-03-27 13:44:51 公開日:2023-03-24
# 貧乏の罪 The crime of being poor ( http://arxiv.org/abs/2303.14128v1 ) ライセンス: Link先を確認	Georgina Curto, Svetlana Kiritchenko, Isar Nejadgholi and Kathleen C. Fraser	(参考訳) 貧困の犯罪は、最も脆弱な人々に対する集団的偏見として広く非難されている。 ngoや国際機関は、貧困者が自らの状況で非難され、社会の富裕層よりも犯罪に関連し、貧乏であるために単に犯罪を犯すことが多いと主張している。貧困と全体犯罪率に相関する証拠は文献に見出されていないが、本稿は両概念を関連づけた集団的信念の証拠を提供する。この報告は、Twitterの自然言語処理(NLP)技術を用いて、富裕層と比較して犯罪と貧困層を関連付ける社会的バイアスを測定する。この論文は、8つの異なる英語圏のパネルで犯罪-貧困バイアスのレベルを定量化している。犯罪と貧困の関連性における地域差は、文学が財産犯罪と相関する不平等や失業のレベルによって正当化できない。地理的に異なる地域における犯罪・貧困バイアスの観測率の変動は、文化的要因や、特定の国における機会と社会的移動の平等を過大評価する傾向に影響される可能性がある。これらの結果は政策形成に影響を及ぼし、貧困軽減のための新たな研究の道を開き、貧困だけでなく社会全体にも焦点をあてる。貧困者に対する集団的偏見に基づいて行動することで、貧困削減政策の承認や、影響を受けた人々の尊厳の回復が促進される。 The criminalization of poverty has been widely denounced as a collective bias against the most vulnerable. NGOs and international organizations claim that the poor are blamed for their situation, are more often associated with criminal offenses than the wealthy strata of society and even incur criminal offenses simply as a result of being poor. While no evidence has been found in the literature that correlates poverty and overall criminality rates, this paper offers evidence of a collective belief that associates both concepts. This brief report measures the societal bias that correlates criminality with the poor, as compared to the rich, by using Natural Language Processing (NLP) techniques in Twitter. The paper quantifies the level of crime-poverty bias in a panel of eight different English-speaking countries. The regional differences in the association between crime and poverty cannot be justified based on different levels of inequality or unemployment, which the literature correlates to property crimes. The variation in the observed rates of crime-poverty bias for different geographic locations could be influenced by cultural factors and the tendency to overestimate the equality of opportunities and social mobility in specific countries. These results have consequences for policy-making and open a new path of research for poverty mitigation with the focus not only on the poor but on society as a whole. Acting on the collective bias against the poor would facilitate the approval of poverty reduction policies, as well as the restoration of the dignity of the persons affected.	翻訳日:2023-03-27 13:44:36 公開日:2023-03-24
# CIFAKE:AI生成合成画像の分類と説明可能な識別 CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images ( http://arxiv.org/abs/2303.14126v1 ) ライセンス: Link先を確認	Jordan J. Bird, Ahmad Lotfi	(参考訳) 近年の合成データの技術進歩により、人間が実際の写真とai(artificial intelligence)生成画像の違いを区別できないほど高品質な画像が生成されるようになった。本論文は,データの信頼性と認証の必要性を考慮し,コンピュータビジョンによるai画像認識能力を向上させることを目的とする。最初は、既に利用可能なcifar-10データセットの10のクラスと、実際の写真と比較してコントラストのあるイメージセットを提供する潜在拡散を反映する合成データセットが生成される。このモデルは、水中のフォトリアリスティック反射のような複雑な視覚特性を生成することができる。写真が本物かAIによって生成されるかに関して、バイナリ分類問題として存在する2つのデータセット。そこで本研究では,畳み込みニューラルネットワーク(CNN)を用いて,画像をリアルとフェイクの2つのカテゴリに分類する。ハイパーパラメータチューニングと36個のネットワークトポロジのトレーニングの後、最適なアプローチは92.98%の精度で画像を正しく分類することができた。最後に,グラデーションクラスアクティベーションマッピングによる説明可能なaiを実装し,画像内のどの特徴が分類に有用かを検討する。解釈は、特に、実際の実体自体が分類に有用な情報を持っていないことに注目し、画像の背景にある小さな視覚的欠陥に焦点を当てている。 CIFAKEデータセットと呼ばれるこの研究のために設計された完全なデータセットは、将来の研究のために研究コミュニティに公開されている。 Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.	翻訳日:2023-03-27 13:44:12 公開日:2023-03-24
# 多様なビデオのためのスケーラブルなニューラル表現に向けて Towards Scalable Neural Representation for Diverse Videos ( http://arxiv.org/abs/2303.14124v1 ) ライセンス: Link先を確認	Bo He, Xitong Yang, Hanyu Wang, Zuxuan Wu, Hao Chen, Shuaiyi Huang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava	(参考訳) Inlicit Neural representations (INR)は、3Dシーンや画像の表現に注目が集まり、最近ビデオのエンコード(例えば、NeRV、E-NeRV)に応用されている。有望な結果を達成する一方で、既存のINRベースの手法は、少数のショートビデオ(UVGデータセットの7つの5秒ビデオなど)を冗長なビジュアルコンテンツで符号化することに限定され、個々のビデオフレームを独立して適合させ、多数の多様なビデオに対して効率よく拡張できないモデル設計につながる。本稿では,多彩な視覚コンテンツを含む長大な映像を符号化する,より実用的なセットアップのためのニューラル表現の開発に着目する。まず、動画を小さなサブセットに分割し、別々のモデルでエンコードする代わりに、長く多様なビデオを統一されたモデルでエンコードすることで、より良い圧縮結果が得られることを示す。そこで本研究では,多様な映像をエンコードするニューラル表現フレームワークD-NeRVを提案する。 (i)映像情報からクリップ特有の視覚コンテンツを分離すること。 (ii)暗黙のニューラルネットワークに時間的推論を導入すること、 (iii)中間出力としてタスク指向の流れを用い、空間的冗長性を低減すること。我々の新しいモデルは、ビデオ圧縮タスクにおけるUCF101およびUVGデータセット上のNERVおよび従来のビデオ圧縮技術を大きく上回っている。さらに、効率的なデータローダとして使用する場合、同じ圧縮比でUCF101データセット上のアクション認識タスクにおいて、D-NeRVはNeRVよりも3%-10%高い精度を達成する。 Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images, and have been recently applied to encode videos (e.g., NeRV, E-NeRV). While achieving promising results, existing INR-based methods are limited to encoding a handful of short videos (e.g., seven 5-second videos in the UVG dataset) with redundant visual content, leading to a model design that fits individual video frames independently and is not efficiently scalable to a large number of diverse videos. This paper focuses on developing neural representations for a more practical setup -- encoding long and/or a large number of videos with diverse visual content. We first show that instead of dividing videos into small subsets and encoding them with separate models, encoding long and diverse videos jointly with a unified model achieves better compression results. Based on this observation, we propose D-NeRV, a novel neural representation framework designed to encode diverse videos by (i) decoupling clip-specific visual content from motion information, (ii) introducing temporal reasoning into the implicit neural network, and (iii) employing the task-oriented flow as intermediate output to reduce spatial redundancies. Our new model largely surpasses NeRV and traditional video compression techniques on UCF101 and UVG datasets on the video compression task. Moreover, when used as an efficient data-loader, D-NeRV achieves 3%-10% higher accuracy than NeRV on action recognition tasks on the UCF101 dataset under the same compression ratios.	翻訳日:2023-03-27 13:43:48 公開日:2023-03-24
# Few-Shot画像認識のための意味プロンプト Semantic Prompt for Few-Shot Image Recognition ( http://arxiv.org/abs/2303.14123v1 ) ライセンス: Link先を確認	Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu Tan	(参考訳) 新しいクラスを認識するためにいくつかの例が提供されているだけで、ほとんどショット学習は難しい問題である。いくつかの最近の研究は、セマンティックプロトタイプとビジュアルプロトタイプを組み合わせることで、稀なサンプルの問題に対処するために、クラス名のテキスト埋め込みのような追加のセマンティック情報を利用する。しかし、これらの手法は、稀なサポートサンプルから得られた視覚的特徴に悩まされ、限られた利益をもたらす。本稿では,単発学習のための新しい意味的プロンプト(sp)手法を提案する。セマンティクス情報を利用した分類器の修正に代えて,視覚特徴抽出ネットワークを適応的にチューニングするための提案としてセマンティクス情報を活用することを検討する。具体的には,特徴抽出器に意味的プロンプトを挿入する2つの補完機構を設計する。一つは意味的プロンプトと,自己アテンションによる空間的次元に沿ったパッチ埋め込みの相互作用を可能にすること,もうひとつはチャネル次元に沿って変換された意味的プロンプトで視覚的特徴を補うことである。これらの2つのメカニズムを組み合わせることで、特徴抽出器はクラス固有の特徴によりよい対応能力を示し、少数のサポートサンプルでより一般的なイメージ表現を得ることができる。 4つのデータセットに関する広範な実験を通じて、提案手法は有望な結果を達成し、1ショットの学習精度を平均3.67%向上させる。 Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.	翻訳日:2023-03-27 13:43:24 公開日:2023-03-24
# 分子オブザーバのフォールトトレラント量子計算 Fault-tolerant quantum computation of molecular observables ( http://arxiv.org/abs/2303.14118v1 ) ライセンス: Link先を確認	Mark Steudtner, Sam Morley-Short, William Pol, Sukin Sim, Cristian L. Cortes, Matthias Loipersberger, Robert M. Parrish, Matthias Degroote, Nikolaj Moll, Raffaele Santagati, Michael Streif	(参考訳) 過去30年間で、量子コンピュータを用いて分子ハミルトニアンの基底状態エネルギーを推定するコストが大幅に削減された。しかし,多くの産業用途において重要な,他の観測対象の観測対象の期待値の推定には,比較的注意が払われていない。本研究では,システムの任意の固有状態に対する任意の可観測値の期待値を推定するために適用可能な,新しい期待値推定(eve)量子アルゴリズムを提案する。特に、標準量子位相推定に基づく std-EVE と量子信号処理(QSP)技術を用いた QSP-EVE の2つの変種を考える。両変種について厳密な誤差解析を行い、QSPEVEの個別位相因子数を最小化する。これらの誤差分析により、様々な分子系と観測可能な領域にわたって、std-EVEとQSP-EVEの双方に対して、定数要素の量子リソース推定を作成できる。検討したシステムでは,QSP-EVEは最大3桁のゲート数を減少させ,std-EVEに比べて最大25%のビット幅を減少させる。第1世代のフォールトトレラント量子コンピュータでは、推定資源数はまだ高すぎるが、予測値推定と最新のQSPベースの技術の両方の適用において、我々の推定値が最初のものである。 Over the past three decades significant reductions have been made to the cost of estimating ground-state energies of molecular Hamiltonians with quantum computers. However, comparatively little attention has been paid to estimating the expectation values of other observables with respect to said ground states, which is important for many industrial applications. In this work we present a novel expectation value estimation (EVE) quantum algorithm which can be applied to estimate the expectation values of arbitrary observables with respect to any of the system's eigenstates. In particular, we consider two variants of EVE: std-EVE, based on standard quantum phase estimation, and QSP-EVE, which utilizes quantum signal processing (QSP) techniques. We provide rigorous error analysis for both both variants and minimize the number of individual phase factors for QSPEVE. These error analyses enable us to produce constant-factor quantum resource estimates for both std-EVE and QSP-EVE across a variety of molecular systems and observables. For the systems considered, we show that QSP-EVE reduces (Toffoli) gate counts by up to three orders of magnitude and reduces qubit width by up to 25% compared to std-EVE. While estimated resource counts remain far too high for the first generations of fault-tolerant quantum computers, our estimates mark a first of their kind for both the application of expectation value estimation and modern QSP-based techniques.	翻訳日:2023-03-27 13:42:46 公開日:2023-03-24
# 基礎・応用研究から見た注意機構による予測性能とモデル解釈可能性の向上 Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives ( http://arxiv.org/abs/2303.14116v1 ) ライセンス: Link先を確認	Shunsuke Kitada	(参考訳) ディープラーニング技術の劇的な進歩により、機械学習の研究は、モデル予測の解釈可能性の向上と、基礎研究と応用研究の両方における予測性能の向上に注力している。ディープラーニングモデルは従来の機械学習モデルよりもはるかに高い予測性能を持つが、特定の予測プロセスは解釈や説明が難しい。これは機械学習モデルのブラックボックス化として知られており、製造業、商業、ロボット工学などの幅広い研究分野において、そのような技術の使用が一般的になっている産業や、ミスを許容しない医療分野などにおいて、特に重要な問題として認識されている。この論文は著者の論文の要約に基づいている。論文の中で要約された研究は、近年注目されている注意機構に焦点をあて、予測性能と解釈可能性の向上の観点から基礎研究の可能性について論じ、実験室環境を超えて大規模なデータセットを用いて実世界の応用に応用した研究を行った。この論文はまた、これらの発見がその後の研究や今後の分野の展望にもたらす意味をまとめて締めくくっている。 With the dramatic advances in deep learning technology, machine learning research is focusing on improving the interpretability of model predictions as well as prediction performance in both basic and applied research. While deep learning models have much higher prediction performance than traditional machine learning models, the specific prediction process is still difficult to interpret and/or explain. This is known as the black-boxing of machine learning models and is recognized as a particularly important problem in a wide range of research fields, including manufacturing, commerce, robotics, and other industries where the use of such technology has become commonplace, as well as the medical field, where mistakes are not tolerated. This bulletin is based on the summary of the author's dissertation. The research summarized in the dissertation focuses on the attention mechanism, which has been the focus of much attention in recent years, and discusses its potential for both basic research in terms of improving prediction performance and interpretability, and applied research in terms of evaluating it for real-world applications using large data sets beyond the laboratory environment. The dissertation also concludes with a summary of the implications of these findings for subsequent research and future prospects in the field.	翻訳日:2023-03-27 13:42:23 公開日:2023-03-24
# 逆の例を見つけるのに何次元が必要か? How many dimensions are required to find an adversarial example? ( http://arxiv.org/abs/2303.14173v1 ) ライセンス: Link先を確認	Charles Godfrey, Henry Kvinge, Elise Bishoff, Myles Mckay, Davis Brown, Tim Doster, and Eleanor Byler	(参考訳) 敵の脆弱性を探究する過去の研究は、敵がモデル入力のすべての次元を摂動できる状況に焦点を当ててきた。一方、近年の研究ではどちらの場合も考慮している。 (i)敵は、限られた数の入力パラメータを乱すことができる。 (ii)マルチモーダル問題におけるモダリティの部分集合。どちらの場合も、逆例は、周囲の入力空間$\mathcal{X}$内の部分空間$V$に効果的に制約される。これに動機づけられたこの研究では、敵の脆弱性がどのように$\dim(V)$に依存するかを調べる。特に、$\ell^p$の通常の制約を持つ標準的なpgd攻撃の敵意的な成功は、$\epsilon (\frac{\dim(v)}{\dim \mathcal{x}})^{\frac{1}{q}}$の単調に増加する関数のように振る舞う。この関数形式は単純な玩具線形モデルから容易に導出することができ、その結果は高次元空間上の局所線型モデルに対して逆例が固有であるという議論にさらなる信頼を与える。 Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.	翻訳日:2023-03-27 13:36:28 公開日:2023-03-24
# 局在軌道間の物理的絡み合い Physical Entanglement Between Localized Orbitals ( http://arxiv.org/abs/2303.14170v1 ) ライセンス: Link先を確認	Lexin Ding, Gesa D\"unnweber, Christian Schilling	(参考訳) ArXiv:2207.03377]では、現実的な電子系に適用可能な忠実絡み合い尺度の最初の閉じた公式が導出された。本研究は,量子技術開発を導くという究極の目標をもって,この重要な成果を生かしたものである。そのため、まず原子、分子、固体体などの電子系における絡み合い交換の過程を明らかにする。このことは、局所化された小軌道サブシステムへの参照と、数値パリティ選択規則の実装の両方の必要性を明確に示している。したがって、ウィックの定理により、自由電子鎖の部位間の真の物理的絡み合いの完全な解析的研究を行う。その意味では、そのような分析分析を単位不変な設定、すなわち鎖をより非現実的でマクロ的に大きなサブシステムに分割することを制限する共通のパラダイムを破る。次に、このモデルを相互作用する電子の水素環にアップグレードし、探索された局在軌道を構築する。両システムとも,充填率が十分に低い場合,長距離絡み合いの存在が確認される。 In [arXiv:2207.03377] the first closed formula of a faithful entanglement measure applicable to realistic electron systems has been derived. In the present work, we build on this key achievement with the ultimate goal of guiding the development of quantum technologies. For this, we first elucidate the process of entanglement swapping in electron systems such as atoms, molecules or solid bodies. This clearly demonstrates the necessity of both the reference to localized few-orbital subsystems and the implementation of the number-parity superselection rule. Accordingly, in virtue of Wick's theorem, we then provide a fully analytical study of the true physical entanglement between sites in free electron chains. In that sense, we break the common paradigm of restricting such analytical analyses to unitarily invariant settings, i.e. bipartitions of the chain into rather impractical, macroscopically large subsystems. We then upgrade this model to a hydrogen ring of interacting electrons and construct the sought-after localized orbitals. For both systems, we confirm the presence of long-distance entanglement, provided the filling fractions are sufficiently low/high.	翻訳日:2023-03-27 13:35:56 公開日:2023-03-24
# 都市GIRAFFE:構成生成型ニューラル特徴場としての都市景観の表現 UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields ( http://arxiv.org/abs/2303.14167v1 ) ライセンス: Link先を確認	Yuanbo Yang, Yifei Yang, Hanlei Guo, Rong Xiong, Yue Wang, Yiyi Liao	(参考訳) AR/VRやシミュレーションを含む多くのアプリケーションにおいて、カメラポーズやシーン内容の制御が可能なフォトリアリスティック画像の生成が不可欠である。 3D認識生成モデルで急速に進歩しているにもかかわらず、既存の手法のほとんどはオブジェクト中心の画像に焦点を当てており、自由カメラ視点制御やシーン編集のための都市シーンの生成には適用できない。そこで本稿では,難易度の高い3dパンオプティクスを用いた3d認識生成モデルを導出するために,可算物と可算物体のレイアウト分布を含む粗い3dパンオプティクスを用いた都市giraffeを提案する。私たちのモデルは、シーンを物、物、空に分解するので、構成と制御が可能です。セマンティクスボクセルグリッド(semantic voxel grids)の形式に先立って、粗いセマンティクスと幾何情報を効果的に組み込んだ条件付き生成器を構築します。事前のオブジェクトレイアウトにより、散らかったシーンからオブジェクトジェネレータを学ぶことができます。適切な損失関数により,大規模なカメラの動き,物体の編集,物体の操作など,様々な制御性を持つ光リアルな3D認識画像合成が容易となる。 kitti-360データセットを含む合成データと実世界のデータセットの両方において,モデルの有効性を検証する。 Generating photorealistic images with controllable camera pose and scene contents is essential for many applications including AR/VR and simulation. Despite the fact that rapid progress has been made in 3D-aware generative models, most existing methods focus on object-centric images and are not applicable to generating urban scenes for free camera viewpoint control and scene editing. To address this challenging task, we propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior, including the layout distribution of uncountable stuff and countable objects, to guide a 3D-aware generative model. Our model is compositional and controllable as it breaks down the scene into stuff, objects, and sky. Using stuff prior in the form of semantic voxel grids, we build a conditioned stuff generator that effectively incorporates the coarse semantic and geometry information. The object layout prior further allows us to learn an object generator from cluttered scenes. With proper loss functions, our approach facilitates photorealistic 3D-aware image synthesis with diverse controllability, including large camera movement, stuff editing, and object manipulation. We validate the effectiveness of our model on both synthetic and real-world datasets, including the challenging KITTI-360 dataset.	翻訳日:2023-03-27 13:35:40 公開日:2023-03-24
# BundleSDF:ニューラル6-DoF追跡と未知物体の3次元再構成 BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects ( http://arxiv.org/abs/2303.14158v1 ) ライセンス: Link先を確認	Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield	(参考訳) 本稿では,モノクロRGBDビデオシーケンスから未知物体の6-DoF追跡をリアルタイムに行うとともに,物体のニューラル3D再構成を行う。視覚的テクスチャがほとんど欠如している場合でも,任意の剛体オブジェクトに対して有効である。オブジェクトは第1フレームのみにセグメント化されていると仮定される。追加情報は不要で、相互作用エージェントに関する仮定は不要である。提案手法の鍵となるのは,形状と外観の両方を捉える一貫した3次元表現にロバストに情報を蓄積するために,ポーズグラフ最適化プロセスと並行して学習するニューラルオブジェクトフィールドである。これらのスレッド間の通信を容易にするために、ポーズ付きメモリフレームの動的プールが自動的に維持される。提案手法では,大きなポーズ変化,部分的および完全閉塞,無テクスチャ面,特異なハイライトなどの課題に対処する。 ho3d、ycbineoat、behavior datasetsの結果を示し、この手法が既存のアプローチを大きく上回ることを示した。プロジェクトページ: https://bundlesdf.github.io We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is made about the interaction agent. Key to our method is a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation capturing both geometry and appearance. A dynamic pool of posed memory frames is automatically maintained to facilitate communication between these threads. Our approach handles challenging sequences with large pose changes, partial and full occlusion, untextured surfaces, and specular highlights. We show results on HO3D, YCBInEOAT, and BEHAVE datasets, demonstrating that our method significantly outperforms existing approaches. Project page: https://bundlesdf.github.io	翻訳日:2023-03-27 13:35:16 公開日:2023-03-24
# カラムローアンタングル型画素合成による高効率スケール不変発電機 Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis ( http://arxiv.org/abs/2303.14157v1 ) ライセンス: Link先を確認	Thuan Hoang Nguyen, Thanh Van Le, Anh Tran	(参考訳) 任意のスケールの画像合成は、任意のスケールで写真リアルな画像を合成する、効率的でスケーラブルなソリューションを提供する。しかし、既存のGANベースのソリューションは畳み込みと階層アーキテクチャに過度に依存するため、出力解像度をスケールする際、一貫性と$``$texture sticking$"$問題が発生する。別の観点では、inrベースのジェネレータは設計によってスケール等価であるが、その巨大なメモリフットプリントと遅い推論は、大規模またはリアルタイムシステムでこれらのネットワークを採用することを妨げている。本研究では,空間的畳み込みや粗雑な設計を使わずに,効率的かつスケール等価な新しい生成モデルである$\textbf{c}$olumn-$\textbf{r}$ow$\textbf{e}$ntangled$\textbf{p}$ixel$\textbf{s}$ynthesis (\textbf{creps}$)を提案する。メモリフットプリントを節約し、システムをスケーラブルにするために、レイヤ毎の機能マップを$`$thick$"$カラムと行エンコーディングに分割する、新しい双方向表現を採用しました。 FFHQ、LSUN-Church、MetFaces、Flickr-Sceneryといったさまざまなデータセットの実験では、CREPSが適切なトレーニングと推論速度で任意の解像度でスケール一貫性とエイリアスのない画像を合成する能力を確認している。コードはhttps://github.com/VinAIResearch/CREPS.comから入手できる。 Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.	翻訳日:2023-03-27 13:34:57 公開日:2023-03-24
# 医用画像認識のための局所コントラスト学習 Local Contrastive Learning for Medical Image Recognition ( http://arxiv.org/abs/2303.14153v1 ) ライセンス: Link先を確認	S. A. Rizvi, R. Tang, X. Jiang, X. Ma, X. Hu	(参考訳) 放射線画像解析におけるDeep Learning (DL) を用いた手法の普及は,専門家による放射線学データに対する大きな需要を生み出している。最近の自己監督型フレームワークは、関連する放射線学レポートから専門家のラベル付けの必要性を軽減している。しかし、これらのフレームワークは、医学画像の異なる病理の微妙な違いを区別するのに苦労している。さらに、それらの多くは画像領域とテキストの解釈を提供しておらず、放射線科医がモデル予測を評価するのが困難である。本研究では,画像領域選択のためのレイヤの追加と相互モダリティの相互作用を目的とした,フレキシブルな微調整フレームワークであるLRCLRを提案する。胸部x線検査の結果から,lrclrは重要な局所画像領域を同定し,胸部x線医学的所見のゼロショット性能を改善しつつ,放射線学的テキストに対して有意義な解釈を行っていることが示唆された。 The proliferation of Deep Learning (DL)-based methods for radiographic image analysis has created a great demand for expert-labeled radiology data. Recent self-supervised frameworks have alleviated the need for expert labeling by obtaining supervision from associated radiology reports. These frameworks, however, struggle to distinguish the subtle differences between different pathologies in medical images. Additionally, many of them do not provide interpretation between image regions and text, making it difficult for radiologists to assess model predictions. In this work, we propose Local Region Contrastive Learning (LRCLR), a flexible fine-tuning framework that adds layers for significant image region selection as well as cross-modality interaction. Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text while improving zero-shot performance on several chest x-ray medical findings.	翻訳日:2023-03-27 13:34:20 公開日:2023-03-24
# 幻想的な破片:現実世界の壊れた物体とその完全なカウンターの3Dスキャンデータ Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts ( http://arxiv.org/abs/2303.14152v1 ) ライセンス: Link先を確認	Nikolas Lamb, Cameron Palmer, Benjamin Molloy, Sean Banerjee, Natasha Kholgade Banerjee	(参考訳) 自動形状修正アプローチは現在、現実世界の損傷形状を記述するデータセットへのアクセスを欠いている。 https://terascale-all-sensing-research-studio.github.io/fantasticbreaks)は、78個の壊れたオブジェクトのスキャン、防水、クリーンな3dメッシュを含むデータセット。 Fantastic Breaksには、クラスとマテリアルラベル、壊れたメッシュに結合して完全なメッシュを生成する修復部品の合成プロキシ、手動で注釈付き骨折境界が含まれている。フラクチャー幾何学の詳細な解析を通して, ファンタスティック・ブレークと幾何学的および物理的手法を用いて生成した合成破砕物のデータセットの違いを明らかにする。本稿では,Fantastic Breaks を用いた形状修復実験の結果を,合成データセットを用いて事前学習し,Fantastic Breaks のサブセットを用いて再訓練した。 Automated shape repair approaches currently lack access to datasets that describe real-world damage geometry. We present Fantastic Breaks (and Where to Find Them: https://terascale-all-sensing-research-studio.github.io/FantasticBreaks), a dataset containing scanned, waterproofed, and cleaned 3D meshes for 78 broken objects, paired and geometrically aligned with complete counterparts. Fantastic Breaks contains class and material labels, synthetic proxies of repair parts that join to broken meshes to generate complete meshes, and manually annotated fracture boundaries. Through a detailed analysis of fracture geometry, we reveal differences between Fantastic Breaks and datasets of synthetically fractured objects generated using geometric and physics-based methods. We show experimental results of shape repair with Fantastic Breaks using multiple learning-based approaches pre-trained using a synthetic dataset and re-trained using a subset of Fantastic Breaks.	翻訳日:2023-03-27 13:34:04 公開日:2023-03-24
# double descent demystified: 深層学習パズルの源を同定、解釈、補間する Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle ( http://arxiv.org/abs/2303.14151v1 ) ライセンス: Link先を確認	Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo	(参考訳) ダブル降下は機械学習において驚くべき現象であり、モデルパラメータ数がデータ数に対して増加するにつれて、モデルが大きくなり、テストエラーが減少し、高度に過大にパラメータ化(データサンプル化)される。このテストエラーの減少は、オーバーフィッティングに関する古典的な学習理論に反し、機械学習における大きなモデルの成功を暗示している。このテスト損失の非単調な振る舞いは、データの数、データの次元性、モデルパラメータの数に依存する。ここでは、二重降下を簡潔に記述し、なぜ二重降下が非公式で接近可能な方法で起こるのかを説明し、線型代数と導入確率にのみ親しむ必要がある。多項式回帰を用いた視覚的直観を提供し、次に通常の線形回帰を用いて2重降下を数学的に解析し、同時に3つの解釈可能な因子を同定する。通常の線形回帰を用いた場合, 2重降下は実データ上で起こることを実証し, いずれかの因子が崩壊しても2重降下は起こらないことを示した。重ね合わせと二重降下に関する非線形モデルにおける最近の観測に光を当てるために、この理解を用いる。コードは公開されている。 Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.	翻訳日:2023-03-27 13:33:42 公開日:2023-03-24
# 離散対称性の進化 Evolution of Discrete Symmetries ( http://arxiv.org/abs/2303.14150v1 ) ライセンス: Link先を確認	P. Schmelcher	(参考訳) 対称性は重要な物理特性を規定することが知られており、特に波動構造や結果として生じる伝播力学を含む波動物理学における設計原理として用いられる。局所対称性は、空間の有限領域にしか持たない対称性という意味では、自己組織化過程の結果か、合成合成された物理系への構造的成分の結果である。与えられた有限鎖を拡張するために局所対称性演算を適用することで、結果の1次元格子は過渡的およびその後の周期的挙動からなることを示す。建設により、埋め込みされた局所対称性が強に重なり、結果として生じる格子はそのような対称性の密度の高い骨格を持つ。この挙動を局所対称性演算のクラスに基づいて証明し、最終周期や単位セルの分解、過渡的な長さと分解といった「漸近的」な性質を結論付けることができる。例として、対応するタイト結合ハミルトニアンを考察する。それらのエネルギー固有値スペクトルと固有状態は、いくつかの詳細で分析され、特に局所対称性の多元性の存在による固有状態の局在特性の強い変動を示す。 Symmetries are known to dictate important physical properties and can be used as a design principle in particular in wave physics, including wave structures and the resulting propagation dynamics. Local symmetries, in the sense of a symmetry that holds only in a finite domain of space, can be either the result of a self-organization process or a structural ingredient into a synthetically prepared physical system. Applying local symmetry operations to extend a given finite chain we show that the resulting one-dimensional lattice consists of a transient followed by a subsequent periodic behaviour. Due to the fact that, by construction, the implanted local symmetries strongly overlap the resulting lattice possesses a dense skeleton of such symmetries. We proof this behaviour on the basis of a class of local symmetry operations allowing us to conclude upon the 'asymptotic' properties such as the final period, decomposition of the unit-cell and the length and decomposition of the transient. As an example case, we explore the corresponding tight-binding Hamiltonians. Their energy eigenvalue spectra and eigenstates are analyzed in some detail, showing in particular the strong variability of the localization properties of the eigenstates due to the presence of a plethora of local symmetries.	翻訳日:2023-03-27 13:33:17 公開日:2023-03-24
# パーティーの準備」:大規模言語モデルの助けを借りてスマートなスマートスペースを探る "Get ready for a party": Exploring smarter smart spaces with help from large language models ( http://arxiv.org/abs/2303.14143v1 ) ライセンス: Link先を確認	Evan King, Haoxiang Yu, Sangsu Lee, and Christine Julien	(参考訳) パーティーの準備ができている」と言う人に対する正しい反応は、意味と文脈に深く影響されている。スマートホームアシスタント(例えばgoogle home)にとって、理想的な反応は、家庭で利用可能なデバイスを調査し、その状態を変えてお祝いの雰囲気を作り出すことだ。現在の実用的なシステムでは,(1)抽象文の背後にある意味を推測する機能,(2)その推論をコンテキスト(例えば,特定のデバイスの設定を変更する)に適した具体的な行動コースにマップする機能が必要となるため,そのような要求を処理できない。本稿では、GPT-3のような最近のタスク非依存の大規模言語モデル(LLM)が、既存のルールベースのホームアシスタントシステムに欠けている、膨大な量のクロスドメイン、時には予測不可能な文脈的知識を具現化しているという観察を活用する。まず、LLMをコマンド推論とアクション計画の中心に配置するシステムの実現可能性について検討し、LCMが「パーティーの準備が整う」といったあいまいでコンテキスト依存的なコマンドの背後にある意図を推論し、スマートデバイスを制御するために使用できる具体的な機械パース可能な命令に応答する能力を示す。さらに、LLMが実際のデバイスを制御するための概念実証を行い、微調整やタスク固有の訓練を伴わずに、意図を推論し、デバイス状態を適切に変更する能力を示す。我々の研究は、スマート環境における文脈認識のためのLLM駆動システムの実現を示唆し、この分野における今後の研究を動機付けている。 The right response to someone who says "get ready for a party" is deeply influenced by meaning and context. For a smart home assistant (e.g., Google Home), the ideal response might be to survey the available devices in the home and change their state to create a festive atmosphere. Current practical systems cannot service such requests since they require the ability to (1) infer meaning behind an abstract statement and (2) map that inference to a concrete course of action appropriate for the context (e.g., changing the settings of specific devices). In this paper, we leverage the observation that recent task-agnostic large language models (LLMs) like GPT-3 embody a vast amount of cross-domain, sometimes unpredictable contextual knowledge that existing rule-based home assistant systems lack, which can make them powerful tools for inferring user intent and generating appropriate context-dependent responses during smart home interactions. We first explore the feasibility of a system that places an LLM at the center of command inference and action planning, showing that LLMs have the capacity to infer intent behind vague, context-dependent commands like "get ready for a party" and respond with concrete, machine-parseable instructions that can be used to control smart devices. We furthermore demonstrate a proof-of-concept implementation that puts an LLM in control of real devices, showing its ability to infer intent and change device state appropriately with no fine-tuning or task-specific training. Our work hints at the promise of LLM-driven systems for context-awareness in smart environments, motivating future research in this area.	翻訳日:2023-03-27 13:32:58 公開日:2023-03-24
# Masked Scene Contrast: 教師なし3D表現学習のためのスケーラブルなフレームワーク Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning ( http://arxiv.org/abs/2303.14191v1 ) ライセンス: Link先を確認	Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao	(参考訳) 先駆的な研究として、PointContrastは生のRGB-Dフレーム上のコントラスト学習を活用して教師なしの3D表現学習を行い、様々な下流タスクにおいてその効果を証明する。しかし、rgb-dフレームをコントラストビューとしてマッチングする非効率性と、前述したような煩わしいモード崩壊現象という2つの障害により、3dでの大規模非教師なし学習の傾向はまだ現れていない。筆者らはまず,2つのスタブルブロックを経験的ステップストーンに変換し,よく計算されたデータ拡張パイプラインと実用的なビューミキシング戦略により,シーンレベルの点雲に直接コントラストビューを生成する,効率的かつ効果的なコントラスト学習フレームワークを提案する。次に,ポイントカラーとサーフェルノーマルの再構築を目標としたコントラストクロスマスクをデザインしたコントラスト学習フレームワークの再構築学習について紹介する。マスキングシーンコントラスト(msc)フレームワークは,包括的3次元表現をより効率的かつ効果的に抽出することができる。トレーニング前の手順を少なくとも3倍に加速し、以前の作業と比べて未妥協のパフォーマンスを実現している。さらに、MSCは複数のデータセットにわたる大規模な3D事前トレーニングを可能にし、パフォーマンスをさらに向上し、ScanNetセマンティックセグメンテーション検証セットの75.5% mIoUなど、いくつかの下流タスクで最先端の微調整結果を達成する。 As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks. However, the trend of large-scale unsupervised learning in 3D has yet to emerge due to two stumbling blocks: the inefficiency of matching RGB-D frames as contrastive views and the annoying mode collapse phenomenon mentioned in previous works. Turning the two stumbling blocks into empirical stepping stones, we first propose an efficient and effective contrastive learning framework, which generates contrastive views directly on scene-level point clouds by a well-curated data augmentation pipeline and a practical view mixing strategy. Second, we introduce reconstructive learning on the contrastive learning framework with an exquisite design of contrastive cross masks, which targets the reconstruction of point color and surfel normal. Our Masked Scene Contrast (MSC) framework is capable of extracting comprehensive 3D representations more efficiently and effectively. It accelerates the pre-training procedure by at least 3x and still achieves an uncompromised performance compared with previous work. Besides, MSC also enables large-scale 3D pre-training across multiple datasets, which further boosts the performance and achieves state-of-the-art fine-tuning results on several downstream tasks, e.g., 75.5% mIoU on ScanNet semantic segmentation validation set.	翻訳日:2023-03-27 13:27:26 公開日:2023-03-24
# WildLight:フラッシュライトを使った逆レンダリング WildLight: In-the-wild Inverse Rendering with a Flashlight ( http://arxiv.org/abs/2303.14190v1 ) ライセンス: Link先を確認	Ziang Cheng, Junxuan Li, Hongdong Li	(参考訳) 本稿では,未知の環境光の下での逆レンダリングの課題に対する実用的な測光手法を提案する。本システムは,スマートフォンで撮影した多視点画像のみを用いて,シーン形状と反射率を復元する。重要なアイデアは、スマートフォンの内蔵フラッシュライトを最小制御光源として活用し、画像強度を2つのフォトメトリックコンポーネントに分解することだ。我々の方法では、フラッシュ/非フラッシュ画像はペアでキャプチャする必要がない。ニューラル・ライト・フィールドの成功に基づき、オフ・ザ・シェルフ法を用いて周囲の反射を捉え、フラッシュライト・コンポーネントは物理的に正確な光度制約により反射率と照明を分離する。既存の逆レンダリング手法と比較して,非暗室環境に適用できるが,環境反射を明示的に解くことの難しさは回避できる。提案手法は実装が容易で,セットアップも容易で,既存の逆レンダリング技術よりも一貫して優れていることを示す。最後に,産業用レンダラ用に用意されたpbrテクスチャトライアングルメッシュに,神経再構成を容易にエクスポートできる。 This paper proposes a practical photometric solution for the challenging problem of in-the-wild inverse rendering under unknown ambient lighting. Our system recovers scene geometry and reflectance using only multi-view images captured by a smartphone. The key idea is to exploit smartphone's built-in flashlight as a minimally controlled light source, and decompose image intensities into two photometric components -- a static appearance corresponds to ambient flux, plus a dynamic reflection induced by the moving flashlight. Our method does not require flash/non-flash images to be captured in pairs. Building on the success of neural light fields, we use an off-the-shelf method to capture the ambient reflections, while the flashlight component enables physically accurate photometric constraints to decouple reflectance and illumination. Compared to existing inverse rendering methods, our setup is applicable to non-darkroom environments yet sidesteps the inherent difficulties of explicit solving ambient reflections. We demonstrate by extensive experiments that our method is easy to implement, casual to set up, and consistently outperforms existing in-the-wild inverse rendering techniques. Finally, our neural reconstruction can be easily exported to PBR textured triangle mesh ready for industrial renderers.	翻訳日:2023-03-27 13:26:53 公開日:2023-03-24
# FastViT:構造リパラメータを用いた高速ハイブリッドビジョントランス FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization ( http://arxiv.org/abs/2303.14189v1 ) ライセンス: Link先を確認	Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan	(参考訳) 近年の変圧器と畳み込み設計の融合により、モデルの精度と効率が着実に改善されている。本稿では,最先端のレイテンシ-精度トレードオフを得るハイブリッドビジョントランスフォーマーアーキテクチャであるFastViTを紹介する。この目的のために,FastViTのビルディングブロックである新しいトークンミキシング演算子RepMixerを導入する。さらに、列車時間オーバーパラメータ化と大きなカーネル畳み込みを適用して精度を高め、これらの選択が遅延に与える影響を実証的に示します。我々のモデルは、最近の最先端ハイブリッドトランスフォーマーアーキテクチャであるCMTよりも3.5倍速く、EfficientNetより4.9倍速く、ImageNetデータセットと同じ精度でモバイルデバイス上のConvNeXtより1.9倍速い。同様のレイテンシでは、MobileOneよりもImageNetのTop-1精度が4.2%向上しています。私たちのモデルは、画像分類、検出、セグメンテーション、および3Dメッシュレグレッションといった、いくつかのタスクで競合するアーキテクチャを一貫して上回ります。さらに,本モデルは分布外サンプルや腐敗に対して非常に堅牢であり,競合するロバストモデルよりも優れている。 The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network. We further apply train-time overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency. We show that - our model is 3.5x faster than CMT, a recent state-of-the-art hybrid transformer architecture, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset. At similar latency, our model obtains 4.2% better Top-1 accuracy on ImageNet than MobileOne. Our model consistently outperforms competing architectures across several tasks -- image classification, detection, segmentation and 3D mesh regression with significant improvement in latency on both a mobile device and a desktop GPU. Furthermore, our model is highly robust to out-of-distribution samples and corruptions, improving over competing robust models.	翻訳日:2023-03-27 13:26:35 公開日:2023-03-24
# TRAK: スケールでのモデル行動への貢献 TRAK: Attributing Model Behavior at Scale ( http://arxiv.org/abs/2303.14186v1 ) ライセンス: Link先を確認	Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry	(参考訳) データ帰属の目的は、モデルの予測をトレーニングデータに遡ることである。この目標への長い努力にもかかわらず、データ帰属に対する既存のアプローチは、ユーザに計算の扱いやすさと有効性を選択させる傾向がある。すなわち、計算可能な手法は、非凸設定(ディープニューラルネットワークの文脈など)におけるモデル予測の正確な帰属に苦労するが、そのような手法では、数千のモデルを訓練する必要があるため、大規模モデルやデータセットでは実用的でない。本稿では,大規模で微分可能なモデルに対して,有効かつ計算的に抽出可能なデータ帰属法であるTRAK(Tracing with the Randomly-projected After Kernel)を紹介する。特に、わずかに訓練されたモデルを活用することで、TRAKは何千ものモデルのトレーニングを必要とする属性メソッドのパフォーマンスにマッチすることができる。我々は、イメージネットで訓練された画像分類器、視覚言語モデル(CLIP)、言語モデル(BERT、mT5)のTRAKの有用性を実証する。私たちは https://github.com/MadryLab/trak で TRAK を使用するためのコードを提供しています。 The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .	翻訳日:2023-03-27 13:26:11 公開日:2023-03-24
# Make-It-3D:拡散前の単一画像からの高忠実度3D創出 Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior ( http://arxiv.org/abs/2303.14184v1 ) ライセンス: Link先を確認	Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen	(参考訳) 本研究では,1枚の画像のみから高忠実度3Dコンテンツを作成する問題について検討する。基本的には下層の3d幾何学を推定し、目に見えないテクスチャを同時に幻覚させる。この課題に対処するために,訓練された2次元拡散モデルからの事前知識を活用し,3次元生成のための3次元認識監督を行う。提案手法であるMake-It-3Dは,2段階の最適化パイプラインを用いており,第1段階は前部からの基準画像からの制約を取り入れ,第2段階は粗いモデルをテクスチャ化された点雲に変換し,第2段階は参照画像から高品質なテクスチャを活用しながら,拡散により現実性を高める。広汎な実験により,本手法は先行研究よりも大きなマージンを達成し,忠実な再建と印象的な視覚的品質を実現した。本手法は,汎用オブジェクトの単一画像から高品質な3D作成を実現するための最初の試みであり,テキスト・ツー・3D作成やテクスチャ編集などの様々な応用を可能にする。 In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.	翻訳日:2023-03-27 13:25:48 公開日:2023-03-24
# すべてを支配する1つのプロトコル? 相互運用可能なメッセージングのセキュリティについて One Protocol to Rule Them All? On Securing Interoperable Messaging ( http://arxiv.org/abs/2303.14178v1 ) ライセンス: Link先を確認	Jenny Blessing and Ross Anderson	(参考訳) 欧州の議員は、異なるプラットフォーム上のユーザーが互いにメッセージを交換できるべきだと裁定した。しかし、メッセージングの相互運用性は、Pandoraのセキュリティとプライバシの課題の箱を開く。反トラスト対策としてだけでなく、エンドユーザにより良いエクスペリエンスを提供する手段としても支持されているが、相互運用性は、貧弱な実行時にユーザエクスペリエンスを悪化させるリスクを負う。実際のメッセージ交換を有効にする方法と、あるサービスプロバイダから別のサービスプロバイダに渡される暗号化メッセージから生じる多数の残余の課題にどのように対処するか – コンテンツモデレーション、ユーザ認証、キー管理、プロバイダ間のメタデータ共有など – という2つの基本的な疑問がある。本研究では、エンドツーエンドの暗号化メッセージにおける相互運用可能な通信に関する特定のオープンな質問と課題を特定し、これらの課題に取り組むためのハイレベルな提案を示す。 European lawmakers have ruled that users on different platforms should be able to exchange messages with each other. Yet messaging interoperability opens up a Pandora's box of security and privacy challenges. While championed not just as an anti-trust measure but as a means of providing a better experience for the end user, interoperability runs the risk of making the user experience worse if poorly executed. There are two fundamental questions: how to enable the actual message exchange, and how to handle the numerous residual challenges arising from encrypted messages passing from one service provider to another -- including but certainly not limited to content moderation, user authentication, key management, and metadata sharing between providers. In this work, we identify specific open questions and challenges around interoperable communication in end-to-end encrypted messaging, and present high-level suggestions for tackling these challenges.	翻訳日:2023-03-27 13:25:04 公開日:2023-03-24
# 教師なしドメインディスカバリによるエキスパート言語モデルのスケーリング Scaling Expert Language Models with Unsupervised Domain Discovery ( http://arxiv.org/abs/2303.14177v1 ) ライセンス: Link先を確認	Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer	(参考訳) 大規模言語モデルは一般的に密に訓練され、全てのパラメータは全ての入力に対して更新される。これは数千のGPU間で数十億のパラメータの同期を必要とする。任意のテキストコーパス上で,大小の言語モデルを非同期に訓練する,シンプルだが効果的な手法を提案する。提案手法では,コーパスを関連文書の集合にクラスタリングし,各クラスタ上で個別の専門家言語モデルを学習し,それらを疎結合に組み合わせて推論を行う。このアプローチは、各エキスパートのドメインを自動的に発見することで、恥ずかしい並列トレーニングを一般化し、既存のスパース言語モデルのほとんどすべての通信オーバーヘッドを取り除く。分析の結果,有意義なクラスタに専門家を特化することが,これらの向上の鍵であることがわかった。また、専門家の数やトレーニングデータのサイズによってパフォーマンスが向上し、これは大規模な言語モデルをトレーニングするための非常に効率的でアクセスしやすいアプローチであることを示唆している。 Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introduce a simple but effective method to asynchronously train large, sparse language models on arbitrary text corpora. Our method clusters a corpus into sets of related documents, trains a separate expert language model on each cluster, and combines them in a sparse ensemble for inference. This approach generalizes embarrassingly parallel training by automatically discovering the domains for each expert, and eliminates nearly all the communication overhead of existing sparse language models. Our technique outperforms dense baselines on multiple corpora and few-shot tasks, and our analysis shows that specializing experts to meaningful clusters is key to these gains. Performance also improves with the number of experts and size of training data, suggesting this is a highly efficient and accessible approach to training large language models.	翻訳日:2023-03-27 13:24:47 公開日:2023-03-24
# 低消費電力・低レイテンシ視覚知覚のためのハイブリッドANN-SNNアーキテクチャ A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception ( http://arxiv.org/abs/2303.14176v1 ) ライセンス: Link先を確認	Asude Aydin, Mathias Gehrig, Daniel Gehrig, and Davide Scaramuzza	(参考訳) Spiking Neural Networks(SNN)は、バイオインスパイアされたニューラルネットワークのクラスで、非同期およびスパース処理を通じてエッジデバイスに低電力および低レイテンシ推論をもたらすことを約束する。しかしながら、時相モデルであるSNNは、古典的人工ニューラルネットワーク(ANN)と同等の予測を生成するために、表現的状態に大きく依存している。これらの状態は、長い過渡期の後だけ収束し、入力データなしで急速に崩壊し、より高いレイテンシ、消費電力、精度が低下する。この作業は、補助的なANNが低速で動作する状態の初期化によってこの問題に対処する。その後、SNNは状態を使用して、次の初期化フェーズまで高時間分解能の予測を生成する。我々のハイブリッドANN-SNNモデルは、両者の長所を結合する: ANNのおかげで長い状態の過渡性と状態崩壊に悩まされず、SNNのおかげで高時間分解能、低レイテンシ、低電力で予測を生成することができる。イベントベース2Dおよび3Dヒューマンポーズ推定の課題について,提案手法は,同じ推論速度で実行した場合のANNと比べ,性能を4%低下させることなく88%の消費電力を消費することを示した。さらに,snsと比較した場合,誤差は74%低減した。この研究は、それぞれの利益を最大化するために、ANNとSNNをどのように使用できるか、新たな理解を提供する。 Spiking Neural Networks (SNN) are a class of bio-inspired neural networks that promise to bring low-power and low-latency inference to edge devices through asynchronous and sparse processing. However, being temporal models, SNNs depend heavily on expressive states to generate predictions on par with classical artificial neural networks (ANNs). These states converge only after long transient periods, and quickly decay without input data, leading to higher latency, power consumption, and lower accuracy. This work addresses this issue by initializing the state with an auxiliary ANN running at a low rate. The SNN then uses the state to generate predictions with high temporal resolution until the next initialization phase. Our hybrid ANN-SNN model thus combines the best of both worlds: It does not suffer from long state transients and state decay thanks to the ANN, and can generate predictions with high temporal resolution, low latency, and low power thanks to the SNN. We show for the task of event-based 2D and 3D human pose estimation that our method consumes 88% less power with only a 4% decrease in performance compared to its fully ANN counterparts when run at the same inference rate. Moreover, when compared to SNNs, our method achieves a 74% lower error. This research thus provides a new understanding of how ANNs and SNNs can be used to maximize their respective benefits.	翻訳日:2023-03-27 13:24:29 公開日:2023-03-24
# 半教師付き医用画像セグメンテーションにおける固有一貫性学習 Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2303.14175v1 ) ライセンス: Link先を確認	Ye Zhu, Jie Yang, Si-Qi Liu and Ruimao Zhang	(参考訳) 近年,医用画像アノテーションのコストが高いことから,半監督的医用画像分割が注目されている。本稿では,ラベル付きおよびラベル付きデータの意味的一貫性ガイダンスを通じて,ロバストな意味カテゴリー表現を学習し,セグメンテーションを支援する新しい本質的一貫性学習(icl)手法を提案する。実際には、ラベル付きおよびラベルなしデータのセマンティックなカテゴリ表現を整列するアテンション機構に基づいて、トレーニングセット全体にわたってグローバルなセマンティックなセマンティックな表現を更新する2つの外部モジュール、SSPA(Supervised Semantic Proxy Adaptor)とUnsupervised Semantic Consistent Learner(USCL)を導入する。 iclは様々なネットワークアーキテクチャのためのプラグイン・アンド・プレイ方式であり、2つのモジュールはテスト段階には関与していない。 3つの公開ベンチマークによる実験結果から,提案手法は特に注釈付きデータの数が極めて限られている場合に,最先端の手法よりも優れていることが示された。コードはhttps://github.com/zhuye98/icl.git。 Semi-supervised medical image segmentation has attracted much attention in recent years because of the high cost of medical image annotations. In this paper, we propose a novel Inherent Consistent Learning (ICL) method, which aims to learn robust semantic category representations through the semantic consistency guidance of labeled and unlabeled data to help segmentation. In practice, we introduce two external modules namely Supervised Semantic Proxy Adaptor (SSPA) and Unsupervised Semantic Consistent Learner (USCL) that based on the attention mechanism to align the semantic category representations of labeled and unlabeled data, as well as update the global semantic representations over the entire training set. The proposed ICL is a plug-and-play scheme for various network architectures and the two modules are not involved in the testing stage. Experimental results on three public benchmarks show that the proposed method can outperform the state-of-the-art especially when the number of annotated data is extremely limited. Code is available at: https://github.com/zhuye98/ICL.git.	翻訳日:2023-03-27 13:24:04 公開日:2023-03-24
# オンライン連続学習におけるリアルタイム評価:新しい希望 Real-Time Evaluation in Online Continual Learning: A New Hope ( http://arxiv.org/abs/2302.01047v3 ) ライセンス: Link先を確認	Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Ameya Prabhu, Philip H. S. Torr, Bernard Ghanem	(参考訳) 現在のCL(Continuous Learning)手法の評価では、トレーニング時間や計算に制約がないと仮定することが多い。ストリームはモデルが予測のために次のデータを明らかにする前にトレーニングを完了するのを待たない、連続学習の実用的なリアルタイム評価です。そこで本研究では,現在のCL手法を計算コストに対して評価する。位置ラベル付き3900万のタイムスタンプ画像を含む大規模データセットであるCLOCについて広範な実験を行った。本評価では, 現状のCL手法よりも単純なベースラインが優れており, 現実的な設定における既存手法の適用性に疑問を呈する。さらに,メモリサンプリング戦略や正規化アプローチなど,文献で一般的に使用される様々なclコンポーネントについて検討する。考慮されたすべてのメソッドが、私たちの単純なベースラインと競合しないことがわかった。これは、既存のCL文献の大部分は、実用的でない特定の種類のストリームに適合していることを驚くほど示唆している。我々は,オンライン連続学習手法の開発において,計算コストを考慮するためのパラダイムシフトに向けた第一歩となることを期待する。 Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any real-world setting, which motivates us to propose: a practical real-time evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions. To do this, we evaluate current CL methods with respect to their computational costs. We conduct extensive experiments on CLOC, a large-scale dataset containing 39 million time-stamped images with geolocation labels. We show that a simple baseline outperforms state-of-the-art CL methods under this evaluation, questioning the applicability of existing methods in realistic settings. In addition, we explore various CL components commonly used in the literature, including memory sampling strategies and regularization approaches. We find that all considered methods fail to be competitive against our simple baseline. This surprisingly suggests that the majority of existing CL literature is tailored to a specific class of streams that is not practical. We hope that the evaluation we provide will be the first step towards a paradigm shift to consider the computational cost in the development of online continual learning methods.	翻訳日:2023-03-27 11:32:16 公開日:2023-03-24
# Kupczynski の文脈局所因果確率モデルはベルの定理によって制約される Kupczynski's Contextual Locally Causal Probabilistic Models are constrained by Bell's theorem ( http://arxiv.org/abs/2208.09930v5 ) ライセンス: Link先を確認	Richard D. Gill and Justo Pastor Lambare	(参考訳) マリアン・クプシンスキーは一連の論文で、ベルの定理は測定器を記述する文脈的設定依存パラメータを正しく考慮すれば回避できると主張した。これは事実ではないことを示す。初期の出現にもかかわらず、クプシンキの文脈的局所因果確率モデルの概念は数学的にはベル局所隠れ変数モデルの特別な場合である。したがって、たとえ彼が提案した方法で文脈性を考慮するとしても、ベル-CHSHの不等式は導出可能である。量子力学と局所実在論(クプチンスキーの主張による概念の拡大を含む)は互いに相容れない。さらなる検査の結果、クプチンスキーは実際に検出の抜け穴に落ちていることがわかった。 2015年以降、ベル・チェシュの不等式に違反する多くの抜け穴のない実験が行われており、そのような実験の他の不完全さにもかかわらず、クプチンスキーの局所実在論への脱出ルートは入手できない。 In a sequence of papers, Marian Kupczynski has argued that Bell's theorem can be circumvented if one takes correct account of contextual setting-dependent parameters describing measuring instruments. We show that this is not true. Despite first appearances, Kupczynksi's concept of a contextual locally causal probabilistic model is mathematically a special case of a Bell local hidden variables model. Thus, even if one takes account of contextuality in the way he suggests, the Bell-CHSH inequality can still be derived. Violation thereof by quantum mechanics cannot be easily explained away: quantum mechanics and local realism (including Kupczynski's claimed enlargement of the concept) are not compatible with one another. Further inspection shows that Kupczynski is actually falling back on the detection loophole. Since 2015, numerous loophole-free experiments have been performed, in which the Bell-CHSH inequality is violated, so despite any other possible imperfections of such experiments, Kupczynski's escape route for local realism is not available	翻訳日:2023-03-27 11:31:59 公開日:2023-03-24
# 消散状態準備のための初期状態依存量子速度制限:枠組みと最適化 Initial-state-dependent quantum speed limit for dissipative state preparation: Framework and optimization ( http://arxiv.org/abs/2303.12967v2 ) ライセンス: Link先を確認	Junjie Liu and Hanlin Nie	(参考訳) 散逸は伝統的に量子情報処理の障害と考えられてきたが、近年の研究により、所望の量子状態を生成するために利用できることが示されている。実用的な用途に有用であるためには、散逸的な進化をスピードアップする能力が不可欠である。本研究では, 生成状態がエネルギー固有状態の1つであるマルコフ散逸状態生成スキームに着目した。我々は、一般的に用いられる初期状態非依存緩和時間と比較して、実際の進化時間のより洗練された測定値を提供する初期状態依存量子速度制限(QSL)を導出する。これにより、異なる初期状態にわたる散逸的進化のパッシブ最適化が可能になる。 qslを用いた進化時間の最小化を条件とした調製過程における散逸熱の最小化により、望ましい初期状態は固有値の増加の順序エネルギー固有値に対して対角要素の特定の置換を持つことがわかった。この構成では、準備された状態の個体数は最大であり、残りの対角要素は、同じ順序のエネルギー固有基底における受動的状態の順にソートされる。ベル状態を作成するための散逸ライドバーグ原子系における戦略の有効性を実証する。我々の研究は、散逸状態準備プロセスの最適化に関する新たな洞察を提供し、実用的な量子技術に重大な影響を与える可能性がある。 Dissipation has traditionally been considered a hindrance to quantum information processing, but recent studies have shown that it can be harnessed to generate desired quantum states. To be useful for practical applications, the ability to speed up the dissipative evolution is crucial. In this study, we focus on a Markovian dissipative state preparation scheme where the prepared state is one of the energy eigenstates. We derive an initial-state-dependent quantum speed limit (QSL) that offers a more refined measure of the actual evolution time compared to the commonly used initial-state-independent relaxation time. This allows for a passive optimization of dissipative evolution across different initial states. By minimizing the dissipated heat during the preparation process, conditioned on the minimization of evolution time using the QSL, we find that the preferred initial state has a specific permutation of diagonal elements with respect to an ordered energy eigenbasis of increasing eigenvalues. In this configuration, the population on the prepared state is the largest, and the remaining diagonal elements are sorted in an order resembling that of a passive state in the same ordered energy eigenbasis. We demonstrate the effectiveness of our strategy in a dissipative Rydberg atom system for preparing the Bell state. Our work provides new insights into the optimization of dissipative state preparation processes and could have significant implications for practical quantum technologies.	翻訳日:2023-03-27 11:24:13 公開日:2023-03-24
# 非トリミングビデオにおけるDense-Localizing Audio-Visual Events:大規模ベンチマークとベースライン Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline ( http://arxiv.org/abs/2303.12930v2 ) ライセンス: Link先を確認	Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng	(参考訳) 既存のオーディオ視覚イベントローカライゼーション(AVE)は、手動でトリミングされたビデオを処理する。しかし、この設定は非現実的であり、自然ビデオは様々なカテゴリーの多数のオーディオ視覚イベントを含むことが多い。本稿では,実生活の応用をよりよくするために,未編集映像に発生するすべての音声視覚イベントを共同でローカライズし,認識することを目的とした,密集した音声視覚イベントのタスクに焦点をあてる。この問題は、きめ細かいオーディオ視覚シーンとコンテキスト理解を必要とするため、難しい。この問題に対処するために,最初のUntrimmed Audio-Visual (UnAV-100)データセットを導入する。各ビデオには平均して2.8の映像イベントがあり、イベントは通常互いに関連しており、現実のシーンのように共起する可能性がある。次に,様々な長さの音声視覚イベントをローカライズし,それら間の依存関係をひとつのパスでキャプチャする,学習ベースの新しいフレームワークを用いてタスクを定式化する。提案手法の有効性と,マルチスケールクロスモーダル知覚と依存性モデリングの意義を実証する実験を行った。 Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. The problem is challenging as it requires fine-grained audio-visual scene and context understanding. To tackle this problem, we introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events. Each video has 2.8 audio-visual events on average, and the events are usually related to each other and might co-occur as in real-life scenes. Next, we formulate the task using a new learning-based framework, which is capable of fully integrating audio and visual modalities to localize audio-visual events with various lengths and capture dependencies between them in a single pass. Extensive experiments demonstrate the effectiveness of our method as well as the significance of multi-scale cross-modal perception and dependency modeling for this task.	翻訳日:2023-03-27 11:23:49 公開日:2023-03-24
# フェルミオン系およびボソニックガウス系におけるエントロピー生成に対する量子的および古典的貢献 Quantum and classical contributions to entropy production in fermionic and bosonic Gaussian systems ( http://arxiv.org/abs/2303.12749v2 ) ライセンス: Link先を確認	Krzysztof Ptaszynski, Massimiliano Esposito	(参考訳) 前述したように、熱力学過程の不可逆性を特徴づける重要な量であるエントロピー生成は、系の自由度と熱環境の間の相関関係の生成に関係している。これは、そのような相関が古典的か量子的か、すなわち測定によってアクセス可能であるかという疑問を提起する。フェルミオン系とボソニックガウス系を考えることでこの問題に対処する。フェルミオンの場合、エントロピー生成は、物理的に許容される測定のセットをフォック状態の射影に制限し、古典的にアクセス可能な相関の量を大幅に制限するパリティ超選択規則により、ほとんど量子的であることを示す。対照的に、ボソニック系では、ガウス測度によってはるかに多くの相関がアクセス可能である。具体的には、量子寄与は低温では重要であるが、高温ではエントロピー生成は純粋に古典的な位置-運動量相関に対応する。本研究は, エントロピー生成の微視的定式化における量子-古典遷移の存在に関して, フェルミオン系とボソニック系の重要な違いを示した。また、エントロピー生成は、弱いカップリング限界においても主に量子相関によって引き起こされる可能性があり、これは状態人口の古典的な速度方程式や、ボソンとフェルミオンの輸送特性が古典的な粒子のそれと収束する低粒子密度限界において記述される。 As previously demonstrated, the entropy production - a key quantity characterizing the irreversibility of thermodynamic processes - is related to generation of correlations between degrees of freedom of the system and its thermal environment. This raises the question of whether such correlations are of a classical or quantum nature, namely, whether they are accessible through measurements. We address this problem by considering fermionic and bosonic Gaussian systems. We show that for fermions the entropy production is mostly quantum due to the parity superselection rule which restricts the set of physically allowed measurements to projections on the Fock states, thus significantly limiting the amount of classically accessible correlations. In contrast, in bosonic systems a much larger amount of correlations can be accessed through Gaussian measurements. Specifically, while the quantum contribution may be important at low temperatures, in the high temperature limit the entropy production corresponds to purely classical position-momentum correlations. Our results demonstrate an important difference between fermionic and bosonic systems regarding the existence of a quantum-to-classical transition in the microscopic formulation of the entropy production. They also show that entropy production can be mainly caused by quantum correlations even in the weak coupling limit, which admits a description in terms of classical rate equations for state populations, as well as in the low particle density limit, where the transport properties of both bosons and fermions converge to those of classical particles.	翻訳日:2023-03-27 11:23:25 公開日:2023-03-24
# 人工知能の火花:GPT-4による初期の実験 Sparks of Artificial General Intelligence: Early experiments with GPT-4 ( http://arxiv.org/abs/2303.12712v2 ) ライセンス: Link先を確認	S\'ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang	(参考訳) 人工知能(AI)の研究者たちは、さまざまなドメインやタスクにまたがる優れた能力を示す大規模な言語モデル(LLM)を開発し、洗練し、学習と認知の理解に挑戦しています。 OpenAIが開発した最新のモデルであるGPT-4は、前例のない規模の計算とデータを使って訓練された。本稿では,openaiによる開発が盛んであったgpt-4の初期バージョンについて報告する。 GPT-4は(例えばChatGPTやGoogleのPaLMとともに)従来のAIモデルよりも汎用的なインテリジェンスを示すLLMの新たなコホートの一部である、と私たちは主張する。我々は、これらのモデルの能力と影響について論じる。 GPT-4は、言語習得以外にも、数学、コーディング、ビジョン、医学、法、心理学など、特別なプロンプトを必要とせずに、新しくて困難なタスクを解くことができる。さらに、これらすべてのタスクにおいて、GPT-4のパフォーマンスは人間レベルのパフォーマンスに非常に近く、しばしばChatGPTのような以前のモデルを大きく上回っている。 GPT-4の能力の広さと深さを考えると、人工知能(AGI)システムの早期(まだ未完成)バージョンと見なすことができると信じている。我々は, GPT-4の探索において, 限界の発見に特に重点を置いており, 次世代の予測を超えて新たなパラダイムを追求する必要性を含む, より深く包括的なAGIバージョンに向けて進む上での課題について論じている。我々は,最近の技術的飛躍と今後の研究方向の社会的な影響を振り返って結論づける。 Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.	翻訳日:2023-03-27 11:22:58 公開日:2023-03-24
# RaBit: トポロジ一貫性データセットを用いた3次元二足歩行文字のパラメトリックモデリング RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset ( http://arxiv.org/abs/2303.12564v2 ) ライセンス: Link先を確認	Zhongjin Luo, Shengcai Cai, Jinguo Dong, Ruibo Ming, Liangdong Qiu, Xiaohang Zhan, Xiaoguang Han	(参考訳) 視覚的に可視な3D文字を効率的に作成する支援は、コンピュータビジョンとコンピュータグラフィックスの基本的な研究課題である。最近の学習に基づくアプローチは、3d現実の人間のデジタル化の領域において前例のない精度と効率を達成している。しかし、以前の作品ではゲームや撮影にも大きな需要がある3Dバイペッドの漫画キャラクターのモデリングに焦点を当てていなかった。本稿では,3D2Dアニメキャラクタの最初の大規模データセットである3DBiCarと,対応するパラメトリックモデルであるRaBitを紹介する。私たちのデータセットには1500のトポロジ的に一貫性のある高品質な3Dテクスチャモデルが含まれています。このデータに基づいて、RaBitはSMPLのような線形ブレンド形状モデルとStyleGANベースのニューラルUVテクスチャ生成器で設計され、形状、ポーズ、テクスチャを同時に表現する。 3DBiCarとRaBitの実用性を実証するため, シングルビュー再構成, スケッチベースモデリング, 3Dアニメーションアニメーションなど, 様々な応用が行われている。単一視点の再構成設定では、入力画像から出力されたuvベースのテクスチャマップへの直接的なグローバルマッピングは、いくつかのローカル部分(例えば鼻、耳)の詳細な外観を失う傾向がある。これにより、すべての重要な地域を知覚する部分感性テクスチャ推論器が採用される。さらに,本手法の有効性を定量的および定量的に実証する実験を行った。 3DBiCarとRaBitは gaplab.cuhk.edu.cn/projects/RaBitで利用可能である。 Assisting people in efficiently producing visually plausible 3D characters has always been a fundamental research topic in computer vision and computer graphics. Recent learning-based approaches have achieved unprecedented accuracy and efficiency in the area of 3D real human digitization. However, none of the prior works focus on modeling 3D biped cartoon characters, which are also in great demand in gaming and filming. In this paper, we introduce 3DBiCar, the first large-scale dataset of 3D biped cartoon characters, and RaBit, the corresponding parametric model. Our dataset contains 1,500 topologically consistent high-quality 3D textured models which are manually crafted by professional artists. Built upon the data, RaBit is thus designed with a SMPL-like linear blend shape model and a StyleGAN-based neural UV-texture generator, simultaneously expressing the shape, pose, and texture. To demonstrate the practicality of 3DBiCar and RaBit, various applications are conducted, including single-view reconstruction, sketch-based modeling, and 3D cartoon animation. For the single-view reconstruction setting, we find a straightforward global mapping from input images to the output UV-based texture maps tends to lose detailed appearances of some local parts (e.g., nose, ears). Thus, a part-sensitive texture reasoner is adopted to make all important local areas perceived. Experiments further demonstrate the effectiveness of our method both qualitatively and quantitatively. 3DBiCar and RaBit are available at gaplab.cuhk.edu.cn/projects/RaBit.	翻訳日:2023-03-27 11:22:32 公開日:2023-03-24
# 情報手段によるベイズリスクの低水準化 Lower Bounds on the Bayesian Risk via Information Measures ( http://arxiv.org/abs/2303.12497v3 ) ライセンス: Link先を確認	Amedeo Roberto Esposito, Adrien Vandenbroucque, Michael Gastpar	(参考訳) 本稿ではパラメータ推定に着目し,ベイズリスクを低く抑える新しい手法を提案する。この方法は、r\'enyi の $\alpha$, $\varphi$-divergences や sibson の $\alpha$-mutual 情報を含む、事実上 \emph{any} 情報測度の使用を可能にする。このアプローチは発散を測度の関数と見なし、測度の空間と関数の空間の間の双対性を利用する。特に、マルコフの不等式を介して双対を上界にすることで、あらゆる情報測度でリスクを低くすることができることを示す。したがって、ダイバージェンスが満足するデータ処理の不等式により、推定子非依存の不可能性結果を提供できる。結果は、'Hide-and-Seek'問題を含む離散パラメータと連続パラメータの両方を含む関心の設定に適用され、最先端技術と比較される。重要な観察は、サンプル数における下位境界の挙動が、情報尺度の選択によって影響を受けることである。私たちはこれを、‘Hockey-Stick’のDiversergenceにインスパイアされた、すべての考慮された設定で最大の下位バウンドを提供するために、経験的に実証された新しい分散を導入することで活用します。観察が民営化の対象となる場合、強いデータ処理の不等式によってより強い不可能性が得られる。論文はまた、いくつかの一般化と代替方向についても論じている。 This paper focuses on parameter estimation and introduces a new method for lower bounding the Bayesian risk. The method allows for the use of virtually \emph{any} information measure, including R\'enyi's $\alpha$, $\varphi$-Divergences, and Sibson's $\alpha$-Mutual Information. The approach considers divergences as functionals of measures and exploits the duality between spaces of measures and spaces of functions. In particular, we show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality. We are thus able to provide estimator-independent impossibility results thanks to the Data-Processing Inequalities that divergences satisfy. The results are then applied to settings of interest involving both discrete and continuous parameters, including the ``Hide-and-Seek'' problem, and compared to the state-of-the-art techniques. An important observation is that the behaviour of the lower bound in the number of samples is influenced by the choice of the information measure. We leverage this by introducing a new divergence inspired by the ``Hockey-Stick'' Divergence, which is demonstrated empirically to provide the largest lower-bound across all considered settings. If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities. The paper also discusses some generalisations and alternative directions.	翻訳日:2023-03-27 11:22:06 公開日:2023-03-24
# エゴセントリックビュー合成のための平衡球面格子 Balanced Spherical Grid for Egocentric View Synthesis ( http://arxiv.org/abs/2303.12408v2 ) ライセンス: Link先を確認	Changwoon Choi, Sang Min Kim, Young Min Kim	(参考訳) egonerfは,vr資産のための大規模実環境を再構築するための実用的なソリューションである。カジュアルにキャプチャされた360度ビデオの数秒を与えられたEgoNeRFは、ニューラルラジアンスフィールドを効率的に構築し、新しい視点から高品質なレンダリングを可能にする。特徴格子を用いた最近のNeRF加速により,従来のカルト座標の代わりに球面座標を採用する。カーテシアン・フィーチャー・グリッドは、視聴者からの距離に関係なく空間的に均一な解像度を持つため、大規模な境界のないシーンを表現するのに非効率である。球面パラメタライゼーションは、エゴ中心画像の光線との整合性が良く、性能向上のための分解が可能である。しかし、na\\\ 球面格子は2つの極における不規則性に悩まされており、非有界な場面も表現できない。極近傍の特異点を避けるため、2つの平衡格子を結合し、準一様角格子となる。また、指数関数的にラジアルグリッドを分割し、無限大の環境マップを非有界シーンを表す。さらに,グリッド方式の再サンプリング手法により,NeRFボリュームのトレーニングに有効なサンプル数を増やすことができる。今回紹介した合成および実世界エゴセントリック360度ビデオデータセットにおいて,本手法を広範囲に評価し,最先端の性能を一貫して達成した。 We present EgoNeRF, a practical solution to reconstruct large-scale real-world environments for VR assets. Given a few seconds of casually captured 360 video, EgoNeRF can efficiently build neural radiance fields which enable high-quality rendering from novel viewpoints. Motivated by the recent acceleration of NeRF using feature grids, we adopt spherical coordinate instead of conventional Cartesian coordinate. Cartesian feature grid is inefficient to represent large-scale unbounded scenes because it has a spatially uniform resolution, regardless of distance from viewers. The spherical parameterization better aligns with the rays of egocentric images, and yet enables factorization for performance enhancement. However, the na\"ive spherical grid suffers from irregularities at two poles, and also cannot represent unbounded scenes. To avoid singularities near poles, we combine two balanced grids, which results in a quasi-uniform angular grid. We also partition the radial grid exponentially and place an environment map at infinity to represent unbounded scenes. Furthermore, with our resampling technique for grid-based methods, we can increase the number of valid samples to train NeRF volume. We extensively evaluate our method in our newly introduced synthetic and real-world egocentric 360 video datasets, and it consistently achieves state-of-the-art performance.	翻訳日:2023-03-27 11:21:39 公開日:2023-03-24
# マルチエージェント軌道予測のための階層型ハイブリッド学習フレームワーク A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction ( http://arxiv.org/abs/2303.12274v3 ) ライセンス: Link先を確認	Yujun Jiao, Mingze Miao, Zhishuai Yin, Chunyuan Lei, Xu Zhu, Linzhen Nie and Bo Tao	(参考訳) 近隣のエージェントの正確な軌道予測は、複雑な場面で走行する自動運転車にとって重要である。近年提案されている手法の多くは,複雑な相互作用のエンコーディングの強みから,深層学習に基づくものである。しかし、過去の観測に重きを置き、スパースサンプルからの過渡的および偶発的相互作用を効果的に捉えることができないため、賞賛できない予測がしばしば発生する。本稿では,マルチエージェント軌道予測のための階層型ハイブリッド・フレームワークである深層学習(DL)と強化学習(RL)を提案し,マルチスケール相互作用によって形成される動きを予測することの課題に対処する。 DL段階では、トラフィックシーンは、中間レベルとグローバルレベルの異種相互作用をエンコードするためにTransformerスタイルのGNNを採用する複数の中間スケール異種グラフに分割される。 rlステージでは、dlステージで予測される重要な将来ポイントを利用して、トラフィックシーンをローカルなサブシーンに分割する。運動計画手順をエミュレートし、軌道予測を生成するため、車載キネマティクスモデルに組み込んだトランスフォーマーベースのPPO(Pximal Policy Optimization)を設計し、微視的相互作用の圧倒的な影響下で動作を計画する。多目的報酬はエージェント中心の精度とシーンワイド互換性のバランスをとるように設計されている。実験の結果,本提案手法はargoverse forecasting benchmarkの最先端技術に適合することがわかった。また、階層的な学習フレームワークがマルチスケールのインタラクションをキャプチャし、予測されたトラジェクトリの実現性とコンプライアンスを改善することも可視化された結果から明らかになった。 Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.	翻訳日:2023-03-27 11:21:14 公開日:2023-03-24
# スタイルRF:Zero-shot 3Dスタイルの神経放射場移動 StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields ( http://arxiv.org/abs/2303.10598v3 ) ライセンス: Link先を確認	Kunhao Liu, Fangneng Zhan, Yiwen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing	(参考訳) 3dスタイル転送は、3dシーンのスタイル化されたノベルビューをマルチビュー一貫性で描画することを目的としている。しかし、既存の作品の多くは正確な幾何学的再構成、高品質なスタイライゼーション、任意の新しいスタイルに一般化された3方向のジレンマに苦しめられている。放射場の特徴空間内でスタイル変換を行うことで3方向ジレンマを解消する3次元スタイル転送技術であるStyleRF(Style Radiance Fields)を提案する。 StyleRFは3Dシーンを表現するために高精細な特徴の明示的なグリッドを採用しており、ボリュームレンダリングによって高精細な形状を確実に復元することができる。さらに、グリッド機能は参照スタイルに従って変換され、高品質なゼロショットスタイル転送に直接繋がる。 StyleRFは2つの革新的な設計で構成されている。 1つ目はサンプリング不変なコンテンツ変換であり、この変換はサンプル化された3D点の全体統計に不変であり、したがってマルチビュー整合性を保証する。 2つ目は、3Dポイントの変換と同等の2D特徴写像の遅延型変換であるが、マルチビューの一貫性を損なうことなくメモリフットプリントを大幅に削減する。広範な実験により、stylerfは正確な形状再構成により優れた3dスタイライゼーション品質を達成し、ゼロショット方式で様々な新しいスタイルに一般化できることを示した。 3D style transfer aims to render stylized novel views of a 3D scene with multi-view consistency. However, most existing work suffers from a three-way dilemma over accurate geometry reconstruction, high-quality stylization, and being generalizable to arbitrary new styles. We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by performing style transformation within the feature space of a radiance field. StyleRF employs an explicit grid of high-level features to represent 3D scenes, with which high-fidelity geometry can be reliably restored via volume rendering. In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer. StyleRF consists of two innovative designs. The first is sampling-invariant content transformation that makes the transformation invariant to the holistic statistics of the sampled 3D points and accordingly ensures multi-view consistency. The second is deferred style transformation of 2D feature maps which is equivalent to the transformation of 3D points but greatly reduces memory footprint without degrading multi-view consistency. Extensive experiments show that StyleRF achieves superior 3D stylization quality with precise geometry reconstruction and it can generalize to various new styles in a zero-shot manner.	翻訳日:2023-03-27 11:20:49 公開日:2023-03-24
# カラースタイル伝達のためのニューラルプリセット Neural Preset for Color Style Transfer ( http://arxiv.org/abs/2303.13511v2 ) ライセンス: Link先を確認	Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W.H. Lau	(参考訳) 本稿では,視覚アーチファクトや膨大なメモリ要求,スロースタイルスイッチング速度など,既存のカラースタイル転送方法の制限に対処するためのニューラルプリセット手法を提案する。我々の手法は2つのコア設計に基づいている。まず,画像適応色マッピングマトリクスを介して各画素に対して一貫して動作し,アーティファクトを回避し,少ないメモリフットプリントで高解像度入力をサポートする決定論的ニューラルネットワークマッピング(dncm)を提案する。次に,カラー正規化とスタイライゼーションにタスクを分割し,カラースタイルをプリセットとして抽出し,正規化入力画像で再利用することで,効率的なスタイル切り替えを実現する2段階パイプラインを開発した。ペアワイズデータセットが利用できないため、自己教師型戦略を用いてNeural Presetをトレーニングする方法を解説する。既存の手法に対するニューラル・プリセットの様々な利点は包括的評価によって示される。特にneural presetは、アーティファクトなしで安定した4kカラースタイル転送を可能にする。さらに,本モデルでは,低照度画像強調,水中画像補正,デハージング,画像調和など,微調整の必要なく複数のアプリケーションを自然にサポートできることが示されている。デモのあるプロジェクトページ: https://zhkke.github.io/NeuralPreset 。 In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Mapping (DNCM) to consistently operate on each pixel via an image-adaptive color mapping matrix, avoiding artifacts and supporting high-resolution inputs with a small memory footprint. Second, we develop a two-stage pipeline by dividing the task into color normalization and stylization, which allows efficient style switching by extracting color styles as presets and reusing them on normalized input images. Due to the unavailability of pairwise datasets, we describe how to train Neural Preset via a self-supervised strategy. Various advantages of Neural Preset over existing methods are demonstrated through comprehensive evaluations. Notably, Neural Preset enables stable 4K color style transfer in real-time without artifacts. Besides, we show that our trained model can naturally support multiple applications without fine-tuning, including low-light image enhancement, underwater image correction, image dehazing, and image harmonization. Project page with demos: https://zhkkke.github.io/NeuralPreset .	翻訳日:2023-03-27 11:14:45 公開日:2023-03-24
# CLIP for All Things Zero-Shot Sketch-based Image Retrieval, Fine-Grained or not CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not ( http://arxiv.org/abs/2303.13440v2 ) ライセンス: Link先を確認	Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang, Yi-Zhe Song	(参考訳) 本稿では,ゼロショットスケッチに基づく画像検索(ZS-SBIR)にCLIPを利用する。私たちは、ファンデーションモデルにおける最近の進歩と、彼らが提供していると思われる非並列の一般化能力に大きく影響を受けています。我々は、このシナジーをいかに最適に達成するかという新しいデザインを、カテゴリー設定ときめ細かい設定("all")の両方のために提案した。私たちのソリューションの核心は、迅速な学習セットアップです。まず、スケッチ固有のプロンプトをファクタリングすることで、すでにカテゴリレベルのZS-SBIRシステムがあり、すべての先行芸術をオーバーシュートし(24.8%)、CLIPとZS-SBIRのシナジーを研究する上で大きな証拠となります。しかし、細かな設定に移行するのは難しく、このシナジーを深く掘り下げる必要がある。そのため、この問題のきめ細かいマッチング性に取り組むために、2つの具体的な設計を考え出した。 (i)スケッチと写真の相対的な分離がカテゴリ間で均一であることを保証するための追加の正規化損失。金本位制の三重項損失はそうではない。 (ii)スケッチとフォトのペア間のインスタンスレベルの構造的対応を確立するための巧妙なパッチシャッフル技術。これらの設計により、我々は以前の最先端よりも26.9%の領域での大幅な性能向上を再び観察する。提案されているクリップとプロンプト学習のパラダイムは、データ不足が大きな課題である他のスケッチ関連のタスク(zs-sbirに限らず)に取り組む上で、大きな可能性を秘めています。プロジェクトページ: https://aneeshan95.github.io/Sketch_LVM/ In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained setting ("all"). At the very core of our solution is a prompt learning setup. First we show just via factoring in sketch-specific prompts, we already have a category-level ZS-SBIR system that overshoots all prior arts, by a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR synergy. Moving onto the fine-grained setup is however trickier, and requires a deeper dive into this synergy. For that, we come up with two specific designs to tackle the fine-grained matching nature of the problem: (i) an additional regularisation loss to ensure the relative separation between sketches and photos is uniform across categories, which is not the case for the gold standard standalone triplet loss, and (ii) a clever patch shuffling technique to help establishing instance-level structural correspondences between sketch-photo pairs. With these designs, we again observe significant performance gains in the region of 26.9% over previous state-of-the-art. The take-home message, if any, is the proposed CLIP and prompt learning paradigm carries great promise in tackling other sketch-related tasks (not limited to ZS-SBIR) where data scarcity remains a great challenge. Project page: https://aneeshan95.github.io/Sketch_LVM/	翻訳日:2023-03-27 11:14:23 公開日:2023-03-24
# ゼロセグメントラベルを用いたゼロ誘導セグメンテーション Zero-guidance Segmentation Using Zero Segment Labels ( http://arxiv.org/abs/2303.13396v2 ) ライセンス: Link先を確認	Pitchaporn Rewatbowornwong, Nattanat Chatthee, Ekapol Chuangsuwanich, Supasorn Suwajanakorn	(参考訳) CLIPは新しくてエキサイティングな共同ビジョン言語アプリケーションを実現した。ひとつはオープン語彙セグメンテーションで、任意のテキストクエリの任意のセグメントを特定できる。本研究では,テキストクエリや事前定義されたクラスでユーザ誘導なしに意味セグメントを見つけ出し,自然言語で自動的にラベル付けすることができるか質問する。そこで本研究では,DINOとCLIPという2つの事前学習されたジェネラリストモデルを利用したゼロガイダンスセグメンテーションと第1ベースラインを提案する。一般的なアイデアは、まず画像を小さなオーバーセグメントに分割し、クリップのビジュアル言語空間にエンコードし、テキストラベルに変換し、意味的に類似したセグメントをマージすることだ。しかし、重要な課題は、視覚セグメントを、グローバルなコンテキスト情報とローカルなコンテキスト情報のバランスをとるセグメント固有の埋め込みにエンコードする方法だ。私たちの主な貢献は、CLIP内のアテンション層を分析することによって、2つのコンテキストのバランスをとる新しいアテンションマスキング技術です。この新しいタスクの評価のための指標もいくつか紹介する。 CLIPの生来の知識により、美術館の観衆の間でモナ・リザの絵を正確に見つけることができる。プロジェクトページ: https://zero-guide-seg.github.io/ CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabulary segmentation, which can locate any segment given an arbitrary text query. In our research, we ask whether it is possible to discover semantic segments without any user guidance in the form of text queries or predefined classes, and label them using natural language automatically? We propose a novel problem zero-guidance segmentation and the first baseline that leverages two pre-trained generalist models, DINO and CLIP, to solve this problem without any fine-tuning or segmentation dataset. The general idea is to first segment an image into small over-segments, encode them into CLIP's visual-language space, translate them into text labels, and merge semantically similar segments together. The key challenge, however, is how to encode a visual segment into a segment-specific embedding that balances global and local context information, both useful for recognition. Our main contribution is a novel attention-masking technique that balances the two contexts by analyzing the attention layers inside CLIP. We also introduce several metrics for the evaluation of this new task. With CLIP's innate knowledge, our method can precisely locate the Mona Lisa painting among a museum crowd. Project page: https://zero-guide-seg.github.io/.	翻訳日:2023-03-27 11:13:53 公開日:2023-03-24
# GETT-QA:知識グラフ質問応答のためのグラフ埋め込みベースのT2T変換器 GETT-QA: Graph Embedding based T2T Transformer for Knowledge Graph Question Answering ( http://arxiv.org/abs/2303.13284v2 ) ライセンス: Link先を確認	Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann	(参考訳) 本稿では, GETT-QA というエンドツーエンドの知識グラフ質問応答システムを提案する。 GETT-QAは、人気のあるテキストからテキストまでの事前訓練言語モデルであるT5を使用している。このモデルは自然言語を入力とし、意図したSPARQLクエリのよりシンプルな形式を生成する。単純な形式では、モデルは直接エンティティと関係IDを生成しない。代わりに、対応するエンティティと関係ラベルを生成する。ラベルは、その後のステップでkgエンティティとリレーションシップidに接地される。結果をさらに改善するため、各エンティティに対してKG埋め込みの切り離されたバージョンを作成するようモデルに指示する。切断されたkg埋め込みは、曖昧さの目的をより細かく探索することができる。その結果,T5 は損失関数の変化を伴わずに絡み合った KG 埋め込みを学習でき,KGQA 性能が向上することがわかった。その結果, LC-QuAD 2.0 と SimpleQuestions-Wikidata のデータセットを Wikidata 上のエンドツーエンド KGQA 上に構築した。 In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces corresponding entity and relation labels. The labels are grounded to KG entity and relation IDs in a subsequent step. To further improve the results, we instruct the model to produce a truncated version of the KG embedding for each entity. The truncated KG embedding enables a finer search for disambiguation purposes. We find that T5 is able to learn the truncated KG embeddings without any change of loss function, improving KGQA performance. As a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata datasets on end-to-end KGQA over Wikidata.	翻訳日:2023-03-27 11:13:30 公開日:2023-03-24
# 不完全ラベルを用いた複数ラベル認識のための構造化セマンティック先行探索 Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels ( http://arxiv.org/abs/2303.13223v2 ) ライセンス: Link先を確認	Zixuan Ding, Ao Wang, Hui Chen, Qiang Zhang, Pengzhang Liu, Yongjun Bao, Weipeng Yan, Jungong Han	(参考訳) 不完全なラベルを持つマルチラベル認識(MLR)は非常に難しい。近年、視覚言語モデルである \ie, clip で画像とラベルの対応を探求し、アノテーションの不足を補う研究が進められている。有望なパフォーマンスにもかかわらず、彼らは一般にラベルとラベルの対応について価値ある事前を見落としている。本稿では,semantic prior prompter によるラベル間対応の構造化された意味を導出することにより,不完全なラベルを持つmlrのラベル管理の欠如を解消することを推奨する。次に、構造化されたセマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティックス・ネットワーク(SCPNet)を提案する。さらに,事前学習の促進を図るために,事前学習法が導入された。ベンチマークデータセットの総合的な実験と解析により,提案手法が既存の手法を全データセットで大幅に上回っており,提案手法の有効性と優越性が実証されている。私たちのコードはhttps://github.com/jameslahm/scpnetで利用可能です。 Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, \ie, CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method. Our code will be available at https://github.com/jameslahm/SCPNet.	翻訳日:2023-03-27 11:13:16 公開日:2023-03-24
# チャネルワイズ変換による特徴蒸留のためのシンプルで汎用的なフレームワーク A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation ( http://arxiv.org/abs/2303.13212v2 ) ライセンス: Link先を確認	Ziwei Liu, Yongtao Wang, Xiaojie Chu	(参考訳) 知識蒸留は、大きな教師モデルから小さな学生モデルに模倣して知識を伝達する一般的な手法である。しかし,教師と生徒間で特徴マップを直接調整することで,生徒に過度に厳格な制約を課すことができるため,学生モデルの性能は低下する。上記の特徴の不一致問題を軽減するため,既存の研究は教師と生徒の特徴マップをピクセルワイドな変換で空間的に整列させることに重点を置いている。本稿では,教師と生徒の特徴マップをチャネル次元に沿って整列させることが,特徴的不一致問題への対処に有効であることを新たに発見する。具体的には,教師モデルと教師モデルの特徴を整合させるために,学習可能な非線形チャネル回り変換を提案する。そこで,我々はさらに,蒸留損失とタスク固有損失のバランスをとるためのハイパーパラメータを1つだけ備えた,シンプルで汎用的な機能蒸留フレームワークを提案する。 Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. コードは公開される予定だ。 Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.	翻訳日:2023-03-27 11:12:55 公開日:2023-03-24
# クラス増分学習のための適応正規化 Adaptive Regularization for Class-Incremental Learning ( http://arxiv.org/abs/2303.13113v2 ) ライセンス: Link先を確認	Elif Ceren Gok Yildirim, Murat Onur Yildirim, Mert Kilickaya, Joaquin Vanschoren	(参考訳) クラスインクリメンタルラーニングは、以前に観測されたクラスの精度を維持しながら、新しいカテゴリで深い分類器を更新する。ニューラルネットワークの重み付けを正則化することは、新しいクラスを学習しながら学習したクラスを忘れることを防ぐ一般的な方法である。しかし、既存の正則化器は学習セッションを通して一定等級を使い、漸進的な学習で遭遇するタスクの難しさのレベルを反映していない可能性がある。本研究は,課題の複雑さに応じて動的に正則化強度を調節する授業インクリメンタルラーニングにおける適応正則化の必要性について検討する。ベイズ最適化に基づく学習タスクごとに最適な正則化量を自動的に決定する手法を提案する。 2つの正規化器による2つのデータセットの実験は、正確で忘れられない視覚的漸進学習を実現するための適応正規化の重要性を示している。 Class-Incremental Learning updates a deep classifier with new categories while maintaining the previously observed class accuracy. Regularizing the neural network weights is a common method to prevent forgetting previously learned classes while learning novel ones. However, existing regularizers use a constant magnitude throughout the learning sessions, which may not reflect the varying levels of difficulty of the tasks encountered during incremental learning. This study investigates the necessity of adaptive regularization in Class-Incremental Learning, which dynamically adjusts the regularization strength according to the complexity of the task at hand. We propose a Bayesian Optimization-based approach to automatically determine the optimal regularization magnitude for each learning task. Our experiments on two datasets via two regularizers demonstrate the importance of adaptive regularization for achieving accurate and less forgetful visual incremental learning.	翻訳日:2023-03-27 11:12:29 公開日:2023-03-24
# 容積型医用画像分割のための可変ハイブリッドネットワーク A Permutable Hybrid Network for Volumetric Medical Image Segmentation ( http://arxiv.org/abs/2303.13111v2 ) ライセンス: Link先を確認	Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen	(参考訳) 視覚トランスフォーマー(vit)の出現は、3dボリュームベンチマーク、特に3d医療画像セグメンテーションの大幅な進歩をもたらした。同時に、Multi-Layer Perceptron(MLP)ネットワークは、重い自己保持モジュールを除外したにもかかわらず、ViTに匹敵する結果により、研究者の間で人気を取り戻している。本稿では,畳み込みニューラルネットワーク (CNN) と MLP の利点を利用する,PHNet という医用画像分割のための可変ハイブリッドネットワークを提案する。 PHNetは2次元CNNと3次元CNNの両方を用いて3次元ボリュームデータの固有等方性問題に対処する。また, 位置情報を保持しながら長距離依存を得ることにより, 元のmlpを増大させるmlppという, 効率的な多層透過型パーセプトロンモジュールを提案する。大規模な実験結果によると、PHNetは2つのパブリックデータセット、すなわちCOVID-19-20とSynapseで最先端の手法より優れている。さらに, PHNet が CNN および MLP の強度に有効であることを示す。コードは受理後、一般に公開されます。 The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.	翻訳日:2023-03-27 11:12:17 公開日:2023-03-24
# OCELOT:病理組織学のための組織データセット上のオーバーラップ細胞 OCELOT: Overlapped Cell on Tissue Dataset for Histopathology ( http://arxiv.org/abs/2303.13110v2 ) ライセンス: Link先を確認	Jeongun Ryu, Aaron Valero Puche, JaeWoong Shin, Seonwook Park, Biagio Brattoli, Jinhee Lee, Wonkyung Jung, Soo Ick Cho, Kyunghyun Paeng, Chan-Young Ock, Donggeun Yoo, S\'ergio Pereira	(参考訳) 細胞検出は計算病理学の基本的な課題であり、全スライディング画像から高レベルの医療情報を抽出するのに使用できる。正確な細胞検出のために、病理学者は組織レベルの構造を理解するためにズームアウトし、その形態と周囲の状況に基づいて細胞を分類する。しかしながら、細胞検出モデルにおける病理学者のこのような行動を反映しようとする努力の欠如は、主に重複した領域を持つ細胞と組織の両方を含むデータセットの欠如によるものである。この制限を克服するために,組織学における細胞検出のための細胞間関係の研究を目的としたデータセットOCELOTを提案する。 OCELOTは複数の臓器から取得した画像に重複する細胞および組織アノテーションを提供する。この設定内では,細胞と組織の両方のタスクを同時に学習できるマルチタスク学習手法も提案する。細胞検出タスクのみで訓練されたモデルと比較すると,提案手法はOCELOT,パブリックTIGER,内部CARPデータセットの3つのデータセット上での細胞検出性能を向上させる。特にOCELOTテストセットでは、F1スコアが最大6.79改善されている。我々は,OCELOTデータセットをhttps://lunit-io.github.io/research/publications/ocelotでリリースすることを含め,本論文のコントリビューションは,計算病理学に細胞-組織関係を組み込む上で重要な研究方向への重要な出発点であると考えている。 Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at https://lunit-io.github.io/research/publications/ocelot are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.	翻訳日:2023-03-27 11:11:54 公開日:2023-03-24
# 合成分析によるトップダウン視覚注意 Top-Down Visual Attention from Analysis by Synthesis ( http://arxiv.org/abs/2303.13043v2 ) ライセンス: Link先を確認	Baifeng Shi, Trevor Darrell, Xin Wang	(参考訳) 現在の注意アルゴリズム(例えば、自己注意)は刺激駆動であり、画像内のすべての有能な物体をハイライトする。しかしながら、人間のような知的エージェントは、手前の高レベルなタスクに基づいて注意を誘導し、タスク関連のオブジェクトのみに焦点を当てることが多い。このタスク誘導トップダウンアテンションの能力は、タスク適応表現を提供し、モデルが様々なタスクに一般化するのに役立つ。本稿では,古典的分析合成(AbS)による視覚の視点からトップダウンの注意を考察する。先行研究は,視覚注意とスパース再構成との間の機能的等価性を示し,目標指向トップダウン信号によって変調される類似スパース再構築目標を最適化するabs視覚システムは,自然にトップダウン注意をシミュレートすることを示す。さらに、AbSを変動的に近似するトップダウン変調ViTモデルであるAbSViT(Analytic-by-Synthesis Vision Transformer)を提案する。現実世界のアプリケーションでは、AbSViTは、VQAやゼロショット検索などのビジョン言語タスクのベースラインを一貫して改善し、言語がトップダウンの注意を導く。 AbSViTは一般的なバックボーンとしても機能し、分類、セマンティックセグメンテーション、モデルロバスト性が改善される。 Current attention algorithms (e.g., self-attention) are stimulus-driven and highlight all the salient objects in an image. However, intelligent agents like humans often guide their attention based on the high-level task at hand, focusing only on task-related objects. This ability of task-guided top-down attention provides task-adaptive representation and helps the model generalize to various tasks. In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision. Prior work indicates a functional equivalence between visual attention and sparse reconstruction; we show that an AbS visual system that optimizes a similar sparse reconstruction objective modulated by a goal-directed top-down signal naturally simulates top-down attention. We further propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and achieves controllable top-down attention. For real-world applications, AbSViT consistently improves over baselines on Vision-Language tasks such as VQA and zero-shot retrieval where language guides the top-down attention. AbSViT can also serve as a general backbone, improving performance on classification, semantic segmentation, and model robustness.	翻訳日:2023-03-27 11:11:31 公開日:2023-03-24

Title

Authors

Abstract

論文公表日・翻訳日

# ビデオ予習型変圧器:予習型エキスパートのマルチモーダル混合

Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts ( http://arxiv.org/abs/2304.10505v1 )

ライセンス: Link先を確認

Kastan Day, Daniel Christl, Rohan Salvi, Pranav Sriram

(参考訳) ビデオプリトレーニングトランスを提案する。 VPTは以前の作業から4つのSOTAエンコーダモデルを使用して、ビデオをコンパクトな埋め込みのシーケンスに変換する。我々のバックボーンは、参照Flan-T5-11Bアーキテクチャに基づいて、エンコーダモデルの非線形和であるビデオの普遍的な表現を学習する。自動回帰因果言語モデリングの損失を利用して学習し、YouTubeビデオで話される単語を予測する。最後に、各タスクの完全連結予測ヘッドをトレーニングすることにより、標準下流ベンチマークを評価する。私たちの知る限りでは、これは"embedding -> backbone -> prediction head"デザインパターンにおけるエンコーダとして、複数の凍結したsomaモデルの最初の使用です。さらに、明示的なScene Graph情報を追加することで、現在のSOTAであるMelot Reserveよりも多くのモダリティが含まれています。これら2つの理由から、SOTAのパフォーマンスを達成するために、世界で最も優れたオープンソースモデルを組み合わせることができると考えています。最初の実験は、モデルを適切に学習していることを実証するが、より多くの実験と計算が必要である。この作業に加えて、私たちはYT-20Mデータセットを構築し、それを再現し、25,000人の個人が選んだYouTubeビデオをコーパスに追加しました。すべてのコードとモデルチェックポイントは、標準のMITライセンスの下でオープンソース化されている。

We present Video Pre-trained Transformer. VPT uses four SOTA encoder models from prior work to convert a video into a sequence of compact embeddings. Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models. It learns using an autoregressive causal language modeling loss by predicting the words spoken in YouTube videos. Finally, we evaluate on standard downstream benchmarks by training fully connected prediction heads for each task. To the best of our knowledge, this is the first use of multiple frozen SOTA models as encoders in an "embedding -> backbone -> prediction head" design pattern - all others have trained their own joint encoder models. Additionally, we include more modalities than the current SOTA, Merlot Reserve, by adding explicit Scene Graph information. For these two reasons, we believe it could combine the world's best open-source models to achieve SOTA performance. Initial experiments demonstrate the model is learning appropriately, but more experimentation and compute is necessary, and already in progress, to realize our loftier goals. Alongside this work, we build on the YT-20M dataset, reproducing it and adding 25,000 personally selected YouTube videos to its corpus. All code and model checkpoints are open sourced under a standard MIT license.

翻訳日:2023-04-23 03:58:52 公開日:2023-03-24

# 特性曲線の変換グラフィカル特徴とCBAMモジュールによる畳み込みニューラルネットワークに基づくダスト衝突を考慮したPVアレイの故障診断

Fault diagnosis for PV arrays considering dust impact based on transformed graphical feature of characteristic curves and convolutional neural network with CBAM modules ( http://arxiv.org/abs/2304.06493v1 )

ライセンス: Link先を確認

Jiaqi Qu, Lu Wei, Qiang Sun, Hamidreza Zareipour, Zheng Qian

(参考訳) PVアレイの動作中に様々な障害が発生し、塵の影響のある動作条件と様々なダイオード構成の両方が断層をより複雑にする。しかし、I-V特性曲線に基づく現在の故障診断法は部分的特徴情報しか利用せず、しばしばフィールド特性曲線を標準試験条件(STC)に校正することに頼っている。 pvアレイの異なるブロッキングダイオード構成に類似性を持つ複数の複雑な欠陥を,ダストの影響下で正確に同定することは,実際に適用することは困難である。そこで, ダスト衝突を考慮した新しいPVアレイ故障診断法を提案する。プリプロセッシング段階では、Isc-Voc正規化文法角差場(GADF)法が提示され、I-VおよびP-Vを含むフィールドから再サンプリングされたPVアレイ特性曲線を正規化し変換し、変換されたグラフィカル特徴行列を得る。そして、障害診断段階では、畳み込みブロック注意モジュール(CBAM)を用いた畳み込みニューラルネットワーク(CNN)モデルが、完全な特徴情報を含む変換されたグラフィカル行列から断層識別情報を抽出し、故障を分類するように設計されている。また,シミュレーション事例を用いて異なる特徴変換法を比較し,cnnに基づく分類法も分析した。その結果,様々な動作条件下で異なるブロッキングダイオード構成を持つPVアレイの開発手法は,高い故障診断精度と信頼性を有することがわかった。

Various faults can occur during the operation of PV arrays, and both the dust-affected operating conditions and various diode configurations make the faults more complicated. However, current methods for fault diagnosis based on I-V characteristic curves only utilize partial feature information and often rely on calibrating the field characteristic curves to standard test conditions (STC). It is difficult to apply it in practice and to accurately identify multiple complex faults with similarities in different blocking diodes configurations of PV arrays under the influence of dust. Therefore, a novel fault diagnosis method for PV arrays considering dust impact is proposed. In the preprocessing stage, the Isc-Voc normalized Gramian angular difference field (GADF) method is presented, which normalizes and transforms the resampled PV array characteristic curves from the field including I-V and P-V to obtain the transformed graphical feature matrices. Then, in the fault diagnosis stage, the model of convolutional neural network (CNN) with convolutional block attention modules (CBAM) is designed to extract fault differentiation information from the transformed graphical matrices containing full feature information and to classify faults. And different graphical feature transformation methods are compared through simulation cases, and different CNN-based classification methods are also analyzed. The results indicate that the developed method for PV arrays with different blocking diodes configurations under various operating conditions has high fault diagnosis accuracy and reliability.

翻訳日:2023-04-16 21:48:28 公開日:2023-03-24

# PromptORE - 完全教師なし関係抽出に向けた新しいアプローチ

PromptORE -- A Novel Approach Towards Fully Unsupervised Relation Extraction ( http://arxiv.org/abs/2304.01209v1 )

ライセンス: Link先を確認

Pierre-Yves Genest (Alteca, DRIM), Pierre-Edouard Portier (DRIM), El\"od Egyed-Zsigmond (DRIM), Laurent-Walter Goix (Alteca)

(参考訳) unsupervised relation extraction (re)は、トレーニング中にラベル付きデータにアクセスせずに、テキスト内のエンティティ間の関係を識別することを目的としている。この設定は、アノテーション付きデータセットが利用できないドメイン固有のREと、関係のタイプが未知のオープンドメインREに特に関係している。最近のアプローチでは有望な結果が得られるが、チューニングがラベル付きデータを必要とすることが多いハイパーパラメータに大きく依存している。ハイパーパラメータへの依存を軽減するため,'Prompt-based Open Relation extract'モデルであるPromptOREを提案する。我々は,教師なし設定で作業するために,新しいプロンプト・チューニング・パラダイムを適用し,関係を表す文を埋め込む。次に、これらの埋め込みをクラスタ化して候補関係を発見し、適切なクラスタ数を自動的に見積もるさまざまな戦略を実験します。我々の知る限りでは、PromptOREはハイパーパラメータチューニングを必要としない最初の教師なしREモデルである。 3つの一般および特定のドメインデータセットの結果から、PromptOREはB3、V測定、ARIの40%以上の相対的なゲインを持つ最先端モデルよりも一貫して優れていた。定性的分析はまた、真の関係に非常に近い意味的コヒーレントなクラスタを特定できる PromptORE の能力を示している。

Unsupervised Relation Extraction (RE) aims to identify relations between entities in text, without having access to labeled data during training. This setting is particularly relevant for domain specific RE where no annotated dataset is available and for open-domain RE where the types of relations are a priori unknown. Although recent approaches achieve promising results, they heavily depend on hyperparameters whose tuning would most often require labeled data. To mitigate the reliance on hyperparameters, we propose PromptORE, a ''Prompt-based Open Relation Extraction'' model. We adapt the novel prompt-tuning paradigm to work in an unsupervised setting, and use it to embed sentences expressing a relation. We then cluster these embeddings to discover candidate relations, and we experiment different strategies to automatically estimate an adequate number of clusters. To the best of our knowledge, PromptORE is the first unsupervised RE model that does not need hyperparameter tuning. Results on three general and specific domain datasets show that PromptORE consistently outperforms state-of-the-art models with a relative gain of more than 40% in B 3 , V-measure and ARI. Qualitative analysis also indicates PromptORE's ability to identify semantically coherent clusters that are very close to true relations.

翻訳日:2023-04-09 05:34:10 公開日:2023-03-24

# 機械学習に基づくスピン構造検出

Machine learning-based spin structure detection ( http://arxiv.org/abs/2303.16905v1 )

ライセンス: Link先を確認

Isaac Labrie-Boulay, Thomas Brian Winkler, Daniel Franzen, Alena Romanova, Hans Fangohr, Mathias Kl\"aui

(参考訳) 最も重要な磁気スピン構造の一つは、トポロジカルに安定化されたスカイミオン準粒子である。その興味深い物理的性質は、メモリと効率的なニューロモルフィック計算スキームの候補となる。装置操作には、スキャミオンの位置、形状、大きさの検出が必要であり、一般的に磁気イメージングが用いられる。しばしば用いられる技術は、試料の組成、温度、材料成長手順などによって、ノイズ、低コントラスト、強度勾配などの光学的成果物に悩まされる、磁気光学のカー顕微鏡である。従来の画像解析パッケージは手作業による処理が必要であり、より自動化されたソリューションが必要である。我々は,この測定におけるスカイミオンの位置と形状を検出するために,セグメンテーション問題に特化して設計された畳み込みニューラルネットワークについて報告する。ネットワークは選択した手法で調整され、予測を最適化し、特に検出されたクラス数でパフォーマンスを制御できる。本研究の結果から, よく訓練されたネットワークが, 磁気顕微鏡におけるデータ前処理の自動化に有効であることが示唆された。アプローチは他のスピン構造や他の磁気イメージング手法に容易に拡張できる。

One of the most important magnetic spin structure is the topologically stabilised skyrmion quasi-particle. Its interesting physical properties make them candidates for memory and efficient neuromorphic computation schemes. For the device operation, detection of the position, shape, and size of skyrmions is required and magnetic imaging is typically employed. A frequently used technique is magneto-optical Kerr microscopy where depending on the samples material composition, temperature, material growing procedures, etc., the measurements suffer from noise, low-contrast, intensity gradients, or other optical artifacts. Conventional image analysis packages require manual treatment, and a more automatic solution is required. We report a convolutional neural network specifically designed for segmentation problems to detect the position and shape of skyrmions in our measurements. The network is tuned using selected techniques to optimize predictions and in particular the number of detected classes is found to govern the performance. The results of this study shows that a well-trained network is a viable method of automating data pre-processing in magnetic microscopy. The approach is easily extendable to other spin structures and other magnetic imaging methods.

翻訳日:2023-04-02 18:16:22 公開日:2023-03-24

# 気候モデルにおけるサブグリッドパラメータ化のデータ駆動型マルチスケールモデリング

Data-driven multiscale modeling of subgrid parameterizations in climate models ( http://arxiv.org/abs/2303.17496v1 )

ライセンス: Link先を確認

Karl Otness, Laure Zanna, Joan Bruna

(参考訳) 現在の気候モデルの解像度以下の物理過程を表すサブグリッドパラメータ化は、気候の正確な長期予測を生成する上で重要な要素である。これらのコンポーネントを設計するための様々なアプローチがテストされている。本研究では,この予測問題に対する多元的アプローチを示す概念実証について評価する。テストベッドモデルのサブグリッド強制値を予測するためにニューラルネットワークを訓練し、細かな方向と粗い方向の両方で追加情報を用いて得られる予測精度の向上を検討する。

Subgrid parameterizations, which represent physical processes occurring below the resolution of current climate models, are an important component in producing accurate, long-term predictions for the climate. A variety of approaches have been tested to design these components, including deep learning methods. In this work, we evaluate a proof of concept illustrating a multiscale approach to this prediction problem. We train neural networks to predict subgrid forcing values on a testbed model and examine improvements in prediction accuracy that can be obtained by using additional information in both fine-to-coarse and coarse-to-fine directions.

翻訳日:2023-04-02 18:10:57 公開日:2023-03-24

# 非クリニカルテキスト情報検索による癌関連フォーラムポストの効率的なラベル付け

Computationally Efficient Labeling of Cancer Related Forum Posts by Non-Clinical Text Information Retrieval ( http://arxiv.org/abs/2303.16766v1 )

ライセンス: Link先を確認

Jimmi Agerskov, Kristian Nielsen, Christian Marius Lillelund, Christian Fischer Pedersen

(参考訳) 癌に関する情報はオンラインで豊富に存在するが、有用な情報を分類し抽出することは困難である。医療データ処理における研究のほとんどは、正式な臨床データに関するものだが、非臨床データにも貴重な情報がある。本研究は, 分散コンピューティング, テキスト検索, クラスタリング, 分類の手法をコヒーレントかつ計算効率の良いシステムに統合し, 非臨床的かつ自由に利用可能な情報に基づいて癌患者の軌跡を明らかにする。我々は,非クリニカルフォーラムポストから癌軌跡情報を検索し,収集し,提示できる完全機能プロトタイプを作成した。我々は3つのクラスタリングアルゴリズム (MR-DBSCAN, DBSCAN, HDBSCAN) を評価し, 得られたポスト数と近傍半径の関数として, 調整された乱数指数と総実行時間を比較した。クラスタリングの結果は, 周辺半径がクラスタリング性能に最も大きな影響を与えることを示している。小さな値の場合、データセットはそれに従って分割されるが、高い値は多数のパーティションを生成し、最適なパーティションを探すのに時間を要する。適切な推定半径で、MR-DBSCANは、DBSCAN (143.4) や HDBSCAN (282.3) と比較して、50000のフォーラムポストを46.1秒でクラスタリングすることができる。デンマーク癌学会とインタビューを行い,ソフトウェアプロトタイプについて紹介する。この組織は、がんに関するオンライン情報を民主化し、そのようなシステムが将来必要となると予測できるソフトウェアの可能性を見込んでいる。

An abundance of information about cancer exists online, but categorizing and extracting useful information from it is difficult. Almost all research within healthcare data processing is concerned with formal clinical data, but there is valuable information in non-clinical data too. The present study combines methods within distributed computing, text retrieval, clustering, and classification into a coherent and computationally efficient system, that can clarify cancer patient trajectories based on non-clinical and freely available information. We produce a fully-functional prototype that can retrieve, cluster and present information about cancer trajectories from non-clinical forum posts. We evaluate three clustering algorithms (MR-DBSCAN, DBSCAN, and HDBSCAN) and compare them in terms of Adjusted Rand Index and total run time as a function of the number of posts retrieved and the neighborhood radius. Clustering results show that neighborhood radius has the most significant impact on clustering performance. For small values, the data set is split accordingly, but high values produce a large number of possible partitions and searching for the best partition is hereby time-consuming. With a proper estimated radius, MR-DBSCAN can cluster 50000 forum posts in 46.1 seconds, compared to DBSCAN (143.4) and HDBSCAN (282.3). We conduct an interview with the Danish Cancer Society and present our software prototype. The organization sees a potential in software that can democratize online information about cancer and foresee that such systems will be required in the future.

翻訳日:2023-03-31 15:51:42 公開日:2023-03-24

# LLM for patient-Trial Matching: パフォーマンスと一般化性向上に向けたプライバシ対応データ拡張

LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability ( http://arxiv.org/abs/2303.16756v1 )

ライセンス: Link先を確認

Jiayi Yuan, Ruixiang Tang, Xiaoqian Jiang, Xia Hu

(参考訳) 患者と適切な臨床試験を合わせるプロセスは、医学研究を進め、最適なケアを提供するために不可欠である。しかし、現在のアプローチでは、データの標準化、倫理的考察、電子健康記録(EHR)と臨床試験基準との相互運用性の欠如といった課題に直面している。本稿では,ehlsと臨床試験記述との互換性を改善するために,それらの高度な自然言語生成能力を活用することで,これらの課題に対処するための大規模言語モデル(llms)の可能性を検討する。本稿では,LLMに基づく患者心電図マッチング(LLM-PTM)のための革新的なプライバシ・アウェアなデータ拡張手法を提案する。本実験では, LLM-PTM法を用いて平均性能を7.32%向上させ, 新データの一般化性を12.12%向上させた。さらに,本手法の有効性をさらに説明し,基礎となる原理をより深く理解するためのケーススタディを提示する。

The process of matching patients with suitable clinical trials is essential for advancing medical research and providing optimal care. However, current approaches face challenges such as data standardization, ethical considerations, and a lack of interoperability between Electronic Health Records (EHRs) and clinical trial criteria. In this paper, we explore the potential of large language models (LLMs) to address these challenges by leveraging their advanced natural language generation capabilities to improve compatibility between EHRs and clinical trial descriptions. We propose an innovative privacy-aware data augmentation approach for LLM-based patient-trial matching (LLM-PTM), which balances the benefits of LLMs while ensuring the security and confidentiality of sensitive patient data. Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%. Additionally, we present case studies to further illustrate the effectiveness of our approach and provide a deeper understanding of its underlying principles.

翻訳日:2023-03-31 15:50:19 公開日:2023-03-24

# スクイーズと励磁によるスウィントランスを用いた表情認識

Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers ( http://arxiv.org/abs/2301.10906v3 )

ライセンス: Link先を確認

Arpita Vats, Aman Chadha

(参考訳) 本稿では,swin vision transformersとswing and excitation block (se)を併用した表情感情認識フレームワークを提案する。近年,視覚課題に対処するために注意機構に基づくトランスフォーマーモデルが提案されている。本稿では,Squeeze Excitation block (SE) と sharpness-aware minimalr (SAM) を用いた視覚変換器を提案する。ハイブリッドデータセットを使用して、モデルとAffectNetデータセットをトレーニングし、モデルの結果を評価しました。

We present a facial emotion recognition framework, built upon Swin vision Transformers jointly with squeeze and excitation block (SE). A transformer model based on an attention mechanism has been presented recently to address vision tasks. Our method uses a vision transformer with a Squeeze excitation block (SE) and sharpness-aware minimizer (SAM). We have used a hybrid dataset, to train our model and the AffectNet dataset to evaluate the result of our model

翻訳日:2023-03-29 18:50:35 公開日:2023-03-24

# 深層学習における数学的挑戦

Mathematical Challenges in Deep Learning ( http://arxiv.org/abs/2303.15464v1 )

ライセンス: Link先を確認

Vahid Partovi Nia, Guojun Zhang, Ivan Kobyzev, Michael R. Metel, Xinlin Li, Ke Sun, Sobhan Hemati, Masoud Asgharian, Linglong Kong, Wulong Liu, Boxing Chen

(参考訳) 2012年のImageNetチャレンジ以来、ディープモデルは人工知能(AI)業界を支配している。それ以来、深層モデルのサイズは増え続けており、携帯電話、パーソナルコンピュータ、自動運転車、無線基地局など、この分野に新たな課題をもたらしている。ここでは、これらの課題を数学者、統計学者、理論計算機科学者と伝達するために、トレーニング、推論、一般化境界、最適化を含む一連の問題を列挙する。これは、長期的な技術産業に利益をもたらすディープラーニングの研究課題の主観的な見解である。

Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run.

翻訳日:2023-03-29 18:05:06 公開日:2023-03-24

# ネットワーク特性を利用した誤入力検出

Utilizing Network Properties to Detect Erroneous Inputs ( http://arxiv.org/abs/2002.12520v3 )

ライセンス: Link先を確認

Matt Gorbett, Nathaniel Blanchard

(参考訳) ニューラルネットワークは、敵、腐敗、配布外、誤分類例など、幅広い誤入力に対して脆弱である。本研究では、線形SVM分類器をトレーニングし、事前学習ニューラルネットワークの隠れおよびソフトマックス特徴ベクトルを用いて、これらの4種類の誤データを検出する。以上の結果から,誤りデータ型は一般に,適切な例から線形に分離可能なアクティベーション特性を示し,余分なトレーニングやオーバーヘッドを伴わずに悪い入力を拒否できることがわかった。我々は、さまざまなデータセット、ドメイン、事前訓練されたモデル、および敵攻撃で、我々の発見を実験的に検証した。

Neural networks are vulnerable to a wide range of erroneous inputs such as adversarial, corrupted, out-of-distribution, and misclassified examples. In this work, we train a linear SVM classifier to detect these four types of erroneous data using hidden and softmax feature vectors of pre-trained neural networks. Our results indicate that these faulty data types generally exhibit linearly separable activation properties from correct examples, giving us the ability to reject bad inputs with no extra training or overhead. We experimentally validate our findings across a diverse range of datasets, domains, pre-trained models, and adversarial attacks.

翻訳日:2023-03-29 05:10:16 公開日:2023-03-24

# 学習に基づくデモサイシング,デノイジング,超解像パイプラインの再考

Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline ( http://arxiv.org/abs/1905.02538v3 )

ライセンス: Link先を確認

Guocheng Qian, Yuanhao Wang, Jinjin Gu, Chao Dong, Wolfgang Heidrich, Bernard Ghanem, Jimmy S. Ren

(参考訳) イメージングは通常、不完全色サンプリング、ノイズ劣化、解像度制限の混合問題である。この混合問題は典型的にはデモサイシング(dm)、デノイジング(dn)、スーパーレゾリューション(sr)を固定および事前定義されたパイプライン(タスクの実行順序)dm$\to$dn$\to$srで順次適用する逐次解によって解決される。画像処理に関する最近の研究は、より高い画質を実現するためのより洗練されたアーキテクチャの開発に焦点を当てている。パイプラインの設計にはほとんど注意が払われておらず、パイプラインが画像品質にどの程度重要かは、まだ明らかではない。本研究では,学習ベースDN,DM,SRの混合問題に対するパイプラインの効果を,逐次的および共同解法の両方において包括的に研究する。一方で、シーケンシャルなソリューションでは、パイプラインが結果の画像品質に非自明な影響を与えていることが分かりました。我々の提案するパイプラインDN$\to$SR$\to$DMは、様々な実験設定やベンチマークにおいて、他のシーケンシャルパイプラインよりも一貫してパフォーマンスが向上する。一方,共同ソリューションでは,混合問題に対する最先端の性能を実現するエンドツーエンドトリニティ画素拡張ネットワーク(tenet)を提案する。さらに,分離可能なヘッドを用いた中間管理を提供することにより,特定のパイプラインを所定のエンドツーエンドネットワークに統合する,新規でシンプルな手法を提案する。広範な実験により、提案するパイプラインとのエンドツーエンドネットワークは、一貫性があるが重要でない改善しか達成できないことが示された。私たちの研究は、パイプラインの調査はシーケンシャルなソリューションに適用できるが、エンドツーエンドネットワークではそれほど必要ではないことを示している。 RR{Code, model, and our contribute PixelShift200 data is available at \url{https://github.com/guochengqian/TENet}

Imaging is usually a mixture problem of incomplete color sampling, noise degradation, and limited resolution. This mixture problem is typically solved by a sequential solution that applies demosaicing (DM), denoising (DN), and super-resolution (SR) sequentially in a fixed and predefined pipeline (execution order of tasks), DM$\to$DN$\to$SR. The most recent work on image processing focuses on developing more sophisticated architectures to achieve higher image quality. Little attention has been paid to the design of the pipeline, and it is still not clear how significant the pipeline is to image quality. In this work, we comprehensively study the effects of pipelines on the mixture problem of learning-based DN, DM, and SR, in both sequential and joint solutions. On the one hand, in sequential solutions, we find that the pipeline has a non-trivial effect on the resulted image quality. Our suggested pipeline DN$\to$SR$\to$DM yields consistently better performance than other sequential pipelines in various experimental settings and benchmarks. On the other hand, in joint solutions, we propose an end-to-end Trinity Pixel Enhancement NETwork (TENet) that achieves state-of-the-art performance for the mixture problem. We further present a novel and simple method that can integrate a certain pipeline into a given end-to-end network by providing intermediate supervision using a detachable head. Extensive experiments show that an end-to-end network with the proposed pipeline can attain only a consistent but insignificant improvement. Our work indicates that the investigation of pipelines is applicable in sequential solutions, but is not very necessary in end-to-end networks. \RR{Code, models, and our contributed PixelShift200 dataset are available at \url{https://github.com/guochengqian/TENet}

翻訳日:2023-03-29 05:09:57 公開日:2023-03-24

# 画像登録のための注意(air):教師なし変圧器アプローチ

Attention for Image Registration (AiR): an unsupervised Transformer approach ( http://arxiv.org/abs/2105.02282v2 )

ライセンス: Link先を確認

Zihao Wang, Herv\'e Delingette

(参考訳) 画像登録は信号処理において重要なタスクであるが、しばしば安定性と効率の面で問題に遭遇する。非学習登録アプローチは、時間と空間の複雑さの点で高価な固定画像と移動画像の類似度を最適化することに依存する。この問題は、画像が大きく、あるいはそれらの間に大きな変形がある場合、悪化する可能性がある。近年,ディープラーニング,特に畳み込みニューラルネットワーク(CNN)に基づく手法が,非学習アプローチの弱点に対する効果的な解決策として研究されている。画像登録における学習手法をさらに進めるために,変形可能な画像登録問題における注意機構を導入する。提案手法は,GPGPUデバイス上で効率的にトレーニング可能なTransformerフレームワークであるAiRに基づいている。画像登録問題を言語翻訳タスクとして扱い,変形場を学習するためにトランスを使用する。教師なし生成した変形マップを学習し、2つのベンチマークデータセットでテストする。要約すると,本手法は画像登録タスクにおける安定性と効率性の問題に対処する上で有望な効果を示す。 AiRのソースコードはGithubで公開されている。

Image registration is a crucial task in signal processing, but it often encounters issues with stability and efficiency. Non-learning registration approaches rely on optimizing similarity metrics between fixed and moving images, which can be expensive in terms of time and space complexity. This problem can be exacerbated when the images are large or there are significant deformations between them. Recently, deep learning, specifically convolutional neural network (CNN)-based methods, have been explored as an effective solution to the weaknesses of non-learning approaches. To further advance learning approaches in image registration, we introduce an attention mechanism in the deformable image registration problem. Our proposed approach is based on a Transformer framework called AiR, which can be efficiently trained on GPGPU devices. We treat the image registration problem as a language translation task and use the Transformer to learn the deformation field. The method learns an unsupervised generated deformation map and is tested on two benchmark datasets. In summary, our approach shows promising effectiveness in addressing stability and efficiency issues in image registration tasks. The source code of AiR is available on Github.

翻訳日:2023-03-29 05:06:23 公開日:2023-03-24

# エピソジックおよび慢性ホームレスシェルター使用の迅速同定のための最善の閾値

The Best Thresholds for Rapid Identification of Episodic and Chronic Homeless Shelter Use ( http://arxiv.org/abs/2105.01042v3 )

ライセンス: Link先を確認

Geoffrey Guy Messier, Leslie Tutty, Caleb John

(参考訳) 本稿では,ホームレスの避難所アクセスパターンに基づいて,住宅サービスにおけるクライアントの最適識別方法について検討する。我々は、時間枠内でクライアントのシェルター使用回数とシェルター使用回数を数えることに集中する。次に閾値がこれらの値に適用され、その個人が住宅支援のよい候補かどうかを判断する。新しい住宅基準衝撃測定値を用いて,どの組み合わせが影響を最大化するかをしきい値と時間窓値を用いて検討し,住宅候補をできるだけ早く特定する。また、通常、住宅支援について特定されていない「下層」顧客グループの特徴についても、新たな洞察が得られている。

This paper explores how to best identify clients for housing services based on their homeless shelter access patterns. We focus on counting the number of shelter stays and episodes of shelter use for a client within a time window. Thresholds are then applied to these values to determine if that individual is a good candidate for housing support. Using new housing referral impact metrics, we explore a range of threshold and time window values to determine which combination both maximizes impact and identifies good candidates for housing as soon as possible. New insights are also provided regarding the characteristics of the "under-the-radar" client group who are typically not identified for housing support.

翻訳日:2023-03-29 05:06:07 公開日:2023-03-24

# 個人化フェデレーション学習のための共有表現のエクスプロイト

Exploiting Shared Representations for Personalized Federated Learning ( http://arxiv.org/abs/2102.07078v3 )

ライセンス: Link先を確認

Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

(参考訳) 深層ニューラルネットワークは、さまざまな学習タスクに有用な画像やテキストなどのデータから、普遍的な特徴表現を抽出する能力を示している。しかし、表現学習の成果はまだフェデレーション設定で完全に実現されていない。統合された設定におけるデータは、クライアント間では非単位であることが多いが、集中型ディープラーニングの成功は、データがグローバルな特徴表現を共有することが多いことを示唆している。この直感に基づいて,クライアント間の共有データ表現とクライアント毎のユニークなローカルヘッダを学習するための,新しいフェデレーション学習フレームワークとアルゴリズムを提案する。提案アルゴリズムは,クライアント間の分散計算能力を利用して,表現の更新毎に低次元局所パラメータに対して多くの局所更新を行う。本手法は,クライアント毎の問題次元を効率的に削減できることを示すために,最適に近いサンプル複雑性を持つ接地表現への線形収束を線形設定で得ることを実証する。この結果は,例えばメタラーニングやマルチタスクラーニングなどにおいて,データ分布間の共有低次元表現を学習することを目的とした,幅広い問題に対するフェデレートラーニング以上の関心を持っている。さらに,ヘテロジニアスデータを用いたフェデレーション環境において,代替型フェデレーション学習手法よりも経験的改善がみられた。

Deep neural networks have shown the ability to extract universal feature representations from data such as images and text that have been useful for a variety of learning tasks. However, the fruits of representation learning have yet to be fully-realized in federated settings. Although data in federated settings is often non-i.i.d. across clients, the success of centralized deep learning suggests that data often shares a global feature representation, while the statistical heterogeneity across clients or tasks is concentrated in the labels. Based on this intuition, we propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client. Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation. We prove that this method obtains linear convergence to the ground-truth representation with near-optimal sample complexity in a linear setting, demonstrating that it can efficiently reduce the problem dimension for each client. This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions, for example in meta-learning and multi-task learning. Further, extensive experimental results show the empirical improvement of our method over alternative personalized federated learning approaches in federated environments with heterogeneous data.

翻訳日:2023-03-29 05:04:19 公開日:2023-03-24

# 決定論的画像分類器のシーン不確かさとウェリントン後方

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers ( http://arxiv.org/abs/2106.13870v2 )

ライセンス: Link先を確認

Stephanie Tsuei, Aditya Golatkar, Stefano Soatto

(参考訳) 本研究では,画像分類器の出力結果の不確実性を評価する手法を提案する。画像分類によく使用されるディープニューラルネットワークは、入力画像から出力クラスへの決定論的マップである。そのため、不確実性を定義し、測定し、解釈し、結果に「自信」を帰結させる際に、どのような変動性について言及しているかを明確にする必要がある。この目的のために、Wellington Posteriorは、与えられた画像を生成する同じシーンから生成される可能性のあるデータに応答して得られる結果の分布である。任意の画像を生成できるシーンは無限に多いため、ウェリントン・ポストミラーは描かれたもの以外のシーンから誘導的に移動する。本研究では,データ拡張,ドロップアウト,センシング,単一視点再構成,モデル線形化によるウェリントン後方の計算について検討する。その他の方法は、生成逆数ネットワーク、神経放射場、条件付き事前ネットワークなどの条件付き生成モデルの使用を含む。提案手法は,同じシーンの複数の画像に対して推論を行うことにより得られた経験的後部に対して検証する。これらの開発は、安全クリティカルなアプリケーションや人間の解釈と互換性のある方法でディープネットワーク分類器の信頼性を評価するための小さな一歩にすぎない。

We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As such, their outcome on a given datum involves no uncertainty, so we must specify what variability we are referring to when defining, measuring and interpreting uncertainty, and attributing "confidence" to the outcome. To this end, we introduce the Wellington Posterior, which is the distribution of outcomes that would have been obtained in response to data that could have been generated by the same scene that produced the given image. Since there are infinitely many scenes that could have generated any given image, the Wellington Posterior involves inductive transfer from scenes other than the one portrayed. We explore the use of data augmentation, dropout, ensembling, single-view reconstruction, and model linearization to compute a Wellington Posterior. Additional methods include the use of conditional generative models such as generative adversarial networks, neural radiance fields, and conditional prior networks. We test these methods against the empirical posterior obtained by performing inference on multiple images of the same underlying scene. These developments are only a small step towards assessing the reliability of deep network classifiers in a manner that is compatible with safety-critical applications and human interpretation.

翻訳日:2023-03-29 04:09:45 公開日:2023-03-24

# 慢性ホームレスの予測:クライアント履歴を用いたアルゴリズム比較の重要性

Predicting Chronic Homelessness: The Importance of Comparing Algorithms using Client Histories ( http://arxiv.org/abs/2105.15080v2 )

ライセンス: Link先を確認

Geoffrey G. Messier, Caleb John, Ayush Malik

(参考訳) 本研究は, 住宅計画の好適候補を特定するために, 慢性ホームレスの予測アルゴリズムを最適に比較する方法を検討する。予測手法は、潜在的に慢性的なシェルター利用者を住居に迅速に参照することができるが、時には慢性的な(偽陽性の)個人を誤って識別することもある。私たちはシェルターアクセス履歴を使って、これらの偽陽性が住宅にとって良い候補であることを示す。本研究では,より複雑なロジスティック回帰アルゴリズムとニューラルネットワークアルゴリズムを用いて,慢性ホームレスの予測のための単純なしきい値法を比較した。従来の二分分類性能指標では、機械学習アルゴリズムはしきい値法よりも優れた性能を示すが、3つのアルゴリズムで同定されたコホートのシェルターアクセス履歴を調べると、非常に類似した特徴を持つグループを選択することが示される。しきい値技術は、機械学習アルゴリズムよりもはるかに単純な情報技術基盤を使って実装できるため、リソース制約のある非営利組織にとって重要な意味を持つ。

This paper investigates how to best compare algorithms for predicting chronic homelessness for the purpose of identifying good candidates for housing programs. Predictive methods can rapidly refer potentially chronic shelter users to housing but also sometimes incorrectly identify individuals who will not become chronic (false positives). We use shelter access histories to demonstrate that these false positives are often still good candidates for housing. Using this approach, we compare a simple threshold method for predicting chronic homelessness to the more complex logistic regression and neural network algorithms. While traditional binary classification performance metrics show that the machine learning algorithms perform better than the threshold technique, an examination of the shelter access histories of the cohorts identified by the three algorithms show that they select groups with very similar characteristics. This has important implications for resource constrained not-for-profit organizations since the threshold technique can be implemented using much simpler information technology infrastructure than the machine learning algorithms.

翻訳日:2023-03-29 04:08:25 公開日:2023-03-24

# SQUID: 教師なし異常検出のためのディープ・フィーチャー・イン・パインティング

SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2111.13495v3 )

ライセンス: Link先を確認

Tiange Xiang, Yixiao Zhang, Yongyi Lu, Alan L. Yuille, Chaoyi Zhang, Weidong Cai, Zongwei Zhou

(参考訳) 放射線画像撮影プロトコルは特定の身体領域に焦点をあてるため、非常に類似した画像が生成され、患者全体の解剖学的構造が繰り返される。本研究では,この構造化情報を活用するために,空間認識型メモリキューを用いて,X線画像からの異常検出を行う(略してSQUID)。 SQUIDは, 微細な解剖学的構造を逐次パターンに分類でき, 推測では画像中の異常(見えない/修正されたパターン)を識別できる。 SQUIDは、AUC(Area Under the Curve)によって測定された2つの胸部X線ベンチマークデータセットにおいて、教師なし異常検出の13の最先端手法を少なくとも5ポイント超えた。さらに,胸部解剖学における空間相関と一貫した形状を合成する新しいデータセット (DigitAnatomy) も作成した。我々は,DigiAnatomyが異常検出手法の開発,評価,解釈を促進できることを期待している。

Radiography imaging protocols focus on particular body regions, therefore producing images of great similarity and yielding recurrent anatomical structures across patients. To exploit this structured information, we propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID). We show that SQUID can taxonomize the ingrained anatomical structures into recurrent patterns; and in the inference, it can identify anomalies (unseen/modified patterns) in the image. SQUID surpasses 13 state-of-the-art methods in unsupervised anomaly detection by at least 5 points on two chest X-ray benchmark datasets measured by the Area Under the Curve (AUC). Additionally, we have created a new dataset (DigitAnatomy), which synthesizes the spatial correlation and consistent shape in chest anatomy. We hope DigitAnatomy can prompt the development, evaluation, and interpretability of anomaly detection methods.

翻訳日:2023-03-29 03:59:59 公開日:2023-03-24

# 教師付き学習のための情報理論枠組み

An Information-Theoretic Framework for Supervised Learning ( http://arxiv.org/abs/2203.00246v6 )

ライセンス: Link先を確認

Hong Jun Jeon and Yifan Zhu and Benjamin Van Roy

(参考訳) ディープラーニングは毎年、より深く広いニューラルネットワークを使って、新しい、そして改善された経験的な結果を示す。一方、既存の理論的枠組みでは、パラメータをカウントしたり、指数関数的なサンプル複雑性境界に遭遇することなく、2層以上のネットワークを解析することは困難である。おそらく、異なるレンズの下で現代の機械学習を分析するのは実りあるかもしれない。本稿では,機械学習のデータ要求を分析するために,後悔とサンプルの複雑さという独自の概念を持つ新しい情報理論フレームワークを提案する。このフレームワークでは,まずスカラー推定や線形回帰といった古典的な例を通して直観を構築し,一般的な手法を導入する。次に,このフレームワークを用いて,reluアクティベーションユニットを用いたディープニューラルネットワークが生成するデータから学習のサンプル複雑性を調べる。重みに関する特定の事前分布に対して、幅が独立で深さが線形なサンプル複雑性境界を確立する。この事前分布は、高い確率で合理的に正確な低次元近似を許容する高次元の潜在表現をもたらす。我々は、ランダム単一隠れ層ニューラルネットワークの実験解析により、理論結果を裏付ける。

Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth. This prior distribution gives rise to high-dimensional latent representations that, with high probability, admit reasonably accurate low-dimensional approximations. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.

翻訳日:2023-03-29 03:52:12 公開日:2023-03-24

# pgmax:離散確率グラフィカルモデルのための因子グラフとjaxにおけるループ的信念伝播

PGMax: Factor Graphs for Discrete Probabilistic Graphical Models and Loopy Belief Propagation in JAX ( http://arxiv.org/abs/2202.04110v4 )

ライセンス: Link先を確認

Guangyao Zhou, Antoine Dedieu, Nishanth Kumar, Wolfgang Lehrach, Miguel L\'azaro-Gredilla, Shrinu Kushagra, Dileep George

(参考訳) PGMaxはオープンソースのPythonパッケージである (a)離散確率図式モデル(PGM)を因子グラフとして容易に特定し、 b) JAXで効率よくスケーラブルなループ的信念伝達(LBP)を自動的に実行します。 pgmaxは扱いやすい因子を持つ一般的な因子グラフをサポートし、gpuのような現代的なアクセラレータを推論に活用している。 PGMaxは既存の代替手法と比較して、最大3桁のマグニチュード推論時間で高品質な推論結果が得られる。 PGMaxは急速に成長しているJAXエコシステムとシームレスに相互作用し、新しい研究可能性を開く。ソースコード、例、ドキュメントはhttps://github.com/deepmind/PGMax.orgで公開されています。

PGMax is an open-source Python package for (a) easily specifying discrete Probabilistic Graphical Models (PGMs) as factor graphs; and (b) automatically running efficient and scalable loopy belief propagation (LBP) in JAX. PGMax supports general factor graphs with tractable factors, and leverages modern accelerators like GPUs for inference. Compared with existing alternatives, PGMax obtains higher-quality inference results with up to three orders-of-magnitude inference time speedups. PGMax additionally interacts seamlessly with the rapidly growing JAX ecosystem, opening up new research possibilities. Our source code, examples and documentation are available at https://github.com/deepmind/PGMax.

翻訳日:2023-03-29 03:50:36 公開日:2023-03-24

# PARC: エネルギー材料のメソスケール反応力学を同化するための物理対応リカレント畳み込みニューラルネットワーク

PARC: Physics-Aware Recurrent Convolutional Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials ( http://arxiv.org/abs/2204.07234v3 )

ライセンス: Link先を確認

Phong C.H. Nguyen, Yen-Thi Nguyen, Joseph B. Choi, Pradeep K. Seshadri, H.S. Udaykumar, and Stephen Baek

(参考訳) 衝撃開始エネルギー材料(EM)の熱力学的応答は、そのミクロ構造の影響を強く受け、「材料・バイ・デザイン」フレームワークでEMマイクロ構造を設計する機会を与える。しかし、複雑なem構造-プロパティー-パフォーマンスリンクを構築するには、多くのシミュレーションが必要となるため、現在の設計プラクティスは限られている。本稿では,高分解能直接数値シミュレーション(dns)からemのメソスケール熱力学を学習できるディープラーニングアルゴリズムである,物理量認識リカレント畳み込み(parc)ニューラルネットワークを提案する。検証結果によると、PARCは衝撃を受けたEMの機械的応答をDNSに匹敵する精度で予測できるが、計算時間は明らかに少ない。 PARCの物理学的認識はモデリング能力と一般化性を高める。また, PARCにおける人工ニューロンの可視化は, EM熱力学の重要な面に光を当てることや, EMを概念化するための追加レンズを提供することも実証した。

The thermo-mechanical response of shock-initiated energetic materials (EM) is highly influenced by their microstructures, presenting an opportunity to engineer EM microstructure in a "materials-by-design" framework. However, the current design practice is limited, as a large ensemble of simulations is required to construct the complex EM structure-property-performance linkages. We present the Physics-Aware Recurrent Convolutional (PARC) Neural Network, a deep-learning algorithm capable of learning the mesoscale thermo-mechanics of EM from a modest number of high-resolution direct numerical simulations (DNS). Validation results demonstrated that PARC could predict the themo-mechanical response of shocked EM with a comparable accuracy to DNS but with notably less computation time. The physics awareness of PARC enhances its modeling capabilities and generalizability, especially when challenged in unseen prediction scenarios. We also demonstrate that visualizing the artificial neurons at PARC can shed light on important aspects of EM thermos-mechanics and provide an additional lens for conceptualizing EM.

翻訳日:2023-03-29 03:42:32 公開日:2023-03-24

# SunStage: ライトステージとしての太陽を用いたポートレート再構築とリライティング

SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage ( http://arxiv.org/abs/2204.03648v2 )

ライセンス: Link先を確認

Yifan Wang, Aleksander Holynski, Xiuming Zhang and Xuaner Zhang

(参考訳) ライトステージは一連のキャリブレーションされたカメラとライトを使用して、さまざまな照明と視点の下で被写体の顔の外観をキャプチャする。この捉えた情報は、顔の復元とリライトに欠かせない。残念なことに、ライトステージは高価であり、建設と運用に重要な技術的専門知識を必要とする。本稿では、スマートフォンカメラと太陽のみを使用して、同等のデータをキャプチャする、ライトステージの軽量な代替手段であるSunStageを紹介する。提案手法では, 自撮り動画を屋外で撮影し, 位置を回転させ, 顔形状, 反射率, カメラポーズ, 照明パラメータの同時再構成の指針として, 太陽と顔の角度の異なる角度を用いる。本手法は,未校正環境にもかかわらず,顔の外観や形状を再現し,リライティング,新しいビュー合成,リフレクタンス編集などの魅力的な効果を期待できる。結果とインタラクティブなデモはhttps://sunstage.cs.washington.edu/で見ることができる。

A light stage uses a series of calibrated cameras and lights to capture a subject's facial appearance under varying illumination and viewpoint. This captured information is crucial for facial reconstruction and relighting. Unfortunately, light stages are often inaccessible: they are expensive and require significant technical expertise for construction and operation. In this paper, we present SunStage: a lightweight alternative to a light stage that captures comparable data using only a smartphone camera and the sun. Our method only requires the user to capture a selfie video outdoors, rotating in place, and uses the varying angles between the sun and the face as guidance in joint reconstruction of facial geometry, reflectance, camera pose, and lighting parameters. Despite the in-the-wild un-calibrated setting, our approach is able to reconstruct detailed facial appearance and geometry, enabling compelling effects such as relighting, novel view synthesis, and reflectance editing. Results and interactive demos are available at https://sunstage.cs.washington.edu/.

翻訳日:2023-03-29 03:42:12 公開日:2023-03-24

# 慢性緊急時在宅シェルタークライアントの早期同定のためのルール検索フレームワーク

A Rule Search Framework for the Early Identification of Chronic Emergency Homeless Shelter Clients ( http://arxiv.org/abs/2205.09883v2 )

ライセンス: Link先を確認

Caleb John and Geoffrey G. Messier

(参考訳) 本稿では,長期ないし慢性的なシェルターユーザになるリスクのある緊急避難所クライアントの早期識別にルールサーチ手法を用いる。 4万人以上の個人との12年間のサービスインタラクションを含む、北米の主要シェルターのデータセットを使用して、unordered search(opus)アルゴリズムを最適化したpruningは、直感的かつ効果的なルールを開発するために使用される。ルールは、リスクの高いクライアントを支援的な住宅に移行するための住宅プログラムのリアルタイム配信と互換性のあるフレームワーク内で評価される。その結果, 本研究の手法を適用した場合, 慢性シェルター使用リスクのクライアント識別の中央値が297日から162日に低下することが認められた。

This paper uses rule search techniques for the early identification of emergency homeless shelter clients who are at risk of becoming long term or chronic shelter users. Using a data set from a major North American shelter containing 12 years of service interactions with over 40,000 individuals, the optimized pruning for unordered search (OPUS) algorithm is used to develop rules that are both intuitive and effective. The rules are evaluated within a framework compatible with the real-time delivery of a housing program meant to transition high risk clients to supportive housing. Results demonstrate that the median time to identification of clients at risk of chronic shelter use drops from 297 days to 162 days when the methods in this paper are applied.

翻訳日:2023-03-29 03:33:47 公開日:2023-03-24

# ランダム特徴回帰モデルのための最適活性化関数

Optimal Activation Functions for the Random Features Regression Model ( http://arxiv.org/abs/2206.01332v3 )

ライセンス: Link先を確認

Jianxin Wang and Jos\'e Bento

(参考訳) 近年,ランダム特徴回帰モデル(rfr)の漸近的平均二乗検定誤差と感度が研究されている。我々はこの研究に基づいて、異なる関数parsimonyの概念の下でrfrのテストエラーと感度の組み合わせを最小化するアクティベーション関数ファミリー(afs)をクローズドフォームで特定する。最適afsが線形、飽和線型関数、あるいはエルミート多項式を用いて表現可能なシナリオを見いだす。最後に, 最適afsの利用が, 二重降下曲線などのrfrモデルの確立した特性や, 観測騒音レベルに対する最適正規化パラメータの依存性にどのように影響するかを示す。

The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well-established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level.

翻訳日:2023-03-29 03:22:56 公開日:2023-03-24

# ジェネリックイベント境界キャプション用デュアルストリームトランス

Dual-Stream Transformer for Generic Event Boundary Captioning ( http://arxiv.org/abs/2207.03038v3 )

ライセンス: Link先を確認

Xin Gu, Hanhua Ye, Guang Chen, Yufei Wang, Libo Zhang, Longyin Wen

(参考訳) 本稿では,CVPR2022ジェネリックイベント境界キャプタリング(GEBC)コンペティションのチャンピオンソリューションについて述べる。 GEBCは、キャプションモデルに対して、所定のビデオ境界付近の即時的なステータス変更の理解を必要とするため、従来のビデオキャプションタスクよりもはるかに難しい。本稿では,映像コンテンツエンコーディングとキャプション生成の両面で改善したデュアルストリームトランスを提案する。さらに,境界の型をヒントとして活用し,モデルによるキャプション生成を支援する。 2) 境界キャプションの識別表現を学習するために,特にDual-Stream Transformerと呼ばれるモデルの設計を行う。 3) 内容関連文や人間ライクなキャプションの作成に向けて, 単語レベルのアンサンブル戦略をデザインし, 記述品質の向上を図る。 GEBCテストスプリットの有望な結果は,提案モデルの有効性を示すものである。

This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition. GEBC requires the captioning model to have a comprehension of instantaneous status changes around the given video boundary, which makes it much more challenging than conventional video captioning task. In this paper, a Dual-Stream Transformer with improvements on both video content encoding and captions generation is proposed: (1) We utilize three pre-trained models to extract the video features from different granularities. Moreover, we exploit the types of boundary as hints to help the model generate captions. (2) We particularly design an model, termed as Dual-Stream Transformer, to learn discriminative representations for boundary captioning. (3) Towards generating content-relevant and human-like captions, we improve the description quality by designing a word-level ensemble strategy. The promising results on the GEBC test split demonstrate the efficacy of our proposed model.

翻訳日:2023-03-29 03:15:06 公開日:2023-03-24

# どこから始める? フェデレーション学習における事前学習と初期化の影響について

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning ( http://arxiv.org/abs/2206.15387v3 )

ライセンス: Link先を確認

John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

(参考訳) 連合学習の暗黙の挑戦は異質性の存在である。 \emph{Data heterogeneity} は、異なるクライアントのデータが全く異なる分散に従う可能性があるという事実を指す。 \emph{system heterogeneity} は異なるシステム能力を持つクライアントデバイスを指す。かなりの数の最適化手法がこの課題に対処する。文献では、経験的評価は通常ランダム初期化から連邦訓練を開始する。しかし、フェデレーション学習の多くの実用的な応用において、サーバーは、フェデレーショントレーニングを開始する前にモデルの事前トレーニングに使用できるトレーニングタスクのプロキシデータにアクセスすることができる。 4つの標準フェデレーション学習ベンチマークデータセットを用いて、フェデレーション学習における事前学習モデルからの開始の影響を実証的に検討する。当然ながら、事前訓練されたモデルから始めると、目標エラー率に達するのに必要なトレーニング時間を短縮し、ランダム初期化から始める場合よりも正確なモデルのトレーニング(最大40%)を可能にする。驚くべきことに、事前訓練された初期化からフェデレート学習を始めることで、データとシステムの不均一性が低下する。我々は,ランダムおよび事前学習初期化から始める際の性能を評価するために,フェデレーション最適化手法の提案と評価を推奨する。本研究は,連帯最適化における不均一性の役割を理解するためのさらなる研究として,いくつかの疑問を提起する。 https://github.com/facebookresearch/where_to_begin}}

An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to client devices having different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical evaluations usually start federated training from random initialization. However, in many practical applications of federated learning, the server has access to proxy data for the training task that can be used to pre-train a model before starting federated training. Using four standard federated learning benchmark datasets, we empirically study the impact of starting from a pre-trained model in federated learning. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. We recommend future work proposing and evaluating federated optimization methods to evaluate the performance when starting from random and pre-trained initializations. This study raises several questions for further work on understanding the role of heterogeneity in federated optimization. \footnote{Our code is available at: \url{https://github.com/facebookresearch/where_to_begin}}

翻訳日:2023-03-29 03:14:03 公開日:2023-03-24

# 強電界電離におけるサブバリアリコライジョンとトンネル時間遅延の3つのクラス

Sub-barrier recollisions and the three classes of tunneling time delays in strong-field ionization ( http://arxiv.org/abs/2208.10946v2 )

ライセンス: Link先を確認

Michael Klaiber, Daniel Bakucz Can\'ario, and Karen Z. Hatsagortsyan

(参考訳) トンネルイオン化は、サブバリア再結合と直接イオン化経路の干渉によって引き起こされる光電子運動量分布の特定のシフトとして漸近的に観察される負の時間遅延によって特徴づけられる。対照的に、波動関数のピークに続く \textit{gedankenexperiment} は、直接イオン化経路のみを考慮してトンネル出口で正のトンネル時間遅延を示す。本稿では,トンネル出口の時間遅延パターンに対するサブバリア再結合の影響について検討する。その結果、直接軌道と再衝突軌道の干渉により、出口でのトンネル時間遅延は漸近時間遅延と等しい値に減少するが、その大きさは正の値となる。最後に, トンネル時間に対処する光: 科学・応用 11, 1 (2022) の最近の実験について検討する。理論モデルを用いた実験結果の解析は,トンネルイオン化の新しい時間特性であるトンネル波パケットの開始を記述した時間遅延を導入する物理的必要性を示している。

Tunneling ionization is characterized by a negative time delay, observed asymptotically as a specific shift of the photoelectron momentum distribution, which is caused by the interference of the sub-barrier recolliding and direct ionization paths. In contrast, a \textit{Gedankenexperiment} following the peak of the wavefunction shows a positive tunneling time delay at the tunnel exit, considering only the direct ionization path. In this paper, we investigate the effects of sub-barrier recollisions on the time delay pattern at the tunnel exit. We conclude that the interference of the direct and recolliding trajectories decreases the tunneling time delay at the exit by the value equal to the asymptotic time delay maintaining, however, its sizeable positive value. Finally, we discuss the recent experiment [Light: Science \& Applications 11, 1 (2022)] addressing the tunneling time in a modified two-color attoclock setup. The analysis of the experimental findings with our theoretical model indicates the physical necessity to introduce a new time characteristic for tunneling ionization -- the time delay describing the initiation of the tunneling wave packet.

翻訳日:2023-03-29 02:54:26 公開日:2023-03-24

# 自然言語処理の効率的な手法に関する研究

Efficient Methods for Natural Language Processing: A Survey ( http://arxiv.org/abs/2209.00099v2 )

ライセンス: Link先を確認

Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, Andr\'e F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

(参考訳) 自然言語処理(NLP)における最近の研究は、モデルパラメータのスケーリングとトレーニングデータから魅力的な結果を得ているが、性能向上のためにスケールのみを使用することで、資源消費も増大している。そのようなリソースには、データ、時間、ストレージ、エネルギーが含まれており、それらは自然に制限され、均等に分散している。これにより、同様の結果を得るのに少ないリソースを必要とする効率的な方法の研究が動機となる。本研究は, 効率的なNLPにおける現在の手法と知見を合成し, 関連づけるものである。我々は,限られた資源下でNLPを実施するためのガイダンスと,より効率的な手法を開発するための有望な研究方向性の両立を目指す。

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.

翻訳日:2023-03-29 02:44:33 公開日:2023-03-24

# Augraphy: ドキュメントイメージのためのデータ拡張ライブラリ

Augraphy: A Data Augmentation Library for Document Images ( http://arxiv.org/abs/2208.14558v2 )

ライセンス: Link先を確認

Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman

(参考訳) 本稿では,実際の文書画像データセットによく見られる歪みを生成するデータ拡張パイプラインを構築するためのPythonライブラリであるAugraphyを紹介する。 Augraphyは、印刷、スキャン、古いマシンや汚れたマシンによるファックス化、時間の経過とともにインクの劣化、手書きのマーキングなど、標準的なオフィス操作によって変更されているように見えるクリーンなドキュメントイメージの強化版を作成するための多くの戦略を提供することによって、他のデータ拡張ツールとは異なっている。本稿では,Aaugraphyツールについて論じ,文書記述などのタスクのための多様なトレーニングデータを生成するためのデータ拡張ツールや,文書画像モデリングタスクにおけるモデルロバスト性を評価するための挑戦的なテストデータを生成するツールとしての利用方法を紹介する。

This paper introduces Augraphy, a Python library for constructing data augmentation pipelines which produce distortions commonly seen in real-world document image datasets. Augraphy stands apart from other data augmentation tools by providing many different strategies to produce augmented versions of clean document images that appear as if they have been altered by standard office operations, such as printing, scanning, and faxing through old or dirty machines, degradation of ink over time, and handwritten markings. This paper discusses the Augraphy tool, and shows how it can be used both as a data augmentation tool for producing diverse training data for tasks such as document denoising, and also for generating challenging test data to evaluate model robustness on document image modeling tasks.

翻訳日:2023-03-29 02:44:07 公開日:2023-03-24

# オンラインmin-sumセットカバーの改良アルゴリズム

An Improved Algorithm For Online Min-Sum Set Cover ( http://arxiv.org/abs/2209.04870v3 )

ライセンス: Link先を確認

Marcin Bienkowski, Marcin Mucha

(参考訳) 我々は、アルゴリズムが規則付き$n$要素のリストを維持するオンライン嗜好集約の基本的なモデルについて研究する。入力は望ましい集合 $r_1, r_2, \dots, r_t, \dots$ のストリームである。 R_t$を見た後、将来の集合の知識がなければ、アルゴリズムは要素を再ランクし(リストの順序を変更する)、リストフロントの少なくとも1つの要素を見つける必要がある。発生したコストは、リスト更新コスト(隣接するリスト要素のスワップ数)とアクセスコスト(リスト上の$R_t$の最初の要素の配置)の合計である。このシナリオは、オンラインショップにおける商品の注文のような、ショップ顧客の選好を集約したアプリケーションで自然に発生する。この問題の理論的基盤はMin-Sum Set Coverとして知られている。オンラインアルゴリズムALGの静的最適解(単一最適リスト順序付け)に対する性能を主に研究した以前の研究 (Fotakis et al., ICALP 2020, NIPS 2020) とは異なり、本論文では、ベンチマークが証明可能なより強力な最適動的解 OPT (リスト順序付けも変更できる) である、明らかに難しい変種について検討する。オンラインショップの観点では、ユーザーベース全体の嗜好が時間とともに進化することを意味している。我々は、競争比が$O(r^2)$である計算効率の良いランダム化アルゴリズムを構築し、決定論的な$O(r^4)$-競争性アルゴリズムの存在を証明する。ここで、$r$は集合の最大濃度$R_t$である。この問題に対する最善のアルゴリズムは$o(r^{3/2} \cdot \sqrt{n})$-競合であり、$\omega(r)$は任意の決定論的オンラインアルゴリズムのパフォーマンスに対する下限である。

We study a fundamental model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements. An input is a stream of preferred sets $R_1, R_2, \dots, R_t, \dots$. Upon seeing $R_t$ and without knowledge of any future sets, an algorithm has to rerank elements (change the list ordering), so that at least one element of $R_t$ is found near the list front. The incurred cost is a sum of the list update costs (the number of swaps of neighboring list elements) and access costs (position of the first element of $R_t$ on the list). This scenario occurs naturally in applications such as ordering items in an online shop using aggregated preferences of shop customers. The theoretical underpinning of this problem is known as Min-Sum Set Cover. Unlike previous work (Fotakis et al., ICALP 2020, NIPS 2020) that mostly studied the performance of an online algorithm ALG against the static optimal solution (a single optimal list ordering), in this paper, we study an arguably harder variant where the benchmark is the provably stronger optimal dynamic solution OPT (that may also modify the list ordering). In terms of an online shop, this means that the aggregated preferences of its user base evolve with time. We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r^2)$ and prove the existence of a deterministic $O(r^4)$-competitive algorithm. Here, $r$ is the maximum cardinality of sets $R_t$. This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r^{3/2} \cdot \sqrt{n})$-competitive and $\Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.

翻訳日:2023-03-29 02:36:05 公開日:2023-03-24

# AdaGrad on $\R^{d}$:Beyond Convexity, Non-Asymptotic Rate and Accelerationについて

On the Convergence of AdaGrad on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration ( http://arxiv.org/abs/2209.14827v2 )

ライセンス: Link先を確認

Zijian Liu, Ta Duy Nguyen, Alina Ene, Huy L. Nguyen

(参考訳) 滑らかな凸最適化のためのAdaGradや他の適応手法の既存の分析は、典型的には有界領域径を持つ関数に対して行われる。制約のない問題では、以前の研究は関数クラス全体に真となる明示的な定数因子を伴わない漸近収束率を保証する。さらに、確率的設定では、AdaGradの修正版のみが、一般的に使われているものと異なり、最新の勾配はステップサイズを更新するのに使われていない。本稿では,これらのギャップを埋め,AdaGradとその変種を滑らかな凸関数の標準設定およびより一般的なクエーサー凸関数の設定でより深く理解することを目的とする。まず,バニラAdaGradの収束率を決定論的,確率的両面の制約のない問題に明示的に拘束する手法を示す。第二に、平均的な反復ではなく、最後の反復の収束を示すことのできる AdaGrad の変種を提案する。最後に,問題パラメータに明示的に依存した決定論的設定において,新しい高速化適応アルゴリズムと収束保証を与え,先行研究で示された漸近速度を改善した。

Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.

翻訳日:2023-03-29 02:25:36 公開日:2023-03-24

# 異種対話生成のための等サイズハードEMアルゴリズム

An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation ( http://arxiv.org/abs/2209.14627v2 )

ライセンス: Link先を確認

Yuqiao Wen, Yongchang Hao, Yanshuai Cao, Lili Mou

(参考訳) オープンドメイン対話システムは、自然言語テキストを通じて人間と対話することを目的としている。近年のChatGPTのような超大規模対話システムの成功にもかかわらず、中～小規模の対話システムの方が軽量でアクセスしやすいため、現在でも一般的な方法である。本研究では,多様な対話生成のためのマルチデコーダモデルをトレーニングするためのEqHard-EMアルゴリズムを提案する。このアルゴリズムはサンプルをハードな方法でデコーダに割り当て、さらに全てのデコーダが十分に訓練されていることを保証するために等割り当て制約を課す。我々はアプローチを正当化するために詳細な理論的分析を提供する。さらに,2つの大規模オープンドメイン対話データセットの実験により,我々のEqHard-EMアルゴリズムが高品質な多様な応答を生成することを確認した。

Open-domain dialogue systems aim to interact with humans through natural language texts in an open-ended fashion. Despite the recent success of super large dialogue systems such as ChatGPT, using medium-to-small-sized dialogue systems remains the common practice as they are more lightweight and accessible; however, generating diverse dialogue responses is challenging, especially with smaller models. In this work, we propose an Equal-size Hard Expectation--Maximization (EqHard-EM) algorithm to train a multi-decoder model for diverse dialogue generation. Our algorithm assigns a sample to a decoder in a hard manner and additionally imposes an equal-assignment constraint to ensure that all decoders are well-trained. We provide detailed theoretical analysis to justify our approach. Further, experiments on two large-scale open-domain dialogue datasets verify that our EqHard-EM algorithm generates high-quality diverse responses.

翻訳日:2023-03-29 02:25:18 公開日:2023-03-24

# 硬質単分子非剛体3次元再構成技術の現状

State of the Art in Dense Monocular Non-Rigid 3D Reconstruction ( http://arxiv.org/abs/2210.15664v2 )

ライセンス: Link先を確認

Edith Tretschk, Navami Kairanda, Mallikarjun B R, Rishabh Dabral, Adam Kortylewski, Bernhard Egger, Marc Habermann, Pascal Fua, Christian Theobalt, Vladislav Golyanik

(参考訳) モノキュラーな2次元画像からの変形可能な(または非剛性)シーンの3次元再構成は、コンピュータビジョンとグラフィックスの長年にわたる活発な研究領域である。なぜなら、追加の仮定なしでは、入力された2D画像への正確な投影につながる無限に多くの解を許すからである。非剛性再構築は、ロボット工学、AR/VR、視覚コンテンツ作成といった下流アプリケーションのための基礎的なビルディングブロックである。単眼カメラを使用する主な利点は、全能性とエンドユーザへの可用性であり、ステレオやマルチビューシステムのようなより洗練されたカメラセットと比べて使いやすさである。本研究は, モノキュラ映像やモノキュラビューのセットから, 様々な変形可能な物体と複合シーンの密集した非剛性3次元再構成のための最先端手法に焦点をあてたものである。 2次元画像観察から3次元再構成と変形モデリングの基礎を考察する。次に、任意の場面を処理し、いくつかの前提を下す一般的な方法から始め、観察対象や変形の種類(例えば、人間の顔、体、手、動物)についてより強い仮定を行う技術へと進む。このSTARの重要な部分は、手法の分類と高レベルの比較、および、議論された手法のトレーニングと評価のためのデータセットの概要にも費やされている。本稿では,その分野におけるオープンな課題と,レビュー手法の活用に関連する社会的側面について論じる。

3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since -- without additional prior assumptions -- it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods -- that handle arbitrary scenes and make only a few prior assumptions -- and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.

翻訳日:2023-03-29 02:10:00 公開日:2023-03-24

# 緊急避難所アクセスパターンの簡易理解法

A Simpler Method for Understanding Emergency Shelter Access Patterns ( http://arxiv.org/abs/2210.13619v2 )

ライセンス: Link先を確認

Geoffrey G. Messier

(参考訳) Simplified Access Metric (SAM)は、シェルタークライアント脆弱性の尺度として、緊急シェルターアクセスパターンを特徴付ける新しいアプローチである。 SAMの目標は、スプレッドシート操作を使用して非技術スタッフが実装可能なアクセスパターンを直感的に理解するためのシェルターオペレータを提供することである。北米の大きなシェルターからのクライアントデータは、samが従来のトランジショナル、エピソディック、慢性的なクライアントクラスタ分析と同じような結果を生成することを示すために使用される。 SAMはクラスタ分析よりも少ないデータを必要とするため、外部要因によるシェルターアクセスパターンの影響のリアルタイムな画像を生成することもできる。 samを使った9年間のシェルタークライアントデータから生成されたタイムラインは、ハウジングファーストプログラミングとcovid-19ロックダウンがシェルターへのアクセス方法に与える影響を示しています。最後にSAMは、シェルタースタッフが移行、エピソード、慢性的なラベルを割り当てるだけでなく、SAMの"ソフト"出力を直接脆弱性の尺度として使うことができる。

The Simplified Access Metric (SAM) is a new approach for characterizing emergency shelter access patterns as a measure of shelter client vulnerability. The goal of SAM is to provide shelter operators with an intuitive way to understand access patterns that can be implemented by non-technical staff using spreadsheet operations. Client data from a large North American shelter will be used to demonstrate that SAM produces similar results to traditional transitional, episodic and chronic client cluster analysis. Since SAM requires less data than cluster analysis, it is also able to generate a real time picture of how shelter access patterns are affected by external factors. Timelines generated from nine years of shelter client data using SAM demonstrate the impact of Housing First programming and the COVID-19 lockdown on how people access shelter. Finally, SAM allows shelter staff to move beyond assigning transitional, episodic and chronic labels and instead use the "soft" output of SAM directly as a measure of vulnerability.

翻訳日:2023-03-29 02:08:30 公開日:2023-03-24

# 最低ランダウ準位における量子誤差補正

Quantum Error Correction in the Lowest Landau Level ( http://arxiv.org/abs/2210.16957v2 )

ライセンス: Link先を確認

Yale Fan, Willy Fischler, Eric Kubischta

(参考訳) 我々は、Albert, Covey, Preskill (ACP) によって提案された量子誤り訂正符号の有限次元バージョンを開発し、非アーベル対称性群を持つ構成空間上で連続変数量子計算を行う。我々の符号は、ゴッテマン、キタエフ、プレスキル(gkp)のクディット符号の平面的実現とは対照的に、球面幾何学上のランダウ準位の荷電粒子、またはより一般にスピンコヒーレント状態によって実現できる。我々の量子誤り訂正方式は本質的に近似しており、符号化状態はgkpやacpよりも準備が容易である。

We develop finite-dimensional versions of the quantum error-correcting codes proposed by Albert, Covey, and Preskill (ACP) for continuous-variable quantum computation on configuration spaces with nonabelian symmetry groups. Our codes can be realized by a charged particle in a Landau level on a spherical geometry -- in contrast to the planar Landau level realization of the qudit codes of Gottesman, Kitaev, and Preskill (GKP) -- or more generally by spin coherent states. Our quantum error-correction scheme is inherently approximate, and the encoded states may be easier to prepare than those of GKP or ACP.

翻訳日:2023-03-29 01:57:59 公開日:2023-03-24

# パッキングとカバー制約を伴うコンテキストバンディット:回帰によるモジュールラグランジアンアプローチ

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression ( http://arxiv.org/abs/2211.07484v3 )

ライセンス: Link先を確認

Aleksandrs Slivkins and Karthik Abinav Sankararaman and Dylan J. Foster

(参考訳) 本稿では,線形制約付きコンテキスト帯域(CBwLC)について考察する。これは,アルゴリズムが全消費の線形制約を受ける複数のリソースを消費するコンテキスト帯域の変種である。この問題はknapsacks (CBwK) を用いてコンテキスト的帯域幅を一般化し、制約のパッケージ化とカバー、および正および負のリソース消費を可能にする。回帰オラクルに基づくCBwLC(CBwK)の最初のアルゴリズムを提案する。このアルゴリズムは単純で計算効率が良く、後悔は消える。 CBwKの変種には統計的に最適であり、ある制約が破られたらアルゴリズムは停止しなければならない。さらに,確率的環境を超えたCBwLC(CBwK)について,初めて消滅・回復保証を行う。私たちは、比較するより弱い(そしておそらく公平な)ベンチマークを特定することで、以前の作業から強い不可能性(impossibility)を回避します。我々のアルゴリズムは、CBwKのためのラグランジアンベースのテクニックであるLagrangeBwK(Immorlica et al., FOCS 2019)と、文脈的盗賊のための回帰ベースのテクニックであるSquareCB(Foster and Rakhlin, ICML 2020)に基づいて構築されている。我々の分析は、両方の技術の本質的なモジュラリティを活用する。

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for the variant of CBwK in which the algorithm must stop once some constraint is violated. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.

翻訳日:2023-03-29 01:48:52 公開日:2023-03-24

# BiasBed - 厳密なテクスチャバイアス評価

BiasBed -- Rigorous Texture Bias Evaluation ( http://arxiv.org/abs/2211.13190v3 )

ライセンス: Link先を確認

Nikolai Kalischek, Rodrigo C. Daudt, Torben Peters, Reinhard Furrer, Jan D. Wegner, Konrad Schindler

(参考訳) 現代の畳み込みニューラルネットワークにおけるテクスチャバイアスの存在は、しばしば新しいドメインへの一般化を支援するために、シェイプキューに重点を置くアルゴリズムの多さにつながっている。しかし、一般的なデータセット、ベンチマーク、一般的なモデル選択戦略は欠落しており、合意された厳密な評価プロトコルは存在しない。本稿では,テクスチャバイアスを低減したトレーニングネットワークの困難さと限界について検討する。特に,手法間の適切な評価と有意義な比較は自明ではないことを示す。複数のデータセットや既存のアルゴリズムを含む、テクスチャとスタイルバイアスのトレーニングのためのテストベッドであるBiasBedを紹介します。スタイルバイアス法のかなりのトレーニング不安定さにもかかわらず、結果の重要度を測定するための厳密な仮説検証を含む広範な評価プロトコルが付属している。私たちの広範な実験は、慎重に統計的に確立されたスタイルバイアスの評価プロトコルの必要性に新たな光を当てました。例えば、文献で提案されているいくつかのアルゴリズムは、スタイルバイアスの影響を全く軽減しない。 BiasBedのリリースにより、一貫した意味のある比較の共通理解が促進され、その結果、テクスチャバイアスのない学習方法へのさらなる進歩が期待できる。コードはhttps://github.com/D1noFuzi/BiasBedで入手できる。

The well-documented presence of texture bias in modern convolutional neural networks has led to a plethora of algorithms that promote an emphasis on shape cues, often to support generalization to new domains. Yet, common datasets, benchmarks and general model selection strategies are missing, and there is no agreed, rigorous evaluation protocol. In this paper, we investigate difficulties and limitations when training networks with reduced texture bias. In particular, we also show that proper evaluation and meaningful comparisons between methods are not trivial. We introduce BiasBed, a testbed for texture- and style-biased training, including multiple datasets and a range of existing algorithms. It comes with an extensive evaluation protocol that includes rigorous hypothesis testing to gauge the significance of the results, despite the considerable training instability of some style bias methods. Our extensive experiments, shed new light on the need for careful, statistically founded evaluation protocols for style bias (and beyond). E.g., we find that some algorithms proposed in the literature do not significantly mitigate the impact of style bias at all. With the release of BiasBed, we hope to foster a common understanding of consistent and meaningful comparisons, and consequently faster progress towards learning methods free of texture bias. Code is available at https://github.com/D1noFuzi/BiasBed

翻訳日:2023-03-29 01:29:13 公開日:2023-03-24

# ObjectMatch: 標準オブジェクト対応を用いたロバスト登録

ObjectMatch: Robust Registration using Canonical Object Correspondences ( http://arxiv.org/abs/2212.01985v2 )

ライセンス: Link先を確認

Can G\"umeli, Angela Dai, Matthias Nie{\ss}ner

(参考訳) rgb-d slamパイプライン用の意味的かつオブジェクト中心のカメラポーズ推定器objectmatchを提案する。現代のカメラポーズ推定装置は、フレーム間の重なり合う領域の直接対応に依存するが、カメラフレームをほとんどあるいは全く重なり合っていない。本研究では,意味オブジェクト識別によって得られる間接対応の活用を提案する。例えば、あるフレームの前面から、別のフレームの後方からオブジェクトが見える場合、標準オブジェクト対応を通じて追加のポーズ制約を与えることができる。まず,1ピクセルあたりの対応を予測するためのニューラルネットワークを提案し,これをエネルギー定式化と最先端キーポイントマッチングと組み合わせ,共同ガウス・ニュートン最適化で解いた。本手法は,24%から45%のペアと10%以下のフレームオーバーラップを含む,最先端機能マッチングの登録リコールを改善する。 RGB-Dシークエンスを登録する場合,本手法は困難かつ低フレームレートのシナリオにおいて最先端のSLAMベースラインよりも優れ,複数のシーンにおいて軌道誤差が35%以上減少する。

We present ObjectMatch, a semantic and object-centric camera pose estimator for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlapping regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an object is seen from the front in one frame and from the back in another frame, we can provide additional pose constraints through canonical object correspondences. We first propose a neural network to predict such correspondences on a per-pixel level, which we then combine in our energy formulation with state-of-the-art keypoint matching solved with a joint Gauss-Newton optimization. In a pairwise setting, our method improves registration recall of state-of-the-art feature matching, including from 24% to 45% in pairs with 10% or less inter-frame overlap. In registering RGB-D sequences, our method outperforms cutting-edge SLAM baselines in challenging, low-frame-rate scenarios, achieving more than 35% reduction in trajectory error in multiple scenes.

翻訳日:2023-03-29 01:21:31 公開日:2023-03-24

# DAワンド:ニューラルメッシュパラメータ化を用いた歪み認識の選択

DA Wand: Distortion-Aware Selection using Neural Mesh Parameterization ( http://arxiv.org/abs/2212.06344v3 )

ライセンス: Link先を確認

Richard Liu, Noam Aigerman, Vladimir G. Kim, Rana Hanocka

(参考訳) 本稿では,メッシュパラメータ化に使用できる点周辺の局所部分領域を学習するためのニューラル手法を提案する。私たちのフレームワークの動機は、表面のデカリング、テキスト作成、ペイントに使用されるインタラクティブなワークフローにあります。我々の重要なアイデアは、ニューラルネットワークフレームワーク内で新しい微分可能パラメータ化層として実装された古典的なパラメータ化法の重みとしてセグメンテーション確率を組み込むことである。我々は,2次元にパラメータ化され,歪みによってペナル化される3次元領域を選択するようにセグメンテーションネットワークを訓練する。学習の後、ユーザは我々のシステムを使ってメッシュ上の点を対話的に選択し、低歪みパラメータ化を誘導する選択に関する大きな意味のある領域を得ることができる。私たちのコードとプロジェクトページは現在利用可能です。

We present a neural technique for learning to select a local sub-region around a point which can be used for mesh parameterization. The motivation for our framework is driven by interactive workflows used for decaling, texturing, or painting on surfaces. Our key idea is to incorporate segmentation probabilities as weights of a classical parameterization method, implemented as a novel differentiable parameterization layer within a neural network framework. We train a segmentation network to select 3D regions that are parameterized into 2D and penalized by the resulting distortion, giving rise to segmentations which are distortion-aware. Following training, a user can use our system to interactively select a point on the mesh and obtain a large, meaningful region around the selection which induces a low-distortion parameterization. Our code and project page are currently available.

翻訳日:2023-03-29 01:11:31 公開日:2023-03-24

# 2体系に対するプレボルン-オッペンハイマーディラック-クーロンブレット計算

Pre-Born-Oppenheimer Dirac-Coulomb-Breit computations for two-body systems ( http://arxiv.org/abs/2301.13477v2 )

ライセンス: Link先を確認

D\'avid Ferenc and Edit M\'atyus

(参考訳) bethe-salpeter方程式から導かれる16成分のno-pair dirac--coulomb-breit方程式は、ガウス型基底関数(例えば、ポジトロニウム、ミューオン、水素原子、ミューオン水素)を用いた変分法によって解かれる。変分エネルギーの$\alpha$ 微構造-定数依存は、$\alpha^n$ と $\alpha^n\text{ln}\alpha$ 項の関数を適合させることにより、(摂動的)非相対論的 qed フレームワークの関連するエネルギー表現と優れた一致を示し、したがって、計算相対論的 qed アプローチの開発のための確かな参照を確立する。

The sixteen-component, no-pair Dirac--Coulomb--Breit equation, derived from the Bethe--Salpeter equation, is solved in a variational procedure using Gaussian-type basis functions for the example of positronium, muonium, hydrogen atom, and muonic hydrogen. The $\alpha$ fine-structure-constant dependence of the variational energies, through fitting a function of $\alpha^n$ and $\alpha^n\text{ln}\alpha$ terms, shows excellent agreement with the relevant energy expressions of the (perturbative) non-relativistic QED framework, and thereby, establishes a solid reference for the development of a computational relativistic QED approach.

翻訳日:2023-03-29 00:45:16 公開日:2023-03-24

# K-Planes: 空間、時間、出現における露光場

K-Planes: Explicit Radiance Fields in Space, Time, and Appearance ( http://arxiv.org/abs/2301.10241v2 )

ライセンス: Link先を確認

Sara Fridovich-Keil, Giacomo Meanti, Frederik Warburg, Benjamin Recht, Angjoo Kanazawa

(参考訳) 任意の次元の放射場に対するホワイトボックスモデルであるk平面を導入する。我々のモデルは、D次元のシーンを表現するためにd choose 2平面を使用し、静的(d=3)から動的(d=4)までのシームレスな方法を提供する。この平面分解により、時間的滑らかさや多次元空間構造といった次元固有の先行要素を容易に追加でき、シーンの静的および動的成分の自然な分解を誘導する。学習カラーベースを持つ線形特徴デコーダを用いて,非線形ブラックボックスmlpデコーダと同様の性能を実現する。様々な合成、現実、静的、動的、固定、そして様々な外観シーンにおいて、kプレーンは競争力があり、しばしば最先端の再現フィリティを、メモリ使用量が少なく、完全な4Dグリッド上で1000倍の圧縮を実現し、純粋なPyTorch実装で高速な最適化を実現している。ビデオ結果とコードについては、https://sarafridov.github.io/K-Planesを参照してください。

We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition of static and dynamic components of a scene. We use a linear feature decoder with a learned color basis that yields similar performance as a nonlinear black-box MLP decoder. Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity with low memory usage, achieving 1000x compression over a full 4D grid, and fast optimization with a pure PyTorch implementation. For video results and code, please see https://sarafridov.github.io/K-Planes.

翻訳日:2023-03-29 00:43:26 公開日:2023-03-24

# ゴール認識表現学習と適応水平予測によるオープンワールドマルチタスク制御

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction ( http://arxiv.org/abs/2301.10034v2 )

ライセンス: Link先を確認

Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

(参考訳) 我々は、人間レベルのマルチタスクエージェントを開発するために、普及し、広くアクセスしやすく、挑戦的なオープンエンド環境であるMinecraftの目標条件ポリシーを学習する問題について研究する。まず、このような政策を学ぶ上での2つの主な課題を特定します。 1)広い場面の多様性により、国家分布からタスクが区別できないこと、及び 2)部分的可観測性に起因する環境力学の非定常性。最初の課題に取り組むために,目標関連視覚状態表現の出現を促す政策として,目標感性バックボーン(GSB)を提案する。第2の課題に取り組むために、このポリシーは非定常力学による学習の不確実性を緩和する適応的な水平予測モジュールによってさらに加速される。 20のMinecraftタスクの実験では、我々のメソッドが今までで最高のベースラインを大幅に上回っていることが示されています。我々のアブレーションと探索研究は、我々のアプローチがどのように相手を圧倒するかを説明し、新しいシーン(バイオーム)にゼロショットの一般化の驚くべきボーナスを明らかにします。当社のエージェントが,minecraftのようなオープンな環境において,目標条件とマルチタスクエージェントの学習に光を当ててくれることを願っています。

We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment for developing human-level multi-task agents. We first identify two main challenges of learning such policies: 1) the indistinguishability of tasks from the state distribution, due to the vast scene diversity, and 2) the non-stationary nature of environment dynamics caused by partial observability. To tackle the first challenge, we propose Goal-Sensitive Backbone (GSB) for the policy to encourage the emergence of goal-relevant visual state representations. To tackle the second challenge, the policy is further fueled by an adaptive horizon prediction module that helps alleviate the learning uncertainty brought by the non-stationary dynamics. Experiments on 20 Minecraft tasks show that our method significantly outperforms the best baseline so far; in many of them, we double the performance. Our ablation and exploratory studies then explain how our approach beat the counterparts and also unveil the surprising bonus of zero-shot generalization to new scenes (biomes). We hope our agent could help shed some light on learning goal-conditioned, multi-task agents in challenging, open-ended environments like Minecraft.

翻訳日:2023-03-29 00:43:05 公開日:2023-03-24

# lego-net:部屋の中のオブジェクトの定期的な並べ替えを学ぶ

LEGO-Net: Learning Regular Rearrangements of Objects in Rooms ( http://arxiv.org/abs/2301.09629v2 )

ライセンス: Link先を確認

Qiuhong Anna Wei, Sijie Ding, Jeong Joon Park, Rahul Sajnani, Adrien Poulenard, Srinath Sridhar, Leonidas Guibas

(参考訳) 人間は、乱雑な部屋を掃除する仕事を普遍的に嫌います。機械がこの作業を支援するためには、複数の対称性、共線形性、または共円性、線形パターンや円形パターンの均一性、さらにスタイルや機能に関連するオブジェクト間の関係など、通常の配置に対する人間の基準を理解する必要がある。従来のアプローチでは、目標状態を明確に指定したり、スクラッチから合成したシーンを明示的に指定するために、人間の入力に依存していたが、このような方法では、目標状態を提供することなく、既存の乱雑なシーンの再配置に対応していない。本稿では,乱雑な部屋での物体の規則的な並べ替えを学習するためのデータ駆動トランスフォーマーに基づく反復的手法であるlego-netを提案する。 LEGO-Netは、部分的に拡散モデルにインスピレーションを受けています -- 最初の混乱状態から始まり、移動距離を減らしながら、オブジェクトの位置と方向を通常の状態に'de-noises'を繰り返します。プロが配置したシーンの既存のデータセットにおいて、ランダムに乱れた物体の位置と向きが与えられた場合、本手法は定期的な再配置の回復を訓練する。その結果,本手法は部屋のシーンを確実に再構成し,他の手法よりも優れていることがわかった。また,数理論機械を用いて部屋配置の規則性を評価する指標を提案する。

Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types of symmetry, co-linearity or co-circularity, spacing uniformity in linear or circular patterns, and further inter-object relationships that relate to style and functionality. Previous approaches for this task relied on human input to explicitly specify goal state, or synthesized scenes from scratch -- but such methods do not address the rearrangement of existing messy scenes without providing a goal state. In this paper, we present LEGO-Net, a data-driven transformer-based iterative method for LEarning reGular rearrangement of Objects in messy rooms. LEGO-Net is partly inspired by diffusion models -- it starts with an initial messy state and iteratively ''de-noises'' the position and orientation of objects to a regular state while reducing distance traveled. Given randomly perturbed object positions and orientations in an existing dataset of professionally-arranged scenes, our method is trained to recover a regular re-arrangement. Results demonstrate that our method is able to reliably rearrange room scenes and outperform other methods. We additionally propose a metric for evaluating regularity in room arrangements using number-theoretic machinery.

翻訳日:2023-03-29 00:42:27 公開日:2023-03-24

# ソースラベル適応による半教師付き領域適応

Semi-Supervised Domain Adaptation with Source Label Adaptation ( http://arxiv.org/abs/2302.02335v2 )

ライセンス: Link先を確認

Yu-Chu Yu and Hsuan-Tien Lin

(参考訳) Semi-Supervised Domain Adaptation (SSDA)は、いくつかのラベル付きおよび多くのラベル付きターゲットデータと関連するドメインからのラベル付きソースデータで、未表示のターゲットデータを分類する学習を含む。現在のSSDAアプローチは、通常、ターゲットデータとラベル付きソースデータとを特徴空間マッピングと擬似ラベル割り当てで整列することを目的としている。それでも、そのようなソース指向モデルは、時にターゲットデータを間違ったクラスのソースデータに合わせることができ、分類性能を低下させる。本稿では,対象データに適合するソースデータに対応する新しいソース適応パラダイムを提案する。私たちの重要なアイデアは、ソースデータを理想のターゲットデータの能動的にラベル付けされたバージョンとして見ることです。そこで本研究では,ターゲット視点から設計したロバストなクリーナーコンポーネントを用いて,ラベルノイズを動的に除去するSSDAモデルを提案する。このパラダイムは、既存のSSDAアプローチの背景にあるコアアイデアとは大きく異なるため、提案したモデルと簡単に結合して性能を向上させることができる。 2つの最先端ssdaアプローチの実験結果は、提案モデルがソースラベル内のノイズを効果的に除去し、ベンチマークデータセットをまたいだアプローチよりも優れたパフォーマンスを示すことを示している。私たちのコードはhttps://github.com/chu0802/SLA で利用可能です。

Semi-Supervised Domain Adaptation (SSDA) involves learning to classify unseen target data with a few labeled and lots of unlabeled target data, along with many labeled source data from a related domain. Current SSDA approaches usually aim at aligning the target data to the labeled source data with feature space mapping and pseudo-label assignments. Nevertheless, such a source-oriented model can sometimes align the target data to source data of the wrong classes, degrading the classification performance. This paper presents a novel source-adaptive paradigm that adapts the source data to match the target data. Our key idea is to view the source data as a noisily-labeled version of the ideal target data. Then, we propose an SSDA model that cleans up the label noise dynamically with the help of a robust cleaner component designed from the target perspective. Since the paradigm is very different from the core ideas behind existing SSDA approaches, our proposed model can be easily coupled with them to improve their performance. Empirical results on two state-of-the-art SSDA approaches demonstrate that the proposed model effectively cleans up the noise within the source labels and exhibits superior performance over those approaches across benchmark datasets. Our code is available at https://github.com/chu0802/SLA .

翻訳日:2023-03-29 00:32:55 公開日:2023-03-24

# 外乱認識対象検出のための正規化フローベース特徴合成

Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection ( http://arxiv.org/abs/2302.07106v2 )

ライセンス: Link先を確認

Nishant Kumar, Sini\v{s}a \v{S}egvi\'c, Abouzar Eslami, Stefan Gumhold

(参考訳) 自律運転のようなアプリケーションには、信頼性の高いオブジェクト検出器の現実的な展開が不可欠である。しかし、Faster R-CNNのような汎用オブジェクト検出器は、不整形物体の過信予測を提供する傾向にある。最近の異常物体検出手法は, クラス条件ガウシアンによるインスタンスワイド特徴の密度を推定し, 低様領域から合成外乱特徴を訓練する。しかし、この戦略は、合成された外層特徴が他のクラス条件ガウス多様体に従えば低い確率を持つことを保証しない。そこで本研究では,すべてのイリアークラスの合同データ分布を可逆正規化フローで学習することにより,イリアーとイリアーオブジェクトを区別する,新しい外れ値認識型オブジェクト検出フレームワークを提案する。フローモデルの適切なサンプリングは、合成されたアウトリアーが全てのオブジェクトクラスのインリアーよりも低い可能性を持つことを保証するため、インリアーとアウトリアーの間のより良い決定境界をモデル化する。提案手法は,画像データとビデオデータの両方において,外部認識オブジェクト検出の最先端性を大幅に向上させる。

Real-world deployment of reliable object detectors is crucial for applications such as autonomous driving. However, general-purpose object detectors like Faster R-CNN are prone to providing overconfident predictions for outlier objects. Recent outlier-aware object detection approaches estimate the density of instance-wide features with class-conditional Gaussians and train on synthesized outlier features from their low-likelihood regions. However, this strategy does not guarantee that the synthesized outlier features will have a low likelihood according to the other class-conditional Gaussians. We propose a novel outlier-aware object detection framework that distinguishes outliers from inlier objects by learning the joint data distribution of all inlier classes with an invertible normalizing flow. The appropriate sampling of the flow model ensures that the synthesized outliers have a lower likelihood than inliers of all object classes, thereby modeling a better decision boundary between inlier and outlier objects. Our approach significantly outperforms the state-of-the-art for outlier-aware object detection on both image and video datasets.

翻訳日:2023-03-29 00:23:55 公開日:2023-03-24

# MEDBERT.de: 医学領域のための総合的なドイツのBERTモデル

MEDBERT.de: A Comprehensive German BERT Model for the Medical Domain ( http://arxiv.org/abs/2303.08179v2 )

ライセンス: Link先を確認

Keno K. Bressem and Jens-Michalis Papaioannou and Paul Grundmann and Florian Borchert and Lisa C. Adams and Leonhard Liu and Felix Busch and Lina Xu and Jan P. Loyen and Stefan M. Niehues and Moritz Augustin and Lennart Grosser and Marcus R. Makowski and Hugo JWL. Aerts and Alexander L\"oser

(参考訳) 本稿では,ドイツ医学領域に特化して設計された,事前訓練型ドイツのBERTモデルであるmedBERTdeについて述べる。このモデルは470万のドイツの医療文書の大規模なコーパスで訓練されており、幅広い規律と医療文書のタイプをカバーする8つの異なる医療ベンチマークにおいて、新しい最先端のパフォーマンスを達成することが示されている。本論文は,モデル全体の性能を評価することに加えて,その機能についてより詳細な分析を行う。本研究では,データ重複がモデルの性能に与える影響と,より効率的なトークン化手法を使用することによる潜在的メリットについて検討する。以上の結果から, medbertde のようなドメイン固有モデルは長文に特に有用であり, トレーニングデータの重複が必ずしも性能向上につながるとは限らない。さらに,効率の良いトークン化はモデルの性能向上に小さな役割しか果たさないことを見出し,改善した性能のほとんどを大量のトレーニングデータに分類した。さらなる研究を促進するために、事前訓練されたモデルウェイトと放射線データに基づく新しいベンチマークが科学コミュニティによって公開されている。

This paper presents medBERTde, a pre-trained German BERT model specifically designed for the German medical domain. The model has been trained on a large corpus of 4.7 Million German medical documents and has been shown to achieve new state-of-the-art performance on eight different medical benchmarks covering a wide range of disciplines and medical document types. In addition to evaluating the overall performance of the model, this paper also conducts a more in-depth analysis of its capabilities. We investigate the impact of data deduplication on the model's performance, as well as the potential benefits of using more efficient tokenization methods. Our results indicate that domain-specific models such as medBERTde are particularly useful for longer texts, and that deduplication of training data does not necessarily lead to improved performance. Furthermore, we found that efficient tokenization plays only a minor role in improving model performance, and attribute most of the improved performance to the large amount of training data. To encourage further research, the pre-trained model weights and new benchmarks based on radiological data are made publicly available for use by the scientific community.

翻訳日:2023-03-28 23:57:37 公開日:2023-03-24

# 聴衆は性役割の表現方法にどのように影響しますか?

How does the audience affect the way we express our gender roles? ( http://arxiv.org/abs/2303.12759v2 )

ライセンス: Link先を確認

Melody Sepahpour-Fard and Michael Quayle and Maria Schuld and Taha Yasseri

(参考訳) 人間は、対話するときに聴衆に合うように言語を適応します。オーディエンス効果は理論や小規模な研究で研究されているが、自然発生のオーディエンス効果に関する大規模な研究は乏しい。本研究では,reddit上での対話の分析を通じて,異なる社会的アイデンティティ(母、父、親など)を強調する,ジェンダー化されたコンテキストとの相互作用におけるオーディエンス効果について検討する。 r/daddit、r/mommit、r/parentingの3つの人気子育てサブreddit(r/daddit、r/mommit、r/parenting)からの投稿を収集した。シングルジェンダーとミックスジェンダーの両方のサブレディットで公開しているユーザのサンプルを選択することで、オーディエンスとジェンダーの効果の両方を探索することができる。投稿を解析するために,単語埋め込みを用い,コーパスにトークンとしてユーザを付加した。これにより、ユーザトケンとワードトケンを比較し、その類似度を測定しました。以上の結果から,母親や父親も同様に振舞い,多種多様な話題を混成ジェンダーの文脈で議論し,教育や家族の問題に相互に助言することに焦点を当てた。シングルジェンダーのサブレディットでは、母親と父親は特定のトピックに焦点を当てている。 r/Mommitの母親は、医療、睡眠、トイレのトレーニング、食べ物などのトピックを議論することで、他のグループと差別化している。母と父の両方が子育てのイベントを祝い、シングルジェンダーの聴衆の前で子供たちの身体的外観を記述またはコメントします。本研究は,母親と父親が異なる関心事を表現し,その行動が異なるグループベースのオーディエンスに適応することを示す。また、Redditと単語の埋め込みを使って、自然な設定でオーディエンスとジェンダーのダイナミクスをよりよく理解する可能性を強調している。

Human beings adapt their language to suit their audience when interacting. While audience effects have been studied in theory and small-scale research, there is a lack of large-scale studies on naturally occurring audience effects. In this study, we examine audience effects in interactions with gendered contexts that emphasize different social identities (e.g. mother, father, and parent) by analyzing interactions on Reddit. We collected posts from three popular parenting subreddits (r/Daddit, r/Mommit, and r/Parenting), which cater to self-identified fathers and mothers (ostensibly single-gender) and parents (explicitly mixed-gender) respectively. By selecting a sample of users who have published on both single-gender and mixed-gender subreddits, we are able to explore both audience and gender effects. To analyze the posts, we used word embeddings and added the user as a token in the corpus. This allowed us to compare user-tokens to word-tokens and measure their similarity. Our results show that mothers and fathers behave similarly and discuss a diverse range of topics in a mixed-gender context, focusing more on advising each other on educational and family matters. In single-gender subreddits, mothers and fathers are more focused on specific topics. Mothers in r/Mommit distinguish themselves from other groups by discussing topics such as medical care, sleep and potty training, and food. Both mothers and fathers celebrate parenting events and describe or comment on the physical appearance of their children in front of a single-gender audience. In conclusion, this study demonstrates how mothers and fathers express different concerns and adapt their behaviour to different group-based audiences. It also highlights the potential of using Reddit and word embeddings to better understand the dynamics of audience and gender in a natural setting.

翻訳日:2023-03-28 21:37:06 公開日:2023-03-24

# DiffuScene: 室内シーン生成のための拡散確率モデルに基づくシーングラフ

DiffuScene: Scene Graph Denoising Diffusion Probabilistic Model for Generative Indoor Scene Synthesis ( http://arxiv.org/abs/2303.14207v1 )

ライセンス: Link先を確認

Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Justus Thies, Matthias Nie{\ss}ner

(参考訳) 本研究では,全接続されたシーングラフに格納された3Dインスタンス特性を生成し,各グラフノードに対して最も類似したオブジェクト形状,すなわち,位置,大きさ,向き,意味,幾何学的特徴など,異なる属性の結合として特徴付けられるオブジェクトインスタンスを検索する,拡散確率モデルに基づく屋内3Dシーン合成のためのディフフルSceneを提案する。このシーングラフに基づいて、3Dインスタンスの配置とタイプを決定する拡散モデルを構築した。本手法は,シーン補完,シーン配置,テキストコンディショニングシーン合成など,多くの下流アプリケーションを容易に行うことができる。 3d-frontデータセットを用いた実験では,最先端の手法よりも物理的に妥当で多様な室内シーンを合成できることが示されている。大規模なアブレーション研究は、シーン拡散モデルにおける設計選択の有効性を検証する。

We present DiffuScene for indoor 3D scene synthesis based on a novel scene graph denoising diffusion probabilistic model, which generates 3D instance properties stored in a fully-connected scene graph and then retrieves the most similar object geometry for each graph node i.e. object instance which is characterized as a concatenation of different attributes, including location, size, orientation, semantic, and geometry features. Based on this scene graph, we designed a diffusion model to determine the placements and types of 3D instances. Our method can facilitate many downstream applications, including scene completion, scene arrangement, and text-conditioned scene synthesis. Experiments on the 3D-FRONT dataset show that our method can synthesize more physically plausible and diverse indoor scenes than state-of-the-art methods. Extensive ablation studies verify the effectiveness of our design choice in scene diffusion models.

翻訳日:2023-03-28 21:26:40 公開日:2023-03-24

# 深層学習に基づく交通システムにおけるバックドアニュートラル化のための最適平滑分布探索

Optimal Smoothing Distribution Exploration for Backdoor Neutralization in Deep Learning-based Traffic Systems ( http://arxiv.org/abs/2303.14197v1 )

ライセンス: Link先を確認

Yue Wang, Wending Li, Michail Maniatakos, Saif Eddin Jabari

(参考訳) 深層強化学習(Dep Reinforcement Learning, DRL)は、自律走行車(AV)の効率を高めるだけでなく、交通渋滞や衝突を引き起こすバックドア攻撃の影響を受けやすくする。バックドア機能は典型的には、本物の入力に対して高い精度を維持するために、秘密の悪意のあるデータでトレーニングデータセットを汚染し、敵が選択した特定の入力に対して所望の(悪意のある)出力を誘導する。バックドアに対する現在の防御は、主に画像に基づく特徴を用いた画像分類に重点を置いており、入力は連続センサデータ、すなわちAVとその周辺車両の速度と距離の組み合わせであるため、DRLベースのAVコントローラの回帰タスクに容易に移行できない。提案手法はバックドアを中和するために入力によく設計されたノイズを付加する。このアプローチでは、バックドアを中和しながら、真の入力の正常な機能を保ちながら最適な平滑化(ノイズ)分布を学習する。これにより、実際の入力に対して高い精度を維持しつつ、バックドア攻撃に対してより回復力のあるモデルが期待できる。本手法の有効性を微視的トラヒックシミュレータに基づくシミュレーショントラヒックシステムで検証し,スムース化トラヒックコントローラがすべてのトリガサンプルを中和し,トラヒックを緩和する性能を維持することを実験的に示した。

Deep Reinforcement Learning (DRL) enhances the efficiency of Autonomous Vehicles (AV), but also makes them susceptible to backdoor attacks that can result in traffic congestion or collisions. Backdoor functionality is typically incorporated by contaminating training datasets with covert malicious data to maintain high precision on genuine inputs while inducing the desired (malicious) outputs for specific inputs chosen by adversaries. Current defenses against backdoors mainly focus on image classification using image-based features, which cannot be readily transferred to the regression task of DRL-based AV controllers since the inputs are continuous sensor data, i.e., the combinations of velocity and distance of AV and its surrounding vehicles. Our proposed method adds well-designed noise to the input to neutralize backdoors. The approach involves learning an optimal smoothing (noise) distribution to preserve the normal functionality of genuine inputs while neutralizing backdoors. By doing so, the resulting model is expected to be more resilient against backdoor attacks while maintaining high accuracy on genuine inputs. The effectiveness of the proposed method is verified on a simulated traffic system based on a microscopic traffic simulator, where experimental results showcase that the smoothed traffic controller can neutralize all trigger samples and maintain the performance of relieving traffic congestion

翻訳日:2023-03-28 21:26:26 公開日:2023-03-24

# DeepEpiSolver:コビッド、HIV、エボラ、病原体の逆問題

DeepEpiSolver: Unravelling Inverse problems in Covid, HIV, Ebola and Disease Transmission ( http://arxiv.org/abs/2303.14194v1 )

ライセンス: Link先を確認

Ritam Majumdar, Shirish Karande, Lovekesh Vig

(参考訳) 多くの感染症の拡散は、結合微分方程式であるSIR区画モデルの変種を用いてモデル化されている。 SIRモデルの係数は、疾患の拡散軌跡を決定する。したがって、係数推定は高速かつ正確でなければならない。 Shaierらは論文"Disease Informed Neural Networks"の中で、SIRモデルのパラメータを推定するために物理インフォームドニューラルネットワーク(PINN)を使用した。このアプローチには2つの欠点がある。まず、ピンの訓練時間は高く、訓練に90時間近くかかる病気もある。第二に、PINNは新たなSIDR軌道を一般化せず、対応するSIRパラメータを学習するには、PINNをゼロから再トレーニングする必要がある。この作業では、これらの両方の欠点を取り除こうとしています。 LSODAアルゴリズムを用いて,パラメータの大規模分布に対する前方問題の解法により,ODEのパラメータと拡散軌跡とのデータセットを生成する。次に、ニューラルネットワークを使用して、拡散された軌道とsidrの係数のマッピングをオフラインで学習する。これにより、再トレーニングすることなく新しい拡散軌道のパラメータを学習し、テスト時の一般化を可能にします。 11の高感染症に対するPINNと同等の精度で3～4桁のスピードアップを観察した。 PINNを用いたニューラルネットワーク推定ODE係数のさらなる微調整により、推定係数の2～3次改善がもたらされる。

The spread of many infectious diseases is modeled using variants of the SIR compartmental model, which is a coupled differential equation. The coefficients of the SIR model determine the spread trajectories of disease, on whose basis proactive measures can be taken. Hence, the coefficient estimates must be both fast and accurate. Shaier et al. in the paper "Disease Informed Neural Networks" used Physics Informed Neural Networks (PINNs) to estimate the parameters of the SIR model. There are two drawbacks to this approach. First, the training time for PINNs is high, with certain diseases taking close to 90 hrs to train. Second, PINNs don't generalize for a new SIDR trajectory, and learning its corresponding SIR parameters requires retraining the PINN from scratch. In this work, we aim to eliminate both of these drawbacks. We generate a dataset between the parameters of ODE and the spread trajectories by solving the forward problem for a large distribution of parameters using the LSODA algorithm. We then use a neural network to learn the mapping between spread trajectories and coefficients of SIDR in an offline manner. This allows us to learn the parameters of a new spread trajectory without having to retrain, enabling generalization at test time. We observe a speed-up of 3-4 orders of magnitude with accuracy comparable to that of PINNs for 11 highly infectious diseases. Further finetuning of neural network inferred ODE coefficients using PINN further leads to 2-3 orders improvement of estimated coefficients.

翻訳日:2023-03-28 21:26:02 公開日:2023-03-24

# 臨床基礎モデルの揺るぎない基礎:EMRのための大規模言語モデルと基礎モデルに関する調査

The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs ( http://arxiv.org/abs/2303.12961v2 )

ライセンス: Link先を確認

Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A. Pfeffer, Jason Fries, Nigam H. Shah

(参考訳) chatgptやalphafoldのような基礎モデルの成功は、患者ケアや病院の運営を改善するために、電子医療記録(emr)の類似モデルを構築することに大きな関心を寄せている。しかし、最近の誇大広告は、これらのモデルの能力に対する理解において重大なギャップを曖昧にした。我々は,非イメージングEMMデータ(臨床テキストおよび/または構造化データ)に基づいて訓練された80以上の基礎モデルをレビューし,そのアーキテクチャ,トレーニングデータ,潜在的なユースケースを記述した分類学を作成する。殆どのモデルは、小さな、狭くスコープされた臨床データセット(MIMIC-IIIなど)や、広く公共のバイオメディカルコーパス(PubMedなど)で訓練されており、健康システムに対する有用性について有意義な洞察を与えていないタスクで評価されている。これらの知見を踏まえて,医療において重要な指標により深く根ざした臨床基礎モデルの利点を評価するための,改善された評価枠組みを提案する。

The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

翻訳日:2023-03-28 21:25:19 公開日:2023-03-24

# MaxSATによる量子色符号の復号

Decoding quantum color codes with MaxSAT ( http://arxiv.org/abs/2303.14237v1 )

ライセンス: Link先を確認

Lucas Berent, Lukas Burgholzer, Peter-Jan H.S. Derks, Jens Eisert, Robert Wille

(参考訳) 古典コンピューティングでは、誤り訂正符号は十分に確立されており、理論と実用の両方においてユビキタスである。量子コンピューティングにとって、エラー補正は必須であるが、実現が困難であり、リソースのオーバーヘッドがかなり高く、また、現実の古典的コンピューティングの必要性と相容れない。量子誤り訂正符号は、推定された短期的応用を超えて、フォールトトレラントな量子計算への道に中心的な役割を果たす。その中でも、色コードは特に重要な量子符号のクラスであり、他の符号よりも有利な性質のために近年関心を集めている。古典計算と同様に、復号化は、破損した状態から破損しない状態を復元する操作を推論する問題であり、フォールトトレラント量子デバイスの開発の中心である。本稿では,カラーコードの復号化問題を,よく知られたLightsOutパズルのわずかなバリエーションに還元する方法について述べる。量子カラーコードのための新しいデコーダを提案し,その類似性に基づいて定式化をマックスSAT問題として用いた。さらに、MaxSATの構成を最適化し、提案した復号器の復号性能がカラーコード上で最先端の復号性能を実現することを示す。デコーダの実装と、数値実験を自動的に実行するツールは、GitHubのMunge Quantum Toolkit(MQT)の一部として公開されている。

In classical computing, error-correcting codes are well established and are ubiquitous both in theory and practical applications. For quantum computing, error-correction is essential as well, but harder to realize, coming along with substantial resource overheads and being concomitant with needs for substantial classical computing. Quantum error-correcting codes play a central role on the avenue towards fault-tolerant quantum computation beyond presumed near-term applications. Among those, color codes constitute a particularly important class of quantum codes that have gained interest in recent years due to favourable properties over other codes. As in classical computing, decoding is the problem of inferring an operation to restore an uncorrupted state from a corrupted one and is central in the development of fault-tolerant quantum devices. In this work, we show how the decoding problem for color codes can be reduced to a slight variation of the well-known LightsOut puzzle. We propose a novel decoder for quantum color codes using a formulation as a MaxSAT problem based on this analogy. Furthermore, we optimize the MaxSAT construction and show numerically that the decoding performance of the proposed decoder achieves state-of-the-art decoding performance on color codes. The implementation of the decoder as well as tools to automatically conduct numerical experiments are publicly available as part of the Munich Quantum Toolkit (MQT) on GitHub.

翻訳日:2023-03-28 21:17:48 公開日:2023-03-24

# SIGMORPHON 2023 インターライングロースの共有タスク:ベースラインモデル

SIGMORPHON 2023 Shared Task of Interlinear Glossing: Baseline Model ( http://arxiv.org/abs/2303.14234v1 )

ライセンス: Link先を確認

Michael Ginn

(参考訳) 言語文書は言語保存の重要な側面であり、しばしばInterlinear Glossed Text (IGT) の作成を含む。 IGTの作成は時間と手間がかかり、プロセスの自動化は貴重なアノテータの労力を節約します。本稿では,sigmorphon 2023における線間光沢作業のベースラインシステムについて述べる。本システムでは,トランスアーキテクチャを用いて,グロス生成をシーケンスラベリングタスクとして扱う。

Language documentation is a critical aspect of language preservation, often including the creation of Interlinear Glossed Text (IGT). Creating IGT is time-consuming and tedious, and automating the process can save valuable annotator effort. This paper describes the baseline system for the SIGMORPHON 2023 Shared Task of Interlinear Glossing. In our system, we utilize a transformer architecture and treat gloss generation as a sequence labelling task.

翻訳日:2023-03-28 21:17:29 公開日:2023-03-24

# ダガー線形論理とカテゴリー量子力学

Dagger linear logic and categorical quantum mechanics ( http://arxiv.org/abs/2303.14231v1 )

ライセンス: Link先を確認

Priyaa Varshinee Srinivasan

(参考訳) この論文は非コンパクトな乗法ダガー線形論理の圏論的証明理論を発展させ、圏論的量子力学(cqm)への応用を研究する。既存の CQM のフレームワークはコンパクトダガー線型論理の分類的証明理論であり、有限次元ヒルベルト空間の圏における量子系の解釈によって動機付けられる。この論文では、無限次元のシステムに対応可能なMixed Unitary Categoriesと呼ばれる新しい非コンパクトフレームワークを説明し、フレームワークのモデルを開発する。この目的のために、線形分布圏と(非コンパクトな)乗法線形論理の分類的証明理論である$*$-自己同型圏の上に構築される。非コンパクトなダガー線形論理の証明理論は、ダガー-LDCを与える適切なコヒーレンスを満たすダガー関手を追加することにより、LCCの基本設定から得られる。すべての (isomix) dagger-ldc から、同値まで標準の "unitary core" を抽出することができ、これはdagger-monoidal categoryの伝統的な cqm フレームワークである。これは混合ユニタリカテゴリ(MUC)のフレームワークにつながります: すべてのMUCは(コンパクトでない)ユニタリコアを含み、(非コンパクトな)アイソミックスダガー-LDCで拡張されます。本論文では, 有限性空間, チュ空間, ホップモジュールなどに基づく MUC の様々なモデルを開発した。この論文はまた、可観測性、測定、相補性といったCQMの重要な代数構造をMUCフレームワークに一般化する。さらに、MUCフレームワークを用いて、この論文は量子力学の相補的な可観測性と線形論理の指数的モジュラリティとの間の接続を確立する。

This thesis develops the categorical proof theory for the non-compact multiplicative dagger linear logic, and investigates its applications to Categorical Quantum Mechanics (CQM). The existing frameworks of CQM are categorical proof theories of compact dagger linear logic, and are motivated by the interpretation of quantum systems in the category of finite dimensional Hilbert spaces. This thesis describes a new non-compact framework called Mixed Unitary Categories which can accommodate infinite dimensional systems, and develops models for the framework. To this end, it builds on linearly distributive categories, and $*$-autonomous categories which are categorical proof theories of (non-compact) multiplicative linear logic. The proof theory of non-compact dagger-linear logic is obtained from the basic setting of an LDC by adding a dagger functor satisfying appropriate coherences to give a dagger-LDC. From every (isomix) dagger-LDC one can extract a canonical "unitary core" which up to equivalence is the traditional CQM framework of dagger-monoidal categories. This leads to the framework of Mixed Unitary Categories (MUCs): every MUC contains a (compact) unitary core which is extended by a (non-compact) isomix dagger-LDC. Various models of MUCs based on Finiteness Spaces, Chu spaces, Hopf modules, etc., are developed in this thesis. This thesis also generalizes the key algebraic structures of CQM, such as observables, measurement, and complementarity, to MUC framework. Furthermore, using the MUC framework, this thesis establishes a connection between the complementary observables of quantum mechanics and the exponential modalities of linear logic.

翻訳日:2023-03-28 21:17:22 公開日:2023-03-24

# 効率的なマルチエージェント強化学習のための因果検出

Causality Detection for Efficient Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2303.14227v1 )

ライセンス: Link先を確認

Rafael Pina, Varuna De Silva and Corentin Artaud

(参考訳) タスクをチームとして学ぶとき、MARL(Multi-Agent Reinforcement Learning)のエージェントの中には、チームのパフォーマンスに対する真の影響を理解することができないものもある。このようなエージェントは、望ましくない怠慢な振る舞いを示す、準最適ポリシーを学ぶ。そこで本研究では,marl問題に適用した時間的因果関係の利用を定式化することから始まる。次に,このような遅延エージェントのペナルティと行動改善に因果性がどう役立つかを示す。彼らのローカルな観察がチーム報酬と因果関係にあるかを理解することによって、チームの各エージェントは、報酬を発生させたかどうかに基づいて個々のクレジットを調整することができる。 MARLにおける因果推定は,チームの全体的パフォーマンスだけでなく,各エージェントの個々の能力も向上することを示す。我々は、改善が複数の異なる環境で一貫したものであることを観察する。

When learning a task as a team, some agents in Multi-Agent Reinforcement Learning (MARL) may fail to understand their true impact in the performance of the team. Such agents end up learning sub-optimal policies, demonstrating undesired lazy behaviours. To investigate this problem, we start by formalising the use of temporal causality applied to MARL problems. We then show how causality can be used to penalise such lazy agents and improve their behaviours. By understanding how their local observations are causally related to the team reward, each agent in the team can adjust their individual credit based on whether they helped to cause the reward or not. We show empirically that using causality estimations in MARL improves not only the holistic performance of the team, but also the individual capabilities of each agent. We observe that the improvements are consistent in a set of different environments.

翻訳日:2023-03-28 21:16:52 公開日:2023-03-24

# 合成結合:組合せ介入のための因果推論フレームワーク

Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions ( http://arxiv.org/abs/2303.14226v1 )

ライセンス: Link先を確認

Abhineet Agarwal, Anish Agarwal, Suhas Vijaykumar

(参考訳) 我々は、$N$不均一単位と$p$介入の設定を考える。我々の目標は、これらの$p$介入の任意の組み合わせ、すなわち$N \times 2^p$因果パラメータについて、単位固有の潜在的な結果を学ぶことである。介入の組み合わせの選択は、要因設計実験、レコメンデーションエンジン(例えば、ユーザのエンゲージメントを最大化する一連の映画を表示する)、医学における組み合わせ療法、MLモデルの重要な特徴の選択など、多くのアプリケーションで自然に発生する問題である。様々なパラメータを推定するために$N \times 2^p$実験を実行すると、$N$と$p$が成長するので不可能である。また、観測データにより、単位が組み合わせで見られるか否かが、その組み合わせ下での潜在的結果と相関している可能性が高い。これらの課題に対処するため,我々は単位と組合せの両方に潜伏構造を課す新しいモデルを提案する。我々は、単位間の潜在類似性(すなわち、潜在的な結果行列はランクr$)と組み合わせがどのように相互作用するか(つまり、潜在的な結果のフーリエ展開の係数は$s$ sparse)を仮定する。我々は、観察不能な一致にもかかわらず、すべての因果パラメータの識別を確立する。本稿では,観察パターン上の精密な条件下で推定手順,合成組合せ,有限サンプル一貫性を確立することを提案する。この結果から、Synthetic Combinations は$\text{poly}(r) \times (N + s^2p)$ の単位固有ポテンシャルを一貫して推定する。比較して、両単位と組合せをまたいで構造を活用しない以前の手法では、サンプル複雑性のスケーリングは$\min(n \times s^2p, \ \r \times (n + 2^p))$である。合成組合せを用いて組合せ因果推論のためのデータ効率の高い実験設計機構を提案する。我々は数値シミュレーションで理論的な結果を裏付ける。

We consider a setting with $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing combinations of interventions is a problem that naturally arises in many applications such as factorial design experiments, recommendation engines (e.g., showing a set of movies that maximizes engagement for users), combination therapies in medicine, selecting important features for ML models, etc. Running $N \times 2^p$ experiments to estimate the various parameters is infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. To address these challenges, we propose a novel model that imposes latent structure across both units and combinations. We assume latent similarity across units (i.e., the potential outcomes matrix is rank $r$) and regularity in how combinations interact (i.e., the coefficients in the Fourier expansion of the potential outcomes is $s$ sparse). We establish identification for all causal parameters despite unobserved confounding. We propose an estimation procedure, Synthetic Combinations, and establish finite-sample consistency under precise conditions on the observation pattern. Our results imply Synthetic Combinations consistently estimates unit-specific potential outcomes given $\text{poly}(r) \times (N + s^2p)$ observations. In comparison, previous methods that do not exploit structure across both units and combinations have sample complexity scaling as $\min(N \times s^2p, \ \ r \times (N + 2^p))$. We use Synthetic Combinations to propose a data-efficient experimental design mechanism for combinatorial causal inference. We corroborate our theoretical findings with numerical simulations.

翻訳日:2023-03-28 21:16:37 公開日:2023-03-24

# 新規炭素捕集溶媒の機械ガイドによる発見

Machine Guided Discovery of Novel Carbon Capture Solvents ( http://arxiv.org/abs/2303.14223v1 )

ライセンス: Link先を確認

James L. McDonagh, Benjamin H. Wunsch, Stamatia Zavitsanou, Alexander Harrison, Bruce Elmegreen, Stacey Gifford, Theodore van Kessel, and Flaviu Cipcigan

(参考訳) 二酸化炭素排出量の削減に炭素捕獲技術の重要性が高まり、スケーラビリティと効率性を実現するために捕獲材料を改善する必要性が高まり、かなりのコストと時間を要する材料開発が課題となっている。機械学習は、構造とプロパティの関係の効率的な相関によって、材料開発における時間と資源の負担を軽減し、ダウンセレクションを可能にし、有望な候補に焦点を当てる、有望な方法を提供する。これを実証するために, 市販の酸性ガススクラップ式炭酸ガスに適合する新しい水性アミンを抽出する「発見サイクル」を開発した。簡便で迅速なCO2吸収測定法と機械学習に基づく分子フィンガープリントモデルによるアプローチを組み合わせる。予測プロセスでは,材料パラメータの両実験に対する60%の精度と,外部テストセット上の単一パラメータに対する80%の精度を示す。発見サイクルは、実験的に検証され、以前に炭素捕獲に応用されなかったいくつかの有望なアミンを決定づけた。この過程で、炭素捕獲アミンのための大規模な単一ソースデータセットをコンパイルし、アミン分子候補を特定するためのオープンソースの機械学習ツールを作成した(https://github.com/IBM/Carbon-capture-fingerprint-generation)。

The increasing importance of carbon capture technologies for deployment in remediating CO2 emissions, and thus the necessity to improve capture materials to allow scalability and efficiency, faces the challenge of materials development, which can require substantial costs and time. Machine learning offers a promising method for reducing the time and resource burdens of materials development through efficient correlation of structure-property relationships to allow down-selection and focusing on promising candidates. Towards demonstrating this, we have developed an end-to-end "discovery cycle" to select new aqueous amines compatible with the commercially viable acid gas scrubbing carbon capture. We combine a simple, rapid laboratory assay for CO2 absorption with a machine learning based molecular fingerprinting model approach. The prediction process shows 60% accuracy against experiment for both material parameters and 80% for a single parameter on an external test set. The discovery cycle determined several promising amines that were verified experimentally, and which had not been applied to carbon capture previously. In the process we have compiled a large, single-source data set for carbon capture amines and produced an open source machine learning tool for the identification of amine molecule candidates (https://github.com/IBM/Carbon-capture-fingerprint-generation).

翻訳日:2023-03-28 21:16:01 公開日:2023-03-24

# 自然言語処理を用いたレイテキスト要約 : 物語的文献レビュー

Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review ( http://arxiv.org/abs/2303.14222v1 )

ライセンス: Link先を確認

Oliver Vinzelberg, Mark David Jenkins, Gordon Morison, David McMinn and Zoe Tieges

(参考訳) 研究成果の要約は,研究成果の公開理解を促進するために重要である。自然言語処理による素な要約の生成は、研究者の作業負荷を軽減し、科学と社会の間のギャップを埋める可能性を秘めている。この物語文学レビューの目的は、平文要約を生成するのに使われる異なるテキスト要約アプローチを記述・比較することである。 Web of Science、Google Scholar、IEEE Xplore、Association for Computing Machinery Digital Library、arXivといったデータベースを2022年5月6日まで検索しました。要約文生成のためのテキスト自動要約手法に関する独自の研究も含んでいる。 82の記事をスクリーニングし、2020年から2021年にかけて発行された8つの関連論文を含む。その結果,変換器からの双方向エンコーダ表現 (BERT) や抽象要約のための抽出ギャップ文による事前学習 (PEGASUS) といった変換器を用いた手法が,テキスト要約のランドスケープを支配していることがわかった。ハイブリッドアプローチにおける抽出的および抽象的要約法の組み合わせが最も有効であることが判明した。さらに、入力テキストへの前処理アプローチ(例:抽出要約の適用)や、テキストのどのセクションを含めるかの決定が重要になる。可読性を考慮していない,リコール指向のgisting evaluation (rouge) のための評価指標が用いられた。結論として、自動的なテキスト要約は未探索である。今後の研究では,臨床試験報告を含む長期文書の要約と,要約の可読性を考慮した評価指標の開発を検討すべきである。

Summarisation of research results in plain language is crucial for promoting public understanding of research findings. The use of Natural Language Processing to generate lay summaries has the potential to relieve researchers' workload and bridge the gap between science and society. The aim of this narrative literature review is to describe and compare the different text summarisation approaches used to generate lay summaries. We searched the databases Web of Science, Google Scholar, IEEE Xplore, Association for Computing Machinery Digital Library and arXiv for articles published until 6 May 2022. We included original studies on automatic text summarisation methods to generate lay summaries. We screened 82 articles and included eight relevant papers published between 2020 and 2021, all using the same dataset. The results show that transformer-based methods such as Bidirectional Encoder Representations from Transformers (BERT) and Pre-training with Extracted Gap-sentences for Abstractive Summarization (PEGASUS) dominate the landscape of lay text summarisation, with all but one study using these methods. A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective. Furthermore, pre-processing approaches to input text (e.g. applying extractive summarisation) or determining which sections of a text to include, appear critical. Evaluation metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) were used, which do not consider readability. To conclude, automatic lay text summarisation is under-explored. Future research should consider long document lay text summarisation, including clinical trial reports, and the development of evaluation metrics that consider readability of the lay summary.

翻訳日:2023-03-28 21:15:39 公開日:2023-03-24

# 情報表現の戦い:市場動向予測のための感性と意味的特徴の比較

The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market Trends ( http://arxiv.org/abs/2303.14221v1 )

ライセンス: Link先を確認

Andrei Zaichenko, Aleksei Kazakov, Elizaveta Kovtun, and Semen Budennyy

(参考訳) 機械学習アプローチの魅力を伴う株式市場の研究は、隠された市場規則を明らかにする主要な方向である。この知識は金融市場のダイナミクスを深く理解し、従来の分析手法では見つからなかった行動的洞察を得ることに寄与する。株価は本質的に世界の出来事や社会的認識と関係している。したがって、株価予測モデルを構築する際には、ニュースやソーシャルメディアの投稿に反映された外界にそのような情報を組み込むことが重要となる。これに対応するために研究者は、(1)テキストから抽出された感情、または(2)原文埋め込みという暗黙的あるいは明示的な知識表現を利用する。しかし、金融モデルの予測力への影響という点で、これらのアプローチの直接比較にはあまり研究の注意が払われていない。本稿では,このギャップを埋め,文脈埋め込み形式の意味的特徴が市場動向を予測するための感情属性よりも価値があるかを明らかにすることを目的とする。当社は、NASDAQの資本化による大手企業に関連するTwitter投稿のコーパスとその価格設定について検討する。まず、企業株価のボラティリティとツイート感情の関連性を示す。既存の関係を確信して、ツイートの感情やツイートの埋め込みを補完する価格予測のために、時間的融合トランスフォーマーモデルをトレーニングします。以上の結果から,感情的特徴の活用により,有意な頻度で測定値が上昇することが示唆された。注目すべきは、この結論は、大手テック企業のTwitter投稿や株価に関する考慮されたシナリオの中で正当化できることだ。

The study of the stock market with the attraction of machine learning approaches is a major direction for revealing hidden market regularities. This knowledge contributes to a profound understanding of financial market dynamics and getting behavioural insights, which could hardly be discovered with traditional analytical methods. Stock prices are inherently interrelated with world events and social perception. Thus, in constructing the model for stock price prediction, the critical stage is to incorporate such information on the outside world, reflected through news and social media posts. To accommodate this, researchers leverage the implicit or explicit knowledge representations: (1) sentiments extracted from the texts or (2) raw text embeddings. However, there is too little research attention to the direct comparison of these approaches in terms of the influence on the predictive power of financial models. In this paper, we aim to close this gap and figure out whether the semantic features in the form of contextual embeddings are more valuable than sentiment attributes for forecasting market trends. We consider the corpus of Twitter posts related to the largest companies by capitalization from NASDAQ and their close prices. To start, we demonstrate the connection of tweet sentiments with the volatility of companies' stock prices. Convinced of the existing relationship, we train Temporal Fusion Transformer models for price prediction supplemented with either tweet sentiments or tweet embeddings. Our results show that in the substantially prevailing number of cases, the use of sentiment features leads to higher metrics. Noteworthy, the conclusions are justifiable within the considered scenario involving Twitter posts and stocks of the biggest tech companies.

翻訳日:2023-03-28 21:15:10 公開日:2023-03-24

# 正規化流を用いた縦波データの変分推定

Variational Inference for Longitudinal Data Using Normalizing Flows ( http://arxiv.org/abs/2303.14220v1 )

ライセンス: Link先を確認

Cl\'ement Chadebec and St\'ephanie Allassonni\`ere

(参考訳) 本稿では,高次元縦データを扱うことができ,変分推論に依存する新しい潜在変数生成モデルを提案する。入力シーケンスの観測間の時間依存性は、関連する潜伏変数上の正規化フローを用いてモデル化される。提案手法は,複数のデータに条件付けされた完全合成縦列あるいは軌跡を連続的に生成し,欠落したデータに対するロバスト性を示す。我々は、異なる複雑性を持つ6つのデータセットでモデルをテストし、いくつかの競合製品やより信頼性の高いデータインプテーションよりも高い確率推定を達成できることを示します。コードは \url{https://github.com/clementchadebec/variational_inference_for_longitudinal_data} で利用可能である。

This paper introduces a new latent variable generative model able to handle high dimensional longitudinal data and relying on variational inference. The time dependency between the observations of an input sequence is modelled using normalizing flows over the associated latent variables. The proposed method can be used to generate either fully synthetic longitudinal sequences or trajectories that are conditioned on several data in a sequence and demonstrates good robustness properties to missing data. We test the model on 6 datasets of different complexity and show that it can achieve better likelihood estimates than some competitors as well as more reliable missing data imputation. A code is made available at \url{https://github.com/clementchadebec/variational_inference_for_longitudinal_data}.

翻訳日:2023-03-28 21:14:45 公開日:2023-03-24

# 物理認識型単一画像デハジングのための曲線コントラスト正則化

Curricular Contrastive Regularization for Physics-aware Single Image Dehazing ( http://arxiv.org/abs/2303.14218v1 )

ライセンス: Link先を確認

Yu Zheng, Jiahui Zhan, Shengfeng He, Junyu Dong, and Yong Du

(参考訳) 不適切な性質を考えると、単一の画像デハジングのためにコントラスト正則化が開発され、負の画像からの情報を下界として導入している。しかし、対照的なサンプルは、通常、負はクリアな(すなわち正の)像から遠ざかって表現され、解空間は下限のままである。さらに、深層デハジングモデルの解釈性は、ハジング過程の物理に対して過小評価されている。本稿では, コンセンサスでないコントラスト空間を対象として, 非コンセンサスなコントラスト正規化を提案する。より低いバウンダリの制約を提供する私たちの負は 1) ぼやけた画像, そして 2) 他の方法による対応する復旧さらに、鮮明な画像の埋め込みと負の類似性が異なるため、複数のコンポーネントの学習困難は本質的に不均衡である。この問題に取り組むために,異なる否定の重要性を強調するためにカリキュラム学習戦略をカスタマイズする。さらに, 特徴空間の解釈性を向上させるため, 大気圧散乱モデルに基づく物理対応二分岐ユニットを構築した。このユニットとカーキュラーコントラスト正則化により、我々はc2pnetと呼ばれるデハザーズネットワークを確立する。我々のC2PNetは、SOTS-indoorデータセットとSOTS-outdoorデータセットにおいて、それぞれ3.94dBと1.50dBの極端なPSNRアップで最先端の手法を大幅に上回ることを示した。

Considering the ill-posed nature, contrastive regularization has been developed for single image dehazing, introducing the information from negative images as a lower bound. However, the contrastive samples are nonconsensual, as the negatives are usually represented distantly from the clear (i.e., positive) image, leaving the solution space still under-constricted. Moreover, the interpretability of deep dehazing models is underexplored towards the physics of the hazing process. In this paper, we propose a novel curricular contrastive regularization targeted at a consensual contrastive space as opposed to a non-consensual one. Our negatives, which provide better lower-bound constraints, can be assembled from 1) the hazy image, and 2) corresponding restorations by other existing methods. Further, due to the different similarities between the embeddings of the clear image and negatives, the learning difficulty of the multiple components is intrinsically imbalanced. To tackle this issue, we customize a curriculum learning strategy to reweight the importance of different negatives. In addition, to improve the interpretability in the feature space, we build a physics-aware dual-branch unit according to the atmospheric scattering model. With the unit, as well as curricular contrastive regularization, we establish our dehazing network, named C2PNet. Extensive experiments demonstrate that our C2PNet significantly outperforms state-of-the-art methods, with extreme PSNR boosts of 3.94dB and 1.50dB, respectively, on SOTS-indoor and SOTS-outdoor datasets.

翻訳日:2023-03-28 21:14:33 公開日:2023-03-24

# SU(2)プラケット鎖上の非アベリアゲージ理論 : 固有状態熱化仮説

SU(2) Non-Abelian Gauge Theory on a Plaquette Chain Obeys Eigenstate Thermalization Hypothesis ( http://arxiv.org/abs/2303.14264v1 )

ライセンス: Link先を確認

Xiaojun Yao

(参考訳) 2+1次元su(2)格子ゲージ理論の固有状態熱化仮説(eth)をテストする。 j=1/2$ におけるプラーペットの連鎖の理論とリンク変数の切断基底状態を考えることにより、イジングチェーンにそれを写像し、適度に大きな格子の大きさのハミルトニアンを正確に対角化することができる。運動量セクターのエネルギー準位は、残りの離散対称性を持たない。我々はWilsonループからなる局所観測器を2つ研究し、ETHと整合性を示すエネルギー固有基底の行列要素を計算した。我々の研究は、量子クロモダイナミックス(QCD)の物理ヒルベルト空間における状態のサブセットがETHに従うことを示唆している。

We test the eigenstate thermalization hypothesis (ETH) for 2+1 dimensional SU(2) lattice gauge theory. By considering the theory on a chain of plaquettes and truncating basis states for link variables at $j=1/2$, we can map it onto an Ising chain and numerically exactly diagonalize the Hamiltonian for reasonably large lattice sizes. We find energy level repulsion in momentum sectors with no remaining discrete symmetry. We study two local observables made up of Wilson loops and calculate their matrix elements in the energy eigenbasis, which are shown consistent with the ETH. Our study implies a subset of states in the physical Hilbert space of Quantum Chromodynamics (QCD) obeys the ETH.

翻訳日:2023-03-28 21:07:18 公開日:2023-03-24

# PACE:DenseとCluttered環境におけるデータ駆動型仮想エージェントインタラクション

PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments ( http://arxiv.org/abs/2303.14255v1 )

ライセンス: Link先を確認

James Mullen, Dinesh Manocha

(参考訳) PACEは,高密度で散らばった3Dシーン全体と対話し,移動するために,モーションキャプチャーされた仮想エージェントを修正する新しい手法である。提案手法は,環境中の障害物や物体に適応するために,仮想エージェントの所定の動作順序を変化させる。まず、シーンとの相互作用をモデル化する上で最も重要な動きシーケンスの個々のフレームを、関連するシーンの幾何学、障害物、セマンティクスと組み合わせて、エージェントの動作がシーンの余裕(例えば、床に立ったり、椅子に座ったり)と一致するようにします。次に、シーンの特異な幾何学的制約を考慮し、各フレームの高DOFポーズを直接変更することで、人間の動きを最適化する。我々の定式化は、現実的な流れと自然な動きを維持する新しい損失関数を用いる。提案手法を先行動作生成技術と比較し,本手法の利点を知覚研究および身体的妥当性指標と比較した。人間のラテンダーは、我々の手法を以前のアプローチよりも好んだ。具体的には,既存の動作を用いた最先端の手法に対して57.1%,最先端の動作合成法に対して81.0%を好んだ。さらに,本手法は,確立された物理的可能性と相互作用の指標において有意に高い性能を示す。具体的には,非衝突距離では1.2%以上,接触距離では18%以上,競合手法では1.2%以上である。インタラクティブなシステムをMicrosoft HoloLensに統合し、現実世界の屋内シーンでそのメリットを実証しました。プロジェクトのwebサイトはhttps://gamma.umd.edu/pace/で閲覧できます。

We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment. We first take the individual frames of the motion sequence most important for modeling interactions with the scene and pair them with the relevant scene geometry, obstacles, and semantics such that interactions in the agents motion match the affordances of the scene (e.g., standing on a floor or sitting in a chair). We then optimize the motion of the human by directly altering the high-DOF pose at each frame in the motion to better account for the unique geometric constraints of the scene. Our formulation uses novel loss functions that maintain a realistic flow and natural-looking motion. We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics. Human raters preferred our method over the prior approaches. Specifically, they preferred our method 57.1% of the time versus the state-of-the-art method using existing motions, and 81.0% of the time versus a state-of-the-art motion synthesis method. Additionally, our method performs significantly higher on established physical plausibility and interaction metrics. Specifically, we outperform competing methods by over 1.2% in terms of the non-collision metric and by over 18% in terms of the contact metric. We have integrated our interactive system with Microsoft HoloLens and demonstrate its benefits in real-world indoor scenes. Our project website is available at https://gamma.umd.edu/pace/.

翻訳日:2023-03-28 21:07:05 公開日:2023-03-24

# 時系列予測の多様性とコヒーレント化に向けて

Towards Diverse and Coherent Augmentation for Time-Series Forecasting ( http://arxiv.org/abs/2303.14254v1 )

ライセンス: Link先を確認

Xiyuan Zhang, Ranak Roy Chowdhury, Jingbo Shang, Rajesh Gupta, Dezhi Hong

(参考訳) 時系列データ拡張は、ディープラーニングモデルのトレーニングデータ不足の問題を軽減する。しかし,既存の拡張法は主に分類用に設計されており,拡張によって時間的ダイナミクスが変化してもクラスラベルを保存できる。予測のために設計された拡張には多様性と、当初の時間的ダイナミクスとの整合性が必要であることに留意する。実生活の物理プロセスによって生成された時系列データは時間領域と周波数領域の両方で特性を示すため、より多様でコヒーレントなサンプルを生成するためにスペクトルと時間拡張(staug)を組み合わせることを提案する。具体的には、周波数領域において、経験的モード分解を用いて時系列を分解し、サブコンポーネントをランダムな重みで再構成する。このようにして、同一の基底成分を含むため、元の時間的関係と一貫性を持ちながら多様なサンプルを生成する。時間領域では、多種多様かつ線形にコヒーレントなサンプルを生成する混合戦略を適用する。 5つの実世界の時系列データセットの実験は、staugがデータ拡張や最先端の強化手法なしでベースモデルを上回ることを示している。

Time-series data augmentation mitigates the issue of insufficient training data for deep learning models. Yet, existing augmentation methods are mainly designed for classification, where class labels can be preserved even if augmentation alters the temporal dynamics. We note that augmentation designed for forecasting requires diversity as well as coherence with the original temporal dynamics. As time-series data generated by real-life physical processes exhibit characteristics in both the time and frequency domains, we propose to combine Spectral and Time Augmentation (STAug) for generating more diverse and coherent samples. Specifically, in the frequency domain, we use the Empirical Mode Decomposition to decompose a time series and reassemble the subcomponents with random weights. This way, we generate diverse samples while being coherent with the original temporal relationships as they contain the same set of base components. In the time domain, we adapt a mix-up strategy that generates diverse as well as linearly in-between coherent samples. Experiments on five real-world time-series datasets demonstrate that STAug outperforms the base models without data augmentation as well as state-of-the-art augmentation methods.

翻訳日:2023-03-28 21:06:39 公開日:2023-03-24

# A-MuSIC:環境変化における視覚的位置認識のための適応型アンサンブルシステム

A-MuSIC: An Adaptive Ensemble System For Visual Place Recognition In Changing Environments ( http://arxiv.org/abs/2303.14247v1 )

ライセンス: Link先を確認

Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier and Shoaib Ehsan

(参考訳) 視覚的位置認識(VPR)は、画像データのみを使用して場所を識別するロボットナビゲーションおよび位置決めシステムにおいて不可欠なコンポーネントである。 VPRは、日中、季節の天気と異なる視点で観察された場合、異なる照明の下で、ある場所の出現が著しく変化するため、困難である。現在、すべての環境条件において単一のVPR技術が優れておらず、それぞれに固有の利点と欠点がある。その結果、複数の手法を組み合わせたVPRシステムは、高い計算負荷を犠牲にして、変化する環境においてより信頼性の高いVPR性能を実現する。この問題に対処するため,適応型マルチセル識別・補正(A-MuSIC)と呼ばれる適応型VPRシステムを提案する。まず,マッチングクエリのフレーム間連続性を解析することにより,vpr手法のランタイム性能に関する情報を収集する手法を開発する。次に,その手法の静的アンサンブル上での操作方法を示し,その手法が現在の環境に最も寄与するデータを生成する。 A-MuSICは収集した情報を使用して、最小限のテクニックのサブセットを選択し、ナビゲーション中に再選択が必要なタイミングを決定する。 a-musicは、テストされたすべてのベンチマークデータセットで最先端のvprのパフォーマンスをマッチさせるか、または上回る。

Visual place recognition (VPR) is an essential component of robot navigation and localization systems that allows them to identify a place using only image data. VPR is challenging due to the significant changes in a place's appearance under different illumination throughout the day, with seasonal weather and when observed from different viewpoints. Currently, no single VPR technique excels in every environmental condition, each exhibiting unique benefits and shortcomings. As a result, VPR systems combining multiple techniques achieve more reliable VPR performance in changing environments, at the cost of higher computational loads. Addressing this shortcoming, we propose an adaptive VPR system dubbed Adaptive Multi-Self Identification and Correction (A-MuSIC). We start by developing a method to collect information of the runtime performance of a VPR technique by analysing the frame-to-frame continuity of matched queries. We then demonstrate how to operate the method on a static ensemble of techniques, generating data on which techniques are contributing the most for the current environment. A-MuSIC uses the collected information to both select a minimal subset of techniques and to decide when a re-selection is required during navigation. A-MuSIC matches or beats state-of-the-art VPR performance across all tested benchmark datasets while maintaining its computational load on par with individual techniques.

翻訳日:2023-03-28 21:06:22 公開日:2023-03-24

# 連続波状態における死亡時間と余パルスによる光計測

Photocounting measurements with dead time and afterpulses in the continuous-wave regime ( http://arxiv.org/abs/2303.14246v1 )

ライセンス: Link先を確認

A. A. Semenov, J. Samelin, Ch. Boldt, M. Sch\"unemann, C. Reiher, W. Vogel, and B. Hage

(参考訳) 連続波検出の広く用いられている実験手法は、所定の測定時間窓内のクリック型検出器からの光電流のパルスを数えることを想定している。このような手順で、検出器のデッドタイム中に各光電流パルス後に検出された光子を見逃す。さらに、各パルスは、実際の光子とは関連のないいわゆる後パルスを初期化する。対応する量子光計数式を導出し,その妥当性を実験的に検証する。光電流パルスの統計は、以前の測定時間ウィンドウのメモリ効果によって説明される量子状態に対して非線形であるように見える。一般に非線形で光子とパルスの統計を接続する表現は、異なる測定シナリオのために導出される。また,不平衡ホモダイン検出を用いた量子状態再構成への応用も検討した。

The widely used experimental technique of continuous-wave detection assumes counting pulses of photocurrent from a click-type detector inside a given measurement time window. With such a procedure we miss out the photons detected after each photocurrent pulse during the detector dead time. Additionally, each pulse may initialize so-called afterpulse, which is not associated with the real photons. We derive the corresponding quantum photocounting formula and experimentally verify its validity. Statistics of photocurrent pulses appears to be nonlinear with respect to quantum state, which is explained by the memory effect of the previous measurement time windows. Expressions -- in general, nonlinear -- connecting statistics of photons and pulses are derived for different measurement scenarios. We also consider an application of the obtained results to quantum state reconstruction with unbalanced homodyne detection.

翻訳日:2023-03-28 21:05:56 公開日:2023-03-24

# 暗黙的バランスと正則化:過パラメータ非対称行列センシングの一般化と収束保証

Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing ( http://arxiv.org/abs/2303.14244v1 )

ライセンス: Link先を確認

Mahdi Soltanolkotabi, Dominik St\"oger, Changzhi Xie

(参考訳) 近年,過パラメータ学習モデルの学習のための勾配型手法の収束特性と一般化特性の理解が著しく進展している。しかし、小さなランダム初期化の役割や、勾配に基づく更新においてモデルの様々なパラメータがどのように結合されるかといった多くの側面は、良い一般化を促進するためにほとんど謎のままである。最近の一連の論文は、いくつかの線形測定から低ランクPSD行列の再構成を含む対称正の半定値(PSD)行列の非凸定式化におけるこの役割について研究し始めている。基礎となる対称性/PSDnessは、この問題に対する既存の収束と一般化の保証に不可欠である。本稿では,非対称な長方形低ランク行列を線形測定から再構成したいという,一般的な過パラメータ化低ランク行列検出問題について検討する。偏微分勾配勾配からトレーニングした過パラメータ化モデルが、測定結果を生成する低ランク行列に収束することが証明された。この設定では,(1)勾配更新軌道を通して因子が様々な方法で結合される勾配勾配の軌道の結合と,(2)係数化モデルの過度な性質にもかかわらず,反復が低ランクモデルに対する正当性を示すアルゴリズム正則性という2つの暗黙的な性質が示される。これらの2つの暗黙的な性質は、小さなランダム初期化からの勾配降下軌道が、大域的に最適かつ一般化された解へと移動することを示す。

Recently, there has been significant progress in understanding the convergence and generalization properties of gradient-based methods for training overparameterized learning models. However, many aspects including the role of small random initialization and how the various parameters of the model are coupled during gradient-based updates to facilitate good generalization remain largely mysterious. A series of recent papers have begun to study this role for non-convex formulations of symmetric Positive Semi-Definite (PSD) matrix sensing problems which involve reconstructing a low-rank PSD matrix from a few linear measurements. The underlying symmetry/PSDness is crucial to existing convergence and generalization guarantees for this problem. In this paper, we study a general overparameterized low-rank matrix sensing problem where one wishes to reconstruct an asymmetric rectangular low-rank matrix from a few linear measurements. We prove that an overparameterized model trained via factorized gradient descent converges to the low-rank matrix generating the measurements. We show that in this setting, factorized gradient descent enjoys two implicit properties: (1) coupling of the trajectory of gradient descent where the factors are coupled in various ways throughout the gradient update trajectory and (2) an algorithmic regularization property where the iterates show a propensity towards low-rank models despite the overparameterized nature of the factorized model. These two implicit properties in turn allow us to show that the gradient descent trajectory from small random initialization moves towards solutions that are both globally optimal and generalize well.

翻訳日:2023-03-28 21:05:45 公開日:2023-03-24

# DyLiN:光電界ネットワークを動的にする

DyLiN: Making Light Field Networks Dynamic ( http://arxiv.org/abs/2303.14243v1 )

ライセンス: Link先を確認

Heng Yu, Joel Julin, Zoltan A. Milacski, Koichiro Niinuma, Laszlo A. Jeni

(参考訳) 光電場から配向線へのリフォーミュレーションである光電場ネットワークは、座標ネットワークよりも高速で、2次元観測から3次元構造を表現することができる。一般的なシーンの表現や操作には適しているが、ひとつの問題に悩まされている。本稿では,位相変化を含む非剛性変形を処理可能な動的光電界ネットワーク(dylin)法を提案する。入力光線から正準光線への変形場を学習し、それらを高次元空間に持ち上げて不連続を扱う。さらに,制御可能な属性入力でDyLiNを拡張するCoDyLiNを紹介する。我々は,事前学習した動的放射場から知識蒸留により両方のモデルを訓練する。種々の非剛性変形を含む合成および実世界のデータセットを用いてDyLiNを評価した。 DyLiNは、25倍から71倍の計算速度を保ちながら、視覚的忠実度の観点から、定性的かつ定量的に最先端の手法に適合した。また,属性付データに対してCoDyLiNを試験し,教師モデルを上回った。プロジェクトページ: https://dylin2023.github.io

Light Field Networks, the re-formulations of radiance fields to oriented rays, are magnitudes faster than their coordinate network counterparts, and provide higher fidelity with respect to representing 3D structures from 2D observations. They would be well suited for generic scene representation and manipulation, but suffer from one problem: they are limited to holistic and static scenes. In this paper, we propose the Dynamic Light Field Network (DyLiN) method that can handle non-rigid deformations, including topological changes. We learn a deformation field from input rays to canonical rays, and lift them into a higher dimensional space to handle discontinuities. We further introduce CoDyLiN, which augments DyLiN with controllable attribute inputs. We train both models via knowledge distillation from pretrained dynamic radiance fields. We evaluated DyLiN using both synthetic and real world datasets that include various non-rigid deformations. DyLiN qualitatively outperformed and quantitatively matched state-of-the-art methods in terms of visual fidelity, while being 25 - 71x computationally faster. We also tested CoDyLiN on attribute annotated data and it surpassed its teacher model. Project page: https://dylin2023.github.io .

翻訳日:2023-03-28 21:05:18 公開日:2023-03-24

# IDGI:統合勾配から説明ノイズを除去するフレームワーク

IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients ( http://arxiv.org/abs/2303.14242v1 )

ライセンス: Link先を確認

Ruo Yang, Binghui Wang, Mustafa Bilgic

(参考訳) 統合勾配(ig)とその変種は、ディープニューラルネットワークの決定を解釈するためのよく知られた技術である。 IGベースのアプローチは最先端のパフォーマンスを実現するが、しばしばノイズを説明相性マップに統合し、解釈可能性を低減する。ノイズを最小化するために, ノイズ源を解析的に検討し, 解析結果に基づいて説明ノイズを低減するための新しい手法を提案する。本稿では,Reimann 積分を積分勾配計算に用いる任意の IG 法に容易に組み込むことができる重要方向勾配積分(IDGI)フレームワークを提案する。 3つのIGベースの手法による大規模な実験により、IDGIは多数の解釈可能性指標を大幅に改善することが示された。

Integrated Gradients (IG) as well as its variants are well-known techniques for interpreting the decisions of deep neural networks. While IG-based approaches attain state-of-the-art performance, they often integrate noise into their explanation saliency maps, which reduce their interpretability. To minimize the noise, we examine the source of the noise analytically and propose a new approach to reduce the explanation noise based on our analytical findings. We propose the Important Direction Gradient Integration (IDGI) framework, which can be easily incorporated into any IG-based method that uses the Reimann Integration for integrated gradient computation. Extensive experiments with three IG-based methods show that IDGI improves them drastically on numerous interpretability metrics.

翻訳日:2023-03-28 21:05:00 公開日:2023-03-24

# ブロックチェーンネットワークにおけるコアベーストレンド検出

Core-based Trend Detection in Blockchain Networks ( http://arxiv.org/abs/2303.14241v1 )

ライセンス: Link先を確認

Jason Zhu, Arijit Khan, Cuneyt Gurcan Akcora

(参考訳) ブロックチェーンは今や貿易金融を著しく緩和しており、毎日数十億ドル相当の資産が取引されている。しかし、データのサイズと複雑さのため、これらのネットワークの分析は依然として困難である。 InnerCore”と呼ばれるスケーラブルなアプローチを導入し、ブロックチェーンベースのネットワークにおけるキーアクターを特定し、データ深度ベースのコア分解とセンタ付きモチーフディスカバリを使用して、ネットワークに対するインセンティブインジケータを提供する。インナーコア(innercore)は、大規模な時間グラフの解析に適した計算効率と教師なしのアプローチである。我々は,LunaTerraの最近の崩壊とEthereumのProof-of-Stake(PoS)スイッチのケーススタディを通じて,主要なブロックチェーン分析会社によって収集された外的根拠を用いて,その効果を実証する。ブロックチェーン分析の自動化とトレンド検出をスケーラブルな方法で行うことにより,InnerCoreは人間の関与なしに,適切な分析を正確に行うことができることを示す。

Blockchains are now significantly easing trade finance, with billions of dollars worth of assets being transacted daily. However, analyzing these networks remains challenging due to the large size and complexity of the data. We introduce a scalable approach called "InnerCore" for identifying key actors in blockchain-based networks and providing a sentiment indicator for the networks using data depth-based core decomposition and centered-motif discovery. InnerCore is a computationally efficient, unsupervised approach suitable for analyzing large temporal graphs. We demonstrate its effectiveness through case studies on the recent collapse of LunaTerra and the Proof-of-Stake (PoS) switch of Ethereum, using external ground truth collected by a leading blockchain analysis company. Our experiments show that InnerCore can match the qualified analysis accurately without human involvement, automating blockchain analysis and its trend detection in a scalable manner.

翻訳日:2023-03-28 21:04:49 公開日:2023-03-24

# 適応ベースクラス抑圧とワンショット物体検出のための事前誘導ネットワーク

Adaptive Base-class Suppression and Prior Guidance Network for One-Shot Object Detection ( http://arxiv.org/abs/2303.14240v1 )

ライセンス: Link先を確認

Wenwen Zhang, Xinyu Xiao, Hangguan Shan and Eryun Liu

(参考訳) one-shot object detection (osod)は、クエリイメージによって指定されたカテゴリに向かって、すべてのオブジェクトインスタンスを検出することを目的としている。 osodの既存の研究のほとんどは、効果的なクロスイメージ相関を探求し、意味的特徴の誤用を緩和しようと試みているが、モデルバイアスの基底クラスに対する現象や、新しいクラスにおける一般化の低下を無視している。そこで我々は,この問題を克服するために,BSPG(Base-class Suppression and Prior Guidance)ネットワークという新しいフレームワークを提案する。具体的には,ベースクラス予測器を用いて,ベースクラスのオブジェクトを明示的に検出し,ベースクラス抑制モジュールによって適応的に除去する。さらに、事前ガイダンスモジュールは、非パラメトリックな方法で高レベルの特徴の相関を計算し、クラスに依存しない事前マップを生成し、目的の特徴にリッチなセマンティックな手がかりを与え、その後の検出プロセスを導くように設計されている。提案した2つのモジュールが組み合わさったモデルに対して,対象オブジェクトを基本クラスに属するイントラクタと区別する強力な識別能力を与える。実験の結果,提案手法は従来手法よりも高い性能を示し,各種評価条件下での最先端性能を実現している。

One-shot object detection (OSOD) aims to detect all object instances towards the given category specified by a query image. Most existing studies in OSOD endeavor to explore effective cross-image correlation and alleviate the semantic feature misalignment, however, ignoring the phenomenon of the model bias towards the base classes and the generalization degradation on the novel classes. Observing this, we propose a novel framework, namely Base-class Suppression and Prior Guidance (BSPG) network to overcome the problem. Specifically, the objects of base categories can be explicitly detected by a base-class predictor and adaptively eliminated by our base-class suppression module. Moreover, a prior guidance module is designed to calculate the correlation of high-level features in a non-parametric manner, producing a class-agnostic prior map to provide the target features with rich semantic cues and guide the subsequent detection process. Equipped with the proposed two modules, we endow the model with a strong discriminative ability to distinguish the target objects from distractors belonging to the base classes. Extensive experiments show that our method outperforms the previous techniques by a large margin and achieves new state-of-the-art performance under various evaluation settings.

翻訳日:2023-03-28 21:04:33 公開日:2023-03-24

# H\"古い連続多変量関数の効率的リプシッツ大域最適化

Efficient Lipschitzian Global Optimization of H\"older Continuous Multivariate Functions ( http://arxiv.org/abs/2303.14293v1 )

ライセンス: Link先を確認

Kaan Gokcesu, Hakan Gokcesu

(参考訳) 本研究では,H\より古い連続な多変量関数に対して効率的な大域最適化手法を提案する。低バウンディングプロキシ関数を構成する従来の方法とは異なり、このアルゴリズムは、計算的に優れている所定のクエリ生成ルールを採用している。アルゴリズムのパフォーマンスは、平均的または累積的後悔を用いて評価され、これはまた、単純な後悔の限界を意味し、アプローチの全体的な効果を反映している。その結果、アルゴリズムは適切なパラメータで、与えられた時間的地平線内の$n$次元空間において、H\"older exponent $\alpha$でH\"older連続ターゲット関数を最適化するために、平均後悔境界の$O(T^{-\frac{\alpha}{n}})$に達することを示した。この境界がミニマックス最適であることを示す。

This study presents an effective global optimization technique designed for multivariate functions that are H\"older continuous. Unlike traditional methods that construct lower bounding proxy functions, this algorithm employs a predetermined query creation rule that makes it computationally superior. The algorithm's performance is assessed using the average or cumulative regret, which also implies a bound for the simple regret and reflects the overall effectiveness of the approach. The results show that with appropriate parameters the algorithm attains an average regret bound of $O(T^{-\frac{\alpha}{n}})$ for optimizing a H\"older continuous target function with H\"older exponent $\alpha$ in an $n$-dimensional space within a given time horizon $T$. We demonstrate that this bound is minimax optimal.

翻訳日:2023-03-28 20:58:18 公開日:2023-03-24

# 極長スケールにおけるガウス過程の応用:分子からブラックホールへ

Applications of Gaussian Processes at Extreme Lengthscales: From Molecules to Black Holes ( http://arxiv.org/abs/2303.14291v1 )

ライセンス: Link先を確認

Ryan-Rhys Griffiths

(参考訳) 観測科学と実験科学の多くの領域では、データが乏しい。高エネルギー天体物理学におけるデータの観測は、天体の閉塞と望遠鏡時間の制限によって妨害されるが、合成化学と材料科学の実験室実験から得られたデータは収集するのに時間と費用がかかる。一方で、実験装置の計測誤差など、データ生成機構に関する知識は科学においてしばしば利用可能である。小さなデータと基礎となる物理学の知識の両方が、ガウス過程(GP)をそのようなデータセットに適合させる理想的な候補にしている。 GPは、例えば分子や物質の仮想スクリーニングにおいて不確実性を考慮した予測を行うことができ、またブラックホール集積ディスクからの潜在放出シグネチャのような不完全なデータについて推論することもできる。さらに、GPは現在ベイズ最適化の作業モデルであり、科学的な発見キャンペーンにおける実験実験のガイドとなるための方法論である。この論文の最初の貢献は、セイファート銀河のマーカリアン335からの潜在放出のシグネチャを推論するためにgpモデルを使用し、拡張により、ブラックホール降着円盤の様々な理論モデルの適用可能性について推論することである。第2の貢献はGPフレームワークを分子および化学反応表現に拡張し、このフレームワークを科学者が使えるようにするためのオープンソースソフトウェアライブラリを提供することである。第3の貢献は、GPを利用して新規で高性能なフォトウィッチ分子を発見することである。第4の貢献は、アレエータ的不確かさをモデル化できるベイズ最適化スキームを導入し、大規模な製造プロセスに本質的ロバスト性を持つ材料組成の同定を容易にすることである。

In many areas of the observational and experimental sciences data is scarce. Data observation in high-energy astrophysics is disrupted by celestial occlusions and limited telescope time while data derived from laboratory experiments in synthetic chemistry and materials science is time and cost-intensive to collect. On the other hand, knowledge about the data-generation mechanism is often available in the sciences, such as the measurement error of a piece of laboratory apparatus. Both characteristics, small data and knowledge of the underlying physics, make Gaussian processes (GPs) ideal candidates for fitting such datasets. GPs can make predictions with consideration of uncertainty, for example in the virtual screening of molecules and materials, and can also make inferences about incomplete data such as the latent emission signature from a black hole accretion disc. Furthermore, GPs are currently the workhorse model for Bayesian optimisation, a methodology foreseen to be a guide for laboratory experiments in scientific discovery campaigns. The first contribution of this thesis is to use GP modelling to reason about the latent emission signature from the Seyfert galaxy Markarian 335, and by extension, to reason about the applicability of various theoretical models of black hole accretion discs. The second contribution is to extend the GP framework to molecular and chemical reaction representations and to provide an open-source software library to enable the framework to be used by scientists. The third contribution is to leverage GPs to discover novel and performant photoswitch molecules. The fourth contribution is to introduce a Bayesian optimisation scheme capable of modelling aleatoric uncertainty to facilitate the identification of material compositions that possess intrinsic robustness to large scale fabrication processes.

翻訳日:2023-03-28 20:58:05 公開日:2023-03-24

# 生活支援におけるニュース検索改善のための音声対話エージェントと知識グラフ

Voice-Based Conversational Agents and Knowledge Graphs for Improving News Search in Assisted Living ( http://arxiv.org/abs/2303.14286v1 )

ライセンス: Link先を確認

Phillip Schneider, Nils Rehtanz, Kristiina Jokinen and Florian Matthes

(参考訳) 高齢化、スタッフ不足、一般的な慢性疾患など、医療分野は大きな課題に直面しているため、個人への高品質なケアの提供は非常に困難になっている。会話エージェントは、これらの問題を緩和するための有望な技術であることが示されている。デジタルヘルスアシスタントの形では、高齢者や慢性疾患者の日常生活を改善する可能性を秘めている。これには例えば、薬のリマインダー、定期的なチェック、ソーシャルなチャットが含まれる。さらに、会話エージェントは、日々のニュースやローカルイベントに関する情報にアクセスするための基本的なニーズを満たすことができ、それによって個人は周囲の世界に情報を提供し、つながり続けることができる。しかし、特に技術的リテラシーや健康関連障害に乏しい人にとっては、関連するニュースソースの発見や、オンラインで利用可能な多くのニュース記事のナビゲートは圧倒的である。この課題に対処するために,生活支援におけるニュース検索のための知識グラフと会話エージェントを組み合わせた革新的なソリューションを提案する。グラフデータベースを利用してニュースデータを意味論的に構造化し、直感的な音声ベースのインターフェースを実装することで、ケア依存の人々が関連するニュース記事を簡単に発見し、パーソナライズされたレコメンデーションを提供することができる。設計上の選択を説明し、システムアーキテクチャを提供し、最初のユーザテストに関する洞察を共有し、今後の計画について展望を与えます。

As the healthcare sector is facing major challenges, such as aging populations, staff shortages, and common chronic diseases, delivering high-quality care to individuals has become very difficult. Conversational agents have shown to be a promising technology to alleviate some of these issues. In the form of digital health assistants, they have the potential to improve the everyday life of the elderly and chronically ill people. This includes, for example, medication reminders, routine checks, or social chit-chat. In addition, conversational agents can satisfy the fundamental need of having access to information about daily news or local events, which enables individuals to stay informed and connected with the world around them. However, finding relevant news sources and navigating the plethora of news articles available online can be overwhelming, particularly for those who may have limited technological literacy or health-related impairments. To address this challenge, we propose an innovative solution that combines knowledge graphs and conversational agents for news search in assisted living. By leveraging graph databases to semantically structure news data and implementing an intuitive voice-based interface, our system can help care-dependent people to easily discover relevant news articles and give personalized recommendations. We explain our design choices, provide a system architecture, share insights of an initial user test, and give an outlook on planned future work.

翻訳日:2023-03-28 20:57:36 公開日:2023-03-24

# ロジスティック回帰のための特徴空間スケッチ

Feature Space Sketching for Logistic Regression ( http://arxiv.org/abs/2303.14284v1 )

ライセンス: Link先を確認

Gregory Dexter, Rajiv Khanna, Jawad Raheel, and Petros Drineas

(参考訳) 本稿では,コアセット構成,特徴選択,ロジスティック回帰の次元性低減のための新しい境界を提案する。これら3つのアプローチはロジスティック回帰入力のスケッチと考えることができる。コアセット構築の最前線では,事前作業から開放的な問題を解消し,コアセット構築手法の複雑さに対する新たな境界を提案する。特徴選択と次元減少の面では、ロジスティック回帰のための前方誤差境界の研究を開始する。我々の境界は定数に密着しており、前方誤差境界は一般化線形モデルに拡張することができる。

We present novel bounds for coreset construction, feature selection, and dimensionality reduction for logistic regression. All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve open problems from prior work and present novel bounds for the complexity of coreset construction methods. On the feature selection and dimensionality reduction front, we initiate the study of forward error bounds for logistic regression. Our bounds are tight up to constant factors and our forward error bounds can be extended to Generalized Linear Models.

翻訳日:2023-03-28 20:57:12 公開日:2023-03-24

# 強化学習における可変選択のための逐次ノックオフ

Sequential Knockoffs for Variable Selection in Reinforcement Learning ( http://arxiv.org/abs/2303.14281v1 )

ライセンス: Link先を確認

Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber

(参考訳) 強化学習の現実世界の応用では、事前の知識なしではマルコフ特性を満たすような状態表現を得ることがしばしば困難である。したがって、連続した時間点上の測定を連結することで、必要以上に大きい状態を構築するのが一般的である。しかし、必然的に国家の次元を増大させると、学習が遅くなり、学習方針が難解になる。我々は、マルコフ決定過程(MDP)において、そのプロセスがMDPのままであり、元のプロセスと同じ最適なポリシーを共有する元の状態の最小のサブベクターとして、最小の十分状態の概念を導入する。本稿では,高次元複素非線形力学系における最小限の十分状態を推定する新しいシーケンシャルノックオフ(SEEK)アルゴリズムを提案する。大規模なサンプルでは, 提案手法は偽発見率を制御し, 確率が近づいた全ての変数を選択する。本手法は強化学習アルゴリズムの適用に非依存であるため,政策最適化などの下流タスクに有効である。実験的実験により理論的結果が検証され,提案手法が様々な選択精度と後悔の点で競合する手法よりも優れていることを示す。

In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same optimal policy as the original process. We propose a novel sequential knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method controls the false discovery rate, and selects all sufficient variables with probability approaching one. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy optimization. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods in terms of variable selection accuracy and regret.

翻訳日:2023-03-28 20:57:04 公開日:2023-03-24

# 感情的・社会的規範的特徴を用いたソーシャルメディア投稿の抑うつ検出

Depression detection in social media posts using affective and social norm features ( http://arxiv.org/abs/2303.14279v1 )

ライセンス: Link先を確認

Ilias Triantafyllopoulos, Georgios Paraskevopoulos, Alexandros Potamianos

(参考訳) ソーシャルメディア投稿からの抑うつ検出のための深いアーキテクチャを提案する。提案アーキテクチャはBERTに基づいてソーシャルメディア投稿から言語表現を抽出し、注意深い双方向GRUネットワークを用いてこれらの表現を組み合わせる。我々は,事前学習された感情分類器から抽出した特徴をテキスト表現の強化により感情情報を取り込む。心理学的文献に動機づけられた我々は,後期融合方式を用いて,ポストとワードの表現性と道徳性を建築に取り入れることを提案する。分析の結果,抑うつ検出にはモラルや偏見が重要であることが示された。我々は,pirinaデータセット上のreddit投稿に対する抑うつ検出モデルを適用し,さらに,reddit rsddデータセットで提案されている1ユーザあたりの複数の投稿に対して抑うつのあるユーザを検出する設定について検討する。提案された機能の追加は、それぞれ2.65%と6.73%のf1スコアの絶対的な改善という両方の設定で最先端の結果をもたらす。指標項:抑うつ検出、BERT、特徴融合、感情認識、憎悪、道徳

We propose a deep architecture for depression detection from social media posts. The proposed architecture builds upon BERT to extract language representations from social media posts and combines these representations using an attentive bidirectional GRU network. We incorporate affective information, by augmenting the text representations with features extracted from a pretrained emotion classifier. Motivated by psychological literature we propose to incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme. Our analysis indicates that morality and profanity can be important features for depression detection. We apply our model for depression detection on Reddit posts on the Pirina dataset, and further consider the setting of detecting depressed users, given multiple posts per user, proposed in the Reddit RSDD dataset. The inclusion of the proposed features yields state-of-the-art results in both settings, namely 2.65% and 6.73% absolute improvement in F1 score respectively. Index Terms: Depression detection, BERT, Feature fusion, Emotion recognition, profanity, morality

翻訳日:2023-03-28 20:56:42 公開日:2023-03-24

# プランニングモデルの適用によるオープンワールドでの運用の学習

Learning to Operate in Open Worlds by Adapting Planning Models ( http://arxiv.org/abs/2303.14272v1 )

ライセンス: Link先を確認

Wiktor Piotrowski and Roni Stern and Yoni Sher and Jacob Le and Matthew Klenk and Johan deKleer and Shiwali Mohan

(参考訳) プランニングエージェントは、ドメインモデルがもはや世界を正確に表現していない新しい状況で振る舞うことができない。オープンな世界で活動するエージェントに対して,新規性の存在を検知し,ドメインモデルやアクション選択を効果的に適用するアプローチを提案する。行動の実行を観察し、環境モデルによって期待されるものとの相違を計測し、ノベルティの存在を推測する。そして、モデル変更に対するヒューリスティックスガイダンスによる探索を通じてモデルを改訂する。標準強化学習(rl)ベンチマークであるcartopole問題に関する経験的評価を報告する。その結果,本手法は極めて迅速かつ解釈可能な方法で新規性に対処できることがわかった。

Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.

翻訳日:2023-03-28 20:56:23 公開日:2023-03-24

# マニフォールド上のカーネル回帰の不変性から得られる特異なサンプル複雑度

The Exact Sample Complexity Gain from Invariances for Kernel Regression on Manifolds ( http://arxiv.org/abs/2303.14269v1 )

ライセンス: Link先を確認

Behrooz Tahmasebi, Stefanie Jegelka

(参考訳) 実際、モデルへの不変性のエンコーディングは、サンプルの複雑さに役立つ。本研究では,不変性がサンプル複雑性をいかに改善するかに関する理論的結果を整理し,一般化する。特に、任意の多様体上の任意の群作用に不変な対象関数を持つ任意の多様体上のカーネルリッジ回帰に対するミニマックス最適レートを提供する。我々の結果は(ほとんど)任意の群作用、あるいは正次元の群に対して成り立つ。有限群の場合、利得は群の大きさによってサンプルの「有効」数を増加させる。正の次元の群について、ゲインは商空間の体積に比例する因子に加えて多様体の次元の減少によって観測される。我々の証明は、不変多項式を使用するより一般的な戦略とは対照的に、微分幾何学の観点を取る。したがって、不変性を持つ学習に関するこの新しい幾何学的視点は独立した関心を持つかもしれない。

In practice, encoding invariances into models helps sample complexity. In this work, we tighten and generalize theoretical results on how invariances improve sample complexity. In particular, we provide minimax optimal rates for kernel ridge regression on any manifold, with a target function that is invariant to an arbitrary group action on the manifold. Our results hold for (almost) any group action, even groups of positive dimension. For a finite group, the gain increases the "effective" number of samples by the group size. For groups of positive dimension, the gain is observed by a reduction in the manifold's dimension, in addition to a factor proportional to the volume of the quotient space. Our proof takes the viewpoint of differential geometry, in contrast to the more common strategy of using invariant polynomials. Hence, this new geometric viewpoint on learning with invariances may be of independent interest.

翻訳日:2023-03-28 20:56:12 公開日:2023-03-24

# マルチモーダル受動センシングによるデータ駆動型ストレスモニタリングのための自己教師付きフレームワーク

A Self-supervised Framework for Improved Data-Driven Monitoring of Stress via Multi-modal Passive Sensing ( http://arxiv.org/abs/2303.14267v1 )

ライセンス: Link先を確認

Shayan Fazeli, Lionel Levine, Mehrab Beikzadeh, Baharan Mirzasoleiman, Bita Zadeh, Tara Peris, Majid Sarrafzadeh

(参考訳) 近年の遠隔医療モニタリングの進歩は, 患者の生活の質向上に重要な役割を担っている。しかしながら、生理学的な健康に焦点を当てたソリューションは成功と成熟度の向上を実証しているが、ストレスや不安障害が日常生活で人々が扱う最も一般的な問題であるにもかかわらず、メンタルヘルスに焦点を当てたアプリケーションは、比較的限られた成功を収めている。メンタルヘルスの指標を測定するためのより堅牢な分析フレームワークの開発を通じて、この領域のさらなる進展を期待するために、ストレス応答の生理的前駆体を追跡するための多モード半教師付きフレームワークを提案する。本手法は,ウェアラブル端末と異なる領域と解像度のマルチモーダルデータを利用して,短時間のエピソードを意味的に効率的な埋め込みにマッピングする。さらに、モジュラーとスケーラブルの両方でフレームワークをレンダリングする利点があるため、モダリティ間の対照的な目的も活用しています。階層構造による埋め込みの局所的側面とグローバルな側面の最適化に注力することで、知識の伝達と他のデバイスとの互換性の達成が容易になります。私たちのパイプラインでは、各モードのインスタンスレベルでの寄与を推定するアテンションメカニズムに基づくタスク固有のプーリングが、観測のための最終埋め込みを計算する。これはまた、データ特性に関する詳細な診断の洞察を提供し、メンタルヘルスステータス毎に注釈付けされたエピソードを予測するというより広い視点における信号の重要性を強調します。本研究は,実世界のデータを用いて,知覚的ストレスに対する学習実験を行い,提案手法の有効性を実証した。

Recent advances in remote health monitoring systems have significantly benefited patients and played a crucial role in improving their quality of life. However, while physiological health-focused solutions have demonstrated increasing success and maturity, mental health-focused applications have seen comparatively limited success in spite of the fact that stress and anxiety disorders are among the most common issues people deal with in their daily lives. In the hopes of furthering progress in this domain through the development of a more robust analytic framework for the measurement of indicators of mental health, we propose a multi-modal semi-supervised framework for tracking physiological precursors of the stress response. Our methodology enables utilizing multi-modal data of differing domains and resolutions from wearable devices and leveraging them to map short-term episodes to semantically efficient embeddings for a given task. Additionally, we leverage an inter-modality contrastive objective, with the advantages of rendering our framework both modular and scalable. The focus on optimizing both local and global aspects of our embeddings via a hierarchical structure renders transferring knowledge and compatibility with other devices easier to achieve. In our pipeline, a task-specific pooling based on an attention mechanism, which estimates the contribution of each modality on an instance level, computes the final embeddings for observations. This additionally provides a thorough diagnostic insight into the data characteristics and highlights the importance of signals in the broader view of predicting episodes annotated per mental health status. We perform training experiments using a corpus of real-world data on perceived stress, and our results demonstrate the efficacy of the proposed approach in performance improvements.

翻訳日:2023-03-28 20:56:00 公開日:2023-03-24

# クラスター型動的環境のための安全・サンプル効率強化学習

Safe and Sample-efficient Reinforcement Learning for Clustered Dynamic Environments ( http://arxiv.org/abs/2303.14265v1 )

ライセンス: Link先を確認

Hongyi Chen and Changliu Liu

(参考訳) 本研究では,RLアルゴリズムの開発において,安全性の制約を満たすこと,限られたサンプルで効率的に学習することの2つの大きな課題に対処する,安全かつサンプル効率のよい強化学習(RL)フレームワークを提案する。実世界の複雑な環境での安全性を確保するため,安全設定アルゴリズム(SSA)を用いて名目制御の監視と修正を行い,既存のRLアルゴリズムでは解決が難しいクラスタリングされた動的環境におけるSSA+RLの評価を行う。しかしながら、SSA+RLフレームワークは通常、特に報酬分散環境ではサンプリング効率が良くない。学習効率を向上させるために,(1)SSAを適応させることで過度に保守的な行動を避けること,(2)安全制約付きランダムネットワーク蒸留による安全な探索を促進すること,(3)SSAを専門家による実証として扱うことで政策収束を改善し,そこから直接学習すること,の3つの手法を提案する。実験の結果,我々のフレームワークは,トレーニング中の他の安全なrl手法と比較し,より少ないエピソードで課題を解決できることがわかった。プロジェクトwebサイト: https://hychen-naza.github.io/projects/safe_rl/

This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges in developing applicable RL algorithms: satisfying safety constraints and efficiently learning with limited samples. To guarantee safety in real-world complex environments, we use the safe set algorithm (SSA) to monitor and modify the nominal controls, and evaluate SSA+RL in a clustered dynamic environment which is challenging to be solved by existing RL algorithms. However, the SSA+RL framework is usually not sample-efficient especially in reward-sparse environments, which has not been addressed in previous safe RL works. To improve the learning efficiency, we propose three techniques: (1) avoiding behaving overly conservative by adapting the SSA; (2) encouraging safe exploration using random network distillation with safety constraints; (3) improving policy convergence by treating SSA as expert demonstrations and directly learn from that. The experimental results show that our framework can achieve better safety performance compare to other safe RL methods during training and solve the task with substantially fewer episodes. Project website: https://hychen-naza.github.io/projects/Safe_RL/.

翻訳日:2023-03-28 20:55:33 公開日:2023-03-24

# VILA:Vision-Language Pretrainingによるユーザコメントからイメージ美学を学ぶ

VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining ( http://arxiv.org/abs/2303.14302v1 )

ライセンス: Link先を確認

Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, Feng Yang

(参考訳) 画像の審美性を評価することは、構成、色、スタイル、高レベルの意味論など、複数の要因に影響されるため、難しい。既存の画像美的評価法(IAA)は、人間が知覚する視覚的美的情報を過度に単純化する人間のラベル付き評価スコアに依存している。逆に、ユーザーコメントはより包括的な情報を提供し、画像美学に関する人間の意見や好みを表現する自然な方法である。そこで本研究では,ユーザのコメントからイメージ美学を学ぶこと,マルチモーダル美学表現を学習するための視覚言語事前学習法を提案する。具体的には、コントラスト的および生成的目的を用いて画像テキストエンコーダ-デコーダモデルを事前訓練し、人間のラベルなしでリッチで汎用的な美的意味学を学習する。下流のiaaタスクに事前学習したモデルを効率的に適応させるために,テキストをアンカーとして使用する軽量なランクベースアダプタを提案する。以上の結果から,AVA-Captionsデータセットによる画像の美的字幕化は従来よりも優れており,ゼロショットスタイル分類やゼロショットIAAなどの美的タスクには強力なゼロショット機能を備えており,多くの教師付きベースラインを超えていることがわかった。提案するアダプタモジュールを用いた最小限の微調整パラメータのみを用いて,AVAデータセット上での最先端IAA性能を実現する。

Assessing the aesthetics of an image is challenging, as it is influenced by multiple factors including composition, color, style, and high-level semantics. Existing image aesthetic assessment (IAA) methods primarily rely on human-labeled rating scores, which oversimplify the visual aesthetic information that humans perceive. Conversely, user comments offer more comprehensive information and are a more natural way to express human opinions and preferences regarding image aesthetics. In light of this, we propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels. To efficiently adapt the pretrained model for downstream IAA tasks, we further propose a lightweight rank-based adapter that employs text as an anchor to learn the aesthetic ranking concept. Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset, and it has powerful zero-shot capability for aesthetic tasks such as zero-shot style classification and zero-shot IAA, surpassing many supervised baselines. With only minimal finetuning parameters using the proposed adapter module, our model achieves state-of-the-art IAA performance over the AVA dataset.

翻訳日:2023-03-28 20:46:38 公開日:2023-03-24

# repliclust: クラスター分析のための合成データ

repliclust: Synthetic Data for Cluster Analysis ( http://arxiv.org/abs/2303.14301v1 )

ライセンス: Link先を確認

Michael J. Zellinger and Peter B\"uhlmann

(参考訳) repliclust(repli-cateおよびclust-erより)は、クラスタで合成データセットを生成するpythonパッケージである。本手法は,ユーザが所望の幾何学的特徴を持つ多数の異なるデータセットを作成できる,高レベルな幾何学的記述であるデータセットアーチタイプに基づく。ソフトウェアアーキテクチャはモジュールでオブジェクト指向であり、クラスタセンターを設置し、クラスタの形状をサンプリングし、クラスタごとにデータポイント数を選択し、クラスタに確率分布を割り当てるアルゴリズムにデータ生成を分解する。プロジェクトのWebページ、reliclust.orgは簡潔なユーザーガイドと詳細なドキュメントを提供する。

We present repliclust (from repli-cate and clust-er), a Python package for generating synthetic data sets with clusters. Our approach is based on data set archetypes, high-level geometric descriptions from which the user can create many different data sets, each possessing the desired geometric characteristics. The architecture of our software is modular and object-oriented, decomposing data generation into algorithms for placing cluster centers, sampling cluster shapes, selecting the number of data points for each cluster, and assigning probability distributions to clusters. The project webpage, repliclust.org, provides a concise user guide and thorough documentation.

翻訳日:2023-03-28 20:46:11 公開日:2023-03-24

# agilegan3d: 拡張トランスファー学習による3dポートレートスタイライゼーション

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning ( http://arxiv.org/abs/2303.14297v1 )

ライセンス: Link先を確認

Guoxian Song and Hongyi Xu and Jing Liu and Tiancheng Zhi and Yichun Shi and Jianfeng Zhang and Zihang Jiang and Jiashi Feng and Shen Sang and Linjie Luo

(参考訳) 自動2Dポートレートのスタイリングは大幅に進歩しているが、単一のユーザー写真から3Dポートレートのスタイリングは未解決の課題だ。ここでの大きな障害は、高品質な3Dトレーニングデータがないことだ。本稿では,3d芸術的およびパーソナライズされたポートレートを詳細な形状で生成できる,新しい枠組みである \emph{agilegan3d} を提案する。新しいスタイリゼーションは、わずか (約20) の未完成の2D例で得られる。まず、既存の2Dスタイル化機能である「emph{style pre creation}」を活用して、大量の拡張された2Dスタイルの例を生成する。これらの拡張された例は、正確なカメラポーズラベルと、下流の3Dスタイリングタスクにとって重要なペアリングされた実顔画像で生成される。近年の 3D 対応 GAN モデルの発展により,事前学習した 3D GAN ジェネレータ上で \emph{guided transfer learning} を実行し,マルチビュー一貫性のスタイリングレンダリングを生成する。被験者の身元をよく保持できる3D GANインバージョンを実現するために,エンコーダのトレーニングに 'emph{multi-view consistency loss' を組み込む。われわれのパイプラインは、ユーザー写真を多様な3dアートポートレートに変換する強力な能力を示している。本手法の優れた性能を示すために,質的結果と定量的評価を行った。コードとトレーニング済みのモデルは、再現目的でリリースされる予定だ。

While substantial progresses have been made in automated 2D portrait stylization, admirable 3D portrait stylization from a single user photo remains to be an unresolved challenge. One primary obstacle here is the lack of high quality stylized 3D training data. In this paper, we propose a novel framework \emph{AgileGAN3D} that can produce 3D artistically appealing and personalized portraits with detailed geometry. New stylization can be obtained with just a few (around 20) unpaired 2D exemplars. We achieve this by first leveraging existing 2D stylization capabilities, \emph{style prior creation}, to produce a large amount of augmented 2D style exemplars. These augmented exemplars are generated with accurate camera pose labels, as well as paired real face images, which prove to be critical for the downstream 3D stylization task. Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings. In order to achieve 3D GAN inversion that can preserve subject's identity well, we incorporate \emph{multi-view consistency loss} in the training of our encoder. Our pipeline demonstrates strong capability in turning user photos into a diverse range of 3D artistic portraits. Both qualitative results and quantitative evaluations have been conducted to show the superior performance of our method. Code and pretrained models will be released for reproduction purpose.

翻訳日:2023-03-28 20:46:00 公開日:2023-03-24

# グローバル感度解析と機械学習説明可能性のための導出型シェープリー値

Derivative-based Shapley value for global sensitivity analysis and machine learning explainability ( http://arxiv.org/abs/2303.15183v1 )

ライセンス: Link先を確認

Hui Duan and Giray \"Okten

(参考訳) 我々は、グローバル感度分析と機械学習説明可能性のための新しいShapley値アプローチを導入する。この方法は基礎関数の1階部分微分に基づいている。この方法の計算複雑性は、文献における他のシェープリー値アプローチの指数複雑性とは対照的に、次元(特徴数)において線型である。グローバルな感度分析や機械学習の例を用いて、この手法をアクティビティスコア、SHAP、KernelSHAPと数値的に比較する。

We introduce a new Shapley value approach for global sensitivity analysis and machine learning explainability. The method is based on the first-order partial derivatives of the underlying function. The computational complexity of the method is linear in dimension (number of features), as opposed to the exponential complexity of other Shapley value approaches in the literature. Examples from global sensitivity analysis and machine learning are used to compare the method numerically with activity scores, SHAP, and KernelSHAP.

翻訳日:2023-03-28 15:24:09 公開日:2023-03-24

# グラフ自動コントラスト学習のハイブリッド化

Hybrid Augmented Automated Graph Contrastive Learning ( http://arxiv.org/abs/2303.15182v1 )

ライセンス: Link先を確認

Yifu Chen and Qianqian Ren and Liu Yong

(参考訳) グラフコントラスト学習にはグラフ拡張が不可欠である。既存の作業の多くは、事前に定義されたランダム拡張を使用しており、通常は異なる入力グラフに適応できず、異なるノードとエッジがグラフセマンティクスに与える影響を考慮できない。この問題に対処するため,Hybrid Augmented Automated Graph Contrastive Learning (HAGCL) というフレームワークを提案する。 HAGCLは機能レベルの学習可能なビュージェネレータとエッジレベルの学習可能なビュージェネレータで構成される。ビュージェネレータは、入力グラフに条件付きビューの確率分布を学習するために、エンドツーエンドで微分可能である。特徴とトポロジーの観点で、最も意味的に意味のある構造を学ぶことを保証します。さらに,下流作業におけるラベル情報の弱さや追加作業の広範な評価を伴わずに,従来の作業よりも優れた結果を得られるような共同学習戦略を提案する。

Graph augmentations are essential for graph contrastive learning. Most existing works use pre-defined random augmentations, which are usually unable to adapt to different input graphs and fail to consider the impact of different nodes and edges on graph semantics. To address this issue, we propose a framework called Hybrid Augmented Automated Graph Contrastive Learning (HAGCL). HAGCL consists of a feature-level learnable view generator and an edge-level learnable view generator. The view generators are end-to-end differentiable to learn the probability distribution of views conditioned on the input graph. It insures to learn the most semantically meaningful structure in terms of features and topology, respectively. Furthermore, we propose an improved joint training strategy, which can achieve better results than previous works without resorting to any weak label information in the downstream tasks and extensive evaluation of additional work.

翻訳日:2023-03-28 15:24:01 公開日:2023-03-24

# ISS++:テキストガイドによる3D形状生成のためのステッピングストーンとしてのイメージ

ISS++: Image as Stepping Stone for Text-Guided 3D Shape Generation ( http://arxiv.org/abs/2303.15181v1 )

ライセンス: Link先を確認

Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu

(参考訳) 本稿では,2つのテキストと3Dデータを必要としない3次元形状を生成するために,画像をステップストーンとして利用する新しい3次元形状生成手法(ISS++)を提案する。提案手法のコアとなるのは,CLIP 画像の特徴を SVR モデルの詳細な3次元形状空間にマッピングし,CLIP のテキスト特徴を描画画像と入力テキスト間のCLIP 一貫性を奨励することで,CLIP のテキスト特徴を3次元形状空間にマッピングする,事前訓練された単一ビュー再構成(SVR)モデルを活用する2段階の機能空間アライメント戦略である。さらに,svrモデルの生成能力を超えて,新たな構造やテクスチャで出力形状を向上できるテキスト誘導型3d形状スタイライゼーションモジュールも設計する。さらに,事前学習したテキストから画像への拡散モデルを用いて,生成的多様性,忠実度,スタイライゼーション能力を高める。我々のアプローチは汎用的で柔軟でスケーラブルであり、様々なSVRモデルと容易に統合して生成空間を拡大し、生成精度を向上させることができる。広範な実験結果から,本手法は,生成的品質と入力テキストとの一貫性の観点から,最先端手法よりも優れていることが示された。コードとモデルはhttps://github.com/liuzhengzhe/ISS- Image-as-Stepping-Stone-for-Text-Guided-3D-Shape-Generationで公開されている。

In this paper, we present a new text-guided 3D shape generation approach (ISS++) that uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP features to shapes: to begin with, map the CLIP image feature to the detail-rich 3D shape space of the SVR model, then map the CLIP text feature to the 3D shape space through encouraging the CLIP-consistency between rendered images and the input text. Besides, to extend beyond the generative capability of the SVR model, we design a text-guided 3D shape stylization module that can enhance the output shapes with novel structures and textures. Further, we exploit pre-trained text-to-image diffusion models to enhance the generative diversity, fidelity, and stylization capability. Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity. Extensive experimental results demonstrate that our approach outperforms the state-of-the-art methods in terms of generative quality and consistency with the input text. Codes and models are released at https://github.com/liuzhengzhe/ISS-Image-as-Stepping-Stone-for-Text-Guided-3D-Shape-Generation.

翻訳日:2023-03-28 15:23:46 公開日:2023-03-24

# ブートストラップ強化学習による河川のロバストパス追従

Robust Path Following on Rivers Using Bootstrapped Reinforcement Learning ( http://arxiv.org/abs/2303.15178v1 )

ライセンス: Link先を確認

Niklas Paulig, Ostap Ohkrin

(参考訳) 本稿では,内陸海域における自律型表面容器(ASV)の航行制御のための深層強化学習(DRL)エージェントを開発した。水路の幾何学による空間的制限と、高流動速度や浅瀬のような結果として生じる課題は、ANVの制御と正確な移動を必要とする。最先端のブートストラップq-learningアルゴリズムと多用途なトレーニング環境ジェネレータを組み合わせることで、堅牢で正確なラダーコントローラが実現される。提案手法の経路追従性能を,下流ライン川と中部ライン川からの実世界の河川データに対して比較したところ,DRLアルゴリズムは航法精度を高く保ちながら,見つからないシナリオでも効果的に一般化可能であることが示唆された。

This paper develops a Deep Reinforcement Learning (DRL)-agent for navigation and control of autonomous surface vessels (ASV) on inland waterways. Spatial restrictions due to waterway geometry and the resulting challenges, such as high flow velocities or shallow banks, require controlled and precise movement of the ASV. A state-of-the-art bootstrapped Q-learning algorithm in combination with a versatile training environment generator leads to a robust and accurate rudder controller. To validate our results, we compare the path-following capabilities of the proposed approach to a vessel-specific PID controller on real-world river data from the lower- and middle Rhine, indicating that the DRL algorithm could effectively prove generalizability even in never-seen scenarios while simultaneously attaining high navigational accuracy.

翻訳日:2023-03-28 15:23:02 公開日:2023-03-24

# 正面視のためのNeRFおよびニューラルビュー合成法の知覚的品質評価

Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views ( http://arxiv.org/abs/2303.15206v1 )

ライセンス: Link先を確認

Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli

(参考訳) ニューラルビュー合成(neural view synthesis, nvs)は、自由視点映像を合成する最も成功した手法の1つであり、撮像された画像の集合から高い忠実度を達成することができる。この成功は、PSNR、SSIM、LPIPSといった画像品質の指標を用いて、テストビューのセットで評価される、多くのバリエーションを生み出した。 nvsの手法がビデオ品質に対してどのように機能するかについては、研究が不足している。本研究は,NVSおよびNeRFの知覚的評価に関する最初の研究である。本研究では,制御された実験室環境で撮影されたシーンの2つのデータセットと,室内のシーンを収集した。既存のデータセットとは対照的に、これらのシーンには参照ビデオシーケンスがあり、静的画像のみを見る際に容易に見過ごされる時間的アーティファクトや微妙な歪みをテストできます。我々は,NVS法によって合成された映像の品質をよく制御された知覚品質評価実験で測定した。本稿では,nvs評価のためのデータセットとメトリック選択の結果と推奨結果の詳細な分析を行う。

Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.

翻訳日:2023-03-28 15:14:00 公開日:2023-03-24

# アウトカム駆動サブグループに向けて:6つのうつ病治療研究にわたる機械学習分析

Towards Outcome-Driven Patient Subgroups: A Machine Learning Analysis Across Six Depression Treatment Studies ( http://arxiv.org/abs/2303.15202v1 )

ライセンス: Link先を確認

David Benrimoh, Akiva Kleinerman, Toshi A. Furukawa, Charles F. Reynolds III, Eric Lenze, Jordan Karp, Benoit Mulsant, Caitrin Armstrong, Joseph Mehltretter, Robert Fratila, Kelly Perlman, Sonia Israel, Christina Popescu, Grace Golden, Sabrina Qassim, Alexandra Anacleto, Adam Kapelner, Ariel Rosenfeld, Gustavo Turecki

(参考訳) 主要なうつ病性障害(mdd)は不均一な疾患であり、複数の基礎となる神経生物学的基質が治療反応の変動と関連している可能性がある。この可変性と予測結果の源泉を理解することは明白である。機械学習はmddで治療反応を予測することが期待されているが、機械学習モデルの臨床的解釈性の欠如が制限されている。うつ病に対する薬理学的治療(total n = 5438)の6つの臨床試験から,治療関連患者クラスターの導出に使用可能なニューラルネットワークモデルである差分原型ニューラルネットワーク(DPNN)を用いて,差分処理応答の確率を学習しながら分析した。臨床および人口統計データを用いて, 寛解・個別寛解確率を分類し, 5本の単眼および3種類の組み合わせ治療を訓練した。モデルの妥当性と臨床的有用性は,AUC (Area under the curve) とモデル誘導治療による試料送還率の改善に基づいて測定した。ポストホック分析は、トレーニング中に学んだ患者プロトタイプに基づいてクラスター(サブグループ)を得た。特徴分布と治療特異的な結果の違いを評価することにより, 解釈可能性の評価を行った。 3-プロトタイプモデルではAUCは0.66であり、標本再送率に比べて絶対的な人口再送率の向上が期待された。臨床的に解釈可能な3つの治療関連患者クラスターを同定した。機械学習モデルを用いて新しい治療関連患者のプロファイルを作成することが可能であり、うつ病の精密医療を改善することができる。注:このモデルは、現在、アクティブな臨床試験の対象ではなく、臨床用途を意図していない。

Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models. We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data. Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions and treatment-specific outcomes. A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate compared to the sample remission rate. We identified three treatment-relevant patient clusters which were clinically interpretable. It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use.

翻訳日:2023-03-28 15:13:43 公開日:2023-03-24

# 変圧器の深部特性探索による画像劣化

Image Deblurring by Exploring In-depth Properties of Transformer ( http://arxiv.org/abs/2303.15198v1 )

ライセンス: Link先を確認

Pengwei Liang, Junjun Jiang, Xianming Liu, Jiayi Ma

(参考訳) 画像デブラリングは生成モデルの開発によって印象的な性能を保ち続けている。それでも、回復した画像の知覚的品質と定量的スコアを同時に向上させたい場合、いまだに不快な問題が残っている。本研究では, 変圧器特性の研究から着想を得て, 予め学習した変圧器を導入し, この問題に対処する。特に,事前訓練された視覚トランスフォーマ(vit)から抽出された深部特徴を活用して,定量的測定で測定した性能を犠牲にすることなく,復元画像のシャープ化を奨励する。事前学習した変換器は画像のグローバルなトポロジカルな関係(すなわち自己相似性)を捉えることができ、鮮明な画像に関する捕獲されたトポロジカルな関係は、ぼかしが発生すると変化する。復元画像と目標画像とのトランスフォーマー特性を比較することにより、予め訓練されたトランスフォーマーは高分解能のぼやけ感のある意味情報を提供する。優位性に基づいて、画像の劣化をガイドする2種類の新しい知覚的損失を提示する。特徴をベクトルとみなし、抽出された画像から抽出された表現とユークリッド空間における対象表現との差を計算する。他の型は、画像から抽出した特徴を分布とみなし、回収した画像と対象画像との分布差を比較する。そこで本研究では,uformer,restormer,nafnetなど,最も競争の激しいモデルに対する定量的スコア(psnr)を犠牲にすることなく,知覚品質向上におけるトランスフォーマ特性の有効性を実証する。

Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we introduce the pretrained transformers to address this problem. In particular, we leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. The pretrained transformer can capture the global topological relations (i.e., self-similarity) of image, and we observe that the captured topological relations about the sharp image will change when blur occurs. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information, which is critical in measuring the sharpness of the deblurred image. On the basis of the advantages, we present two types of novel perceptual losses to guide image deblurring. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space. The other type considers the features extracted from an image as a distribution and compares the distribution discrepancy between recovered image and target one. We demonstrate the effectiveness of transformer properties in improving the perceptual quality while not sacrificing the quantitative scores (PSNR) over the most competitive models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and motion deblurring tasks.

翻訳日:2023-03-28 15:12:54 公開日:2023-03-24

# プロンプトチューニングに基づく視覚言語モデル適応用アダプタ

Prompt Tuning based Adapter for Vision-Language Model Adaption ( http://arxiv.org/abs/2303.15234v1 )

ライセンス: Link先を確認

Jingchen Sun, Jiayu Qin, Zihao Lin, Changyou Chen

(参考訳) 大規模な事前学習型視覚言語(VL)モデルは、様々な下流タスクに適応する上で大きな可能性を示している。しかし,モデルパラメータの多さからネットワーク全体の微調整は困難である。この問題に対処するため,プロンプトチューニングなどの効率的な適応手法が提案されている。我々は,マルチタスク事前学習した初期化によるプロンプトチューニングのアイデアを探求し,モデル性能を著しく向上できることを示す。そこで本研究では,事前学習されたプロンプトチューニングと効率的な適応ネットワークを組み合わせた新しいモデルであるprompt-adapterを提案する。特に1ショット、2ショット、4ショット、8ショット画像といった限られたデータインスタンスの設定では、我々のアプローチは、パブリックな11データセットで数ショットのイメージ分類で最先端の手法を破りました。提案手法は,高速な視覚言語モデル適応のために,プロンプトチューニングとパラメータ効率のよいネットワークを組み合わせることを実証する。コードは、https://github.com/Jingchensun/prompt_adapter.comで公開されている。

Large pre-trained vision-language (VL) models have shown significant promise in adapting to various downstream tasks. However, fine-tuning the entire network is challenging due to the massive number of model parameters. To address this issue, efficient adaptation methods such as prompt tuning have been proposed. We explore the idea of prompt tuning with multi-task pre-trained initialization and find it can significantly improve model performance. Based on our findings, we introduce a new model, termed Prompt-Adapter, that combines pre-trained prompt tunning with an efficient adaptation network. Our approach beat the state-of-the-art methods in few-shot image classification on the public 11 datasets, especially in settings with limited data instances such as 1 shot, 2 shots, 4 shots, and 8 shots images. Our proposed method demonstrates the promise of combining prompt tuning and parameter-efficient networks for efficient vision-language model adaptation. The code is publicly available at: https://github.com/Jingchensun/prompt_adapter.

翻訳日:2023-03-28 14:54:15 公開日:2023-03-24

# Pitchclass2vec: コード埋め込みによるシンボリック音楽構造セグメンテーション

Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings ( http://arxiv.org/abs/2303.15306v1 )

ライセンス: Link先を確認

Nicolas Lazzari, Andrea Poltronieri, Valentina Presutti

(参考訳) 構造知覚は人間の音楽認知の基本的な側面である。歴史的に、音楽の階層構造は、意味を伝達し、期待を作り、リスナーの感情を喚起するための物語装置として機能した。これにより、作曲者が自分の考えを整理する音楽的談話を形成するため、音楽構造は作曲において重要な役割を担っている。本稿では,自然言語処理技術とカスタムメイド符号化技術の両方を用いて,連続ベクトル表現に埋め込まれた記号コードアノテーションに基づく新しい楽曲セグメンテーション手法である2vecを提案する。提案アルゴリズムは,Long-Short term memory(LSTM)ニューラルネットワークをベースとして,現場における記号コードアノテーションに基づく最先端技術より優れている。

Structure perception is a fundamental aspect of music cognition in humans. Historically, the hierarchical organization of music into structures served as a narrative device for conveying meaning, creating expectancy, and evoking emotions in the listener. Thereby, musical structures play an essential role in music composition, as they shape the musical discourse through which the composer organises his ideas. In this paper, we present a novel music segmentation method, pitchclass2vec, based on symbolic chord annotations, which are embedded into continuous vector representations using both natural language processing techniques and custom-made encodings. Our algorithm is based on long-short term memory (LSTM) neural network and outperforms the state-of-the-art techniques based on symbolic chord annotations in the field.

翻訳日:2023-03-28 14:35:41 公開日:2023-03-24

# PeakNet: U-Netを用いたX線結晶学実験におけるブラッグピーク発見

PeakNet: Bragg peak finding in X-ray crystallography experiments with U-Net ( http://arxiv.org/abs/2303.15301v1 )

ライセンス: Link先を確認

Cong Wang, Po-Nan Li, Jana Thayer and Chun Hong Yoon

(参考訳) X線自由電子レーザー(XFEL)のシリアル結晶学は、近年、高いデータ速度を達成するために著しく進歩している。この開発は、対数的時間スケールでの分子イベントのイメージングなど、新しい科学的研究を可能にする可能性があるが、ディスク上の科学に関連する特徴や画像だけを保存するために、ある程度のデータ削減を伴うリアルタイムデータ分析に関する課題も生んでいる。データ削減が効果的でない場合、施設の予算要件が大幅に増加するか、あるいはデータ分析を不安定にする超高繰り返しイメージング技術の利用を妨げる可能性がある。さらに、リアルタイムデータ分析からユーザーへリアルタイムフィードバックを提供するという課題もある。連続結晶学の文脈では、リアルタイムデータ解析における初期および臨界ステップは、回折画像からx線ブラッグピークを見つけることである。この課題に対処するために、ニューラルネットワークを活用し、Psocakeのピークファインダの約4倍の速度で実行されるBraggのピークファインダであるPeakNetを紹介します。従来のU-Netアーキテクチャとして実装されたセマンティックセグメンテーション問題にピーク探索のタスクを定式化した。 PeakNetの重要な利点は、データボリュームに関して線形にスケールできることであり、リアルタイムの連続結晶データ解析に高いデータレートで適している。

Serial crystallography at X-ray free electron laser (XFEL) sources has experienced tremendous progress in achieving high data rate in recent times. While this development offers potential to enable novel scientific investigations, such as imaging molecular events at logarithmic timescales, it also poses challenges in regards to real-time data analysis, which involves some degree of data reduction to only save those features or images pertaining to the science on disks. If data reduction is not effective, it could directly result in a substantial increase in facility budgetary requirements, or even hinder the utilization of ultra-high repetition imaging techniques making data analysis unwieldy. Furthermore, an additional challenge involves providing real-time feedback to users derived from real-time data analysis. In the context of serial crystallography, the initial and critical step in real-time data analysis is finding X-ray Bragg peaks from diffraction images. To tackle this challenge, we present PeakNet, a Bragg peak finder that utilizes neural networks and runs about four times faster than Psocake peak finder, while delivering significantly better indexing rates and comparable number of indexed events. We formulated the task of peak finding into a semantic segmentation problem, which is implemented as a classical U-Net architecture. A key advantage of PeakNet is its ability to scale linearly with respect to data volume, making it well-suited for real-time serial crystallography data analysis at high data rates.

翻訳日:2023-03-28 14:35:27 公開日:2023-03-24

# ビジュアルプロンプティングの理解と改善 - ラベルマッピングの視点から

Understanding and Improving Visual Prompting: A Label-Mapping Perspective ( http://arxiv.org/abs/2211.11635v5 )

ライセンス: Link先を確認

Aochuan Chen, Yuguang Yao, Pin-Yu Chen, Yihua Zhang, Sijia Liu

(参考訳) 我々は視覚タスクの入力プロンプト技術である視覚プロンプト(VP)を再検討し前進する。 VPは、(入力摂動パターンの観点で)普遍的なプロンプトを下流のデータポイントに組み込むことで、固定されたトレーニング済みのソースモデルをプログラムして、ターゲットドメインの下流タスクを達成できる。しかし、なぜVPが、ソースクラスとターゲットクラスの間のルールレスラベルマッピング(LM)でさえ有効であるのかは、いまだ解明されていない。 LMはVPとどのように関連していますか? そして、そのような関係を利用してターゲットタスクの精度を向上する方法。我々は、LMがVPに与える影響を考察し、LMのより良い「品質」(マッピング精度と説明による評価)がVPの有効性を一貫して改善できるという肯定的な回答を提供する。これは、LMの要素が欠落していた以前の技術とは対照的である。 LMを最適化するために、新たなVPフレームワークであるILM-VP(iterative label mapping-based visual prompting)を提案し、ソースラベルをターゲットラベルに自動的に再マップし、VPの目標タスク精度を徐々に改善する。さらに,コントラッシブ言語画像事前訓練(CLIP)モデルを用いて,CLIPのテキスト選択を支援するためのLMプロセスの統合と,目標タスクの精度の向上を提案する。広範な実験により,提案手法が最先端vp法を大きく上回ることを示した。以下に示すように、ImageNet-pretrained ResNet-18を13のターゲットタスクに再プログラミングする場合、我々の手法はベースラインをかなり上回り、例えば、ターゲットのFlowers102とCIFAR100データセットへの変換学習の精度が7.9%と6.7%向上している。さらに、CLIPベースのVPに関する提案では、Flowers102とDTDの精度がそれぞれ13.7%と7.1%向上している。私たちのコードはhttps://github.com/OPTML-Group/ILM-VPで利用可能です。

We revisit and advance visual prompting (VP), an input prompting technique for vision tasks. VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the target domain by simply incorporating universal prompts (in terms of input perturbation patterns) into downstream data points. Yet, it remains elusive why VP stays effective even given a ruleless label mapping (LM) between the source classes and the target classes. Inspired by the above, we ask: How is LM interrelated with VP? And how to exploit such a relationship to improve its accuracy on target tasks? We peer into the influence of LM on VP and provide an affirmative answer that a better 'quality' of LM (assessed by mapping precision and explanation) can consistently improve the effectiveness of VP. This is in contrast to the prior art where the factor of LM was missing. To optimize LM, we propose a new VP framework, termed ILM-VP (iterative label mapping-based visual prompting), which automatically re-maps the source labels to the target labels and progressively improves the target task accuracy of VP. Further, when using a contrastive language-image pretrained (CLIP) model, we propose to integrate an LM process to assist the text prompt selection of CLIP and to improve the target task accuracy. Extensive experiments demonstrate that our proposal significantly outperforms state-of-the-art VP methods. As highlighted below, we show that when reprogramming an ImageNet-pretrained ResNet-18 to 13 target tasks, our method outperforms baselines by a substantial margin, e.g., 7.9% and 6.7% accuracy improvements in transfer learning to the target Flowers102 and CIFAR100 datasets. Besides, our proposal on CLIP-based VP provides 13.7% and 7.1% accuracy improvements on Flowers102 and DTD respectively. Our code is available at https://github.com/OPTML-Group/ILM-VP.

翻訳日:2023-03-28 11:56:08 公開日:2023-03-24

# 破壊的ニューラルスケーリング法則

Broken Neural Scaling Laws ( http://arxiv.org/abs/2210.14891v10 )

ライセンス: Link先を確認

Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

(参考訳) We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, "emergent" "phase transitions / changes", arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). 神経スケーリング行動の他の機能形式と比較すると、この関数形式は、この集合においてかなり正確なスケーリング行動の外挿をもたらす。さらに、この関数形式は、二重降下のような現象のスケーリング挙動に存在する非単調遷移や、算術のようなタスクのスケーリング挙動に存在する遅延、鋭いインフレクションポイントなど、他の関数形式が表現できないスケーリング挙動を正確にモデル化し、外挿する。最後に、この関数形式を使用して、スケーリング動作の予測可能性の限界に関する洞察を得ます。コードはhttps://github.com/ethancaballero/broken_neural_scaling_lawsで入手できる。

We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, "emergent" "phase transitions / changes", arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

翻訳日:2023-03-28 11:55:38 公開日:2023-03-24

# マルチアーマッド帯域における局所クラスタリング

Local Clustering in Contextual Multi-Armed Bandits ( http://arxiv.org/abs/2103.00063v3 )

ライセンス: Link先を確認

Yikun Ban, Jingrui He

(参考訳) 本研究では,コンテキスト型マルチアームバンディット(MAB)におけるユーザクラスタの識別について検討する。コンテキストMABは、コンテンツレコメンデーションやオンライン広告など、多くの実アプリケーションにとって効果的なツールである。実際には、ユーザ依存はユーザのアクション、つまり報酬において重要な役割を果たす。類似したユーザーをクラスタリングすることで報酬の質が向上し、結果としてより効果的なコンテンツレコメンデーションとターゲット広告につながる。従来のクラスタリング設定とは異なり、未知のbanditパラメータに基づいてユーザをクラスタ化します。特に、コンテキストMABにおけるクラスタ検出の問題を定義し、局所クラスタリング手法を組み込んだ帯域幅アルゴリズム、LOCBを提案する。また,クラスタリングの正しさと効率,およびその後悔境界の観点から,LOCBに関する理論的解析を行った。最後に,提案アルゴリズムを,最先端のベースラインを上回る様々な側面から評価する。

We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.

翻訳日:2023-03-27 19:16:54 公開日:2023-03-24

# バイナリラテントを用いた変分オートエンコーダの直接進化最適化

Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents ( http://arxiv.org/abs/2011.13704v2 )

ライセンス: Link先を確認

Enrico Guiraud, Jakob Drefs, J\"org L\"ucke

(参考訳) 離散潜在変数は実世界のデータにとって重要であると考えられており、離散潜在変数を持つ変分オートエンコーダ(VAE)の研究の動機となっている。しかし、この場合、標準的なVAEトレーニングは不可能であり、従来のような個別のVAEを訓練するために、個別の分散を操作するための異なる戦略を動機付けている。ここでは、符号化モデルに直接離散最適化を適用することにより、潜伏者の離散性を完全に維持できるかどうかを問う。この手法は, サイドステッピングサンプリング近似, 再パラメータ化トリック, 償却により, 標準的なVAEトレーニングから強く逸脱している。離散最適化は、進化的アルゴリズムと連動して、切断後段を用いた変分設定で実現される。バイナリラテントを持つVAEに対して、(A)ネットワーク重みに対する勾配上昇にそのような離散的変動法がどのように結びついているか、および(B)デコーダがトレーニングのために遅延状態を選択する方法を示す。従来の償却トレーニングはより効率的で、大きなニューラルネットワークに適用できる。しかし、より小さなネットワークを用いることで、数百の潜伏者に対して効率よく分散最適化を行うことができる。さらに重要なのは,直接最適化の有効性が,‘ゼロショット’学習において極めて競争力が高いことだ。大規模な教師付きネットワークとは対照的に、hereが調査したvaes canは、クリーンなデータや大きな画像データセットのトレーニングの事前のトレーニングなしに、1つのイメージをデノーズする。より一般に,vaeの訓練はサンプリングに基づく近似と再パラメータ化を伴わずに可能であり,一般にvae訓練の解析には興味深いものと考えられる。ゼロショット' 設定では、直接最適化され、さらに、VAE は非生成的アプローチによって以前より優れていた。

Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.

翻訳日:2023-03-27 19:16:40 公開日:2023-03-24

# クロスU統計を用いた次元非依存推論

Dimension-agnostic inference using cross U-statistics ( http://arxiv.org/abs/2011.05068v6 )

ライセンス: Link先を確認

Ilmun Kim, Aaditya Ramdas

(参考訳) 統計的推論に対する古典的な漸近理論は、通常、次元$d$を固定し、サンプルサイズ$n$を無限大に増やすことで統計学を校正する。最近、これらのメソッドが高次元設定でどのように振る舞うかを理解するために多くの努力が払われており、$d$と$n$は共に無限大へと増加する。これはしばしば、次元に関する仮定によって異なる推論手順をもたらし、実践者はバインドに残される: 20次元に100のサンプルを持つデータセットが与えられたら、$n \gg d$、または$d/n \approx 0.2$を仮定してキャリブレーションすべきだろうか? 本論文は次元非依存推論の目的を考察し,$d$ と $n$ の仮定に依存しない手法の開発について述べる。サンプル分割と自己正規化とともに既存のテスト統計の変動表現を用いて、$d$が$n$でスケールするかどうかに関わらず、ガウス極限分布を持つ洗練されたテスト統計値を生成するアプローチを導入する。結果の統計学は、縮退したU統計を慎重に修正し、対角ブロックを落とし、対角ブロックを外したままにすると見なすことができる。我々は,一サンプル平均値と共分散テストを含む古典的な問題に対して,本手法を例示し,本試験が局所的代替品に対して最小速度最適化力を有することを示す。ほとんどの設定では、我々の交差U統計は対応する(退化)U統計の高次元のパワーと$\sqrt{2}$因子と一致する。

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

翻訳日:2023-03-27 19:16:07 公開日:2023-03-24

# UCB帯域における最適対外攻撃

Near Optimal Adversarial Attack on UCB Bandits ( http://arxiv.org/abs/2008.09312v3 )

ライセンス: Link先を確認

Shiliang Zuo

(参考訳) 我々は,報酬が敵対的腐敗を受ける確率的多腕バンディット問題を考える。本稿では、UCBの原理を巧みに操り、累積コストを$\sqrt{\log T}$とすると、$T$がラウンド数であるような累積コストで、最適でないターゲットアームを$T - o(T)$倍に引く新たな攻撃戦略を提案する。また、累積攻撃コストに対する最初の下限も証明する。我々の下限は最大$\log \log t$ 要素の上限に一致し、攻撃が最適に近いことを示している。

We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.

翻訳日:2023-03-27 19:15:40 公開日:2023-03-24

# フィッシャー情報を用いた量子ステアリングのウイットネス化

Witnessing quantum steering by means of the Fisher information ( http://arxiv.org/abs/2107.14730v2 )

ライセンス: Link先を確認

Ilaria Gianani, Vincenzo Berardi and Marco Barbieri

(参考訳) 量子ネットワークでは、特定の種類の量子相関を捉えることが重要となる。この課題を達成するために異なる経路をとることができ、そのような量子相関の異なる新しい側面を強調している。 yadin, fadel, gessner [nat. commun. 12, 2410 (2021)] による最近の理論結果に従い、二成分状態のメトロロジー能力においてステアリングがどのように現れるかを実験的に示す。本研究は,本手法の有効性を確認し,既存の代替案と比較した。

Capturing specific kinds of quantum correlation is of paramount importance for quantum networking. Different routes can be taken to achieve this task, highlighting different novel aspects of such quantum correlations. Following the recent theoretical results by Yadin, Fadel and Gessner [Nat. Commun. 12, 2410 (2021)], we demonstrate experimentally how steering manifests in the metrological abilities of a bipartite state. Our results confirm the relevance of this novel approach, and compare the outcome with already employed alternatives.

翻訳日:2023-03-27 19:10:26 公開日:2023-03-24

# 歩行認識のためのシルエットと骨格データの組み合わせ

Combining the Silhouette and Skeleton Data for Gait Recognition ( http://arxiv.org/abs/2202.10645v3 )

ライセンス: Link先を確認

Likai Wang, Ruize Han, Wei Feng

(参考訳) 長距離バイオメトリック技術である歩行認識は近年、強い関心を集めている。現在、主要な2つの歩行認識作業は外観ベースとモデルベースであり、それぞれシルエットと骨格から特徴を抽出する。しかし, 着替えや搬送条件では外観ベースが大きな影響を受け, モデルベースではポーズ推定の精度が制限される。そこで,本研究では,シルエットを入力とするcnn系分枝と,スケルトンを入力とするgcn系分枝を含む,簡便かつ効果的な二分枝ネットワークを提案する。さらに,GCN系分岐における歩行表現の改善のために,マルチスケールグラフ畳み込みを統合する完全連結グラフ畳み込み演算子を提案し,自然関節接続への依存を軽減する。また,stc-attと呼ばれる多次元アテンションモジュールを配置し,空間的,時間的,チャネル的アテンションを同時に学習する。 CASIA-BとOUMVLPの実験結果から, 各種条件下での最先端性能が得られた。

Gait recognition, a long-distance biometric technology, has aroused intense interest recently. Currently, the two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes and skeletons, respectively. However, appearance-based methods are greatly affected by clothes-changing and carrying conditions, while model-based methods are limited by the accuracy of pose estimation. To tackle this challenge, a simple yet effective two-branch network is proposed in this paper, which contains a CNN-based branch taking silhouettes as input and a GCN-based branch taking skeletons as input. In addition, for better gait representation in the GCN-based branch, we present a fully connected graph convolution operator to integrate multi-scale graph convolutions and alleviate the dependence on natural joint connections. Also, we deploy a multi-dimension attention module named STC-Att to learn spatial, temporal and channel-wise attention simultaneously. The experimental results on CASIA-B and OUMVLP show that our method achieves state-of-the-art performance in various conditions.

翻訳日:2023-03-27 19:02:28 公開日:2023-03-24

# ProxSkip: はい。ローカルなグラディエントステップはおそらく通信加速につながる! ついに!

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! ( http://arxiv.org/abs/2202.09357v2 )

ライセンス: Link先を確認

Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich and Peter Richt\'arik

(参考訳) ProxSkipは、スムーズな(f$)関数と高価な非滑らかな(\psi$)関数の和を最小化する驚くほどシンプルで、証明可能な効率のよい方法です。このような問題を解決するための標準的なアプローチは、各反復において$f$の勾配と$\psi$のプロキシ演算子の評価に基づいて、近勾配降下法(ProxGD)アルゴリズムである。本研究で特に注目しているのは,proxの評価が勾配の評価に比較して費用がかかるようなシステムであり,多くの応用例においてそうである。 proxskipは高価なprox演算子をほとんどのイテレーションでスキップできる: イテレーションの複雑さは$\mathcal{o}\left(\kappa \log \frac{1}{\varepsilon}\right)$であり、ここで$\kappa$は$f$の条件番号であるが、proxの評価の数は$\mathcal{o}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$のみである。我々の主な動機は、勾配演算子の評価がすべてのデバイスで独立に局所的なGDステップをとることに対応し、プロキシの評価は勾配平均化の形式での(拡張的な)コミュニケーションに対応することにある。この文脈では、ProxSkipは通信複雑性の効果的な加速を提供する。 fedavg, scaffold,s-local-gd,fedlinなどの他の局所勾配型手法とは異なり,不均質なデータレジームにおけるバニラgdのそれよりも理論的な通信複雑性が悪く,あるいは最善の一致があるため,不均質性境界を仮定せずに証明可能で大きな改善が得られる。

We introduce ProxSkip -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $\mathcal{O}\left(\kappa \log \frac{1}{\varepsilon}\right)$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $\mathcal{O}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.

翻訳日:2023-03-27 19:02:08 公開日:2023-03-24

# 問合せによるブラックボックス深層学習モデルに対するスパース攻撃

Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models ( http://arxiv.org/abs/2202.00091v2 )

ライセンス: Link先を確認

Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

(参考訳) 最善の努力にもかかわらず、ディープラーニングモデルは入力に適用される小さな逆さまの摂動にも非常に弱いままです。機械学習モデルの出力のみから情報を抽出し、ブラックボックスモデルに敵対的な摂動を発生させる能力は、自律車や機械学習モデルがサービスとして公開するMLaaSのような現実のシステムに対する現実的な脅威である。特に興味深いのは、スパース攻撃である。ブラックボックスモデルにおけるスパース攻撃の実現は、機械学習モデルが私たちが信じているよりも脆弱であることを示している。これらの攻撃は、l_0標準条件で測定された摂動画素の数を最小限に抑え、決定(予測ラベル)のみをモデルクエリに返却することで、モデルを誤解させる。しかし、このような攻撃はNPハード最適化の問題につながる。本研究では,畳み込み型ディープニューラルネットワークと視覚トランスフォーマの両方に対して,問題に対する進化に基づくアルゴリズムスパーセボを開発した。特に、視覚変換器は、決定に基づく攻撃条件下ではまだ調査されていない。 SparseEvoは、未ターゲットとターゲットの両方の攻撃に対して、最先端のスパース攻撃よりもはるかに少ないモデルクエリを必要とする。攻撃アルゴリズムは概念的には単純ではあるが、ImageNetのような標準的なコンピュータビジョンタスクにおける最先端の勾配ベースのホワイトボックス攻撃に対して、限られたクエリ予算で競合する。重要なことは、クエリ効率のよいSparseEvoと、一般的には意思決定ベースの攻撃は、デプロイされたシステムの安全性に関する新たな疑問を提起し、機械学習モデルの堅牢性を研究し、理解するための新たな方向性を示す。

Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (MLaaS). Of particular interest are sparse attacks. The realization of sparse attacks in black-box models demonstrates that machine learning models are more vulnerable than we believe. Because these attacks aim to minimize the number of perturbed pixels measured by l_0 norm-required to mislead a model by solely observing the decision (the predicted label) returned to a model query; the so-called decision-based attack setting. But, such an attack leads to an NP-hard optimization problem. We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. Notably, vision transformers are yet to be investigated under a decision-based attack setting. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. The attack algorithm, although conceptually simple, is also competitive with only a limited query budget against the state-of-the-art gradient-based whitebox attacks in standard computer vision tasks such as ImageNet. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models.

翻訳日:2023-03-27 19:01:30 公開日:2023-03-24

# 分数最小化のための座標降下法

Coordinate Descent Methods for Fractional Minimization ( http://arxiv.org/abs/2201.12691v3 )

ライセンス: Link先を確認

Ganzhao Yuan

(参考訳) 目的の数値部が微分可能凸関数と凸非滑らか関数の和であり、分母部が凸あるいは凹関数であるような構成された分数最小化問題のクラスを考える。非凸であるため、この問題は解決が難しい。問題の構造を利用して,この問題を解決するための2つのコーディネートDescent法を提案する。提案手法は1次元のsubproblem \textit{globally} を反復的に解き、座標の定常点に収束することが保証される。凸分母の場合、弱 \textit{locally bounded non-convexity condition} の下では、座標次定常点の最適性が標準臨界点と方向点の最適点よりも強いことが証明される。追加の適切な条件下では、cd法は座標的に静止点に q-線型収束する。凹分母の場合、任意の臨界点が大域的最小値であり、cd法は大域的最小値に劣線形収束率で収束することを示す。提案手法をいくつかの機械学習および信号処理モデルに適用する可能性を示す。実世界のデータを用いた実験により,提案手法は精度において既存手法よりも著しく優れていた。

We consider a class of structured fractional minimization problems, in which the numerator part of the objective is the sum of a differentiable convex function and a convex non-smooth function, while the denominator part is a convex or concave function. This problem is difficult to solve since it is non-convex. By exploiting the structure of the problem, we propose two Coordinate Descent (CD) methods for solving this problem. The proposed methods iteratively solve a one-dimensional subproblem \textit{globally}, and they are guaranteed to converge to coordinate-wise stationary points. In the case of a convex denominator, under a weak \textit{locally bounded non-convexity condition}, we prove that the optimality of coordinate-wise stationary point is stronger than that of the standard critical point and directional point. Under additional suitable conditions, CD methods converge Q-linearly to coordinate-wise stationary points. In the case of a concave denominator, we show that any critical point is a global minimum, and CD methods converge to the global minimum with a sublinear convergence rate. We demonstrate the applicability of the proposed methods to some machine learning and signal processing models. Our experiments on real-world data have shown that our method significantly and consistently outperforms existing methods in terms of accuracy.

翻訳日:2023-03-27 19:01:06 公開日:2023-03-24

# RBMLE-UCBによる線形二次系の適応制御

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems ( http://arxiv.org/abs/2201.10542v2 )

ライセンス: Link先を確認

Akshay Mete, Rahul Singh and P. R. Kumar

(参考訳) 適応LQ制御問題(Adaptive LQ control problem)と呼ばれる2次コストで未知の確率線形系を制御する問題を考える。我々は40年以上前に提案された「RBMLE(Reward Biased Maximum Likelihood Estimate)」という手法を再検討し、盗賊問題に対する「Regret」の定義とともに「Upper Confidence Bound(UCB)」法を先行した。単にパラメータ推定の基準により大きな報酬を持つパラメータを好む項を追加しただけである。本研究では,rbmle法とucb法を両立させる方法を示し,rbmle法のペナルティとucb法の制約を組み合わせた拡張rbmle-ucbアルゴリズムを提案する。理論的には、この手法はこれまでの最もよく知られている$\tilde{\mathcal{o}}(\sqrt{t})$ regretを保っている。さらに,提案する拡張型rbmle-ucbと標準のrbmleを,ucb,トンプソンサンプリング,入力摂動,ランダム化された確実性等価性,stablで比較し,ボーイング747の飛行制御や無人航空機などの実世界の実例と比較した。拡張 RBMLE は UCB, Thompson Sampling および StabL を大差で一貫した性能を保ちながら, 入力摂動よりも極端に優れ, ランダム化された不確実性等価性よりも適度に優れていることを示した。

We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.

翻訳日:2023-03-27 19:00:44 公開日:2023-03-24

# 分解型量子グラフニューラルネットワーク

Decompositional Quantum Graph Neural Network ( http://arxiv.org/abs/2201.05158v2 )

ライセンス: Link先を確認

Xing Ai, Zhihong Zhang, Luzhe Sun, Junchi Yan, Edwin Hancock

(参考訳) 量子機械学習(quantum machine learning)は、量子アルゴリズムと量子コンピューティングを用いた機械学習に取り組むことを目的とした、急速に進化する分野である。物理量子ビットの欠如とユークリッド空間からヒルベルト空間に実世界のデータをマッピングする効果的な手段のため、これらの手法のほとんどは量子類似性やプロセスシミュレーションに焦点をあてる。本稿では,ego-graphベースの量子グラフニューラルネットワーク (egoqgnn) と呼ぶ,グラフ構造データのためのハイブリッド量子古典アルゴリズムを提案する。 egoQGNNはテンソル積とユニティ行列表現を用いてGNN理論フレームワークを実装し、必要なモデルパラメータの数を大幅に削減する。古典的コンピュータによって制御される場合、egoQGNNは、適度な大きさの量子デバイスを用いて入力グラフからエゴグラフを処理することにより、任意の大きさのグラフを調整できる。このアーキテクチャは、現実世界のデータからヒルベルト空間への新しいマッピングに基づいている。このマッピングは、データに存在する距離関係を維持し、情報損失を低減する。実験の結果,提案手法はこれらのモデルと比較して1.68 %のパラメータしか持たない競争状態モデルよりも優れていた。

Quantum machine learning is a fast-emerging field that aims to tackle machine learning using quantum algorithms and quantum computing. Due to the lack of physical qubits and an effective means to map real-world data from Euclidean space to Hilbert space, most of these methods focus on quantum analogies or process simulations rather than devising concrete architectures based on qubits. In this paper, we propose a novel hybrid quantum-classical algorithm for graph-structured data, which we refer to as the Ego-graph based Quantum Graph Neural Network (egoQGNN). egoQGNN implements the GNN theoretical framework using the tensor product and unity matrix representation, which greatly reduces the number of model parameters required. When controlled by a classical computer, egoQGNN can accommodate arbitrarily sized graphs by processing ego-graphs from the input graph using a modestly-sized quantum device. The architecture is based on a novel mapping from real-world data to Hilbert space. This mapping maintains the distance relations present in the data and reduces information loss. Experimental results show that the proposed method outperforms competitive state-of-the-art models with only 1.68\% parameters compared to those models.

翻訳日:2023-03-27 19:00:10 公開日:2023-03-24

# RamBoAttack: 効率的なディープニューラルネットワーク決定エクスプロイトのロバストクエリ

RamBoAttack: A Robust Query Efficient Deep Neural Network Decision Exploit ( http://arxiv.org/abs/2112.05282v3 )

ライセンス: Link先を確認

Viet Quoc Vo and Ehsan Abbasnejad and Damith C. Ranasinghe

(参考訳) 機械学習モデルは、敵の例からの回避攻撃に極めて敏感である。一般的に、元の入力と知覚的に類似した修正された入力は、モデルに完全にアクセス可能な敵によってホワイトボックス設定で構築される。しかし、最近の攻撃では、ブラックボックス攻撃を使って敵の例を作るためにクエリ数が著しく減少している。特にアラームは、google、microsoft、ibmを含む多くの機械学習サービスプロバイダによって提供されるトレーニングされたモデルのアクセスインターフェースから、これらのモデルを組み込んだ多数のアプリケーションによって使用される分類決定を活用できる能力である。モデルから予測されたラベルのみを利用して敵の例を作る能力は、決定に基づく攻撃として区別される。本研究では,iclrとspにおける最近の最先端意思決定に基づく攻撃を深く掘り下げ,勾配推定手法を用いた低歪み逆検出の費用対効果を強調する。我々は,局所的な最小値の侵入を回避し,勾配推定法で見られる雑音勾配からの誤方向を回避できる,堅牢なクエリ効率的な攻撃を開発する。提案する攻撃手法であるRamBoAttackは、ランダム化ブロック座標 Descent の概念を利用して隠れた分類器多様体を探索し、局所化入力のみを演算して勾配推定法の問題に対処する摂動を目標とする。重要なことは、RamBoAttackは、敵とターゲットクラスに利用可能な異なるサンプル入力に対してより堅牢である。全体として、特定のターゲットクラスに対して、RamBoAttackは、所定のクエリ予算内で低い歪みを達成するために、より堅牢であることが示されている。大規模な高解像度imagenetデータセットを使用して広範な結果をキュレーションし、攻撃、テストサンプル、アーティファクトをgithubでオープンソースにしました。

Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attacks. Particularly, alarming is the ability to exploit the classification decision from the access interface of a trained model provided by a growing number of Machine Learning as a Service providers including Google, Microsoft, IBM and used by a plethora of applications incorporating these models. The ability of an adversary to exploit only the predicted label from a model to craft adversarial examples is distinguished as a decision-based attack. In our study, we first deep dive into recent state-of-the-art decision-based attacks in ICLR and SP to highlight the costly nature of discovering low distortion adversarial employing gradient estimation methods. We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The attack method we propose, RamBoAttack, exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the issues of gradient estimation methods. Importantly, the RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class. Overall, for a given target class, RamBoAttack is demonstrated to be more robust at achieving a lower distortion within a given query budget. We curate our extensive results using the large-scale high-resolution ImageNet dataset and open-source our attack, test samples and artifacts on GitHub.

翻訳日:2023-03-27 18:59:50 公開日:2023-03-24

# DeBERTaV3: ELECTRA-Style Pre-TrainingによるDeBERTaの改善

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing ( http://arxiv.org/abs/2111.09543v4 )

ライセンス: Link先を確認

Pengcheng He, Jianfeng Gao and Weizhu Chen

(参考訳) 本稿では,マスク言語モデリング(MLM)を,よりサンプル効率の高い事前学習タスクであるRTDに置き換えることで,従来のDeBERTaモデルを改善する新しい事前学習言語モデルであるDeBERTaV3を提案する。 ELECTRAにおけるバニラ埋め込み共有は,訓練効率とモデル性能を損なうことが示された。これは、ディスクリミネータとジェネレータのプルトークンのトレーニング損失が異なる方向に埋め込み、"綱引き"のダイナミクスを生成するためである。そこで本研究では,タッグ・オブ・ウォーのダイナミクスを回避し,トレーニング効率と事前学習モデルの質を両立させる,新しい勾配偏角埋め込み共有法を提案する。我々はDeBERTaV3をDeBERTaと同じ設定で事前訓練し、広範囲の下流自然言語理解(NLU)タスクにおいて例外的な性能を示す。 GLUEベンチマークを例に挙げると、DeBERTaV3 Largeモデルは平均スコア91.37%で、DeBERTaは1.37%、ELECTRAは1.91%で、同様の構造を持つモデルに新しい最先端(SOTA)が設定されている。さらに,多言語モデルmdebertaを事前学習し,英語モデルに比べて強いベースラインよりも大きな改善が見られた。例えば、mDeBERTa Baseは、XNLIで79.8%のゼロショットのクロスランガル精度を達成し、XLM-R Baseで3.6%改善した。トレーニング済みのモデルと推論コードをhttps://github.com/microsoft/DeBERTaで公開しました。

This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. Our analysis shows that vanilla embedding sharing in ELECTRA hurts training efficiency and model performance. This is because the training losses of the discriminator and the generator pull token embeddings in different directions, creating the "tug-of-war" dynamics. We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model. We have pre-trained DeBERTaV3 using the same settings as DeBERTa to demonstrate its exceptional performance on a wide range of downstream natural language understanding (NLU) tasks. Taking the GLUE benchmark with eight tasks as an example, the DeBERTaV3 Large model achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (SOTA) among the models with a similar structure. Furthermore, we have pre-trained a multi-lingual model mDeBERTa and observed a larger improvement over strong baselines compared to English models. For example, the mDeBERTa Base achieves a 79.8% zero-shot cross-lingual accuracy on XNLI and a 3.6% improvement over XLM-R Base, creating a new SOTA on this benchmark. We have made our pre-trained models and inference code publicly available at https://github.com/microsoft/DeBERTa.

翻訳日:2023-03-27 18:59:18 公開日:2023-03-24

# 最適フェルミオン-量子マッピング

Optimal fermion-qubit mappings ( http://arxiv.org/abs/2110.12792v4 )

ライセンス: Link先を確認

Mitchell Chiew, Sergii Strelchuk

(参考訳) 量子コンピュータ上のフェルミオン系をシミュレーションするには、フェルミオン状態の量子ビットへの高速なマッピングが必要である。効率的なマッピングの重要な特徴は、局所的なフェルミオン相互作用を局所的な量子ビット相互作用に変換する能力である。すべてのフェルミオン・クビット写像は、クビット演算への変換のためにフェルミオンモードの番号スキームを使用する必要がある。フェルミオン系が写像する量子ビットの順序付けされていない記号ラベリングと順序付けされた数値ラベリングとを区別する。この分離は、フェルミオンモードの列挙スキームである追加の自由度を利用してフェルミオン-量子マッピングを設計する新しい方法に光を当てる。これにより、最適なフェルミオン量子ビット写像の概念をよく定義することができる:例えば、本論文の主な焦点は、彼らが生成する量子ハミルトニアンの項において、パウリ行列の平均数を最小化するジョルダン・ウィグナー変換を特定することである。フェルミオン系が与えられたとき、ターゲットの量子ハミルトニアンの実用的コスト関数に対する最適なフェルミオン列挙スキームを選択するためのリソースは一切与えられない。ミッチソンとダービンの列挙パターンが、正方形フェルミオン格子で相互作用するシステムのヨルダン・ウィグナー変換の平均ポーリ重みを最小化するために最適であることを示す。これにより、クビット・ハミルトニアン(qubit hamiltonian)は、パウリの平均重量が13.9%短くなる。さらに、2つのアンシラ量子ビットのみを加えることで、新しいフェルミオン・クビット写像のクラスを導入し、以前の方法と比較してハミルトン項の平均パウリ重量を37.9%削減する。他の自然なn$モードフェルミオン系では、平均ポーリ重みがna\"ive列挙スキームに対して$n^{1/4}$向上する列挙パターンが見つかる。

Simulating fermionic systems on a quantum computer requires a high-performing mapping of fermionic states to qubits. The key characteristic of an efficient mapping is its ability to translate local fermionic interactions into local qubit interactions, leading to easy-to-simulate qubit Hamiltonians. All fermion-qubit mappings must use a numbering scheme for the fermionic modes in order for translation to qubit operations. We make a distinction between the unordered, symbolic labelling of fermions and the ordered, numeric labelling of the qubits to which the fermionic system maps. This separation shines light on a new way to design fermion-qubit mappings by making use of the extra degree of freedom -- the enumeration scheme for the fermionic modes. This allows well-defined concepts of optimal fermion-qubit mappings: for example, the main focus of this paper is in identifying Jordan-Wigner transformations that minimise the average number of Pauli matrices in the terms of the qubit Hamiltonians they produce. Given a fermionic system, it does not expend any resources to choose the optimal fermionic enumeration schemes for practical cost functions of the target qubit Hamiltonian. We demonstrate how Mitchison and Durbin's enumeration pattern is optimal for minimising the average Pauli weight of Jordan-Wigner transformations of the systems interacting in square fermionic lattices. This leads to qubit Hamiltonians consisting of terms with average Pauli weights 13.9% shorter than previously known. Furthermore, by adding only two ancilla qubits we introduce a new class of fermion-qubit mappings, and reduce the average Pauli weight of Hamiltonian terms by 37.9% compared to previous methods. For other natural $n$-mode fermionic systems we find enumeration patterns which result in $n^{1/4}$ improvement in average Pauli weight over na\"ive enumeration schemes.

翻訳日:2023-03-27 18:58:45 公開日:2023-03-24

# 粗いラベルから学習する効率的なアルゴリズム

Efficient Algorithms for Learning from Coarse Labels ( http://arxiv.org/abs/2108.09805v2 )

ライセンス: Link先を確認

Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos

(参考訳) 多くの学習問題では、細かなラベル情報にアクセスできない場合がある。例えば、画像は注釈の専門知識によっては、ハスキー、犬、さらには動物と分類できる。本研究では,これらの設定を定式化し,粗いデータから学習する問題を考察する。設定された$\mathcal{Z}$から実際のラベルを観察する代わりに、$\mathcal{Z}$(またはパーティションの混合)のパーティションに対応する粗いラベルを観察します。私たちのアルゴリズムの主な結果は、粗いデータが十分に有益であるときに、きめ細かいラベルから学べるどんな問題も効率的に学習できるということです。粗いラベルのみを付与したきめ細かなラベルに対して、統計的クエリ(SQ)に応答する一般的なリダクションにより、この結果を得る。要求される粗いラベルの数は、粗さによる情報歪みと ||\mathcal{z}|$ の細かいラベルの数に多項式的に依存する。また、検閲された統計学における中心的な問題に焦点をあてた(無限に多くの)実価値ラベルについても検討する: ガウス平均は粗いデータから推定される。分割中の集合が凸であるときに効率的なアルゴリズムを提供し、非常に単純な非凸集合に対してもNPハードであることを示す。

For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a partition of $\mathcal{Z}$ (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels $|\mathcal{Z}|$. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets.

翻訳日:2023-03-27 18:58:11 公開日:2023-03-24

# 呪いを祝福に変える - 安定化モデルインバージョンによる分散データ不要バックドアの除去を可能にする

Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion ( http://arxiv.org/abs/2206.07018v3 )

ライセンス: Link先を確認

Si Chen, Yi Zeng, Jiachen T.Wang, Won Park, Xun Chen, Lingjuan Lyu, Zhuoqing Mao, Ruoxi Jia

(参考訳) 機械学習モデルにおける多くのバックドア除去技術は、きれいな配布データを必要とするが、プロプライエタリなデータセットのために常に利用できるとは限らない。モデル反転技術は、しばしばプライバシーの脅威と見なされるが、現実的なトレーニングサンプルを再構築し、配布データの必要性をなくす可能性がある。バックドア除去とモデル逆転を組み合わせた以前の試みは、限られた結果をもたらした。本研究は, モデルインバージョンを有効なバックドア除去に活用する手法として, 再構成されたサンプルの特性, 知覚的類似性, バックドアトリガの潜在的な存在に関する重要な疑問に対処する。強固な防御には知覚的類似性のみに依存することは不十分であり、入力とパラメータの摂動に対するモデル予測の安定性も重要である。そこで本研究では,モデルインバージョンと安定性,視覚的品質向上のための2段階最適化フレームワークを提案する。興味深いことに、事前訓練された発電機の潜伏空間からの再構成サンプルは、バックドアモデルからの信号を利用する場合でも、バックドアフリーであることが判明した。この発見を支持する理論的分析を提供する。その結果,本手法は,同一量のクリーンサンプルを用いた性能の一致や超過を伴わずに,最先端のバックドア除去性能を実現した。

Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we introduce a novel bi-level optimization-based framework for model inversion, promoting stability and visual quality. Interestingly, we discover that reconstructed samples from a pre-trained generator's latent space are backdoor-free, even when utilizing signals from a backdoored model. We provide a theoretical analysis to support this finding. Our evaluation demonstrates that our stabilized model inversion technique achieves state-of-the-art backdoor removal performance without clean in-distribution data, matching or surpassing performance using the same amount of clean samples.

翻訳日:2023-03-27 18:51:36 公開日:2023-03-24

# 深層学習モデルの対称性とその内部表現について

On the Symmetries of Deep Learning Models and their Internal Representations ( http://arxiv.org/abs/2205.14258v5 )

ライセンス: Link先を確認

Charles Godfrey, Davis Brown, Tegan Emerson, Henry Kvinge

(参考訳) 対称性は、幅広い複雑なシステムの探索における基本的な道具である。機械学習の対称性はモデルとデータの両方で研究されている。本稿では,モデルファミリーのアーキテクチャから生じる対称性と,そのファミリーの内部データ表現の対称性を結びつける。これを基本対称群の集合を計算し、モデルのインターツウィナー群(英語版)と呼ぶ。我々は、同じアーキテクチャを持つモデル間の隠れた状態間の類似性を調べる一連の実験を通して、データの内部表現に相互に結合する。我々の研究は、ネットワークの対称性が、そのネットワークのデータ表現の対称性に伝播されることを示唆し、アーキテクチャが学習と予測プロセスにどのように影響するかをよりよく理解する。最後に、ReLUネットワークでは、任意の線形結合ではなく、隠れ層における活性化に基づくモデル解釈可能性探索を集中させる一般的な手法の正当性を推測する。

Symmetry is a fundamental tool in the exploration of a broad range of complex systems. In machine learning symmetry has been explored in both models and data. In this paper we seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data. We do this by calculating a set of fundamental symmetry groups, which we call the intertwiner groups of the model. We connect intertwiner groups to a model's internal representations of data through a range of experiments that probe similarities between hidden states across models with the same architecture. Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for ReLU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers rather than arbitrary linear combinations thereof.

翻訳日:2023-03-27 18:50:53 公開日:2023-03-24

# Penguins Don't Fly: Instantiationsと例外によるジェネリックの推論

Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions ( http://arxiv.org/abs/2205.11658v3 )

ライセンス: Link先を確認

Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi

(参考訳) 属は、普遍的に真実ではない世界(例えば、鳥は飛ぶことができる)に関する一般化を表現する(例えば、新生児の鳥やペンギンは飛べない)。共通センス知識ベース(Commonsense knowledge bases)は、NLPで広く使われ、いくつかの一般的な知識を符号化するが、そのような例外を列挙することは滅多にない。我々は、言語理論から情報を得た新しい枠組みを提示する -- ジェネリックが真または偽を持っている特定の場合。我々は、約650のジェネリックに対して19kの例を生成し、我々のフレームワークが強力なGPT-3ベースラインを12.8精度で上回っていることを示す。分析では,例文生成における言語理論に基づく制御可能性の重要性,例文の源としての知識基盤の不足,自然言語推論の課題について考察した。

Generics express generalizations about the world (e.g., birds can fly) that are not universally true (e.g., newborn birds and penguins cannot fly). Commonsense knowledge bases, used extensively in NLP, encode some generic knowledge but rarely enumerate such exceptions and knowing when a generic statement holds or does not hold true is crucial for developing a comprehensive understanding of generics. We present a novel framework informed by linguistic theory to generate exemplars -- specific cases when a generic holds true or false. We generate ~19k exemplars for ~650 generics and show that our framework outperforms a strong GPT-3 baseline by 12.8 precision points. Our analysis highlights the importance of linguistic theory-based controllability for generating exemplars, the insufficiency of knowledge bases as a source of exemplars, and the challenges exemplars pose for the task of natural language inference.

翻訳日:2023-03-27 18:50:37 公開日:2023-03-24

# 量子電池の充電における触媒

Catalysis in Charging Quantum Batteries ( http://arxiv.org/abs/2205.05018v3 )

ライセンス: Link先を確認

Ricard Ravell Rodriguez, Borhan Ahmadi, Pawel Mazurek, Shabir Barzanjeh, Robert Alicki, and Pawel Horodecki

(参考訳) 本稿では,高調波発振器 (量子電池) と高調波発振器 (チャージャー) をレーザー磁場で駆動する方式を提案する。触媒系の存在下では, 充電器と量子電池の間を媒介し, エネルギー伝達限界を著しく緩和できることを示す。これらの触媒系は量子ビットまたは高調波振動子であり、量子電池へのエネルギー移動量を増加させるが、それ自体はほとんどエネルギーを蓄積しない。これにより、電池と充電器の結合強度に依存するベアセッティングにおける最適値である帯電レーザ場の周波数を最適化する必要がなくなる。

We propose a novel approach for optimization of charging of harmonic oscillators (quantum batteries) coupled to a harmonic oscillator (charger), driven by laser field. We demonstrate that energy transfer limitations can be significantly mitigated in the presence of catalyst systems, mediating between the charger and quantum batteries. We show that these catalyst systems, either qubits or harmonic oscillators, enhance the amount of energy transferred to quantum batteries, while they themselves store almost no energy. It eliminates the need for optimizing frequency of the charging laser field, whose optimal value in the bare setting depends on coupling strengths between the charger and the batteries.

翻訳日:2023-03-27 18:49:55 公開日:2023-03-24

# 道路側LiDAR物体検出のための重み付きベイズガウス混合モデル

Weighted Bayesian Gaussian Mixture Model for Roadside LiDAR Object Detection ( http://arxiv.org/abs/2204.09804v4 )

ライセンス: Link先を確認

Tianya Zhang, Yi Ge, Peter J. Jin

(参考訳) 背景モデリングは、静的な背景成分を減じることで移動目標を検出するインテリジェントな監視システムに広く利用されている。多くの道端lidarオブジェクト検出手法は、多くのフレーム(ボクセル密度、近傍数、最大距離など)の記述統計に基づいて、新しいデータポイントと事前訓練された背景参照を比較して、前景ポイントをフィルタリングする。しかし、これらのソリューションは高トラフィック下では非効率であり、パラメータ値はあるシナリオから別のシナリオへの転送が困難である。初期の研究では、ビデオベースシステムに広く用いられている確率論的背景モデリング手法は、疎小で非構造化のクラウドデータのため、ロードサイドのLiDAR監視システムには適していないと考えられていた。本稿では,各LiDAR点の標高と方位値に基づいて,生のLiDARデータを構造化表現に変換する。この高次テンソル表現により、道路側LiDAR背景モデリングのための効率的な高次元多変量解析を可能にする障壁を破る。ベイズ非パラメトリック(BNP)アプローチは、強度値と3D計測を統合し、3Dと強度情報を完全に活用する。提案手法は,2つの最先端の道路背景モデル,コンピュータビジョンベンチマーク,深層学習ベースラインを比較し,交通量と難易度で評価された点,対象,経路レベルを比較した。このマルチモーダル重み付きベイズ混合モデル(GMM)は、ノイズ測定により動的バックグラウンドを処理でき、インフラベースのLiDARオブジェクト検出を大幅に強化し、スマートシティアプリケーションのための様々な3Dモデリングを作成することができる。

Background modeling is widely used for intelligent surveillance systems to detect moving targets by subtracting the static background components. Most roadside LiDAR object detection methods filter out foreground points by comparing new data points to pre-trained background references based on descriptive statistics over many frames (e.g., voxel density, number of neighbors, maximum distance). However, these solutions are inefficient under heavy traffic, and parameter values are hard to transfer from one scenario to another. In early studies, the probabilistic background modeling methods widely used for the video-based system were considered unsuitable for roadside LiDAR surveillance systems due to the sparse and unstructured point cloud data. In this paper, the raw LiDAR data were transformed into a structured representation based on the elevation and azimuth value of each LiDAR point. With this high-order tensor representation, we break the barrier to allow efficient high-dimensional multivariate analysis for roadside LiDAR background modeling. The Bayesian Nonparametric (BNP) approach integrates the intensity value and 3D measurements to exploit the measurement data using 3D and intensity info entirely. The proposed method was compared against two state-of-the-art roadside LiDAR background models, computer vision benchmark, and deep learning baselines, evaluated at point, object, and path levels under heavy traffic and challenging weather. This multimodal Weighted Bayesian Gaussian Mixture Model (GMM) can handle dynamic backgrounds with noisy measurements and substantially enhances the infrastructure-based LiDAR object detection, whereby various 3D modeling for smart city applications could be created.

翻訳日:2023-03-27 18:49:41 公開日:2023-03-24

# 連続時空間グラフ畳み込みネットワーク

Continual Spatio-Temporal Graph Convolutional Networks ( http://arxiv.org/abs/2203.11009v2 )

ライセンス: Link先を確認

Lukas Hedegaard and Negar Heidari and Alexandros Iosifidis

(参考訳) スケルトンデータによるグラフに基づく推論は、人間の行動認識に有望なアプローチとして現れてきた。しかし、オンライン推論の設定に主に時間列全体を入力として利用する従来のグラフベースの手法は、かなりの計算冗長性を必要とする。本稿では,時空間グラフ畳み込みニューラルネットワークを連続推論ネットワークとして再構成することで,フレーム処理を繰り返すことなく段階的に予測を行う。提案手法を評価するため,ST-GCN,CoST-GCN,CoAGCN,CoS-TRの2つの自己保持機構を持つ導出法を連続的に生成する。提案手法は, NTU RGB+D 60, NTU RGB+D 120, Kinetics Skeleton 400 データセットを用いて, 重量移動戦略と予測加速度のアーキテクチャ修正について検討した。同様の予測精度を維持しながら、時間複雑性の最大109倍の削減、26倍のハードウェア上のアクセラレーション、オンライン推論中の最大割り当てメモリの最大52%の削減を観察する。

Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109x reduction in time complexity, on-hardware accelerations of 26x, and reductions in maximum allocated memory of 52% during online inference.

翻訳日:2023-03-27 18:49:10 公開日:2023-03-24

# Euler State Networks - 非散逸型貯留層コンピューティング

Euler State Networks: Non-dissipative Reservoir Computing ( http://arxiv.org/abs/2203.09382v3 )

ライセンス: Link先を確認

Claudio Gallicchio

(参考訳) 本稿では, 常微分方程式の数値解に着想を得て, オイラー状態ネットワーク(EuSN)と呼ばれる新しい貯留層計算(RC)モデルを提案する。提案手法は, 構造によって安定かつ非散逸性を有する貯留層ダイナミクスを設計するために, 前方オイラー離散化と反対称再帰行列を用いる。我々の数学的解析は、結果のモデルが一元的有効スペクトル半径とゼロ局所リアプノフ指数に偏り、本質的に安定性の端近くで動作していることを示している。長期記憶課題実験の結果,複数の時間ステップにわたる入力情報の効果的な伝播を必要とする問題において,標準rcモデルよりも提案手法が優れていることが明らかとなった。さらに、時系列分類ベンチマークの結果から、eusnは、rcファミリーのトレーニング効率を保ちながら、トレーニング可能なリカレントニューラルネットワークの精度をマッチ(あるいは超過)することができ、計算時間の最大490倍の節約と、エネルギー消費量の約1750倍の節約が可能であることが示されている。

Inspired by the numerical solution of ordinary differential equations, in this paper we propose a novel Reservoir Computing (RC) model, called the Euler State Network (EuSN). The presented approach makes use of forward Euler discretization and antisymmetric recurrent matrices to design reservoir dynamics that are both stable and non-dissipative by construction. Our mathematical analysis shows that the resulting model is biased towards a unitary effective spectral radius and zero local Lyapunov exponents, intrinsically operating near to the edge of stability. Experiments on long-term memory tasks show the clear superiority of the proposed approach over standard RC models in problems requiring effective propagation of input information over multiple time-steps. Furthermore, results on time-series classification benchmarks indicate that EuSN is able to match (or even exceed) the accuracy of trainable Recurrent Neural Networks, while retaining the training efficiency of the RC family, resulting in up to $\approx$ 490-fold savings in computation time and $\approx$ 1750-fold savings in energy consumption.

翻訳日:2023-03-27 18:48:51 公開日:2023-03-24

# 量子及び古典的時間結晶における揺らぎの役割

The role of fluctuations in quantum and classical time crystals ( http://arxiv.org/abs/2203.05577v4 )

ライセンス: Link先を確認

Toni L. Heugel, Alexander Eichler, R. Chitra, and Oded Zilberberg

(参考訳) 離散時間結晶(dtc)は、動力学的に作用する力よりも遅い多体状態である。時代が2倍の古典的なシステムにも当てはまる。したがって、この問題は古典と量子 DTC の区別が自然に生じる。ここでは、Bose-Hubbardモデルの変種を分析し、多くの物理現象を記述し、古典的および量子的時間結晶的極限を持つ。システムの安定性におけるゆらぎの役割を考察し、量子と古典的dtcの区別を見いださない。これにより、古典的雑音を受ける2つの強結合パラメトリック共振器を用いて実験におけるゆらぎを調べることができる。

Discrete time crystals (DTCs) are a many-body state of matter whose dynamics are slower than the forces acting on it. The same is true for classical systems with period-doubling bifurcations. Hence, the question naturally arises what differentiates classical from quantum DTCs. Here, we analyze a variant of the Bose-Hubbard model, which describes a plethora of physical phenomena and has both a classical and a quantum time-crystalline limit. We study the role of fluctuations on the stability of the system and find no distinction between quantum and classical DTCs. This allows us to probe the fluctuations in an experiment using two strongly coupled parametric resonators subject to classical noise.

翻訳日:2023-03-27 18:48:32 公開日:2023-03-24

# クロスモーダル・ディエンタングルメントによるフォトリアリスティックな仮想ヒトの合成

Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement ( http://arxiv.org/abs/2209.01320v2 )

ライセンス: Link先を確認

Siddarth Ravichandran, Ond\v{r}ej Texler, Dimitar Dinev, Hyun Jae Kang

(参考訳) 過去数十年にわたって、AmazonのAlexaやAppleのSiriといったデジタルアシスタントの登場から、Metaブランドの最新のメタバース活動に至るまで、人間の生活の多くの側面が仮想ドメインで強化されてきた。これらの傾向は、人間を写実的に描写することの重要性を強調する。これは近年、いわゆるディープフェイクとトーキーヘッド生成手法の急速な成長につながっている。その印象的な結果と人気にもかかわらず、通常はテクスチャの品質、唇の同期、解像度といった定性的側面や、リアルタイムに走る能力といった実用的側面を欠いている。仮想人間のアバターを実用的なシナリオで使用できるようにするために,高性能な仮想顔合成のためのエンドツーエンドフレームワークを提案する。本稿では,ビセムを中間音声表現として利用する新たなネットワークと,大域的な頭部運動を制御するために使用される異なるモーダルのばらつきを解消する階層的画像合成手法を用いた新しいデータ拡張戦略を提案する。提案手法はリアルタイムに動作し,現在の最先端技術と比較して優れた結果が得られる。

Over the last few decades, many aspects of human life have been enhanced with virtual domains, from the advent of digital assistants such as Amazon's Alexa and Apple's Siri to the latest metaverse efforts of the rebranded Meta. These trends underscore the importance of generating photorealistic visual depictions of humans. This has led to the rapid growth of so-called deepfake and talking-head generation methods in recent years. Despite their impressive results and popularity, they usually lack certain qualitative aspects such as texture quality, lips synchronization, or resolution, and practical aspects such as the ability to run in real-time. To allow for virtual human avatars to be used in practical scenarios, we propose an end-to-end framework for synthesizing high-quality virtual human faces capable of speaking with accurate lip motion with a special emphasis on performance. We introduce a novel network utilizing visemes as an intermediate audio representation and a novel data augmentation strategy employing a hierarchical image synthesis approach that allows disentanglement of the different modalities used to control the global head motion. Our method runs in real-time, and is able to deliver superior results compared to the current state-of-the-art.

翻訳日:2023-03-27 18:42:44 公開日:2023-03-24

# 変圧器を用いた物体検出装置における多機能化に向けて

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors ( http://arxiv.org/abs/2208.11356v2 )

ライセンス: Link先を確認

Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, Shijian Lu

(参考訳) マルチスケールの機能はオブジェクト検出に非常に効果的であることが証明されているが、特に最近のTransformerベースの検出器では、大きな計算コストが伴う。本稿では,Transformerベースのオブジェクト検出器において,マルチスケール特徴の効率的な利用を可能にする汎用パラダイムとして,Iterative Multi-scale Feature Aggregation (IMFA)を提案する。中心となるアイデアは、いくつかの重要な場所からスパースなマルチスケール機能を活用し、2つの斬新なデザインで達成することだ。まず、IMFAはTransformerエンコーダ-デコーダパイプラインを再構成し、検出予測に基づいてコード化された特徴を反復的に更新する。第2に、IMFAは事前検出予測のガイダンスに基づき、わずか数箇所のキーポイント位置からの精密検出のためのスケール適応的特徴をわずかにサンプリングした。その結果、サンプルされたマルチスケール機能は少ないが、オブジェクト検出には非常に有益である。広範囲な実験により、提案されたIMFAは、わずかな計算オーバーヘッドだけで、複数のトランスフォーマーベースの物体検出器の性能を大幅に向上させることを示した。

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.

翻訳日:2023-03-27 18:41:50 公開日:2023-03-24

# LayoutFormer++:制約シリアライゼーションとデコード空間制限による条件付きグラフレイアウト生成

LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction ( http://arxiv.org/abs/2208.08037v2 )

ライセンス: Link先を確認

Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang

(参考訳) ユーザ制約に従ってリアルなレイアウトを生成する条件付きグラフィックレイアウト生成は,まだ十分に研究されていない課題である。まず、多様なユーザー制約を柔軟かつ均一に扱う方法についての議論が限られている。第二に、レイアウトをユーザの制約に適合させるため、既存の作業は生成品質を著しく犠牲にすることが多い。本稿では,上記の問題に対処するためにlayoutformer++を提案する。まず,多様な制約を柔軟に処理するために,ユーザ制約を予め定義されたフォーマットのトークン列として表現する制約シリアライズスキームを提案する。次に,シーケンスからシーケンスへの変換として条件付きレイアウト生成を定式化し,トランスフォーマを基本アーキテクチャとするエンコーダ・デコーダフレームワークを利用する。さらに,品質を損なうことなくレイアウトをユーザの要求に合致させるため,デコード空間制限戦略を提案する。具体的には、ユーザの制約に確実に違反し、低品質なレイアウトをもたらす可能性のあるオプションを無視して、予測された分布を訓練し、制限された分布からモデルサンプルを作成する。実験によると、layoutformer++は、生成品質の向上と制約違反の低減という両面で、すべてのタスクで既存のアプローチを上回っている。

Conditional graphic layout generation, which generates realistic layouts according to user constraints, is a challenging task that has not been well-studied yet. First, there is limited discussion about how to handle diverse user constraints flexibly and uniformly. Second, to make the layouts conform to user constraints, existing work often sacrifices generation quality significantly. In this work, we propose LayoutFormer++ to tackle the above problems. First, to flexibly handle diverse constraints, we propose a constraint serialization scheme, which represents different user constraints as sequences of tokens with a predefined format. Then, we formulate conditional layout generation as a sequence-to-sequence transformation, and leverage encoder-decoder framework with Transformer as the basic architecture. Furthermore, to make the layout better meet user requirements without harming quality, we propose a decoding space restriction strategy. Specifically, we prune the predicted distribution by ignoring the options that definitely violate user constraints and likely result in low-quality layouts, and make the model samples from the restricted distribution. Experiments demonstrate that LayoutFormer++ outperforms existing approaches on all the tasks in terms of both better generation quality and less constraint violation.

翻訳日:2023-03-27 18:41:32 公開日:2023-03-24

# 言語モデルの「実」予測に対するデータ統計量の因果効果の測定

Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions ( http://arxiv.org/abs/2207.14251v2 )

ライセンス: Link先を確認

Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Sch\"utze, Yoav Goldberg

(参考訳) 大量のトレーニングデータが、最先端のNLPモデルの高性能化の大きな理由の1つである。しかし、トレーニングデータの何がモデルに特定の予測をさせるのか? 私たちは、トレーニングデータが予測にどのように影響するかを、因果フレームワークを通じて記述する言語を提供することで、この質問に答えたいと考えています。重要なのは、フレームワークが高価なモデルを再トレーニングする必要を回避し、観測データのみに基づいて因果効果を推定できることです。事前学習された言語モデル(PLM)から事実知識を抽出する問題に対処し、共起数などの単純なデータ統計に焦点をあて、これらの統計がPLMの予測に影響を及ぼすことを示す。本研究の因果関係は,NLPモデルを理解する上で,データセットの学習の重要性と因果関係の利点を示すものである。

Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. But what exactly in the training data causes a model to make a certain prediction? We seek to answer this question by providing a language for describing how training data influences predictions, through a causal framework. Importantly, our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone. Addressing the problem of extracting factual knowledge from pretrained language models (PLMs), we focus on simple data statistics such as co-occurrence counts and show that these statistics do influence the predictions of PLMs, suggesting that such models rely on shallow heuristics. Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.

翻訳日:2023-03-27 18:41:13 公開日:2023-03-24

# omni3d:野生の3dオブジェクト検出のための大規模ベンチマークとモデル

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild ( http://arxiv.org/abs/2207.10660v2 )

ライセンス: Link先を確認

Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari

(参考訳) 単一の画像から3Dのシーンやオブジェクトを認識することは、ロボット工学やAR/VRにおけるコンピュータビジョンの長年の目標である。 2D認識では、大規模なデータセットとスケーラブルなソリューションが前例のない進歩をもたらした。 3Dでは、既存のベンチマークは小さく、オブジェクトのカテゴリや特定のドメイン(例えば都市運転シーン)に特化している。 2次元認識の成功に動機づけられて,omni3dと呼ばれる大規模ベンチマークを導入することで,3次元物体検出の課題を再検討した。 Omni3Dは既存のデータセットを再利用し、300万以上のインスタンスと98のカテゴリで注釈付けされた234Kイメージを生成する。このようなスケールでの3D検出は、カメラの内在性の変化とシーンやオブジェクトの多様さにより困難である。本稿では,カメラとシーンタイプを統一したアプローチで一般化するCube R-CNNというモデルを提案する。 cube r-cnnは、より大きなomni3dと既存のベンチマークで以前よりも優れています。最後に、Omni3Dは3Dオブジェクト認識のための強力なデータセットであり、シングルデータセットのパフォーマンスを改善し、事前学習によって新しい小さなデータセットでの学習を加速できることを示す。

Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the success of 2D recognition, we revisit the task of 3D object detection by introducing a large benchmark, called Omni3D. Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 98 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types. We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D object recognition and show that it improves single-dataset performance and can accelerate learning on new smaller datasets via pre-training.

翻訳日:2023-03-27 18:40:58 公開日:2023-03-24

# 量子乱流理論に向けて:渦ループの相互作用を伴う単純なモデル

Towards quantum turbulence theory: A simple model with interaction of the vortex loops ( http://arxiv.org/abs/2207.05414v2 )

ライセンス: Link先を確認

Sergei V. Talalov

(参考訳) 本稿では内部構造を持つ量子化された薄い渦輪について検討する。この力学系の量子化スキームは、著者が以前に提案したアプローチに基づいている。エネルギースペクトルと循環スペクトルの両方が計算される。例として、許容循環値の集合がフラクタル構造を持つことを示す。提案されたモデルにより、孤立渦環と相互作用を持つ渦環の系を記述することができる。さらに、量子乱流理論への応用についても論じる。乱流の分配関数の一般表現を提案する。

This paper investigates quantized thin vortex rings with an internal structure. The quantization scheme of this dynamical system is based on an earlier the approach proposed by the author. Both energy spectrum and circulation spectrum are calculated. Examples show that the set of permissible circulation values has a fractal structure. The suggested model allows us to describe the system of isolated vortex rings as well as the vortex rings with interaction. Furthermore, the application to the quantum turbulence theory is discussed. The general expression for the partition function of a turbulent flow is suggested.

翻訳日:2023-03-27 18:40:39 公開日:2023-03-24

# dualafford:デュアルグリッパーオブジェクト操作のための協調視覚支援学習

DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation ( http://arxiv.org/abs/2207.01971v5 )

ライセンス: Link先を確認

Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, Hao Dong

(参考訳) 未来のホームアシストロボットにとって、日々の環境において多様な3Dオブジェクトを理解し、操作することが不可欠である。様々な3D形状で多様な操作タスクを実行できるスケーラブルなシステムの構築に向けて、最近の研究は、入力された3D幾何学上のすべての点を下流のタスク(例えば、プッシュまたはピックアップ)を達成するアクションの可能性でラベル付けする、視覚的な動作可能な可測性を学ぶ有望な結果を提唱し、実証してきた。しかし、これらの研究はシングルグリッパー操作しか研究しなかったが、現実のタスクの多くは協調的に達成するために両手を必要とする。本研究では,デュアルグリッパー操作タスクの協調的余裕を学ぶための新しい学習フレームワークであるdualaffordを提案する。この手法の中核となる設計は、2つのグリップの二次問題を2つの非絡み合った相互接続サブタスクに還元し、効率的な学習を行うことである。大規模なPartNet-MobilityデータセットとShapeNetデータセットを使用して、デュアルグリッパー操作のための4つのベンチマークタスクを設定した。実験により,提案手法の有効性と優越性が3つのベースラインで証明された。

It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.

翻訳日:2023-03-27 18:40:34 公開日:2023-03-24

# EventNeRF: 単一カラーイベントカメラからのニューラル放射場

EventNeRF: Neural Radiance Fields from a Single Colour Event Camera ( http://arxiv.org/abs/2206.11896v3 )

ライセンス: Link先を確認

Viktor Rudnev and Mohamed Elgharib and Christian Theobalt and Vladislav Golyanik

(参考訳) 非同期動作のイベントカメラは、高ダイナミックレンジ、ムラ、低レイテンシ、低データ帯域幅のために多くのアプリケーションを見つける。この分野はここ数年で著しく進歩し、既存のイベントベースの3d再構築アプローチは、シーンの疎点雲を回復した。しかし、このような空間性は、特にコンピュータビジョンやグラフィックスにおいて、これまで十分に対処されていない多くのケースにおいて制限要因である。そこで本研究では,単色イベントストリームのみを入力として3次元一貫性,密度,フォトリアリスティックな新規ビュー合成を提案する。その中核は、カラーイベントチャンネルのオリジナルの解像度を維持しながら、イベントから完全に自己教師された方法で訓練された神経放射場である。次に、レイサンプリング戦略をイベントに合わせて調整し、データ効率の良いトレーニングを可能にします。実験では,RGB空間において前例のない品質で結果が得られた。提案手法は,いくつかの難易度の高い合成シーンと実シーンで定性的かつ数値的に評価し,既存の手法よりもはるかに高密度で視覚的に魅力的であることを示す。また, 高速かつ低照度条件下での挑戦シナリオにおいて, 強靭性を示す。新たに記録されたデータセットとソースコードを公開し、研究フィールドを促進する。

Asynchronously operating event cameras find many applications due to their high dynamic range, vanishingly low motion blur, low latency and low data bandwidth. The field saw remarkable progress during the last few years, and existing event-based 3D reconstruction approaches recover sparse point clouds of the scene. However, such sparsity is a limiting factor in many cases, especially in computer vision and graphics, that has not been addressed satisfactorily so far. Accordingly, this paper proposes the first approach for 3D-consistent, dense and photorealistic novel view synthesis using just a single colour event stream as input. At its core is a neural radiance field trained entirely in a self-supervised manner from events while preserving the original resolution of the colour event channels. Next, our ray sampling strategy is tailored to events and allows for data-efficient training. At test, our method produces results in the RGB space at unprecedented quality. We evaluate our method qualitatively and numerically on several challenging synthetic and real scenes and show that it produces significantly denser and more visually appealing renderings than the existing methods. We also demonstrate robustness in challenging scenarios with fast motion and under low lighting conditions. We release the newly recorded dataset and our source code to facilitate the research field, see https://4dqv.mpi-inf.mpg.de/EventNeRF.

翻訳日:2023-03-27 18:40:11 公開日:2023-03-24

# 自動テスト生成への機械学習の統合:システムマッピングによる研究

The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study ( http://arxiv.org/abs/2206.10210v4 )

ライセンス: Link先を確認

Afonso Fontes and Gregory Gay

(参考訳) コンテキスト: 機械学習(ML)は効果的な自動テスト生成を可能にする。目的:我々は、新しい研究、テストプラクティス、研究者の目標、適用されたML技術、評価、課題を特徴づけます。方法: 102の出版物のサンプルに対して体系的なマッピングを行う。結果:MLはシステム,GUI,ユニット,パフォーマンス,組合せテストの入力を生成したり,既存の生成メソッドのパフォーマンスを向上する。 MLはまた、テストの検証、プロパティベース、期待される出力オラクルを生成するためにも使用される。監視された学習(ニューラルネットワークと強化学習をベースとすることが多い)は一般的であり、一部の出版物では教師なしあるいは半教師なしの学習も採用されている。 (Semi-/Un-) 従来のテストメトリクスとML関連のメトリクス(例えば精度)の両方を用いて改善されたアプローチを評価する一方、強化学習は報酬関数に関連するテストメトリクスを用いてしばしば評価される。結論: Work-to-dateは素晴らしい将来性を示していますが、トレーニングデータ、リトレーニング、スケーラビリティ、評価の複雑さ、採用するMLアルゴリズム、ベンチマーク、複製性に関するオープンな課題があります。私たちの発見は、この分野の研究者にとってロードマップとインスピレーションとなり得る。

Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.

翻訳日:2023-03-27 18:39:48 公開日:2023-03-24

# 変分オートエンコーダと1級支持ベクトルマシンによる構造損傷の半教師あり検出

Semi-supervised detection of structural damage using Variational Autoencoder and a One-Class Support Vector Machine ( http://arxiv.org/abs/2210.05674v3 )

ライセンス: Link先を確認

Andrea Pollastro, Giusiana Testa, Antonio Bilotta, Roberto Prevete

(参考訳) 近年,構造的ヘルスモニタリング(shm)システムにおいて,ニューラルネットワーク(anns)が導入されている。データ駆動アプローチによる半教師付き手法では、損傷のない構造条件から取得したデータに基づいてANNがトレーニングし、構造的損傷を検出する。標準的なアプローチでは、トレーニング段階の後、決定ルールを手動で定義し、異常なデータを検出する。しかし、このプロセスは、ハイパーパラメータ最適化技術を用いて性能を最大化する機械学習手法を用いて自動で行うことができる。本稿では,構造異常を検出するためのデータ駆動アプローチによる半教師付き手法を提案する。方法論は以下の通りである。 (i)無傷データ分布を近似する変分オートエンコーダ(vae)と (ii)vae信号再構成から抽出した損傷に敏感な特徴を用いて異なる健康状態を判別する一級支援ベクターマシン(oc-svm)。 IASC-ASCE 構造健康モニタリングタスクグループによって9つの損傷シナリオで試験されたスケール鋼構造物に適用した。

In recent years, Artificial Neural Networks (ANNs) have been introduced in Structural Health Monitoring (SHM) systems. A semi-supervised method with a data-driven approach allows the ANN training on data acquired from an undamaged structural condition to detect structural damages. In standard approaches, after the training stage, a decision rule is manually defined to detect anomalous data. However, this process could be made automatic using machine learning methods, whom performances are maximised using hyperparameter optimization techniques. The paper proposes a semi-supervised method with a data-driven approach to detect structural anomalies. The methodology consists of: (i) a Variational Autoencoder (VAE) to approximate undamaged data distribution and (ii) a One-Class Support Vector Machine (OC-SVM) to discriminate different health conditions using damage sensitive features extracted from VAE's signal reconstruction. The method is applied to a scale steel structure that was tested in nine damage's scenarios by IASC-ASCE Structural Health Monitoring Task Group.

翻訳日:2023-03-27 18:32:44 公開日:2023-03-24

# 量子6-および19-頂点モデルからのFredkinとMotzkinの結合鎖

Coupled Fredkin and Motzkin chains from quantum six- and nineteen-vertex models ( http://arxiv.org/abs/2210.03038v3 )

ライセンス: Link先を確認

Zhao Zhang, Israel Klich

(参考訳) 我々は、フレドキンとモツキンのスピン鎖の領域法則違反モデルを2次元に一般化し、相関相互作用を持つ量子6頂点モデルと19頂点モデルを構築する。ハミルトニアンはフラストレーションが無く、そのプロジェクタは非負な高さ構成の部分空間内でエルゴード力学を生成する。基底状態は、バルクの非負の高さと境界のゼロの高さを持つ古典的な二色頂点配置の体積および色重み付き重ね合わせである。サブシステム間の絡み合いエントロピーは、$q$-deformationパラメータがチューニングされるにつれて位相遷移を持ち、自由度に作用する外部フィールドの存在下ではロバストであることが示されている。基底状態は、線形系サイズ $l$ の関数 $l\log l$ として絡み合いエントロピーがスケールする臨界点を持つ領域および体積則エンタングルメント位相の間の量子位相遷移を受ける。 L^2$ と $L^2$ の間の中間電力法則スケーリングは、熱力学限界の異なる速度で 1 に近づく不均一な変形パラメータによって達成できる。 q>1$相の場合、スペクトルギャップ上の上限を$q^{-L^3/8}$に設定する変動波動関数を構築する。

We generalize the area-law violating models of Fredkin and Motzkin spin chains into two dimensions by building quantum six- and nineteen-vertex models with correlated interactions. The Hamiltonian is frustration free, and its projectors generate ergodic dynamics within the subspace of height configuration that are non negative. The ground state is a volume- and color-weighted superposition of classical bi-color vertex configurations with non-negative heights in the bulk and zero height on the boundary. The entanglement entropy between subsystems has a phase transition as the $q$-deformation parameter is tuned, which is shown to be robust in the presence of an external field acting on the color degree of freedom. The ground state undergoes a quantum phase transition between area- and volume-law entanglement phases with a critical point where entanglement entropy scales as a function $L\log L$ of the linear system size $L$. Intermediate power law scalings between $L\log L$ and $L^2$ can be achieved with an inhomogeneous deformation parameter that approaches 1 at different rates in the thermodynamic limit. For the $q>1$ phase, we construct a variational wave function that establishes an upper bound on the spectral gap that scales as $q^{-L^3/8}$.

翻訳日:2023-03-27 18:32:28 公開日:2023-03-24

# Inverse Unscented Kalman Filter による対逆学習

Counter-Adversarial Learning with Inverse Unscented Kalman Filter ( http://arxiv.org/abs/2210.00359v2 )

ライセンス: Link先を確認

Himali Singh, Kumar Vijay Mishra and Arpan Chattopadhyay

(参考訳) 対戦相手システムでは、知的敵エージェントの戦略を推測するために、防御エージェントは、相手が後者について集めた情報を認知的に知覚する必要がある。この問題の先行研究は、線形ガウス状態空間モデルを採用し、逆確率フィルタを設計することで、この逆認知問題を解決する。しかし、実際には対向系は一般的に高度に非線形である。本稿では,逆認識を非線形ガウス状態空間モデルとして定式化することにより,このシナリオに対処する。敵のディフェンダーの推定値を推定するために、逆ukf(iukf)システムを提案し、開発する。次に、平均二乗有界性感覚におけるIUKFの確率安定性に関する理論的保証を導出する。複数の実用的応用に対する数値実験により、iukfの推定誤差が収束し、再帰的 cram\'{e}r-rao 下界に密接に従うことが示されている。

In counter-adversarial systems, to infer the strategy of an intelligent adversarial agent, the defender agent needs to cognitively sense the information that the adversary has gathered about the latter. Prior works on the problem employ linear Gaussian state-space models and solve this inverse cognition problem by designing inverse stochastic filters. However, in practice, counter-adversarial systems are generally highly nonlinear. In this paper, we address this scenario by formulating inverse cognition as a nonlinear Gaussian state-space model, wherein the adversary employs an unscented Kalman filter (UKF) to estimate the defender's state with reduced linearization errors. To estimate the adversary's estimate of the defender, we propose and develop an inverse UKF (IUKF) system. We then derive theoretical guarantees for the stochastic stability of IUKF in the mean-squared boundedness sense. Numerical experiments for multiple practical applications show that the estimation error of IUKF converges and closely follows the recursive Cram\'{e}r-Rao lower bound.

翻訳日:2023-03-27 18:31:33 公開日:2023-03-24

# 断熱量子回路におけるノイズ・ディテールトレードオフの探索

Navigating the noise-depth tradeoff in adiabatic quantum circuits ( http://arxiv.org/abs/2209.11245v2 )

ライセンス: Link先を確認

Daniel Azses, Maxime Dupont, Bram Evert, Matthew J. Reagor, Emanuele G. Dalla Torre

(参考訳) 断熱量子アルゴリズムは、所望の解に自明な状態をゆっくりと発展させることで計算問題を解決する。理想的な量子コンピュータでは、解の質は回路深さの増加とともに単調に向上する。対照的に、現在のノイズの多いコンピュータの深さの増加はより多くのノイズをもたらし、最終的には計算上の優位性を損なう。最善のソリューションを提供する最適な回路深度は何か? ここでは、1次元量子イジングモデルの常磁性と強磁性の基底状態の間を補間する断熱回路を調査してこの問題に対処する。我々は、回路深さ$N$と雑音強度$\sigma$の関数として、欠陥密度$d$によって最終的な出力の品質を特徴づける。 d$ は単純形式 $d_\mathrm{ideal}+d_\mathrm{noise}$ でよく記述されており、理想的な場合 $d_\mathrm{ideal}\sim N^{-1/2}$ は Kibble-Zurek 機構によって制御され、ノイズコントリビューションは $d_\mathrm{noise}\sim N\sigma^2$ となる。欠陥の数を最小化する最適なステップ数は$\sim\sigma^{-4/3}$となる。このアルゴリズムを雑音超伝導量子プロセッサに実装し,回路の深さに対する欠陥密度の依存性が予測される非単調な挙動に従い,ノイズシミュレーションとよく一致することを示す。我々の研究により、量子デバイスを効率的にベンチマークし、その効果的なノイズ強度を抽出できる。

Adiabatic quantum algorithms solve computational problems by slowly evolving a trivial state to the desired solution. On an ideal quantum computer, the solution quality improves monotonically with increasing circuit depth. By contrast, increasing the depth in current noisy computers introduces more noise and eventually deteriorates any computational advantage. What is the optimal circuit depth that provides the best solution? Here, we address this question by investigating an adiabatic circuit that interpolates between the paramagnetic and ferromagnetic ground states of the one-dimensional quantum Ising model. We characterize the quality of the final output by the density of defects $d$, as a function of the circuit depth $N$ and noise strength $\sigma$. We find that $d$ is well-described by the simple form $d_\mathrm{ideal}+d_\mathrm{noise}$, where the ideal case $d_\mathrm{ideal}\sim N^{-1/2}$ is controlled by the Kibble-Zurek mechanism, and the noise contribution scales as $d_\mathrm{noise}\sim N\sigma^2$. It follows that the optimal number of steps minimizing the number of defects goes as $\sim\sigma^{-4/3}$. We implement this algorithm on a noisy superconducting quantum processor and find that the dependence of the density of defects on the circuit depth follows the predicted non-monotonous behavior and agrees well with noisy simulations. Our work allows one to efficiently benchmark quantum devices and extract their effective noise strength $\sigma$.

翻訳日:2023-03-27 18:30:56 公開日:2023-03-24

# トラクタブル確率モデルの連続混合

Continuous Mixtures of Tractable Probabilistic Models ( http://arxiv.org/abs/2209.10584v3 )

ライセンス: Link先を確認

Alvaro H.C. Correia, Gennaro Gala, Erik Quaeghebeur, Cassio de Campos, Robert Peharz

(参考訳) 変分オートエンコーダのような連続的潜在空間に基づく確率的モデルは、コンポーネントが潜在コードに依存する可算混合モデルとして理解することができる。これらは生成的および確率的モデリングの表現的ツールであることが証明されているが、確率分布を表す辺数や条件を計算できる確率的推論とは相反している。一方、確率的回路(pcs)のような扱いやすい確率的モデルは、階層的離散混合モデルとして理解することができ、正確な推論を効率的に行うことができるが、連続的潜在空間モデルと比較してサブパー性能を示すことが多い。本稿では,少ない潜在次元のトラクタブルモデルの連続混合というハイブリッドアプローチについて検討する。これらのモデルは解析的に難解であるが、有限の積分点集合に基づく数値積分スキームによく対応できる。十分な数の統合ポイントがあれば、近似は正確にデファクトになる。さらに、有限の積分点に対して、積分法は、連続混合を標準PCに効率的にコンパイルする。実験では、PCが多くの標準密度推定ベンチマーク上で、トラクタブルモデルのための新しい技術の状態を設定できるので、この単純なスキームは極めて効果的であることを示す。

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.

翻訳日:2023-03-27 18:30:25 公開日:2023-03-24

# スマートCCTVカメラとセマンティックセグメンテーションを用いたCNNによる知的街路管理

CNN based Intelligent Streetlight Management Using Smart CCTV Camera and Semantic Segmentation ( http://arxiv.org/abs/2209.08633v2 )

ライセンス: Link先を確認

Md Sakib Ullah Sourav, Huidong Wang, Mohammad Raziuddin Chowdhury, Rejwan Bin Sulaiman

(参考訳) 最も無視されたエネルギー損失の源の1つは街灯であり、不要な地域ではあまりにも多くの光を発生させる。エネルギー廃棄物は経済と環境に大きな影響を及ぼす。また、通常の手動運転のため、昼は街灯が点灯し、夜はOFが点灯することがしばしば見られるが、これは21世紀になっても残念である。これらの問題は解決するために自動街灯制御を必要とする。本研究の目的は,コンピュータビジョン技術を利用したスマートトランスポート監視システムと閉回路テレビ(CCTV)カメラを組み合わせることで,歩行者や車両の存在を検知し,CCTVビデオストリーミングからのセマンティックイメージセグメンテーションを用いて,街灯を不要にすることで,発光ダイオード(LED)の街灯が適切な明るさで自動的に照らされるようにすることにある。その結果、昼と夜を区別し、街灯のオン/オフを自動化して省エネを図ることが可能となった。前述のアプローチによると、位置情報センサーデータは、よりインフォームドな街灯管理の決定に利用することができる。タスクを完了させるために、ResNet-34をバックボーンとしてU-netモデルをトレーニングすることを検討する。モデルの有効性は評価行列の使用によって保証される。提案された概念は、従来の代替案よりも単純で、経済的、エネルギー効率、長期的、弾力性が高い。

One of the most neglected sources of energy loss is streetlights which generate too much light in areas where it is not required. Energy waste has enormous economic and environmental effects. In addition, due to the conventional manual nature of the operation, streetlights are frequently seen being turned ON during the day and OFF in the evening, which is regrettable even in the twenty-first century. These issues require automated streetlight control in order to be resolved. This study aims to develop a novel streetlight controlling method by combining a smart transport monitoring system powered by computer vision technology with a closed circuit television (CCTV) camera that allows the light-emitting diode (LED) streetlight to automatically light up with the appropriate brightness by detecting the presence of pedestrians or vehicles and dimming the streetlight in their absence using semantic image segmentation from the CCTV video streaming. Consequently, our model distinguishes daylight and nighttime, which made it feasible to automate the process of turning the streetlight 'ON' and 'OFF' to save energy consumption costs. According to the aforementioned approach, geolocation sensor data could be utilized to make more informed streetlight management decisions. To complete the tasks, we consider training the U-net model with ResNet-34 as its backbone. The validity of the models is guaranteed with the use of assessment matrices. The suggested concept is straightforward, economical, energy-efficient, long-lasting, and more resilient than conventional alternatives.

翻訳日:2023-03-27 18:30:07 公開日:2023-03-24

# OCTET: オブジェクト指向の対実的説明

OCTET: Object-aware Counterfactual Explanations ( http://arxiv.org/abs/2211.12380v2 )

ライセンス: Link先を確認

Mehdi Zemni, Micka\"el Chen, \'Eloi Zablocki, H\'edi Ben-Younes, Patrick P\'erez, Matthieu Cord

(参考訳) 近年、ディープビジョンモデルは、例えば自律運転のような安全クリティカルなアプリケーションに広くデプロイされ、そのようなモデルの説明可能性への懸念が高まっている。説明方法のうち、反事実説明は、説明すべきモデルの出力を変更する入力画像の最小かつ解釈可能な変更を見つけることを目的としている。このような説明は、エンドユーザーがモデルの決定に影響を及ぼす主要な要因を指し示している。しかし、従来の手法では、例えば都市シーンのような、多くのオブジェクトで訓練された画像上の決定モデルを説明するのに苦労していた。本稿では,反事実的説明生成のためのオブジェクト指向フレームワークを用いてこの問題に取り組むことを提案する。近年のジェネレーティブ・モデリングに触発された本手法では,オブジェクトレベルの操作を容易にするために,クエリ画像を潜在空間に符号化する。これにより、エンドユーザーに対して、探索方向(例えば、オブジェクトの空間的変位、スタイル変更など)が、デファクトジェネレーション中に探索される制御を提供する。運転シーンの非現実的説明ベンチマークに関する一連の実験を行い,提案手法が,セマンティクスのセグメンテーションモデルなど,分類以外にも適用可能であることを示す。分析を完了させるために,意思決定モデル理解における反事実的説明の有用性を計測するユーザスタディを設計・実施する。コードはhttps://github.com/valeoai/OCTET.comで入手できる。

Nowadays, deep vision models are being widely deployed in safety-critical applications, e.g., autonomous driving, and explainability of such models is becoming a pressing concern. Among explanation methods, counterfactual explanations aim to find minimal and interpretable changes to the input image that would also change the output of the model to be explained. Such explanations point end-users at the main factors that impact the decision of the model. However, previous methods struggle to explain decision models trained on images with many objects, e.g., urban scenes, which are more difficult to work with but also arguably more critical to explain. In this work, we propose to tackle this issue with an object-centric framework for counterfactual explanation generation. Our method, inspired by recent generative modeling works, encodes the query image into a latent space that is structured in a way to ease object-level manipulations. Doing so, it provides the end-user with control over which search directions (e.g., spatial displacement of objects, style modification, etc.) are to be explored during the counterfactual generation. We conduct a set of experiments on counterfactual explanation benchmarks for driving scenes, and we show that our method can be adapted beyond classification, e.g., to explain semantic segmentation models. To complete our analysis, we design and run a user study that measures the usefulness of counterfactual explanations in understanding a decision model. Code is available at https://github.com/valeoai/OCTET.

翻訳日:2023-03-27 18:24:25 公開日:2023-03-24

# SPARF:スパースと雑音場からの神経放射場

SPARF: Neural Radiance Fields from Sparse and Noisy Poses ( http://arxiv.org/abs/2211.11738v2 )

ライセンス: Link先を確認

Prune Truong and Marie-Julie Rakotosaona and Fabian Manhardt and Federico Tombari

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は近年,フォトリアリスティック・ノベルビューを合成するための強力な表現として登場した。印象的なパフォーマンスを示す一方で、高い精度のカメラポーズを備えた高密度のインプットビューの可用性に依存しているため、実際のシナリオでの応用は制限される。本研究ではSPARF(Sparse Pose Adjusting Radiance Field)を導入し,ノイズの多いカメラポーズを付加した広帯域入力画像(以下3以下)の新規ビュー合成の課題に対処する。本手法では,多視点幾何制約を生かしてnerfを学習し,カメラポーズを洗練する。入力ビュー間で抽出された画素マッチングを頼りにすることで、多視点対応の目的は最適化シーンを強制し、カメラのポーズをグローバルかつ幾何学的に正確な解に収束させる。私たちの奥行きの一貫性の喪失は、再構築されたシーンをあらゆる視点から一貫することをさらに促します。われわれのアプローチは、複数の挑戦的なデータセットに基づいてスパースビュー体制における新しい技術状況を設定する。

Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.

翻訳日:2023-03-27 18:24:00 公開日:2023-03-24

# 視覚言語モデルのチューニングのためのタスク残差

Task Residual for Tuning Vision-Language Models ( http://arxiv.org/abs/2211.10277v2 )

ライセンス: Link先を確認

Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, Xinchao Wang

(参考訳) 数十億レベルのデータに事前訓練された大規模視覚言語モデル(VLM)は、一般的な視覚表現と広い視覚概念を学んだ。原則として、VLMの知識構造は、限られたデータで下流タスクに転送される際に適切に継承されるべきである。しかしながら、VLMの既存の効率的な転写学習(ETL)アプローチは、損傷するか、事前知識に過度に偏っている。例えば、即時チューニング(PT)は、事前訓練されたテキストベースの分類器を捨て、新しいものを構築する。そこで本研究では,テキストベース分類器上で直接動作し,事前学習したモデルの事前知識と目標タスクに関する新たな知識を明示的に分離するタスク残差調整(TaskRes)という,VLMの効率的なチューニング手法を提案する。具体的には、TaskResは、元の分類器の重みをVLMから凍結させ、初期独立パラメータのセットを元のパラメータの残余としてチューニングすることで、目標タスクの新しい分類器を取得し、信頼性の高い事前知識保存と柔軟なタスク固有の知識探索を可能にする。提案するtaskresは単純かつ効果的であり、実装に最小限の労力を要しながら、11のベンチマークデータセットで以前のetlメソッド(例えばptとat)を著しく上回っている。私たちのコードはhttps://github.com/geekyutao/taskresで利用可能です。

Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned general visual representations and broad visual concepts. In principle, the well-learned knowledge structure of the VLMs should be inherited appropriately when being transferred to downstream tasks with limited data. However, most existing efficient transfer learning (ETL) approaches for VLMs either damage or are excessively biased towards the prior knowledge, e.g., prompt tuning (PT) discards the pre-trained text-based classifier and builds a new one while adapter-style tuning (AT) fully relies on the pre-trained features. To address this, we propose a new efficient tuning approach for VLMs named Task Residual Tuning (TaskRes), which performs directly on the text-based classifier and explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. Specifically, TaskRes keeps the original classifier weights from the VLMs frozen and obtains a new classifier for the target task by tuning a set of prior-independent parameters as a residual to the original one, which enables reliable prior knowledge preservation and flexible task-specific knowledge exploration. The proposed TaskRes is simple yet effective, which significantly outperforms previous ETL methods (e.g., PT and AT) on 11 benchmark datasets while requiring minimal effort for the implementation. Our code is available at https://github.com/geekyutao/TaskRes.

翻訳日:2023-03-27 18:23:40 公開日:2023-03-24

# ビデオインスタンスセグメンテーションのための一般化フレームワーク

A Generalized Framework for Video Instance Segmentation ( http://arxiv.org/abs/2211.08834v2 )

ライセンス: Link先を確認

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

(参考訳) 近年,ビデオインスタンスセグメンテーション(VIS)コミュニティにおいて,複雑なシーケンスと隠蔽シーケンスによる長いビデオの処理が新たな課題として浮上している。しかし、既存の手法はこの課題に対処するのに限界がある。現在のアプローチの最大のボトルネックは、トレーニングと推論の相違にある、と私たちは主張する。このギャップを効果的に埋めるために、複雑なアーキテクチャを設計したり、余分な後処理を必要とせずに、挑戦的なベンチマークで最先端のパフォーマンスを実現する、VISの汎用フレームワーク、すなわちGenVISを提案する。 GenVISの重要なコントリビューションは、新しいターゲットラベル割り当てによるシーケンシャルラーニングのためのクエリベースのトレーニングパイプラインを含む、学習戦略である。さらに,従来の状態から情報を効果的に取得するメモリを導入する。異なるフレームやクリップ間の関係を構築することに焦点を当てた新しい視点のおかげで、GenVISはオンラインと半オンラインの両方で柔軟に実行できる。提案手法は,YouTube-VIS 2019/2021/2022とOccluded VIS (OVIS) で最先端の結果を得られる。特に、ロングVISベンチマーク(OVIS)の最先端性能を大きく上回り、ResNet-50のバックボーンで5.6 APを改善した。コードはhttps://github.com/miranheo/GenVIS.comで入手できる。

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.

翻訳日:2023-03-27 18:23:15 公開日:2023-03-24

# FAPM: リアルタイム産業異常検出のための高速適応パッチメモリ

FAPM: Fast Adaptive Patch Memory for Real-time Industrial Anomaly Detection ( http://arxiv.org/abs/2211.07381v2 )

ライセンス: Link先を確認

Donghyeong Kim, Chaewon Park, Suhwan Cho and Sangyoun Lee

(参考訳) 特徴埋め込みに基づく手法は, 対象画像の特徴と正常画像とを比較することで, 産業異常の検出において, 例外的な性能を示した。しかし,いくつかの手法は実世界のアプリケーションにとって重要なリアルタイム推論の速度要件を満たしていない。そこで本研究では,リアルタイム産業的異常検出のための高速適応パッチメモリ(fast adaptive patch memory, fapm)という新しい手法を提案する。 FAPMはパッチワイドとレイヤワイドのメモリバンクを使用して,イメージの埋め込み機能をそれぞれパッチレベルとレイヤレベルに格納する。また,より高速かつ正確な検出のためのパッチアダプティブコアセットサンプリングを提案する。 FAPMは、他の最先端手法と比較して精度と速度の両方で良好に機能する

Feature embedding-based methods have shown exceptional performance in detecting industrial anomalies by comparing features of target images with normal images. However, some methods do not meet the speed requirements of real-time inference, which is crucial for real-world applications. To address this issue, we propose a new method called Fast Adaptive Patch Memory (FAPM) for real-time industrial anomaly detection. FAPM utilizes patch-wise and layer-wise memory banks that store the embedding features of images at the patch and layer level, respectively, which eliminates unnecessary repetitive computations. We also propose patch-wise adaptive coreset sampling for faster and more accurate detection. FAPM performs well in both accuracy and speed compared to other state-of-the-art methods

翻訳日:2023-03-27 18:22:53 公開日:2023-03-24

# particlenerf: online neural radiance fieldsのための粒子ベースのエンコーディング

ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields ( http://arxiv.org/abs/2211.04041v4 )

ライセンス: Link先を確認

Jad Abou-Chakra, Feras Dayoub, Niko S\"underhauf

(参考訳) 動的シーンに対する既存のNeural Radiance Fields(NeRF)は、視覚的忠実度を重視したオフライン手法であるが、本稿は、リアルタイム適応性を優先するオンラインユースケースに対処する。我々は200ミリ秒毎に最新の表現をオンラインで学習することで、シーン形状の変化に動的に適応する新しいアプローチであるParticleNeRFを提案する。 ParticleNeRFは、新しい粒子ベースのパラメトリック符号化を用いてこれを実現する。我々は,空間内の粒子に特徴を結合し,光計測による再構成損失を粒子の位置勾配にバックプロパゲートし,速度ベクトルとして解釈する。衝突に対処するための軽量な物理システムによって守られ、地形の変化とともに自由に動きます。本研究では, 翻訳, 回転, 調音, 変形可能な物体を含む様々な動的シーンでParticleNeRFを実演する。 ParticleNeRFは初めてのオンライン動的NeRFであり、ブルートフォースオンラインInstantNGPや他のオンライン制約のある動的シーンに対するベースラインアプローチよりも優れた視覚的忠実度で高速な適応性を実現する。私たちのシステムのビデオは、プロジェクトのWebサイトhttps://sites.google.com/view/ Particlenerf.comで見ることができる。

While existing Neural Radiance Fields (NeRFs) for dynamic scenes are offline methods with an emphasis on visual fidelity, our paper addresses the online use case that prioritises real-time adaptability. We present ParticleNeRF, a new approach that dynamically adapts to changes in the scene geometry by learning an up-to-date representation online, every 200ms. ParticleNeRF achieves this using a novel particle-based parametric encoding. We couple features to particles in space and backpropagate the photometric reconstruction loss into the particles' position gradients, which are then interpreted as velocity vectors. Governed by a lightweight physics system to handle collisions, this lets the features move freely with the changing scene geometry. We demonstrate ParticleNeRF on various dynamic scenes containing translating, rotating, articulated, and deformable objects. ParticleNeRF is the first online dynamic NeRF and achieves fast adaptability with better visual fidelity than brute-force online InstantNGP and other baseline approaches on dynamic scenes with online constraints. Videos of our system can be found at our project website https://sites.google.com/view/particlenerf.

翻訳日:2023-03-27 18:22:42 公開日:2023-03-24

# 言葉なし学習によるオープン語彙テキスト・トゥ・モーション生成

Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training ( http://arxiv.org/abs/2210.15929v3 )

ライセンス: Link先を確認

Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen

(参考訳) テキストから動きへの生成は、入力テキストと同じ意味で動きを合成することを目的とした、新しくて困難な問題である。しかしながら、多種多様なラベル付きトレーニングデータがないため、ほとんどのアプローチは特定のタイプのテキストアノテーションに制限するか、効率と安定性の犠牲で推論中のテキストに対応するためにオンライン最適化を必要とする。本稿では,ゼロショット学習方式でオフラインのオープン語彙テキスト・トゥ・モーション生成を検証し,ペアトレーニングデータや,見当たらないテキストに適応するための追加のオンライン最適化を必要としない。 NLPの即時学習にインスパイアされ、マスクされた動きから全動作を再構築する動き生成装置を事前訓練する。推論中,動作生成装置を変更する代わりに,動作生成装置が動作を「再構成」するプロンプトとして入力テキストをマスクされた動作に再構成する。プロンプトを構築する際、プロンプトの未マストポーズをテキスト対ポス発生器で合成する。テキスト対ポーズ生成器の最適化を監督するために,テキストと3dポーズのアライメントを測定するための最初のテキスト対ポーズアライメントモデルを提案する。また、ポーズ生成器が限られたトレーニングテキストに過度に適合することを防止するため、トレーニングテキストを必要とせず、テキスト対ポーズ生成器を最適化する新しいワードレストレーニング機構を提案する。総合実験の結果,本手法はベースライン法に対して有意な改善が得られた。コードはhttps://github.com/junfanlin/oohmgで入手できる。

Text-to-motion generation is an emerging and challenging problem, which aims to synthesize motion with the same semantics as the input text. However, due to the lack of diverse labeled training data, most approaches either limit to specific types of text annotations or require online optimizations to cater to the texts during inference at the cost of efficiency and stability. In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner that neither requires paired training data nor extra online optimization to adapt for unseen texts. Inspired by the prompt learning in NLP, we pretrain a motion generator that learns to reconstruct the full motion from the masked motion. During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion. In constructing the prompt, the unmasked poses of the prompt are synthesized by a text-to-pose generator. To supervise the optimization of the text-to-pose generator, we propose the first text-pose alignment model for measuring the alignment between texts and 3D poses. And to prevent the pose generator from overfitting to limited training texts, we further propose a novel wordless training mechanism that optimizes the text-to-pose generator without any training texts. The comprehensive experimental results show that our method obtains a significant improvement against the baseline methods. The code is available at https://github.com/junfanlin/oohmg.

翻訳日:2023-03-27 18:22:22 公開日:2023-03-24

# 破壊的ニューラルスケーリング法則

Broken Neural Scaling Laws ( http://arxiv.org/abs/2210.14891v9 )

ライセンス: Link先を確認

Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

(参考訳) We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). 神経スケーリング行動の他の機能形式と比較すると、この関数形式は、この集合においてかなり正確なスケーリング行動の外挿をもたらす。さらに、この関数形式は、二重降下のような現象のスケーリング挙動に存在する非単調遷移や、算術のようなタスクのスケーリング挙動に存在する遅延した鋭いインフレクション点(しばしば「創発的な位相遷移」と呼ばれる)など、他の関数形式が表現できないスケーリング挙動を正確にモデル化し、外挿する。最後に、この関数形式を使用して、スケーリング動作の予測可能性の限界に関する洞察を得ます。コードはhttps://github.com/ethancaballero/broken_neural_scaling_lawsで入手できる。

We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points (often called "emergent phase transitions") present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

翻訳日:2023-03-27 18:21:35 公開日:2023-03-24

# 困難層領域の少ない強調:特異な対流拡散反応問題に対するカリキュラム学習

Less Emphasis on Difficult Layer Regions: Curriculum Learning for Singularly Perturbed Convection-Diffusion-Reaction Problems ( http://arxiv.org/abs/2210.12685v2 )

ライセンス: Link先を確認

Yufeng Wang, Cong Xu, Min Yang, Jin Zhang

(参考訳) 物理情報ニューラルネットワーク(PINN)は、様々な科学・工学分野に応用されているが、わずかに困難な対流拡散反応問題において、基礎となる解を正確に予測できない。本稿では,この障害の原因をドメイン分布の観点から検討し,マルチスケールフィールドの学習を同時に行うことで,ネットワークがトレーニングを前進させることができず,ローカル・ミニマで簡単に立ち往生することを明らかにする。高損失層領域でより多くのコロケーションポイントをサンプリングした経験が、最適化に役立たず、結果が悪化する可能性も示唆した。これらの知見は、ニューラルネットワークがより容易な非層領域での学習を優先し、より難しい層領域での学習を軽視する新しいカリキュラム学習手法の開発を動機付けている。提案手法は,学習強調を自動的に調整し,最適化作業を容易にする。典型的なベンチマーク式における数値的な結果から,提案したカリキュラム学習手法はPINNの故障モードを緩和し,極めて鋭い境界層と内部層に対して正確な結果が得られることが示された。本研究は,大規模に異なる解を持つ方程式に対して,高損失領域に注意を払わないことが,それらを正確に学習するための効果的な戦略であることを示す。

Although Physics-Informed Neural Networks (PINNs) have been successfully applied in a wide variety of science and engineering fields, they can fail to accurately predict the underlying solution in slightly challenging convection-diffusion-reaction problems. In this paper, we investigate the reason of this failure from a domain distribution perspective, and identify that learning multi-scale fields simultaneously makes the network unable to advance its training and easily get stuck in poor local minima. We show that the widespread experience of sampling more collocation points in high-loss layer regions hardly help optimize and may even worsen the results. These findings motivate the development of a novel curriculum learning method that encourages neural networks to prioritize learning on easier non-layer regions while downplaying learning on harder layer regions. The proposed method helps PINNs automatically adjust the learning emphasis and thereby facilitate the optimization procedure. Numerical results on typical benchmark equations show that the proposed curriculum learning approach mitigates the failure modes of PINNs and can produce accurate results for very sharp boundary and interior layers. Our work reveals that for equations whose solutions have large scale differences, paying less attention to high-loss regions can be an effective strategy for learning them accurately.

翻訳日:2023-03-27 18:21:11 公開日:2023-03-24

# 畳み込み・集約・注意に基づく深層ニューラルネットワークによる力学シミュレーションの高速化

Convolution, aggregation and attention based deep neural networks for accelerating simulations in mechanics ( http://arxiv.org/abs/2212.01386v2 )

ライセンス: Link先を確認

Saurabh Deshpande, Ra\'ul I. Sosa, St\'ephane P.A. Bordas, Jakub Lengiewicz

(参考訳) ディープラーニングサロゲートモデルは、コストのかかる従来の数値手法の代替として、科学シミュレーションの加速にますます利用されている。しかし、実世界の複雑な例を扱う場合、それらの使用は依然として大きな課題である。本研究では,固体の非線形変形を効率的に学習するための3種類のニューラルネットワークアーキテクチャを示す。最初の2つのアーキテクチャは、最近提案されたCNN U-NETとMagNET(グラフ U-NET)フレームワークに基づいている。第3のアーキテクチャであるPerceiver IOは、注目に基づくニューラルネットワークのファミリーに属する、非常に最近のアーキテクチャである。 3つのネットワークの性能を2つのベンチマーク例で比較し,ソフトボディの非線形機械的応答を正確に予測する能力を示した。

Deep learning surrogate models are being increasingly used in accelerating scientific simulations as a replacement for costly conventional numerical techniques. However, their use remains a significant challenge when dealing with real-world complex examples. In this work, we demonstrate three types of neural network architectures for efficient learning of highly non-linear deformations of solid bodies. The first two architectures are based on the recently proposed CNN U-NET and MAgNET (graph U-NET) frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks--a class that has revolutionised diverse engineering fields and is still unexplored in computational mechanics. We study and compare the performance of all three networks on two benchmark examples, and show their capabilities to accurately predict the non-linear mechanical responses of soft bodies.

翻訳日:2023-03-27 18:15:05 公開日:2023-03-24

# MIC:コンテキスト拡張ドメイン適応のためのマスク付き画像整合性

MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation ( http://arxiv.org/abs/2212.01322v2 )

ライセンス: Link先を確認

Lukas Hoyer, Dengxin Dai, Haoran Wang, Luc Van Gool

(参考訳) unsupervised domain adaptation(uda)では、ソースデータ(例えばsynthetic)に基づいてトレーニングされたモデルは、ターゲットのアノテーションにアクセスせずにターゲットデータ(例えば実世界)に適応される。従来のUDA手法は、視覚的外観が類似したクラスと競合することが多いが、外観の違いを学習するための基礎的な真実は存在しない。この問題に対処するために、ターゲット領域の空間的コンテキスト関係を頑健な視覚認識のための追加の手がかりとして学習することにより、UDAを強化するMasked Image Consistency (MIC)モジュールを提案する。 MICは、ランダムパッチが保持されないマスクされたターゲット画像の予測と、指数移動平均教師による完全な画像に基づいて生成された擬似ラベルとの一貫性を強制する。一貫性損失を最小限に抑えるために、ネットワークは、そのコンテキストからマスキングされた領域の予測を推測することを学ぶ必要がある。シンプルで普遍的な概念のため、MICは画像分類、セマンティックセグメンテーション、オブジェクト検出など、さまざまな視覚認識タスクにまたがる様々なUDAメソッドに統合することができる。 MICは、合成からリアルタイム、日夜、クリア・ツー・リバース・ウェザーUDAの様々な認識タスクにおいて、最先端の性能を著しく向上させる。例えば、MICは、GTA-to-Cityscapes と VisDA-2017 の75.9 mIoU と92.8%という前例のない UDA のパフォーマンスを達成した。実装はhttps://github.com/lhoyer/micで利用可能である。

In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.

翻訳日:2023-03-27 18:14:50 公開日:2023-03-24

# MHCCL:多変量時系列のための階層型クラスタワイズコントラスト学習

MHCCL: Masked Hierarchical Cluster-Wise Contrastive Learning for Multivariate Time Series ( http://arxiv.org/abs/2212.01141v3 )

ライセンス: Link先を確認

Qianwen Meng, Hangwei Qian, Yong Liu, Lizhen Cui, Yonghui Xu, Zhiqi Shen

(参考訳) 未ラベルの時系列データから意味豊かな表現を学習することは、分類や予測といった下流のタスクに不可欠である。対照的な学習は、最近、専門家のアノテーションがない場合に有望な表現学習能力を示している。しかし、既存の対照的なアプローチは一般的に各インスタンスを独立に扱い、同じ意味論を共有する偽の負のペアを生み出す。この問題に対処するために,多変量時系列の複数の潜在パーティションからなる階層構造から得られた意味情報を利用する,マスケッド階層クラスタ単位のコントラスト学習モデルであるMHCCLを提案する。細粒度クラスタリングが高純度を維持しつつ、粗粒度が高レベルのセマンティクスを反映しているという観察に動機づけられ、クラスタリング階層から複数の粒度情報を取り入れることで偽陰性をフィルタリングし、正を補う新しい下方マスキング戦略を提案する。加えて、mhcclで新しい上向きマスキング戦略が設計され、各パーティションのクラスタの異常を取り除き、プロトタイプを洗練し、階層的クラスタリングプロセスを高速化し、クラスタリング品質を向上させる。広帯域多変量時系列データセットの実験的評価を行う。その結果,教師なし時系列表現学習における最先端手法よりもmhcclが優れていることが示された。

Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.

翻訳日:2023-03-27 18:14:12 公開日:2023-03-24

# 希少事象による動的因果発見に向けて:非パラメトリック条件独立試験

Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test ( http://arxiv.org/abs/2211.16596v3 )

ライセンス: Link先を確認

Chih-Yuan Chiu, Kshitij Kulkarni, Shankar Sastry

(参考訳) 稀な事象に関連する因果現象は、危険に敏感な安全分析、事故解析と予防、極端な価値理論など、幅広い工学的問題にまたがる。しかし、因果発見の現在の手法は、変数が最初に低確率の実現を経験したときにのみ現れる、動的環境におけるランダム変数間の因果関係を発見できないことが多い。そこで本研究では, 時間不変力学系から収集されたデータに対して, 稀ではあるが連続的な事象が発生する新しい統計独立性テストを提案する。特に,システム状態の重畳されたデータセットを,異なるタイミングで発生する前に構築するために,基礎となるデータの時間的不変性を利用する。次に、再構成データに基づいて条件付き独立試験を設計する。本手法の一貫性のために非漸近的なサンプル複雑性境界を提供し,caltrans performance measurement system (pems) から収集したインシデントデータを含む様々なシミュレーションおよび実世界のデータセットでその性能を検証する。データセットと実験を含むコードは公開されている。

Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.

翻訳日:2023-03-27 18:13:48 公開日:2023-03-24

# 能率的単一画像超解像のための特徴領域適応型コントラスト蒸留

Feature-domain Adaptive Contrastive Distillation for Efficient Single Image Super-Resolution ( http://arxiv.org/abs/2211.15951v2 )

ライセンス: Link先を確認

HyeonCheol Moon, JinWoo Jeong, SungJei Kim

(参考訳) 近年,CNN ベースの SISR には多くのパラメータがあり,性能向上のための計算コストが高い。ネットワークを効率的にする方法の1つとして、教師の有用な知識を学生に伝達する知識蒸留(KD)が現在研究されている。近年では,教師と生徒のネットワーク間における特徴マップのユークリッド距離損失を最小限に抑えるために特徴蒸留(fd)が用いられているが,ネットワーク容量制約により教師の知識を効果的かつ有意義に提供し,生徒のパフォーマンスを向上させる方法を十分に検討していない。本稿では,軽量なSISRネットワークを効率的に訓練するための特徴領域適応型コントラスト蒸留(FACD)手法を提案する。本稿では, ユークリッド距離損失を用いた既存のfd手法の限界を示し, 生徒ネットワークが特徴領域における教師の表現からよりリッチな情報を学習させる特徴領域コントラスト損失を提案する。また, トレーニングパッチの条件に応じて選択的に蒸留を施す適応蒸留法を提案する。実験結果から,提案方式による学生EDSRとRCANネットワークは,ベンチマークデータセット全体のPSNR性能だけでなく,従来のFD手法と比較して主観的画質も向上することが示された。

Recently, CNN-based SISR has numerous parameters and high computational cost to achieve better performance, limiting its applicability to resource-constrained devices such as mobile. As one of the methods to make the network efficient, Knowledge Distillation (KD), which transfers teacher's useful knowledge to student, is currently being studied. More recently, KD for SISR utilizes Feature Distillation (FD) to minimize the Euclidean distance loss of feature maps between teacher and student networks, but it does not sufficiently consider how to effectively and meaningfully deliver knowledge from teacher to improve the student performance at given network capacity constraints. In this paper, we propose a feature-domain adaptive contrastive distillation (FACD) method for efficiently training lightweight student SISR networks. We show the limitations of the existing FD methods using Euclidean distance loss, and propose a feature-domain contrastive loss that makes a student network learn richer information from the teacher's representation in the feature domain. In addition, we propose an adaptive distillation that selectively applies distillation depending on the conditions of the training patches. The experimental results show that the student EDSR and RCAN networks with the proposed FACD scheme improves not only the PSNR performance of the entire benchmark datasets and scales, but also the subjective image quality compared to the conventional FD approaches.

翻訳日:2023-03-27 18:13:29 公開日:2023-03-24

# 創発的言語の語彙エントロピーを数学的にモデル化する

Mathematically Modeling the Lexicon Entropy of Emergent Language ( http://arxiv.org/abs/2211.15783v2 )

ライセンス: Link先を確認

Brendon Boldt, David Mortensen

(参考訳) 深層学習に基づく創発言語システムにおける語彙エントロピーの数学的モデルとして確率過程FiLexを定式化する。モデルを数学的に定義することで、直接かつ決定的にテスト可能な明確な予測を生成することができる。本研究は,FiLexがハイパーパラメータ(トレーニングステップ,レキシコンサイズ,学習速度,ロールアウトバッファサイズ,Gumbel-Softmax温度)と,20の環境-ハイパーパラメータの組み合わせのうち20の創発言語エントロピーの正確な相関を予測できる4つの環境を実証的に検証した。さらに, 実験により, 異なる環境が過度パラメータとエントロピーの関係を多様に示し, 精度の高い粒度の予測を行うモデルの必要性が示された。

We formulate a stochastic process, FiLex, as a mathematical model of lexicon entropy in deep learning-based emergent language systems. Defining a model mathematically allows it to generate clear predictions which can be directly and decisively tested. We empirically verify across four different environments that FiLex predicts the correct correlation between hyperparameters (training steps, lexicon size, learning rate, rollout buffer size, and Gumbel-Softmax temperature) and the emergent language's entropy in 20 out of 20 environment-hyperparameter combinations. Furthermore, our experiments reveal that different environments show diverse relationships between their hyperparameters and entropy which demonstrates the need for a model which can make well-defined predictions at a precise level of granularity.

翻訳日:2023-03-27 18:13:04 公開日:2023-03-24

# ディープニューラルネットワークにおけるフォアリング説明

Foiling Explanations in Deep Neural Networks ( http://arxiv.org/abs/2211.14860v2 )

ライセンス: Link先を確認

Snir Vitrack Tamam, Raz Lapid, Moshe Sipper

(参考訳) ディープニューラルネットワーク(DNN)は、過去10年間に多くの分野に大きな影響を与えてきた。しかし、多くの問題に対して優れたパフォーマンスを示すにもかかわらず、ブラックボックスの性質は説明可能性に関して依然として大きな課題となっている。実際、説明可能な人工知能(XAI)はいくつかの分野で重要である。本稿では、画像ベースDNNにおける説明手法の厄介な性質を明らかにする: 入力画像に小さな視覚的変化を加えることで、ネットワークの出力に影響を与えることがほとんどなく、進化戦略を用いて、どのように説明が任意に操作されるかを実証する。我々の新しいアルゴリズムであるAttaXAIは、XAIアルゴリズムに対するモデルに依存しない、敵対的な攻撃であり、分類器の出力ロジットと説明マップへのアクセスしか必要としない。ベンチマークデータセットであるcifar100とimagenetのパフォーマンスを,vgg16-cifar100,vgg16-imagenet,mobilenet-cifar100,inception-v3-imagenetの4つの異なるディープラーニングモデルを用いて比較した。 XAI法は勾配やモデル内部を使わずに操作できることがわかった。我々の新しいアルゴリズムは、XAI法が特定の説明図を出力するように、人間の目では認識できない方法で画像を操作できる。我々の知る限り、これはブラックボックス設定における最初の方法であり、説明責任が望まれ、必要であり、法的に義務付けられている重要な価値があると考えている。

Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone -- sans a reasoning of how said answer was derived -- is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image -- hardly influencing the network's output -- we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.

翻訳日:2023-03-27 18:12:48 公開日:2023-03-24

# RUST:未提示画像からの潜在神経シーン表現

RUST: Latent Neural Scene Representations from Unposed Imagery ( http://arxiv.org/abs/2211.14306v2 )

ライセンス: Link先を確認

Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lucic, Klaus Greff

(参考訳) 2次元の観察から3dシーンの構造を推測することは、コンピュータビジョンにおける根本的な課題である。近年,ニューラルシーン表現に基づくアプローチが広く普及し,様々なアプリケーションに適用されている。この領域で残っている大きな課題の1つは、1つのシーンを超えて効果的に一般化する潜在表現を提供する単一のモデルを訓練することである。 SRT(Scene Representation Transformer)はこの方向を約束しているが、より広い範囲の多様なシーンにスケールすることは困難であり、正確な地上真実データを必要とする。この問題に対処するために,RGB画像だけで訓練された新規ビュー合成のためのポーズレスアプローチであるRUST(Really Unposed Scene representation Transformer)を提案する。我々の主な洞察は、ターゲット画像を覗き見し、デコーダがビュー合成に使用する潜伏ポーズの埋め込みを学習するPose Encoderを訓練できるということです。我々は,学習された潜在ポーズ構造について経験的調査を行い,有意義なテスト時間カメラ変換と正確なポーズ読み出しを可能にすることを示す。おそらく意外なことに、RUSTは完璧なカメラポーズにアクセスできる方法と同じような品質を実現し、それによって、償却されたニューラルシーン表現の大規模トレーニングの可能性を解き放ちます。

Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent pose embedding which is used by the decoder for view synthesis. We perform an empirical investigation into the learned latent pose structure and show that it allows meaningful test-time camera transformations and accurate explicit pose readouts. Perhaps surprisingly, RUST achieves similar quality as methods which have access to perfect camera pose, thereby unlocking the potential for large-scale training of amortized neural scene representations.

翻訳日:2023-03-27 18:12:19 公開日:2023-03-24

# ニューラルレンダリングによる教師なし連続意味適応

Unsupervised Continual Semantic Adaptation through Neural Rendering ( http://arxiv.org/abs/2211.13969v2 )

ライセンス: Link先を確認

Zhizheng Liu, Francesco Milano, Jonas Frey, Roland Siegwart, Hermann Blum, Cesar Cadena

(参考訳) アプリケーションの増加は、シーンのシーケンスにわたって知覚タスクにデプロイされるデータ駆動モデルに依存している。トレーニングデータとデプロイメントデータのミスマッチのため、新しいシーンでモデルを適用することは、しばしば優れたパフォーマンスを得るために重要である。本研究では,セマンティクスセグメンテーションのタスクに対して,セマンティクスセグメンテーションを行うための連続的マルチシーン適応について検討する。セグメンテーションモデルの予測を融合させ,ビュー一貫性のあるセマンティックラベルを擬似ラベルとして使用することにより,シーン毎にセマンティック・NeRFネットワークをトレーニングする。セグメンテーションモデルとのジョイントトレーニングにより,セマンティック・ニューラルフモデルにより2次元3次元の知識伝達が可能となる。さらに、サイズが小さく、長期記憶に保存でき、その後、任意の視点からデータをレンダリングして忘れることを減らすことができる。我々は,Voxelベースのベースラインと最先端の教師なしドメイン適応手法の両方より優れているScanNetに対するアプローチを評価する。

An increasing amount of applications rely on data-driven models that are deployed for perception tasks across a sequence of scenes. Due to the mismatch between training and deployment data, adapting the model on the new scenes is often crucial to obtain good performance. In this work, we study continual multi-scene adaptation for the task of semantic segmentation, assuming that no ground-truth labels are available during deployment and that performance on the previous scenes should be maintained. We propose training a Semantic-NeRF network for each scene by fusing the predictions of a segmentation model and then using the view-consistent rendered semantic labels as pseudo-labels to adapt the model. Through joint training with the segmentation model, the Semantic-NeRF model effectively enables 2D-3D knowledge transfer. Furthermore, due to its compact size, it can be stored in a long-term memory and subsequently used to render data from arbitrary viewpoints to reduce forgetting. We evaluate our approach on ScanNet, where we outperform both a voxel-based baseline and a state-of-the-art unsupervised domain adaptation method.

翻訳日:2023-03-27 18:11:57 公開日:2023-03-24

# FFHQ-UV:3次元顔再構成のための正常顔面UVテクスチャデータセット

FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction ( http://arxiv.org/abs/2211.13874v2 )

ライセンス: Link先を確認

Haoran Bai, Di Kang, Haoxian Zhang, Jinshan Pan, Linchao Bao

(参考訳) 本稿では,5万以上の高品質なテクスチャuvマップと,照度,中性表現,清浄された顔領域を含む大規模顔用uvテクスチャデータセットを提案する。データセットはFFHQという大規模な顔画像データセットから派生したもので、完全に自動で堅牢なUVテクスチャ生産パイプラインの助けを借りています。我々のパイプラインは、最近のStyleGANベースの顔画像編集手法を利用して、画像入力から多視点正規化顔画像を生成する。次に、精巧なUVテクスチャ抽出、補正、完了手順を適用し、正規化顔画像から高品質なUVマップを生成する。既存のuvテキストデータセットと比較して、データセットはより多様で高品質なテクスチャマップを持っています。さらに,パラメトリックフィッティングに基づく3次元顔再構成のための非線形テクスチャベースとしてganベースのテクスチャデコーダを訓練する。実験の結果,本手法は最先端の手法よりも再構成精度が向上し,さらに,現実的なレンダリングが可能な高品質なテクスチャマップが得られた。データセット、コード、トレーニング済みテクスチャデコーダはhttps://github.com/csbhr/FFHQ-UVで公開されている。

We present a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. The dataset is derived from a large-scale face image dataset namely FFHQ, with the help of our fully automatic and robust UV-texture production pipeline. Our pipeline utilizes the recent advances in StyleGAN-based facial image editing approaches to generate multi-view normalized face images from single-image inputs. An elaborated UV-texture extraction, correction, and completion procedure is then applied to produce high-quality UV-maps from the normalized face images. Compared with existing UV-texture datasets, our dataset has more diverse and higher-quality texture maps. We further train a GAN-based texture decoder as the nonlinear texture basis for parametric fitting based 3D face reconstruction. Experiments show that our method improves the reconstruction accuracy over state-of-the-art approaches, and more importantly, produces high-quality texture maps that are ready for realistic renderings. The dataset, code, and pre-trained texture decoder are publicly available at https://github.com/csbhr/FFHQ-UV.

翻訳日:2023-03-27 18:11:25 公開日:2023-03-24

# Flow-Lenia: 大量保存とパラメータ局在による細胞オートマトンのオープンエンド進化に向けて

Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization ( http://arxiv.org/abs/2212.07906v2 )

ライセンス: Link先を確認

Erwan Plantec, Gautier Hamon, Mayalen Etcheverry, Pierre-Yves Oudeyer, Cl\'ement Moulin-Frier and Bert Wang-Chak Chan

(参考訳) 仮想生物のオープンエンド進化のような生命に似た現象を生み出す複雑な自己組織化システムの設計は、人工生命の主要な目標の1つである。コンウェイの生命を連続空間、時間、状態に一般化するセル・オートマトン(ca)のファミリーであるレニアは、それが生成できる幅広い自己組織化パターンのために多くの注目を集めている。その中でも、空間的局所化パターン(slp)は生命のような人工生物に似ており、複雑な行動を示す。しかし、これらの生物は、レニアパラメータ空間の小さな部分空間にのみ存在し、発見し、先進的な探索アルゴリズムを必要とする。さらに、これらの生物は特定の更新規則によって制御された世界でのみ存在し、したがって同一の世界では相互作用できない。本稿ではこれらの問題を解決するために,フローレニアと呼ばれるレニアの大量保存的拡張を提案する。本稿では,複雑な動作を伴うSLPの生成の有効性を示す実験を行い,関心を示すSLPを生成するために更新ルールパラメータを最適化できることを示す。最後に、フローレニアはCAのダイナミックス内でCAの更新ルールのパラメータの統合を可能にし、動的かつ局所化し、複数種のシミュレーションを可能にし、出現する生物の性質を定義する局所的コヒーレントな更新ルールを持ち、近隣の規則と混同できることを示す。これは連続casにおける自己組織型人工生命の形態の本質的進化への道を開くと論じている。

The design of complex self-organising systems producing life-like phenomena, such as the open-ended evolution of virtual creatures, is one of the main goals of artificial life. Lenia, a family of cellular automata (CA) generalizing Conway's Game of Life to continuous space, time and states, has attracted a lot of attention because of the wide diversity of self-organizing patterns it can generate. Among those, some spatially localized patterns (SLPs) resemble life-like artificial creatures and display complex behaviors. However, those creatures are found in only a small subspace of the Lenia parameter space and are not trivial to discover, necessitating advanced search algorithms. Furthermore, each of these creatures exist only in worlds governed by specific update rules and thus cannot interact in the same one. This paper proposes as mass-conservative extension of Lenia, called Flow Lenia, that solve both of these issues. We present experiments demonstrating its effectiveness in generating SLPs with complex behaviors and show that the update rule parameters can be optimized to generate SLPs showing behaviors of interest. Finally, we show that Flow Lenia enables the integration of the parameters of the CA update rules within the CA dynamics, making them dynamic and localized, allowing for multi-species simulations, with locally coherent update rules that define properties of the emerging creatures, and that can be mixed with neighbouring rules. We argue that this paves the way for the intrinsic evolution of self-organized artificial life forms within continuous CAs.

翻訳日:2023-03-27 18:05:15 公開日:2023-03-24

# 制御可能なアバターの再構成のための構造的3次元特徴

Structured 3D Features for Reconstructing Controllable Avatars ( http://arxiv.org/abs/2212.06820v2 )

ライセンス: Link先を確認

Enric Corona, Mihai Zanfir, Thiemo Alldieck, Eduard Gabriel Bazavan, Andrei Zanfir, Cristian Sminchisescu

(参考訳) パラメトリックな統計的メッシュ表面からサンプリングされた高密度な3次元点に画素整列画像特徴をプールする,新しい暗黙の3次元表現に基づくモデルであるStructured 3D Featuresを紹介する。 3Dポイントは関連する意味を持ち、3D空間で自由に移動することができる。これにより、身体の形状だけでなく、興味のある人物の最適なカバーが可能になり、さらにアクセサリー、髪、ゆるい衣服のモデリングにも役立ちます。そこで本研究では,アルベドと照明分解を併用したアニマタブルな3次元再構成を,一方のエンド・ツー・エンドモデル,訓練された半教師付きセミプロセッサ,追加のポストプロセッシングを伴わない,完全な3次元トランスフォーマーベースのアテンション・フレームワークを提案する。本研究では,S3Fモデルがモノクロ3D再構成やアルベド,シェーディング推定など,これまでの課題を超越していることを示す。さらに,提案手法では,新しい視点合成,リライト,再構成が可能であり,複数の入力画像(例えば,人物の異なる視点,あるいは同じ視点を異なるポーズで,映像内で)を自然に処理できるように拡張できることを示す。最後に,3次元仮想トライオンアプリケーションのためのモデルの編集機能を示す。

We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.

翻訳日:2023-03-27 18:04:47 公開日:2023-03-24

# 構造化知識強化によるオープンワールドストーリー生成:包括的調査

Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey ( http://arxiv.org/abs/2212.04634v2 )

ライセンス: Link先を確認

Yuxin Wang, Jieru Lin, Zhiwei Yu, Wei Hu, B\"orje F. Karlsson

(参考訳) ストーリーテリングと物語は人間体験の基本であり、社会と文化の関わりに絡み合っている。そのため、研究者は長い間、物語を自動生成できるシステムを作ろうとしてきた。近年,ディープラーニングと大量のデータリソースを活用して,自動ストーリ生成が大きな進歩を見せている。しかし、生成したストーリーのグローバルコヒーレンスの必要性など、かなりの課題は、生成モデルが人間のナレーターと同じストーリーテリング能力に達することを妨げている。これらの課題に取り組むために、多くの研究は構造的知識を生成プロセスに注入し、構造的知識強化ストーリー生成(structured knowledge-enhanced story generation)と呼ばれる。外部知識の導入は、ストーリーイベント間の論理的一貫性を高め、より良い知識基盤化を達成し、ストーリーにおける過剰な一般化と反復問題を緩和することができる。この調査は、この研究分野の最新かつ包括的なレビューを提供する。 (i)既存の手法がいかに構造化された知識をストーリー生成に組み込むかに関する体系的分類法を提示する。 (二)ストーリーコーパス、構造化知識データセット、評価指標をまとめる。 (3)知識強化ストーリー生成の課題を多次元的に把握し,将来的な研究の方向性に光を当てる。

Storytelling and narrative are fundamental to human experience, intertwined with our social and cultural engagement. As such, researchers have long attempted to create systems that can generate stories automatically. In recent years, powered by deep learning and massive data resources, automatic story generation has shown significant advances. However, considerable challenges, like the need for global coherence in generated stories, still hamper generative models from reaching the same storytelling ability as human narrators. To tackle these challenges, many studies seek to inject structured knowledge into the generation process, which is referred to as structured knowledge-enhanced story generation. Incorporating external knowledge can enhance the logical coherence among story events, achieve better knowledge grounding, and alleviate over-generalization and repetition problems in stories. This survey provides the latest and comprehensive review of this research field: (i) we present a systematical taxonomy regarding how existing methods integrate structured knowledge into story generation; (ii) we summarize involved story corpora, structured knowledge datasets, and evaluation metrics; (iii) we give multidimensional insights into the challenges of knowledge-enhanced story generation and cast light on promising directions for future study.

翻訳日:2023-03-27 18:04:07 公開日:2023-03-24

# 静止空間における運動拡散によるコマンドの実行

Executing your Commands via Motion Diffusion in Latent Space ( http://arxiv.org/abs/2212.04048v2 )

ライセンス: Link先を確認

Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu

(参考訳) 本稿では,アクションクラスやテキスト記述子など,様々な条件入力に応じて人間の動作シーケンスを生成する課題である条件付きヒューマンモーション生成について検討する。人間の動きは多様であり、自然言語におけるテキスト記述子のような条件付きモダリティとは全く異なる性質を持つため、所望の条件付きモダリティから人間の動き列への確率的マッピングを学ぶことは困難である。さらに、モーションキャプチャシステムからの生のモーションデータはシーケンスが冗長でノイズも含んでいる可能性があり、生のモーションシーケンスと条件付きモダリティのジョイント分布を直接モデル化するには、重い計算オーバーヘッドが必要となり、キャプチャされたノイズによって引き起こされるアーティファクトを発生させる可能性がある。人間の動作シーケンスをよりよく表現するために、我々はまず強力な変分オートエンコーダ(VAE)を設計し、人間の動作シーケンスを代表的で低次元の遅延コードに到達する。次に, 動き列と条件入力との接続を確立するために拡散モデルを用いる代わりに, 動き潜在空間上で拡散過程を行う。提案した動作遅延に基づく拡散モデル(MLD)は、与えられた条件入力に対応する鮮明な動き列を生成し、トレーニングおよび推論段階の計算オーバーヘッドを大幅に低減する。様々な人体運動生成タスクに対する広範囲な実験により、我々のMLDは、広範囲な人体運動生成タスクにおける最先端の手法よりも大幅に改善され、原動列上の従来の拡散モデルよりも2桁高速であることが示された。

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.

翻訳日:2023-03-27 18:03:47 公開日:2023-03-24

# 逆強化学習における誤特定

Misspecification in Inverse Reinforcement Learning ( http://arxiv.org/abs/2212.03201v2 )

ライセンス: Link先を確認

Joar Skalse, Alessandro Abate

(参考訳) 逆強化学習(IRL)の目的は、ポリシー$\pi$から報酬関数$R$を推論することである。これを行うには、$\pi$と$R$の関係のモデルが必要です。現在の文献では、最も一般的なモデルは最適性、ボルツマン合理性、因果エントロピー最大化である。 IRLの主な動機の1つは、人間の行動から人間の嗜好を推測することである。しかしながら、人間の嗜好と人間の行動の関係は、現在IRLで使われているどのモデルよりもはるかに複雑である。これは、それらが誤って特定され、現実世界のデータに適用された場合、不適切な推測につながる恐れが生じることを意味する。本稿では,異なるirlモデルが不特定化に対していかに頑健であるかを数学的に解析し,そのモデルが報酬関数 $r$ に関する誤った推論につながる前に,各標準モデルとデモストラクタポリシーがどのように異なるかを正確に答える。また、IRLの誤特定を推論するためのフレームワークと、新しいIRLモデルの誤特定堅牢性を容易に導き出すためのフォーマルなツールも導入する。

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. To do this, we need a model of how $\pi$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.

翻訳日:2023-03-27 18:03:18 公開日:2023-03-24

# State Space Closure: 強化学習による無限のオンラインレベル生成の再考

State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning ( http://arxiv.org/abs/2212.02951v2 )

ライセンス: Link先を確認

Ziqi Wang, Tianye Shu, Jialin Liu

(参考訳) 本稿では,最近提案されている強化学習(edrl)フレームワークによる経験駆動プロシージャコンテンツ生成を用いて,エンドレスオンラインレベル生成を再考する。 EDRLは繰り返しパターンを生成する傾向にあるという観察から着想を得た状態空間閉包の概念を定式化し、無限水平オンライン生成プロセスにおいて、任意の確率状態が有限水平内で見られるようにした。理論的解析により、状態空間の閉包が多様性に関する懸念を生じても、コンテンツ品質の劣化を伴わずに有限水平で訓練されたEDRLを無限水平シナリオに一般化する。さらに,広範に使用されているSuper Mario Bros.ベンチマークを用いて,EDRLが生成するコンテンツの品質と多様性を実証研究により検証した。実験結果から,EDRLが生成するレベルの多様性は状態空間の閉鎖によって制限されるが,その品質はトレーニングで指定されたものよりも長い水平線では劣化しないことがわかった。結果と分析をまとめると、強化学習による無限のオンラインレベル生成に関する今後の取り組みは、状態空間の閉鎖と品質の発生を保証しながら多様性の問題に対処すべきである。

In this paper, we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework. Inspired by an observation that EDRL tends to generate recurrent patterns, we formulate a notion of state space closure which makes any stochastic state appeared possibly in an infinite-horizon online generation process can be found within a finite-horizon. Through theoretical analysis, we find that even though state space closure arises a concern about diversity, it generalises EDRL trained with a finite-horizon to the infinite-horizon scenario without deterioration of content quality. Moreover, we verify the quality and the diversity of contents generated by EDRL via empirical studies, on the widely used Super Mario Bros. benchmark. Experimental results reveal that the diversity of levels generated by EDRL is limited due to the state space closure, whereas their quality does not deteriorate in a horizon which is longer than the one specified in the training. Concluding our outcomes and analysis, future work on endless online level generation via reinforcement learning should address the issue of diversity while assuring the occurrence of state space closure and quality.

翻訳日:2023-03-27 18:03:00 公開日:2023-03-24

# イメージが画像で話す: 文脈内ビジュアル学習のためのジェネラリスト・ペインティング

Images Speak in Images: A Generalist Painter for In-Context Visual Learning ( http://arxiv.org/abs/2212.02499v2 )

ライセンス: Link先を確認

Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang

(参考訳) インコンテキスト学習は、NLPの新しいパラダイムとして、少数のプロンプトと例だけで、モデルが様々なタスクに迅速に適応できるようにする。しかし、コンピュータビジョンでは、文脈内学習の難しさは、タスクが出力表現で大きく異なるため、ビジョンモデルがドメイン外のタスクを理解し、転送できる汎用的なタスクプロンプトをどのように定義すればよいかは明らかではない。本稿では,コアビジョンタスクの出力をイメージとして再定義する"イメージ"中心のソリューションを用いて,これらの障害に対処するジェネラリストモデルであるpaintを提案し,タスクプロンプトをイメージとして指定する。この考え方では、トレーニングプロセスは非常にシンプルで、入力と出力のイメージペアを縫い合わせることで、標準的なマスク画像モデリングを実行します。これにより、モデルは可視像パッチで条件付きタスクを実行することができる。したがって、推論中に入力条件と同じタスクから一対の入出力画像を適用でき、どのタスクを実行するかを示すことができる。ベルやホイッスルがなければ,高レベルの視覚的理解から低レベルの画像処理に至るまでの7つの視覚的タスクにおいて,精確に確立されたタスク固有モデルと比較して,競争性能が向上する。加えて、paintはいくつかの困難なタスクで最近のジェネラリストモデルを大きく上回っている。

In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. But in computer vision, the difficulties for in-context learning lie in that tasks vary significantly in the output representations, thus it is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks. In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images. With this idea, our training process is extremely simple, which performs standard masked image modeling on the stitch of input and output image pairs. This makes the model capable of performing tasks conditioned on visible image patches. Thus, during inference, we can adopt a pair of input and output images from the same task as the input condition, to indicate which task to perform. Without bells and whistles, our generalist Painter can achieve competitive performance compared to well-established task-specific models, on seven representative vision tasks ranging from high-level visual understanding to low-level image processing. In addition, Painter significantly outperforms recent generalist models on several challenging tasks.

翻訳日:2023-03-27 18:02:38 公開日:2023-03-24

# 単一カメラからのシーン認識型3次元マルチヒューマンモーションキャプチャ

Scene-Aware 3D Multi-Human Motion Capture from a Single Camera ( http://arxiv.org/abs/2301.05175v2 )

ライセンス: Link先を確認

Diogo Luvizon, Marc Habermann, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt

(参考訳) 本研究では,静的カメラで記録された1枚のRGBビデオから,シーン内の複数の人間の3次元位置を推定する問題と,その身体形状と調音性について考察する。高価なマーカーベースやマルチビューシステムとは対照的に、当社の軽量なセットアップは、インストールが容易で専門家の知識を必要としない安価な3dモーションキャプチャを可能にするため、プライベートユーザにとって理想的です。この困難な状況に対処するため,我々は,2次元身体関節,関節角度,正規化格差マップ,ヒトセグメンテーションマスクなど,様々な形態の大規模事前学習モデルを用いて,コンピュータビジョンの最近の進歩を活用している。そこで,本稿では,人間の絶対3次元位置,関節的なポーズ,個々の形状,シーンのスケールについて共同で解く,非線形最適化に基づく最初のアプローチを提案する。特に, 2次元身体関節と関節角度を用いた正規化不等式予測から, シーンの奥行きと人別尺度を推定した。フレームあたりのシーン深度を考慮し、3次元空間の静的シーンの点雲を再構成する。最後に、人間のフレーム当たりの3D推定値とシーンポイントクラウドを考慮し、時間的、空間的、物理的妥当性を確保するために、ビデオ上で時空間コヒーレントな最適化を行う。本手法は,従来手法を一貫して上回る多人数3次元ポーズベンチマークを用いて評価し,異なる大きさの人物による挑戦シーンを含む実環境条件にロバストな手法であることを定性的に証明した。

In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the absolute 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person unique scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-the-wild conditions including challenging scenes with people of different sizes.

翻訳日:2023-03-27 17:56:06 公開日:2023-03-24

# frustumformer:マルチビュー3d検出のための適応型インスタンスアウェアリサンプリング

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection ( http://arxiv.org/abs/2301.04467v2 )

ライセンス: Link先を確認

Yuqi Wang, Yuntao Chen, and Zhaoxiang Zhang

(参考訳) 2次元視点空間から3次元空間への特徴の変換は、多視点3次元オブジェクト検出に不可欠である。近年のアプローチでは、視界を3D空間に引き上げる画素ワイジングや、3DプロジェクションによってBEV機能をグリッドワイジングで構築し、すべてのピクセルやグリッドを等しく扱うという視点変換の設計に重点を置いている。しかし、トランスフォーメーションの選択も重要だが、これまで議論されることはめったにない。動く車のピクセルは、空のピクセルよりも情報的です。画像に含まれる情報を十分に活用するためには、ビュー変換はその内容に応じて異なる画像領域に適応できる必要がある。本稿では,アダプティブ・インスタンス・アウェア・リサンプリング(adaptive instance-aware resampling)によってインスタンス領域の機能にさらに注目する,frustumformerという新しいフレームワークを提案する。具体的には、画像ビューオブジェクトの提案を利用して、鳥の視線上のインスタンスフラストレーションを取得する。インスタンスの場所を洗練するために、インスタンスフラスタム内のアダプティブ占有マスクが学習される。さらに、時間的フラストタル交叉は、物体の局在不確実性をさらに減少させる可能性がある。 nuScenesデータセットに関する総合的な実験はFrustumFormerの有効性を示し、ベンチマークで新しい最先端性能を実現する。コードとモデルはhttps://github.com/Robertwyq/Frustum.comで公開される。

The transformation of features from 2D perspective space to 3D space is essential to multi-view 3D object detection. Recent approaches mainly focus on the design of view transformation, either pixel-wisely lifting perspective view features into 3D space with estimated depth or grid-wisely constructing BEV features via 3D projection, treating all pixels or grids equally. However, choosing what to transform is also important but has rarely been discussed before. The pixels of a moving car are more informative than the pixels of the sky. To fully utilize the information contained in images, the view transformation should be able to adapt to different image regions according to their contents. In this paper, we propose a novel framework named FrustumFormer, which pays more attention to the features in instance regions via adaptive instance-aware resampling. Specifically, the model obtains instance frustums on the bird's eye view by leveraging image view object proposals. An adaptive occupancy mask within the instance frustum is learned to refine the instance location. Moreover, the temporal frustum intersection could further reduce the localization uncertainty of objects. Comprehensive experiments on the nuScenes dataset demonstrate the effectiveness of FrustumFormer, and we achieve a new state-of-the-art performance on the benchmark. Codes and models will be made available at https://github.com/Robertwyq/Frustum.

翻訳日:2023-03-27 17:55:41 公開日:2023-03-24

# 二元性ニューロン活性化パターンを用いた分布外サンプルの検出

Detection of out-of-distribution samples using binary neuron activation patterns ( http://arxiv.org/abs/2212.14268v2 )

ライセンス: Link先を確認

Bartlomiej Olber, Krystian Radlak, Adam Popowicz, Michal Szczepankiewicz, Krystian Chachu{\l}a

(参考訳) ディープニューラルネットワーク(DNN)は、様々なアプリケーションで優れた性能を発揮する。研究コミュニティの多くの努力にもかかわらず、アウト・オブ・ディストリビューション(OOD)サンプルはDNN分類器の重要な制限として残っている。未発見の入力を新規に識別する能力は、自動運転車、無人航空機、ロボットといった安全上重要な応用において不可欠である。 OODサンプルを検出するための既存のアプローチでは、DNNをブラックボックスとして扱い、出力予測の信頼性スコアを評価する。残念ながら、DNNはOOD入力に対する信頼を減らすために訓練されていないため、この方法は頻繁に失敗する。本研究では,OOD検出のための新しい手法を提案する。この手法は、ReLUアーキテクチャにおけるニューロン活性化パターン(NAP)の理論解析によって動機づけられる。提案手法では,畳み込み層から抽出した活性化パターンのバイナリ表現による計算オーバーヘッドが高まることはない。広範な実証評価により、様々なDNNアーキテクチャと7つの画像データセットの性能が証明された。

Deep neural networks (DNN) have outstanding performance in various applications. Despite numerous efforts of the research community, out-of-distribution (OOD) samples remain a significant limitation of DNN classifiers. The ability to identify previously unseen inputs as novel is crucial in safety-critical applications such as self-driving cars, unmanned aerial vehicles, and robots. Existing approaches to detect OOD samples treat a DNN as a black box and evaluate the confidence score of the output predictions. Unfortunately, this method frequently fails, because DNNs are not trained to reduce their confidence for OOD inputs. In this work, we introduce a novel method for OOD detection. Our method is motivated by theoretical analysis of neuron activation patterns (NAP) in ReLU-based architectures. The proposed method does not introduce a high computational overhead due to the binary representation of the activation patterns extracted from convolutional layers. The extensive empirical evaluation proves its high performance on various DNN architectures and seven image datasets.

翻訳日:2023-03-27 17:55:19 公開日:2023-03-24

# 作業前を対象とする統一オブジェクトカウントネットワーク

A Unified Object Counting Network with Object Occupation Prior ( http://arxiv.org/abs/2212.14193v2 )

ライセンス: Link先を確認

Shengqin Jiang, Qing Wang, Fengna Cheng, Yuankai Qi, Qingshan Liu

(参考訳) 多数のアプリケーション(例えば、群衆数、トラフィック統計)で基本的な役割を果たすカウントタスクは、さまざまな密度を持つオブジェクトの数を予測することを目的としている。既存のオブジェクトカウントタスクは単一のオブジェクトクラスのために設計されます。しかし、私たちの現実世界で新しいクラスで新しいデータに遭遇するのは避けられない。このシナリオを \textit{evolving object counting} と命名します。本稿では,最初の進化するオブジェクト計数データセットを構築し,この課題に対する最初の試みとして統一オブジェクト計数ネットワークを提案する。提案モデルは,クラスに依存しないマスクモジュールとクラスインクリメンタルモジュールの2つの重要なコンポーネントから構成される。クラス非依存マスクモジュールは、クラス非依存なバイナリマスクを予測して、汎用オブジェクトの占有を事前に学習する(例えば、1は、画像中の考慮位置にあるオブジェクトが存在し、それ以外は0であることを示す)。 class-incrementalモジュールは新しい来るべきクラスを扱うために使われ、密度マップ予測のための判別クラスガイダンスを提供する。クラス非依存マスクモジュールと画像特徴抽出器の組合せ出力を用いて最終密度マップを予測する。新しいクラスが来たら、まずクラスインクリメンタルモジュールの最後の回帰層と分類層に新しいニューラルネットワークを追加します。そして、モデルをスクラッチから再トレーニングするのではなく、モデルが以前のオブジェクトクラスについて既に学んだことを思い出すのに役立つ知識蒸留を利用する。また、各クラスの典型的なトレーニングサンプルを少数のサポートサンプルバンクに格納することで、モデルが古いデータのキー情報を忘れないようにしています。この設計により,大規模再トレーニングを行わずに,既存のデータのパフォーマンスを維持しつつ,新しいクラスに効率的に適応することができる。収集したデータセットに関する広範な実験は、優れたパフォーマンスを示している。

The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-incremental module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-incremental module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of class-incremental module. Then, instead of retraining the model from scratch, we utilize knowledge distillation to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.

翻訳日:2023-03-27 17:55:06 公開日:2023-03-24

# スケーラブルな物理的一貫性のあるニューラルネットワークに向けて:データ駆動型マルチゾーンサーマルビルディングモデルへの応用

Towards Scalable Physically Consistent Neural Networks: an Application to Data-driven Multi-zone Thermal Building Models ( http://arxiv.org/abs/2212.12380v2 )

ライセンス: Link先を確認

Loris Di Natale, Bratislav Svetozarevic, Philipp Heer, and Colin Neil Jones

(参考訳) 収集されるデータが増えるにつれて、データ駆動モデリングの手法が近年人気が高まっている。物理的に健全であるが、古典的なグレーボックスモデルはしばしば識別とスケールが困難であり、その正確さは表現力の制限によって妨げられる可能性がある。一方で、現在ではニューラルネットワーク(nns)に依存する古典的なブラックボックス法は、データから統計的パターンを導出することで、大規模でも印象的なパフォーマンスを達成していることが多い。しかし、それらは基礎となる物理法則に完全に従わないままであり、現実世界の物理システムに対する決定がそれらに基づく場合、破滅的な失敗につながる可能性がある。物理的に一貫性のあるニューラルネットワーク(PCNN)は最近、前述の問題に対処するために開発された。そこで本研究では,PCNNを用いて建築温度動態をモデル化し,従来のグレーボックス法とブラックボックス法とを徹底的に比較する。より正確には、3つの異なるpcnn拡張を設計し、アーキテクチャのモジュラリティと柔軟性を例示し、その物理的一貫性を正式に証明します。実例では,PCNNは最先端の精度を達成でき,制約構造にもかかわらず従来のNNモデルよりも優れていた。さらに、我々の調査は、完全に物理に依存しないまま、NNが優れたパフォーマンスを達成していることを示す明確なイラストを提供している。この性能は計算複雑性のコストがかかるが、pcnnは他の物理的に一貫性のある手法と比較して17-35%の精度向上を示し、最先端の性能を持つスケーラブルな物理的一貫性モデルへの道を開く。

With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.

翻訳日:2023-03-27 17:54:38 公開日:2023-03-24

# 波動関数密度勾配に伴う粒子運動

Particle motion associated with wave function density gradients ( http://arxiv.org/abs/2212.11575v3 )

ライセンス: Link先を確認

Jan Klaers, Violetta Sharoglazova, Chris Toebes

(参考訳) 2つの結合導波管電位の系における大粒子の量子力学的運動について検討し、導波管間の集団移動が時計として効果的に働き、粒子速度を決定できることを示す。反射ステップポテンシャルにおけるエバネッセント現象へのこのスキームの適用は、古典的に禁止された運動に対するエネルギー-速度関係を明らかにする。獲得と損失の領域は、想像上のポテンシャルによって説明され、粒子の運動を加速させる。量子力学的波動関数の位相および密度勾配は粒子の速度を示すのに相補的な役割を果たす。

We study the quantum mechanical motion of massive particles in a system of two coupled waveguide potentials, where the population transfer between the waveguides effectively acts as a clock and allows particle velocities to be determined. Application of this scheme to evanescent phenomena at a reflective step potential reveals an energy-velocity relationship for classically forbidden motion. Regions of gain and loss, as described by imaginary potentials, are shown to speed up the motion of particles. We argue that phase and density gradients in quantum mechanical wave functions play complementary roles in indicating the speed of particles.

翻訳日:2023-03-27 17:54:06 公開日:2023-03-24

# layoutdetr: detection transformerは優れたマルチモーダルレイアウトデザイナである

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer ( http://arxiv.org/abs/2212.09877v3 )

ライセンス: Link先を確認

Ning Yu, Chia-Chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu

(参考訳) グラフィックレイアウト設計は視覚コミュニケーションにおいて重要な役割を果たす。しかし、手作りのレイアウト設計は、スキル要求、時間消費、バッチ生産への非スカラブルである。生成モデルは、デザインの自動化をスケーラブルにするために出現するが、デザイナーのマルチモーダルな願望、すなわち背景画像によって制約され、前景コンテンツによって駆動されるデザインを作成することは、いまだに自明ではない。本研究では,生成モデルから高品質かつ現実性を継承するLayoutDETRを提案するとともに,コンテンツ認識要求を検出問題として再定義し,背景画像から適切な位置,スケール,空間的関係をレイアウトで検出する。当社のソリューションは、パブリックベンチマークと新たに調達したad bannerデータセットで、レイアウト生成のための新たな最先端のパフォーマンスを設定します。ユーザの学習を促進するグラフィカルなシステムにソリューションを統合することで,ユーザがベースラインよりもデザインを好むことを示す。私たちのコード、モデル、データセット、グラフィカルシステム、デモはhttps://github.com/salesforce/LayoutDETRで公開されています。

Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.

翻訳日:2023-03-27 17:53:32 公開日:2023-03-24

# MM拡散:共同音声・ビデオ生成のための多モード拡散モデル学習

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation ( http://arxiv.org/abs/2212.09478v2 )

ライセンス: Link先を確認

Ludan Ruan and Yiyang Ma and Huan Yang and Huiguo He and Bei Liu and Jianlong Fu and Nicholas Jing Yuan and Qin Jin and Baining Guo

(参考訳) 本稿では,高品質なリアルなビデオにエンゲージメントと聴取体験を同時にもたらす,初の共同音声ビデオ生成フレームワークを提案する。音声とビデオの併用ペアを生成するために,二結合脱音オートエンコーダを用いたマルチモーダル拡散モデル(mm-diffusion)を提案する。既存の単一モード拡散モデルとは対照的に、MM拡散は設計による共同記述プロセスのための逐次多モードU-Netで構成されている。音声とビデオの2つのサブネットは、ガウス雑音から徐々にアライメントされたオーディオビデオペアを生成する。モダリティ間の意味的一貫性を確保するために,2つのサブネット上にランダムシフトに基づくアテンションブロックを橋渡しし,効率的なクロスモーダルアライメントを実現することにより,相互に音声・映像の忠実度を高めることを提案する。広汎な実験は、無条件のオーディオビデオ生成やゼロショット条件タスク(例えば、ビデオからオーディオ)において優れた結果を示す。特にランドスケープとAIST++のダンスデータセットで最高のFVDとFADを実現する。 10k票のチューリングテストは、我々のモデルに支配的な選好を示す。コードと事前訓練されたモデルはhttps://github.com/researchmm/MM-Diffusion.comでダウンロードできる。

We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion.

翻訳日:2023-03-27 17:53:14 公開日:2023-03-24

# 大規模言語モデルにおける創発的類推

Emergent Analogical Reasoning in Large Language Models ( http://arxiv.org/abs/2212.09196v2 )

ライセンス: Link先を確認

Taylor Webb, Keith J. Holyoak, Hongjing Lu

(参考訳) 近年の大規模言語モデルの出現は、十分な訓練データを得た一般的なモデルに人間の認知能力が出現するかどうかという議論を再燃させた。特に興味深いのは、これらのモデルが直接訓練することなく、ゼロショットで新しい問題を推論する能力である。人間の認知では、この能力は類推による推論能力と密接に結びついている。本稿では,ラヴェンのプログレッシブ・マトリクスをモデルとした新しいテキストベースマトリクス推論タスクを含む,様々な類推的タスクについて,人間推論者と大言語モデル(gpt-3のテキストダヴィンチ-003変種)の直接比較を行った。その結果、GPT-3は、多くの設定において、抽象パターン誘導、マッチング、さらには人間の能力を超える、驚くほど強力な能力を示した。以上の結果から, GPT-3のような大規模言語モデルでは, 幅広い類似問題に対するゼロショット解を求める能力が得られている。

The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven's Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.

翻訳日:2023-03-27 17:52:52 公開日:2023-03-24

# User-Centered Design (IX):人工知能時代の"User Experience 3.0"パラダイムフレームワーク

User-Centered Design (IX): A "User Experience 3.0" Paradigm Framework in the Intelligence Era ( http://arxiv.org/abs/2302.06681v6 )

ライセンス: Link先を確認

Wei Xu

(参考訳) ユーザ中心設計」のデザイン哲学に基づくユーザエクスペリエンス(UX)の分野は、インテリジェンスの時代に向かっている。それでも、既存のUXパラダイムは主にインテリジェントでないシステムを対象としており、インテリジェントなシステムに対するUXに対する体系的なアプローチが欠けている。 UXの開発を通じて、UXパラダイムは技術横断時代の進化特性を示している。現在、インテリジェンス時代はUXパラダイムに対する新たな要求を提起している。そこで本稿では,インテリジェンス時代の"UX 3.0"パラダイムフレームワークと,それに対応するUX方法論システムを提案する。 UX 3.0"パラダイムフレームワークには、エコロジーエクスペリエンス、イノベーション対応エクスペリエンス、AI対応エクスペリエンス、ヒューマン-AIインタラクションベースエクスペリエンス、ヒューマン-AIコラボレーションベースのエクスペリエンスメソッドの5つのカテゴリが含まれており、それぞれが対応する複数のUXパラダイム指向を提供する。 UX 3.0"パラダイムの提案は、既存のUXメソッドの改善を支援し、インテリジェントシステム開発におけるUXの研究と応用に対する方法論的なサポートを提供する。最後に、この論文は「UX 3.0」パラダイムの今後の研究と応用を楽しみにしている。

The field of user experience (UX) based on the design philosophy of "user-centered design" is moving towards the intelligence era. Still, the existing UX paradigm mainly aims at non-intelligent systems and lacks a systematic approach to UX for intelligent systems. Throughout the development of UX, the UX paradigm shows the evolution characteristics of the cross-technology era. At present, the intelligence era has put forward new demands on the UX paradigm. For this reason, this paper proposes a "UX 3.0" paradigm framework and the corresponding UX methodology system in the intelligence era. The "UX 3.0" paradigm framework includes five categories of UX methods: ecological experience, innovation-enabled experience, AI-enabled experience, human-AI interaction-based experience, and human-AI collaboration-based experience methods, each providing corresponding multiple UX paradigmatic orientations. The proposal of the "UX 3.0" paradigm helps improve the existing UX methods and provides methodological support for the research and applications of UX in developing intelligent systems. Finally, this paper looks forward to future research and applications of the "UX 3.0" paradigm.

翻訳日:2023-03-27 17:46:58 公開日:2023-03-24

# 多様性が必要である:安定拡散によるモデル非依存なゼロショット分類の改善

Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion ( http://arxiv.org/abs/2302.03298v3 )

ライセンス: Link先を確認

Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookes

(参考訳) 本研究では,実画像を用いずに実画像の分類を行うための非特異的分類アーキテクチャ(ダウンストリームモデル)を訓練することを目的とした,モデル非依存ゼロショット分類(ma-zsc)の問題を検討する。近年の研究では、拡散モデルを用いた合成訓練画像の生成は、ma-zscに対処する潜在的な解決策となることが示されている。しかし、現在のこのアプローチの性能は、大規模なビジョン言語モデルによって達成されるものには及ばない。考えられる説明の1つは、合成画像と実画像の間の潜在的な領域ギャップである。我々の研究は、生成したデータセット内の画像の多様性を改善することにより、MA-ZSCの性能を改善することができるという最初の洞察を提供することで、この問題に対する新たな視点を提供する。我々は,事前学習した拡散モデルを用いてテキストから画像への生成プロセスを改良し,多様性を高めることを提案する。提案手法は,CLIPなどの最先端モデルに匹敵する,様々な分類アーキテクチャにおける顕著な改善を示す。 CIFAR10, CIFAR100, EuroSATの衛星画像領域によるゼロショット分類は特に困難である。我々はResNetとViTを含む5つの分類アーキテクチャでアプローチを評価した。本研究は拡散モデルを用いたma-zsc問題の初期知見を提供する。すべてのコードはGitHubで入手できる。

In this work, we investigate the problem of Model-Agnostic Zero-Shot Classification (MA-ZSC), which refers to training non-specific classification architectures (downstream models) to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. However, the performance of this approach currently falls short of that achieved by large-scale vision-language models. One possible explanation is a potential significant domain gap between synthetic and real images. Our work offers a fresh perspective on the problem by providing initial insights that MA-ZSC performance can be improved by improving the diversity of images in the generated dataset. We propose a set of modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity, which we refer to as our $\textbf{bag of tricks}$. Our approach shows notable improvements in various classification architectures, with results comparable to state-of-the-art models such as CLIP. To validate our approach, we conduct experiments on CIFAR10, CIFAR100, and EuroSAT, which is particularly difficult for zero-shot classification due to its satellite image domain. We evaluate our approach with five classification architectures, including ResNet and ViT. Our findings provide initial insights into the problem of MA-ZSC using diffusion models. All code will be available on GitHub.

翻訳日:2023-03-27 17:46:40 公開日:2023-03-24

# ロバスト多視点三角測量のための半定値緩和

Semidefinite Relaxations for Robust Multiview Triangulation ( http://arxiv.org/abs/2301.11431v2 )

ライセンス: Link先を確認

Linus H\"arenstam-Nielsen, Niclas Zeller, Daniel Cremers

(参考訳) 本稿では,凸緩和に基づく最適ロバスト多視点三角測量のアプローチを提案する。この目的のために、最小二乗コスト関数を組み込むことで、既存の緩和アプローチを非ロバスト多視点三角測量に拡張する。本稿では,エピポーラ制約に基づく2つの定式化と,分数再投影制約に基づく2つの定式化を提案する。 1つ目は低次元であり、中程度の騒音と降圧レベルの下ではきつく、もう1つ目は高次元であり、したがって遅いが、極端な騒音と降圧レベルでもきつい。提案手法は,大きな雑音と大容量の異常の下でも,証明可能な最適再構成を計算できることを実証する。

We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. To this end, we extend existing relaxation approaches to non-robust multiview triangulation by incorporating a least squares cost function. We propose two formulations, one based on epipolar constraints and one based on fractional reprojection constraints. The first is lower dimensional and remains tight under moderate noise and outlier levels, while the second is higher dimensional and therefore slower but remains tight even under extreme noise and outlier levels. We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal reconstructions even under significant noise and a large percentage of outliers.

翻訳日:2023-03-27 17:45:25 公開日:2023-03-24

# 音声言語理解におけるファイラー : 計算的・心理的視点

Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives ( http://arxiv.org/abs/2301.10761v4 )

ライセンス: Link先を確認

Tanvi Dinkar, Chlo\'e Clavel, Ioana Vasilescu

(参考訳) 発話の通常の流れにおける中断(disfluencies)は、話し言葉に対してユビキタスである。フィラー("uh", "um")は、他の種類の不均衡と比較して最も頻繁に発生する不規則である。しかし、私たちの知る限りでは、これらのスピーチイベントにおいてSpoken Language Understanding(SLU)に影響を与える研究の視点をまとめるリソースは存在しない。本論文の目的は,基本(心理学)言語理論の考察から,自動音声認識(asr)とsluシステムにおける注釈と考察から,世代的観点からの研究まで,幅広い視点を総合的に調査することである。この記事では、SLUと会話型AIコミュニティにアプローチ可能な方法で視点を提示し、前進、各分野のトレンドと課題を議論することを目的としています。

Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.

翻訳日:2023-03-27 17:45:14 公開日:2023-03-24

# すべてのドメインに対する1つのモデル:クロスドメインnerのためのコラボレーティブなドメインプリフィックスチューニング

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER ( http://arxiv.org/abs/2301.10410v3 )

ライセンス: Link先を確認

Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen

(参考訳) クロスドメインNERは、実践シナリオにおける低リソースの問題に対処する上で難しいタスクである。従来の典型的なソリューションは主に、リッチリソースドメインのデータを持つ事前学習言語モデル(PLM)を用いてNERモデルを取得し、ターゲットドメインに適応する。異なるドメインのエンティティタイプ間のミスマッチの問題のため、従来のアプローチは通常、PLMのすべてのパラメータをチューニングし、最終的に各ドメインに対して全く新しいNERモデルになる。さらに、現在のモデルは、複数のソースからターゲットへの知識の転送に失敗しながら、単一のソースドメインにおける知識の活用にのみ焦点を当てている。この問題に対処するために,テキスト対テキスト生成plmに基づくクロスドメインner(cp-ner)のための協調型ドメインプリフィックスチューニングを導入する。具体的には、ドメイン関連インストラクターを対象に、構造変更なしに知識を新しいドメインNERタスクに転送するテキスト・ツー・テキスト生成を提案する。凍結したPLMを利用して協調的なドメイン-プレフィックスチューニングを行い、PLMのポテンシャルを刺激し、NERタスクを様々なドメインで処理する。 Cross-NERベンチマークによる実験結果から,提案手法はフレキシブルトランスファー能力を有し,単一ソースと複数ソースのクロスドメインNERタスクにおいて優れた性能を発揮することが示された。コードはhttps://github.com/zjunlp/DeepKE/tree/main/example/ner/crossで提供される。

Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks. Codes will be available in https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross.

翻訳日:2023-03-27 17:44:59 公開日:2023-03-24

# 深部測定量子化

Deep Conditional Measure Quantization ( http://arxiv.org/abs/2301.06907v2 )

ライセンス: Link先を確認

Gabriel Turinici

(参考訳) 確率測度の量子化は、それを有限個のディラック質量で表し、入力分布を十分に近似することを意味する(確率測度の幾らかの計量空間において)。様々な方法が存在するが、条件付き法則の定量化の状況は調査されていない。本稿では,深層ニューラルネットワークアーキテクチャと組み合わされたフーバーエネルギーカーネルベースアプローチを用いたdcmqと呼ばれる手法を提案する。この方法はいくつかの例でテストされ、有望な結果が得られる。

Quantization of a probability measure means representing it with a finite set of Dirac masses that approximates the input distribution well enough (in some metric space of probability measures). Various methods exists to do so, but the situation of quantizing a conditional law has been less explored. We propose a method, called DCMQ, involving a Huber-energy kernel-based approach coupled with a deep neural network architecture. The method is tested on several examples and obtains promising results.

翻訳日:2023-03-27 17:44:18 公開日:2023-03-24

# セマンティック・トレラント・コントラスト・ロスによる自己監督型イメージ・ツー・ポイント蒸留

Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss ( http://arxiv.org/abs/2301.05709v2 )

ライセンス: Link先を確認

Anas Mahmoud, Jordan S. K. Hu, Tianshu Kuai, Ali Harakeh, Liam Paull, and Steven L. Waslander

(参考訳) 知覚タスクの3D表現を学習するための効果的なフレームワークは、コントラスト学習を通じて、リッチな自己教師付き画像特徴を抽出することである。しかし、自律運転データセットのイメージ・ツー・ポイント表現学習は2つの大きな課題に直面している。 1) 自己相似性の豊富さは、意味的に類似した点や画像領域を押し出し、学習した表現の局所的な意味構造を乱す、対照的な損失をもたらす。 2)プリトレーニングとしての厳しいクラス不均衡は,過度に表現されたクラスに支配される。本稿では,画像領域と画像領域の対比を最小化するために,正と負の領域間の意味距離を考慮した,新しい意味論的に寛容な画像対点コントラスト損失法を提案する。さらに,クラス不均衡度を,集合的なサンプルとサンプル間のセマンティック類似度によって近似するクラス非均衡損失を設計することで,クラス不均衡に対処する。クラスバランスによるセマンティック・トレラントなコントラスト損失は,3次元セマンティックセグメンテーションのすべての評価設定において,最先端の2D-to-3D表現学習を改善することを示す。提案手法は,最先端の2D-to-3D表現学習フレームワークを多種多様な自己教師付き事前学習モデルで一貫した性能を発揮する。

An effective framework for learning 3D representations for perception tasks is distilling rich self-supervised image features via contrastive learning. However, image-to point representation learning for autonomous driving datasets faces two main challenges: 1) the abundance of self-similarity, which results in the contrastive losses pushing away semantically similar point and image regions and thus disturbing the local semantic structure of the learned representations, and 2) severe class imbalance as pretraining gets dominated by over-represented classes. We propose to alleviate the self-similarity problem through a novel semantically tolerant image-to-point contrastive loss that takes into consideration the semantic distance between positive and negative image regions to minimize contrasting semantically similar point and image regions. Additionally, we address class imbalance by designing a class-agnostic balanced loss that approximates the degree of class imbalance through an aggregate sample-to-samples semantic similarity measure. We demonstrate that our semantically-tolerant contrastive loss with class balancing improves state-of-the art 2D-to-3D representation learning in all evaluation settings on 3D semantic segmentation. Our method consistently outperforms state-of-the-art 2D-to-3D representation learning frameworks across a wide range of 2D self-supervised pretrained models.

翻訳日:2023-03-27 17:44:10 公開日:2023-03-24

# 音声アシスタントの親制御に向けて

Towards Usable Parental Control for Voice Assistants ( http://arxiv.org/abs/2303.04957v2 )

ライセンス: Link先を確認

Peiyi Yang, Jie Fan, Zice Wei, Haoqian Li, Tu Le, and Yuan Tian

(参考訳) ボイスパーソナルアシスタント(VPA)は一般的な家電製品となっている。 VPA技術の主要なプラットフォームのひとつとして、AmazonはAlexaを開発し、子供向けのAmazon Kidsを設計し、VPAの豊富な機能を安全に享受し、親がペアレントダッシュボードを通じて子供の活動を監視するようにした。このエコシステムは存在するが、親ダッシュボードの利用は親にはまだ普及していない。本稿では,親による調査を行い,親のコントロール機能について,親の好みや嫌いについて調査する。親は、子どもの活動、子どものセキュリティ機能へのアクセスの容易化、ユーザーインターフェースの改善など、より視覚的な情報を必要としている。本調査から得られた知見をもとに,親の期待を鑑み,親のダッシュボードに新たなデザインを提案する。

Voice Personal Assistants (VPA) have become a common household appliance. As one of the leading platforms for VPA technology, Amazon created Alexa and designed Amazon Kids for children to safely enjoy the rich functionalities of VPA and for parents to monitor their kids' activities through the Parent Dashboard. Although this ecosystem is in place, the usage of Parent Dashboard is not yet popularized among parents. In this paper, we conduct a parent survey to find out what they like and dislike about the current parental control features. We find that parents need more visuals about their children's activity, easier access to security features for their children, and a better user interface. Based on the insights from our survey, we present a new design for the Parent Dashboard considering the parents' expectations.

翻訳日:2023-03-27 17:37:59 公開日:2023-03-24

# HairStep:シングルビュー3次元ヘアモデリングのためのストランドマップと深さマップを用いた実写合成

HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling ( http://arxiv.org/abs/2303.02700v2 )

ライセンス: Link先を確認

Yujian Zheng, Zirong Jin, Moran Li, Haibin Huang, Chongyang Ma, Shuguang Cui, Xiaoguang Han

(参考訳) 本研究では,学習型単一視点3Dヘアモデリングの課題に対処する。実画像と3Dヘアデータを集めることの難しさから, 合成データを用いて, 実領域の事前知識を提供する手法が主流となっている。残念ながら、これはドメインギャップの課題をもたらします。現実的なヘアレンダリングが本質的に困難であるため、既存の手法では、ギャップを埋める入力としてヘアイメージの代わりに方向マップを使用するのが一般的である。中間表現は不可欠であると考えるが、支配的なフィルタリングに基づく手法を用いた方向マップは不確定なノイズに敏感であり、有能な表現とは程遠い。そこで本研究では,まずこの問題を提起し,ストランドマップと深さマップからなるヘアステップと呼ばれる新しい中間表現を提案する。 HairStepは正確な3Dヘアモデリングに十分な情報を提供するだけでなく、実際の画像から推測できる。具体的には、2種類のアノテーションで1,250枚の肖像画画像のデータセットを収集する。さらに学習フレームワークは、実際の画像をストランドマップと深さマップに転送するように設計されている。新たなデータセットの付加的なボーナスが3Dヘアモデリングの最初の定量的指標であることに注意が必要だ。実験の結果, ヘアステップは合成とリアルの領域ギャップを狭くし, 単視点3dヘアリコンストラクションの最先端性能を実現することがわかった。

In this work, we tackle the challenging problem of learning-based single-view 3D hair modeling. Due to the great difficulty of collecting paired real image and 3D hair data, using synthetic data to provide prior knowledge for real domain becomes a leading solution. This unfortunately introduces the challenge of domain gap. Due to the inherent difficulty of realistic hair rendering, existing methods typically use orientation maps instead of hair images as input to bridge the gap. We firmly think an intermediate representation is essential, but we argue that orientation map using the dominant filtering-based methods is sensitive to uncertain noise and far from a competent representation. Thus, we first raise this issue up and propose a novel intermediate representation, termed as HairStep, which consists of a strand map and a depth map. It is found that HairStep not only provides sufficient information for accurate 3D hair modeling, but also is feasible to be inferred from real images. Specifically, we collect a dataset of 1,250 portrait images with two types of annotations. A learning framework is further designed to transfer real images to the strand map and depth map. It is noted that, an extra bonus of our new dataset is the first quantitative metric for 3D hair modeling. Our experiments show that HairStep narrows the domain gap between synthetic and real and achieves state-of-the-art performance on single-view 3D hair reconstruction.

翻訳日:2023-03-27 17:37:46 公開日:2023-03-24

# PixMIM:マズーク画像モデリングにおけるピクセル再構成の再考

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling ( http://arxiv.org/abs/2303.02416v2 )

ライセンス: Link先を確認

Yuan Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

(参考訳) Masked Image Modeling (MIM) は Masked Autoencoders (MAE) と BEiT の出現によって有望な進歩を遂げた。しかし、その後の作業は、新しい補助タスクや予備訓練されたモデルでフレームワークを複雑化し、必然的に計算オーバーヘッドを増加させた。本稿では、画素再構成の観点からMIMの基本的な解析を行い、入力画像パッチと再構成ターゲットを調べ、2つの重要なボトルネックを強調する。この分析に基づいて, 2つの戦略を包含する非常に単純で効果的な方法, {\ourmethod} を提案する。 1) 再構成対象から高周波成分をフィルタリングし、テクスチャに富む詳細へのネットワークの焦点を強調しない。 2)MIMトレーニングにおける前景不足の問題を軽減するため,保守的なデータ変換戦略を採用する。 {\ourmethod} は、既存のピクセルベースのMIMアプローチ (\ie, using raw image as reconstruction target) に、無視できる追加計算で簡単に統合できる。ベルとホイッスルがなければ,提案手法は様々な下流タスクにおいて,MAE,ConvMAE,LSMAEの3つのMIMアプローチを一貫して改善する。我々は,この効果的なプラグアンドプレイ方式が,自己指導型学習の強力なベースラインとなり,MIMフレームワークの今後の改良に対する洞察を提供すると考えている。コードとモデルは \url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim} で利用可能である。

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT. However, subsequent works have complicated the framework with new auxiliary tasks or extra pre-trained models, inevitably increasing computational overhead. This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction, which examines the input image patches and reconstruction target, and highlights two critical but previously overlooked bottlenecks. Based on this analysis, we propose a remarkably simple and effective method, {\ourmethod}, that entails two strategies: 1) filtering the high-frequency components from the reconstruction target to de-emphasize the network's focus on texture-rich details and 2) adopting a conservative data transform strategy to alleviate the problem of missing foreground in MIM training. {\ourmethod} can be easily integrated into most existing pixel-based MIM approaches (\ie, using raw images as reconstruction target) with negligible additional computation. Without bells and whistles, our method consistently improves three MIM approaches, MAE, ConvMAE, and LSMAE, across various downstream tasks. We believe this effective plug-and-play method will serve as a strong baseline for self-supervised learning and provide insights for future improvements of the MIM framework. Code and models are available at \url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim}.

翻訳日:2023-03-27 17:37:21 公開日:2023-03-24

# オンライン討論におけるヘイト、毒性、過激な集団的モデレーション

Collective moderation of hate, toxicity, and extremity in online discussions ( http://arxiv.org/abs/2303.00357v2 )

ライセンス: Link先を確認

Jana Lasser and Alina Herderich and Joshua Garland and Segun Taofeek Aroyehun and David Garcia and Mirta Galesic

(参考訳) ネット上でのヘイト、毒性、過激主義を市民はどうやって抑えられるのか? 我々は、移民危機と政治的混乱が続く4年間にわたる混乱の中で、ドイツTwitterに関する13万人以上の議論の大規模なコーパスを分析した。人間の注釈、言語モデル、機械学習分類器、および縦断統計分析の助けを借りて、言論の異なる次元のダイナミクスを識別する。単純な意見を表現することは、必ずしも事実によって支持されるのではなく、侮辱なしでも、後続の議論において、憎悪、毒性、言論の過激さに関係している。サルカズムはこれらの成果、特に組織化された極端な集団の存在の達成にも貢献する。事実の提供や矛盾の露呈といったより建設的なコメントは、反発し、より過激さを惹きつける可能性がある。アウトグループまたはイングループへの言及は、一般的に長期的にの談話の悪化と関連している。怒りや恐怖などの否定的な感情のトーンや、熱意やプライドなどの肯定的な感情のトーンも、より悪い結果をもたらす。会話の小さなサンプルをワンショット分析するだけでなく,集合的市民モデレーションによるオンラインコモンズの管理が成功に繋がる可能性が示唆された。

How can citizens moderate hate, toxicity, and extremism in online discourse? We analyze a large corpus of more than 130,000 discussions on German Twitter over the turbulent four years marked by the migrant crisis and political upheavals. With a help of human annotators, language models, machine learning classifiers, and longitudinal statistical analyses, we discern the dynamics of different dimensions of discourse. We find that expressing simple opinions, not necessarily supported by facts but also without insults, relates to the least hate, toxicity, and extremity of speech and speakers in subsequent discussions. Sarcasm also helps in achieving those outcomes, in particular in the presence of organized extreme groups. More constructive comments such as providing facts or exposing contradictions can backfire and attract more extremity. Mentioning either outgroups or ingroups is typically related to a deterioration of discourse in the long run. A pronounced emotional tone, either negative such as anger or fear, or positive such as enthusiasm and pride, also leads to worse outcomes. Going beyond one-shot analyses on smaller samples of discourse, our findings have implications for the successful management of online commons through collective civic moderation.

翻訳日:2023-03-27 17:36:55 公開日:2023-03-24

# 汎用的映像モーメント検索に向けて:画像テキスト事前学習へのビジュアルダイナミックインジェクション

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training ( http://arxiv.org/abs/2303.00040v2 )

ライセンス: Link先を確認

Dezhao Luo, Jiabo Huang, Shaogang Gong, Hailin Jin, Yang Liu

(参考訳) 視覚とテキストの相関関係はビデオモーメント検索(VMR)において重要であるが,既存の手法では視覚とテキストの理解のために,個別の事前学習機能抽出器に大きく依存している。十分な時間境界アノテーションがなければ、ユニバーサルなビデオテキストアライメントを学ぶことは簡単ではない。本研究では,大規模画像テキストデータから派生したマルチモーダル相関を探索し,vmrの一般化を容易にする。映像変化のキャプチャにおける画像テキスト事前学習モデルの限界に対処するため,映像モーメントの理解を促進するため,視覚動的インジェクション(vdi)と呼ばれる汎用的な手法を提案する。既存のvmr手法は時相認識ビデオ機能の構築に重点を置いているが、時相変化に関するテキスト記述を認識することも重要であるが、元々は静的画像と文をマッチングして事前学習では見過ごされていた。そこで,映像フレームから映像コンテキストと空間動的情報を抽出し,映像変化を表すフレーズ(例えば動詞)とのアライメントを明示的に強制する。これにより、ビデオ中の可能性のある視覚および動きパターンを対応するテキスト埋め込み(インジェクション)にエンコードし、より正確なビデオテキストアライメントを可能にする。我々は2つのVMRベンチマークデータセット(Charades-STAとActivityNet-Captions)で広範な実験を行い、最先端のパフォーマンスを実現した。特に、VDIは、新規なシーンと語彙を含むテストサンプルが配布外分割でテストされる際、顕著な利点をもたらす。

The correlation between the vision and text is essential for video moment retrieval (VMR), however, existing methods heavily rely on separate pre-training feature extractors for visual and textual understanding. Without sufficient temporal boundary annotations, it is non-trivial to learn universal video-text alignments. In this work, we explore multi-modal correlations derived from large-scale image-text data to facilitate generalisable VMR. To address the limitations of image-text pre-training models on capturing the video changes, we propose a generic method, referred to as Visual-Dynamic Injection (VDI), to empower the model's understanding of video moments. Whilst existing VMR methods are focusing on building temporal-aware video features, being aware of the text descriptions about the temporal changes is also critical but originally overlooked in pre-training by matching static images with sentences. Therefore, we extract visual context and spatial dynamic information from video frames and explicitly enforce their alignments with the phrases describing video changes (e.g. verb). By doing so, the potentially relevant visual and motion patterns in videos are encoded in the corresponding text embeddings (injected) so to enable more accurate video-text alignments. We conduct extensive experiments on two VMR benchmark datasets (Charades-STA and ActivityNet-Captions) and achieve state-of-the-art performances. Especially, VDI yields notable advantages when being tested on the out-of-distribution splits where the testing samples involve novel scenes and vocabulary.

翻訳日:2023-03-27 17:36:34 公開日:2023-03-24

# Masked Image Modeling (MIM) を用いたリモートセンシングシーン分類

Remote Sensing Scene Classification with Masked Image Modeling (MIM) ( http://arxiv.org/abs/2302.14256v2 )

ライセンス: Link先を確認

Liya Wang, Alex Tien

(参考訳) リモートセンシングシーンの分類は、地質調査、石油探査、交通管理、地震予知、山火事モニタリング、情報監視において重要な役割を果たしている。過去には、タスクを実行する機械学習(ML)メソッドは、主に教師あり学習(SL)の方法で事前訓練されたバックボーンを使用していた。自己教師付き学習(SSL)技術であるMasked Image Modeling(MIM)が視覚特徴表現学習のより良い方法として示されたため、シーン分類タスクにおけるMLパフォーマンスを改善する新たな機会が提示された。本研究では,merced, aid, nwpu-resisc45, optimal-31の4つの分類データセットにおいて,mim事前学習されたバックボーンの可能性を検討することを目的とした。公開ベンチマークと比較すると,mimプリトレーニング視覚トランスフォーマ(vits)バックボーンは,他の選択肢(トップ1の精度では最大18%)よりも優れており,mimテクニックは教師あり学習よりも優れた特徴表現(トップ1の精度では最大5%)を学習できることが示されている。さらに, 汎用MIM-Pretrained ViTsは, リモートセンシング(TRS)フレームワークとして設計されながら複雑なトランスフォーマーとして, 競争力を発揮することを示す。実験結果は,今後の研究における性能ベースラインも提供する。

Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the Machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked Image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained Vision Transformer (ViTs) backbones outperform other alternatives (up to 18% on top 1 accuracy) and that the MIM technique can learn better feature representation than the supervised learning counterparts (up to 5% on top 1 accuracy). Moreover, we show that the general-purpose MIM-pretrained ViTs can achieve competitive performance as the specially designed yet complicated Transformer for Remote Sensing (TRS) framework. Our experiment results also provide a performance baseline for future studies.

翻訳日:2023-03-27 17:36:06 公開日:2023-03-24

# CBA:物理世界における光学的空中検出に対する背景背景攻撃

CBA: Contextual Background Attack against Optical Aerial Detection in the Physical World ( http://arxiv.org/abs/2302.13519v3 )

ライセンス: Link先を確認

Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Shaohui Mei

(参考訳) パッチベースの物理的攻撃はますます懸念を喚起している。しかし、既存の手法のほとんどは地上で捕獲された目標を無視することに焦点を当てており、これらの方法のいくつかは単に空中探知機を欺くために拡張されている。物理的に標的となる物体を精巧な対向パッチで削り、これは空中検出器の予測をわずかに妨げ、攻撃の伝達性が弱いだけである。以上の課題に対処するため,本研究では,空中検出に対する新たな物理的攻撃フレームワークであるコンテキスト背景攻撃(CBA)を提案する。特に、関心の対象、すなわち航空画像における航空機は、敵のパッチをマスキングするために採用されている。マスク領域の外の画素は、生成した対向パッチが検出の重要背景領域を密にカバーするように最適化されており、これは現実世界においてより堅牢で移動可能な攻撃力を持つ対向パッチの贈与に寄与する。攻撃性能をさらに強化するため、敵パッチはトレーニング中に外部目標とされ、検出された対象物(オン・アンド・アウト・パッチ)は攻撃効果の蓄積に寄与する。これにより、高度に設計されたパッチは、対向パッチの上と外の両方のオブジェクトに対して、しっかりとした騙し効果を同時に付与される。大規模にスケールされた実験は、物理的なシナリオにおいて行われ、提案した物理攻撃フレームワークの優位性と可能性を示す。提案手法は,多様な航空検出器と防衛手法の対角的ロバスト性を評価するための指標として期待できる。

Patch-based physical attacks have increasingly aroused concerns. However, most existing methods focus on obscuring targets captured on the ground, and some of these methods are simply extended to deceive aerial detectors. They smear the targeted objects in the physical world with the elaborated adversarial patches, which can only slightly sway the aerial detectors' prediction and with weak attack transferability. To address the above issues, we propose to perform Contextual Background Attack (CBA), a novel physical attack framework against aerial detection, which can achieve strong attack efficacy and transferability in the physical world even without smudging the interested objects at all. Specifically, the targets of interest, i.e. the aircraft in aerial images, are adopted to mask adversarial patches. The pixels outside the mask area are optimized to make the generated adversarial patches closely cover the critical contextual background area for detection, which contributes to gifting adversarial patches with more robust and transferable attack potency in the real world. To further strengthen the attack performance, the adversarial patches are forced to be outside targets during training, by which the detected objects of interest, both on and outside patches, benefit the accumulation of attack efficacy. Consequently, the sophisticatedly designed patches are gifted with solid fooling efficacy against objects both on and outside the adversarial patches simultaneously. Extensive proportionally scaled experiments are performed in physical scenarios, demonstrating the superiority and potential of the proposed framework for physical attacks. We expect that the proposed physical attack method will serve as a benchmark for assessing the adversarial robustness of diverse aerial detectors and defense methods.

翻訳日:2023-03-27 17:35:41 公開日:2023-03-24

# 位置依存有効質量を持つ半圧高調波振動子モデルのウィグナー関数

The Wigner function of a semiconfined harmonic oscillator model with a position-dependent effective mass ( http://arxiv.org/abs/2302.12673v2 )

ライセンス: Link先を確認

S.M. Nagiyev, A.M. Jafarova and E.I. Jafarov

(参考訳) 我々は、量子調和振動子モデルに対するウィグナー関数の観点から位相空間表現の概念を開発し、その位置によって変化する質量を通して半収束効果を示す。このような半閉じ込め量子系に対するウィグナー分布関数の解析計算に新たな手法を適用した。この方法では、量子分布関数の定義における積分のばらつきを抑えることができ、半収束振動子モデルの定常状態に対する解析式の計算に繋がる。この量子系に対する応用外等質場の存在と欠如の両方のケースについて研究した。得られたウィグナー分布関数の正確な表現は、第一種およびラゲール多項式のベッセル関数を介して表現される。さらに、いくつかの特殊な事例と限界を詳細に論じる。

We develop a phase-space representation concept in terms of the Wigner function for a quantum harmonic oscillator model that exhibits the semiconfinement effect through its mass varying with the position. The new method is applied for the analytical computation of the Wigner distribution function for such a semiconfinement quantum system. The method allows for suppression of the divergence of the integrand in the definition of the quantum distribution function and leads to the computation of its analytical expressions for the stationary states of the semiconfined oscillator model. Both cases of the presence and absence of the applied external homogeneous field for this quantum system are studied. Obtained exact expressions of the Wigner distribution function are expressed through the Bessel function of the first kind and Laguerre polynomials. Further, some of the special cases and limits are discussed in detail.

翻訳日:2023-03-27 17:35:14 公開日:2023-03-24

# masterの論文:エネルギーベースモデルによる分布外検出

Master's Thesis: Out-of-distribution Detection with Energy-based Models ( http://arxiv.org/abs/2302.12002v2 )

ライセンス: Link先を確認

Sven Elflein

(参考訳) 現在、ディープラーニングは、自動運転や医療診断のようなセキュリティクリティカルな状況にますます適用されている。その成功にもかかわらず、ディープネットワークの振る舞いと堅牢性はまだ完全には理解されておらず、重大なリスクをもたらしている。特に最近研究者たちは、ニューラルネットワークは、これまで見たことのないデータでも、その予測に過度に自信を持っていることを発見しました。この問題に取り組むために、文献における2つのアプローチを区別することができる。 1つは予測の不確実性を考慮し、もう1つはトレーニングデータの基盤となる密度を推定し、与えられた入力がトレーニングデータに近いかどうかを判断し、ネットワークが期待通りに実行可能であることを示し、本論文では、トレーニングデータ分布を適合させるタスクにおけるebmsの能力を調査し、分散(ood)入力の検出を行う。ほとんどのデータセットでは、EDMは柔軟性に拘わらず、OODデータの検出において、本質的に他の密度推定器よりも優れているわけではない。そこで本研究では,ebmsの性能に対する監督,寸法削減,アーキテクチャ変更の影響についても検討した。 OOD検出問題に対処する2つのアプローチのギャップを埋め、EBM内の様々な不確かさを分類するために推定できるEnergy-Prior Network(EPN)を提案する。 EBMにおけるディリクレ分布の濃度パラメータと接合エネルギーとの間の関係を同定する。さらに、一部のアプリケーションでは利用できない、あるいはコストのかかるOODデータセットを保持せずに最適化できる。最後に, エネルギー優先ネットワーク (epn) がood入力, データセットシフト, 逆例として検出できることを実証的に示す。理論的には、EPNは、入力がトレーニングデータから遠く離れた場合、漸近的ケースに対して好ましい特性を提供する。

Today, deep learning is increasingly applied in security-critical situations such as autonomous driving and medical diagnosis. Despite its success, the behavior and robustness of deep networks are not fully understood yet, posing a significant risk. In particular, researchers recently found that neural networks are overly confident in their predictions, even on data they have never seen before. To tackle this issue, one can differentiate two approaches in the literature. One accounts for uncertainty in the predictions, while the second estimates the underlying density of the training data to decide whether a given input is close to the training data, and thus the network is able to perform as expected.In this thesis, we investigate the capabilities of EBMs at the task of fitting the training data distribution to perform detection of out-of-distribution (OOD) inputs. We find that on most datasets, EBMs do not inherently outperform other density estimators at detecting OOD data despite their flexibility. Thus, we additionally investigate the effects of supervision, dimensionality reduction, and architectural modifications on the performance of EBMs. Further, we propose Energy-Prior Network (EPN) which enables estimation of various uncertainties within an EBM for classification, bridging the gap between two approaches for tackling the OOD detection problem. We identify a connection between the concentration parameters of the Dirichlet distribution and the joint energy in an EBM. Additionally, this allows optimization without a held-out OOD dataset, which might not be available or costly to collect in some applications. Finally, we empirically demonstrate that Energy-Prior Network (EPN) is able to detect OOD inputs, datasets shifts, and adversarial examples. Theoretically, EPN offers favorable properties for the asymptotic case when inputs are far from the training data.

翻訳日:2023-03-27 17:35:03 公開日:2023-03-24

# 高速対人訓練における破滅的オーバーフィッティングの考察--自己適合の視点から

Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective ( http://arxiv.org/abs/2302.11963v2 )

ライセンス: Link先を確認

Zhengbao He, Tao Li, Sizhe Chen and Xiaolin Huang

(参考訳) 高速対向トレーニングは、堅牢なネットワークを構築するための効率的なアプローチを提供するが、多段階の堅牢な精度が突然0に崩壊する破滅的なオーバーフィッティング(CO)と呼ばれる深刻な問題に悩まされる。本稿では,データ情報と自己情報に一段階の逆転例を分離し,この現象を「自己適合」と呼ぶ興味深い現象を明らかにした。自己適合、すなわち、ネットワークは単一ステップの摂動に埋め込まれた自己情報を学び、自然にCOが発生する。自己適合が発生すると、ネットワークは明らかな「チャネル分化」現象を経験し、自己情報を認識するための畳み込みチャネルが支配的になり、一方、データ情報のチャンネルは抑圧される。このようにして、ネットワークは十分な自己情報を持つ画像のみを認識でき、他の種類のデータに対する一般化能力を失う。自己適合に基づいて,COを緩和し,COを多段階の対人訓練に拡張する既存手法に関する新たな知見を提供する。本研究は, 対人訓練における自己学習のメカニズムを明らかにし, 異なる種類の情報を抑制してCOを緩和するための新たな視点を開く。

Although fast adversarial training provides an efficient approach for building robust networks, it may suffer from a serious problem known as catastrophic overfitting (CO), where multi-step robust accuracy suddenly collapses to zero. In this paper, we for the first time decouple single-step adversarial examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting". Self-fitting, i.e., the network learns the self-information embedded in single-step perturbations, naturally leads to the occurrence of CO. When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed. In this way, the network can only recognize images with sufficient self-information and loses generalization ability to other types of data. Based on self-fitting, we provide new insights into the existing methods to mitigate CO and extend CO to multi-step adversarial training. Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.

翻訳日:2023-03-27 17:34:35 公開日:2023-03-24

# NISQデバイスにおける変分ギブス状態生成

Variational Gibbs State Preparation on NISQ devices ( http://arxiv.org/abs/2303.11276v2 )

ライセンス: Link先を確認

Mirko Consiglio, Jacopo Settino, Andrea Giordano, Carlo Mastroianni, Francesco Plastina, Salvatore Lorenzo, Sabrina Maniscalco, John Goold, Tony J. G. Apollaro

(参考訳) ノイズのある中間スケール(NISQ)デバイス上での量子多体系の平衡熱状態の生成は、量子計算の応用範囲を広げるために重要な課題である。忠実なギブス状態準備は、熱化や平衡外熱力学などのプロトコルを調査する方法と、ギブス状態からのサンプリングが重要なサブルーチンを構成する量子アルゴリズムに有用なリソースを提供する。量子多体系のギブス状態を作成するための変分量子アルゴリズム(VQA)を提案する。我々のVQAの新規性は、2つの異なる接続された量子レジスタに作用するパラメータ化量子回路を実装することである。 vqaはヘルムホルツ自由エネルギーを評価し、フォン・ノイマンエントロピーは1つのレジスタ上の計算基底測定の事後処理によって得られ、ギブス状態はエネルギー基底のユニタリ回転を介して他のレジスタで作成される。最後に, 逆場イジングモデルのギブズ状態を作成してVQAをベンチマークし, 状態ベクトルシミュレーションにおいて, 広範囲の温度で極めて高い忠実性を実現する。また、IBM量子コンピュータにおけるVQAの性能を評価し、現在のNISQデバイスで実現可能であることを示す。

The preparation of an equilibrium thermal state of a quantum many-body system on noisy intermediate-scale (NISQ) devices is an important task in order to extend the range of applications of quantum computation. Faithful Gibbs state preparation would pave the way to investigate protocols such as thermalization and out-of-equilibrium thermodynamics, as well as providing useful resources for quantum algorithms, where sampling from Gibbs states constitutes a key subroutine. We propose a variational quantum algorithm (VQA) to prepare Gibbs states of a quantum many-body system. The novelty of our VQA consists in implementing a parameterized quantum circuit acting on two distinct, yet connected, quantum registers. The VQA evaluates the Helmholtz free energy, where the von Neumann entropy is obtained via post-processing of computational basis measurements on one register, while the Gibbs state is prepared on the other register, via a unitary rotation in the energy basis. Finally, we benchmark our VQA by preparing Gibbs states of the transverse field Ising model and achieve remarkably high fidelities across a broad range of temperatures in statevector simulations. We also assess the performance of the VQA on IBM quantum computers, showcasing its feasibility on current NISQ devices.

翻訳日:2023-03-27 17:29:10 公開日:2023-03-24

# 自己監督学習のためのオープンセットからのコアセットサンプリング

Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning ( http://arxiv.org/abs/2303.11101v2 )

ライセンス: Link先を確認

Sungnyun Kim, Sangmin Bae, Se-Young Yun

(参考訳) 一般領域におけるディープラーニングは、きめ細かい特徴の認識を必要とするドメイン固有のタスクに絶えず拡張されてきた。しかし、細かなタスクに対する現実世界のアプリケーションは、2つの課題に直面している: アノテーションの専門知識に高い依存と、特定のドメインにおける様々な下流タスクの汎用モデルの必要性(例えば、カテゴリの予測、バウンディングボックス、ピクセル単位でのアノテーションなど)。幸いなことに、最近の自己教師型学習(SSL)は、アノテーションなしでモデルを事前トレーニングするための有望なアプローチであり、下流タスクの効果的な初期化として役立ちます。 SSLはアノテーションの存在に依存しないので、一般に、オープンセットと呼ばれる大規模なラベルなしデータセットを使用する。この意味では,事前学習段階において,大規模無ラベルオープンセットと細粒度目標データセットが利用可能であることを前提として,新しいオープンセット自己教師付き学習問題を導入する。問題設定では、オープンセットとターゲットデータセットの分布ミスマッチを考慮することが重要である。そこで我々はSimCoreアルゴリズムを用いて、潜在空間内のターゲットデータセットに最小距離を持つオープンセットのサブセットであるコアセットをサンプリングする。また,SimCoreは,11個の細粒度データセットと7つのオープンセットを含む広範囲な実験的な設定により,表現学習性能を著しく向上することを示した。

Deep learning in general domains has constantly been extended to domain-specific tasks requiring the recognition of fine-grained characteristics. However, real-world applications for fine-grained tasks suffer from two challenges: a high reliance on expert knowledge for annotation and necessity of a versatile model for various downstream tasks in a specific domain (e.g., prediction of categories, bounding boxes, or pixel-wise annotations). Fortunately, the recent self-supervised learning (SSL) is a promising approach to pretrain a model without annotations, serving as an effective initialization for any downstream tasks. Since SSL does not rely on the presence of annotation, in general, it utilizes the large-scale unlabeled dataset, referred to as an open-set. In this sense, we introduce a novel Open-Set Self-Supervised Learning problem under the assumption that a large-scale unlabeled open-set is available, as well as the fine-grained target dataset, during a pretraining phase. In our problem setup, it is crucial to consider the distribution mismatch between the open-set and target dataset. Hence, we propose SimCore algorithm to sample a coreset, the subset of an open-set that has a minimum distance to the target dataset in the latent space. We demonstrate that SimCore significantly improves representation learning performance through extensive experimental settings, including eleven fine-grained datasets and seven open-sets in various downstream tasks.

翻訳日:2023-03-27 17:28:47 公開日:2023-03-24

# ERSAM: エネルギー効率とリアルタイムソーシャルアンビアンス測定のためのニューラルアーキテクチャ検索

ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement ( http://arxiv.org/abs/2303.10727v2 )

ライセンス: Link先を確認

Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Lin, Ashutosh Sabharwal

(参考訳) ソーシャル・アンビアンス(social ambiance)は、社会的相互作用が起こるコンテキストを記述し、同時話者数を数えることで音声を用いて測定することができる。この測定により、さまざまなメンタルヘルストラッキングと人間中心のIoTアプリケーションが可能になる。デバイス上のSocal Ambiance Measure(SAM)は、ユーザのプライバシの確保と、前述のアプリケーションの広範な採用を促進するために非常に望ましいものだが、最先端のディープニューラルネットワーク(DNN)を使用したSAMソリューションに必要な計算複雑性は、モバイルデバイス上の制約の多いリソースとは相反する。さらに、様々なプライバシの制約と必要な人的努力により、SAMの臨床的設定下において限られたラベル付きデータのみが利用可能または実用的であり、オンデバイスSAMソリューションの達成可能な正確性に挑戦する。そこで本研究では,エネルギー効率とリアルタイムSAM(ERSAM)のためのニューラルネットワーク検索フレームワークを提案する。具体的には、当社のERSAMフレームワークは、モバイルSAMソリューションのハードウェア効率フロンティアに対して達成可能な精度を推し進めるDNNを自動的に検索することができる。例えば、ERSAMが配信するDNNは、Pixel 3の5秒の音声セグメントで40mW x 12hエネルギーと0.05秒の処理レイテンシしか消費せず、LibriSpeechが生成した社会環境データセットでは14.3%のエラー率しか達成していない。当社のERSAMフレームワークは、需要が増大しているデバイス上のSAMソリューションをユビキタスに構築できることを期待しています。

Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW x 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.

翻訳日:2023-03-27 17:28:22 公開日:2023-03-24

# 映像予測のための動的マルチスケールVoxel Flow Network

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction ( http://arxiv.org/abs/2303.09875v2 )

ライセンス: Link先を確認

Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou

(参考訳) ビデオ予測の性能は、高度なディープニューラルネットワークによって大幅に向上している。しかし、現在の手法のほとんどは大きなモデルサイズに悩まされており、将来性のある性能のためにセマンティック/深度マップのような追加の入力を必要とする。本稿では,RGB画像のみを用いて,より少ない計算コストでより優れた映像予測性能を実現するための動的マルチスケールVoxel Flow Network(DMVFN)を提案する。 DMVFNの中核は、ビデオフレームの運動スケールを効果的に知覚できる、微分可能なルーティングモジュールである。トレーニングが完了すると、DMVFNは推論段階で異なる入力に対する適応サブネットワークを選択する。いくつかのベンチマーク実験により、DMVFNはDeep Voxel Flowよりも桁違いに高速であり、生成した画像の品質に対して最先端の反復型OPTを超えることが示されている。コードとデモはhttps://huxiaotaostasy.github.io/dmvfn/で閲覧できます。

The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and demo are available at https://huxiaotaostasy.github.io/DMVFN/.

翻訳日:2023-03-27 17:27:33 公開日:2023-03-24

# ニューラルネットワークトレーニングのためのカスケードフォワードアルゴリズム

The Cascaded Forward Algorithm for Neural Network Training ( http://arxiv.org/abs/2303.09728v2 )

ライセンス: Link先を確認

Gongpei Zhao, Tao Wang, Yidong Li, Yi Jin, Congyan Lang, Haibin Ling

(参考訳) バックプロパゲーションアルゴリズムは、過去10年間、ニューラルネットワークの主流となる学習手順として広く使われてきた。しかし、このアルゴリズムにはいくつかの制限があり、例えば局所的な極小さに固執し、その生物学的な可能性に関する疑問を引き起こした。これらの制限に対処するために、バックプロパゲーションの代替アルゴリズムが事前に検討されており、フォワードフォワード(ff)アルゴリズムがよく知られている。本稿では,ニューラルネットワークのための新しい学習フレームワークであるCascaded Forward(CaFo)アルゴリズムを提案する。 FFとは異なり、我々のフレームワークは各カスケードブロックのラベル分布を直接出力するが、これは追加の負のサンプルの生成を必要としないため、トレーニングとテストの両方においてより効率的なプロセスにつながる。さらに,我々のフレームワークでは,各ブロックを独立して訓練することが可能であり,並列加速度システムに容易に展開できる。提案手法を4つの公開画像分類ベンチマークで評価し, 実験結果から, ベースラインと比較した場合の予測精度が有意に向上することを示した。

Backpropagation algorithm has been widely used as a mainstream learning procedure for neural networks in the past decade, and has played a significant role in the development of deep learning. However, there exist some limitations associated with this algorithm, such as getting stuck in local minima and experiencing vanishing/exploding gradients, which have led to questions about its biological plausibility. To address these limitations, alternative algorithms to backpropagation have been preliminarily explored, with the Forward-Forward (FF) algorithm being one of the most well-known. In this paper we propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples and thus leads to a more efficient process at both training and testing. Moreover, in our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems. The proposed method is evaluated on four public image classification benchmarks, and the experimental results illustrate significant improvement in prediction accuracy in comparison with the baseline.

翻訳日:2023-03-27 17:27:16 公開日:2023-03-24

# 3次テンソル用マルチスライスクラスタリングのDBSCAN

DBSCAN of Multi-Slice Clustering for Third-Order Tensors ( http://arxiv.org/abs/2303.07768v3 )

ライセンス: Link先を確認

Dina Faneva Andriantsiory, Joseph Ben Geloun, Mustapha Lebbah

(参考訳) 3次元データのトリクラスタリングには、各次元のクラスタサイズやクラスタ数を指定する必要がある。この問題に対処するために、3階テンソルのマルチスライスクラスタリング(msc)は、しきい値の類似性に基づいてクラスタを見つけるために、ランク1テンソルデータセットの低次元部分空間にある信号スライスを見つける。データセットがrランク1テンソル(r > 1)の和である場合、データから異なる部分空間にある異なるスライス群を抽出するMSC-DBSCANという拡張アルゴリズムを提案する。我々のアルゴリズムはMSCアルゴリズムと同じ入力を使い、MSCとランクワンテンソルデータの解を見つけることができる。

Several methods for triclustering three-dimensional data require the cluster size or the number of clusters in each dimension to be specified. To address this issue, the Multi-Slice Clustering (MSC) for 3-order tensor finds signal slices that lie in a low dimensional subspace for a rank-one tensor dataset in order to find a cluster based on the threshold similarity. We propose an extension algorithm called MSC-DBSCAN to extract the different clusters of slices that lie in the different subspaces from the data if the dataset is a sum of r rank-one tensor (r > 1). Our algorithm uses the same input as the MSC algorithm and can find the same solution for rank-one tensor data as MSC.

翻訳日:2023-03-27 17:26:31 公開日:2023-03-24

# vmcdl: ソース制御フロー下のカスケードディープラーニングに基づく脆弱性マイニング

VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow ( http://arxiv.org/abs/2303.07128v2 )

ライセンス: Link先を確認

Wen Zhou

(参考訳) コンピュータ産業とコンピュータソフトウェアの急速な発展により、ソフトウェアの脆弱性が悪用されるリスクは大きく増大した。しかし、漏洩源調査のための既存の鉱業技術には、高い誤報率、粗粒度検出、専門家の経験への依存など、多くの欠点がある。本稿では,主にSARDデータセットのc/c++ソースコードデータを使用し,CWE476,CWE469,CWE516,CWE570脆弱性型のソースコードを処理し,最先端ツールのJoern脆弱性スキャン機能をテストするとともに,ソースコード制御フローに基づく新たなカスケード深層学習モデルVMCDLを提案する。まず,感性のある関数や文の探索と抽出にJoernを用い,脆弱なコードの文ライブラリを形成する。そして、CFGフロー脆弱性コードスニペットを双方向の幅優先トラバーサルで生成し、Doc2vecでベクトル化する。最後に、ソースコード制御フローに基づくカスケードディープラーニングモデルを用いて分類を行い、分類結果を得る。実験評価では,特定の脆弱性についてJoernのテスト結果を与え,単一脆弱性型ソースコード上でモデルアルゴリズムのバイナリ分類結果の混乱行列とラベルデータを与え,FPR,FNR,ACC,P,F1の5指標をそれぞれ10.30%,520%,92.50%,85.10%,85.40%とし,静的解析の誤報率を効果的に低減できることを示した。

With the rapid development of the computer industry and computer software, the risk of software vulnerabilities being exploited has greatly increased. However, there are still many shortcomings in the existing mining techniques for leakage source research, such as high false alarm rate, coarse-grained detection, and dependence on expert experience. In this paper, we mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types, test the Joern vulnerability scanning function of the cutting-edge tool, and propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities. First, this paper uses joern to locate and extract sensitive functions and statements to form a sensitive statement library of vulnerable code. Then, the CFG flow vulnerability code snippets are generated by bidirectional breadth-first traversal, and then vectorized by Doc2vec. Finally, the cascade deep learning model based on source code control flow is used for classification to obtain the classification results. In the experimental evaluation, we give the test results of Joern on specific vulnerabilities, and give the confusion matrix and label data of the binary classification results of the model algorithm on single vulnerability type source code, and compare and verify the five indicators of FPR, FNR, ACC, P and F1, respectively reaching 10.30%, 5.20%, 92.50%,85.10% and 85.40%,which shows that it can effectively reduce the false alarm rate of static analysis.

翻訳日:2023-03-27 17:26:17 公開日:2023-03-24

# 合成結晶を用いたニューラルネットワークによるICSD粉末X線回折法による構造情報の抽出

Neural networks trained on synthetically generated crystals can extract structural information from ICSD powder X-ray diffractograms ( http://arxiv.org/abs/2303.11699v2 )

ライセンス: Link先を確認

Henrik Schopmans, Patrick Reiser, Pascal Friederich

(参考訳) 機械学習技術は粉末x線回折から結晶空間群などの構造情報を抽出するのに成功している。しかし、ICSDのようなデータベースからシミュレーションされたディフラクトグラムを直接トレーニングすることは、そのサイズ、クラス不均一性、特定の構造タイプに対するバイアスのために困難である。本稿では,各空間群の対称性演算を用いてランダム座標を持つ合成結晶を生成する方法を提案する。このアプローチに基づいて,1時間に数百万のオンザフライ生成された合成ディフラクトグラムに対して,Deep ResNetライクなモデルのオンライントレーニングを実演する。選択した空間群分類のタスクに対して、ほとんどの空間群からの未確認ICSD構造タイプに対して、79.9%の精度を達成した。これはICSD結晶のトレーニングにおける現在の最先端のアプローチの56.1%を超える。その結果, 合成した結晶は, icd粉体回折から構造情報を抽出でき, 粉体x線回折の領域において, 最先端の機械学習モデルを適用することが可能となった。また、特に高スループット環境では、自動XRDデータ分析が不可欠である実験データに適用するための第一歩を示す。宇宙群の予測に焦点をあてる一方で、我々のアプローチは将来、関連するタスクにまで拡張される可能性がある。

Machine learning techniques have successfully been used to extract structural information such as the crystal space group from powder X-ray diffractograms. However, training directly on simulated diffractograms from databases such as the ICSD is challenging due to its limited size, class-inhomogeneity, and bias toward certain structure types. We propose an alternative approach of generating synthetic crystals with random coordinates by using the symmetry operations of each space group. Based on this approach, we demonstrate online training of deep ResNet-like models on up to a few million unique on-the-fly generated synthetic diffractograms per hour. For our chosen task of space group classification, we achieved a test accuracy of 79.9% on unseen ICSD structure types from most space groups. This surpasses the 56.1% accuracy of the current state-of-the-art approach of training on ICSD crystals directly. Our results demonstrate that synthetically generated crystals can be used to extract structural information from ICSD powder diffractograms, which makes it possible to apply very large state-of-the-art machine learning models in the area of powder X-ray diffraction. We further show first steps toward applying our methodology to experimental data, where automated XRD data analysis is crucial, especially in high-throughput settings. While we focused on the prediction of the space group, our approach has the potential to be extended to related tasks in the future.

翻訳日:2023-03-27 17:16:49 公開日:2023-03-24

# bopr:人体形状とポーズ推定のための身体認識部レグレッサ

BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation ( http://arxiv.org/abs/2303.11675v2 )

ライセンス: Link先を確認

Yongkang Cheng, Shaoli Huang, Jifeng Ning, Ying Shan

(参考訳) 本稿では,人体形状を推定し,眼球運動と深度あいまいさの課題に効果的に対処する単眼画像からポーズする新しいアプローチを提案する。提案手法であるBoPR(Body-Aware Part Regressor)は,まず注意誘導機構を用いて身体と部分の両方の特徴を抽出する。次に,クエリとして部分的特徴,参照として身体的特徴を含む部分的レグレッションに対する余分な部分的依存をエンコードするために,これらの機能を利用する。これにより,目に見える部分や身体参照情報を利用することで,身体とオクルードされた部分の空間的関係を推定できる。提案手法は2つのベンチマークデータセット上で既存の最先端手法よりも優れており,提案手法は深度あいまいさや閉塞処理の点で既存手法をはるかに上回っていることを示す。コードとデータは、https://github.com/cyk990422/BoPR.comで研究目的で公開されている。

This paper presents a novel approach for estimating human body shape and pose from monocular images that effectively addresses the challenges of occlusions and depth ambiguity. Our proposed method BoPR, the Body-aware Part Regressor, first extracts features of both the body and part regions using an attention-guided mechanism. We then utilize these features to encode extra part-body dependency for per-part regression, with part features as queries and body feature as a reference. This allows our network to infer the spatial relationship of occluded parts with the body by leveraging visible parts and body reference information. Our method outperforms existing state-of-the-art methods on two benchmark datasets, and our experiments show that it significantly surpasses existing methods in terms of depth ambiguity and occlusion handling. These results provide strong evidence of the effectiveness of our approach.The code and data are available for research purposes at https://github.com/cyk990422/BoPR.

翻訳日:2023-03-27 17:16:28 公開日:2023-03-24

# パラメータ化球面上の確率勾配勾配の収束と変分モンテカルロシミュレーションへの応用

Convergence of stochastic gradient descent on parameterized sphere with applications to variational Monte Carlo simulation ( http://arxiv.org/abs/2303.11602v2 )

ライセンス: Link先を確認

Nilin Abrahamsen and Zhiyan Ding and Gil Goldshlager and Lin Lin

(参考訳) ニューラルネットワークによってパラメータ化される高次元球面上の確率勾配勾配(SGD)型アルゴリズムを正規化定数まで解析する。教師付き学習の設定のための新しいアルゴリズムを提供し,その収束を理論的および数値的に示す。また、量子物理学において広く用いられている変分モンテカルロ法(VMC)に対応する教師なし設定に対する収束の最初の証明も提供する。

We analyze stochastic gradient descent (SGD) type algorithms on a high-dimensional sphere which is parameterized by a neural network up to a normalization constant. We provide a new algorithm for the setting of supervised learning and show its convergence both theoretically and numerically. We also provide the first proof of convergence for the unsupervised setting, which corresponds to the widely used variational Monte Carlo (VMC) method in quantum physics.

翻訳日:2023-03-27 17:16:09 公開日:2023-03-24

# ラベルノイズ学習のためのダイナミクス・アウェアロス

Dynamics-Aware Loss for Learning with Label Noise ( http://arxiv.org/abs/2303.11562v2 )

ライセンス: Link先を確認

Xiu-Chuan Li, Xiaobo Xia, Fei Zhu, Tongliang Liu, Xu-Yao Zhang, Cheng-Lin Liu

(参考訳) ラベルノイズはディープニューラルネットワーク(DNN)に深刻な脅威をもたらす。堅牢性で適合性を調整できるロバスト損失関数を採用することは、この問題に対処するための単純だが効果的な戦略である。しかし、これらの2つの要因間の広く使われている静的トレードオフは、ラベルノイズによって学習されるDNNの動的性質と矛盾し、性能が低下する。そこで本稿では,この問題を解決するためにDAL(Dynamics-Aware Los)を提案する。 DNNはまず一般化されたパターンを学習し、ラベルノイズを徐々に過度にオーバーフィットする傾向があるので、DALは最初は適合性を強化し、その後徐々に頑丈さの重みを増す。さらに、後段では、DNNは硬いものよりも正確にラベル付けされる可能性が高い簡単な例に重点を置いて、ラベルノイズの負の影響をさらに低減するためにブートストラップ項を導入する。詳細な理論解析と広範な実験結果の両方が本手法の優越性を示している。

Label noise poses a serious threat to deep neural networks (DNNs). Employing robust loss function which reconciles fitting ability with robustness is a simple but effective strategy to handle this problem. However, the widely-used static trade-off between these two factors contradicts the dynamic nature of DNNs learning with label noise, leading to inferior performance. Therefore, we propose a dynamics-aware loss (DAL) to solve this problem. Considering that DNNs tend to first learn generalized patterns, then gradually overfit label noise, DAL strengthens the fitting ability initially, then gradually increases the weight of robustness. Moreover, at the later stage, we let DNNs put more emphasis on easy examples which are more likely to be correctly labeled than hard ones and introduce a bootstrapping term to further reduce the negative impact of label noise. Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.

翻訳日:2023-03-27 17:16:01 公開日:2023-03-24

# 調和ベースと新しいクラス:一般化Few-Shotセグメンテーションのためのクラスコントラストアプローチ

Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation ( http://arxiv.org/abs/2303.13724v1 )

ライセンス: Link先を確認

Weide Liu, Zhonghua Wu, Yang Zhao, Yuming Fang, Chuan-Sheng Foo, Jun Cheng and Guosheng Lin

(参考訳) 少ショットセグメンテーション(FSSeg)の現在の手法は,基本クラスの性能を無視しながら,新しいクラスの性能向上に重点を置いている。この制限を克服するために、ベースクラスと新規クラスのセグメンテーションマスクの予測を目的とした、一般化された小ショットセグメンテーション(GFSSeg)のタスクが導入された。しかし、現在のプロトタイプベースの手法では、プロトタイプを更新する際にベースクラスと新規クラスの関係を明示的に考慮していないため、真のカテゴリを識別する性能は限られている。この課題に対処するために,プロトタイプ更新を規制し,異なるクラスからのプロトタイプ間の距離を広く促進するため,ベースクラスの性能を維持しながらクラスを区別するクラスコントラスト損失とクラス関係損失を提案する。提案手法は,PASCAL VOC および MS COCO データセット上での汎用小ショットセグメンテーションタスクに対して,新しい最先端性能を実現する。

Current methods for few-shot segmentation (FSSeg) have mainly focused on improving the performance of novel classes while neglecting the performance of base classes. To overcome this limitation, the task of generalized few-shot semantic segmentation (GFSSeg) has been introduced, aiming to predict segmentation masks for both base and novel classes. However, the current prototype-based methods do not explicitly consider the relationship between base and novel classes when updating prototypes, leading to a limited performance in identifying true categories. To address this challenge, we propose a class contrastive loss and a class relationship loss to regulate prototype updates and encourage a large distance between prototypes from different classes, thus distinguishing the classes from each other while maintaining the performance of the base classes. Our proposed approach achieves new state-of-the-art performance for the generalized few-shot segmentation task on PASCAL VOC and MS COCO datasets.

翻訳日:2023-03-27 16:23:38 公開日:2023-03-24

# ホモロジー量子ローター符号:トーションからの論理量子ビット

Homological Quantum Rotor Codes: Logical Qubits from Torsion ( http://arxiv.org/abs/2303.13723v1 )

ライセンス: Link先を確認

Christophe Vuillot and Alessandro Ciani and Barbara M. Terhal

(参考訳) 複数の量子ローターを用いて論理情報を符号化するホモロジー量子ローター符号を正式に定義する。これらの符号は、論理振動子を符号化する線形振動子符号と同様に、量子ビットや量子ビットのホモロジーまたはCSS量子符号を一般化する。量子ビットや振動子とは異なり、ホモロジー量子ローター符号は、下層の鎖複体のホモロジーによって、論理ローターと論理キューディットの両方を符号化することができる。特に、実射影平面またはM\ "{o}bius strip" が量子ビットを符号化することによって得られる鎖複体に基づくコードである。本稿では, 連続安定器位相シフトによって拡散する論理演算子の概念により, 量子ビットの場合よりも微妙な符号間の距離スケーリングについて考察する。 2次元および3次元多様体に基づくホモロジー量子ロータ符号の構成と連鎖錯体の積を与える。我々は、キータエフの現在のミラー量子ビット(m\"{o}bius strip qubit)と同様に$0$-$\pi$-qubitが、そのようなコードの小さな例であり、拡張の可能性について議論している。

We formally define homological quantum rotor codes which use multiple quantum rotors to encode logical information. These codes generalize homological or CSS quantum codes for qubits or qudits, as well as linear oscillator codes which encode logical oscillators. Unlike for qubits or oscillators, homological quantum rotor codes allow one to encode both logical rotors and logical qudits, depending on the homology of the underlying chain complex. In particular, such a code based on the chain complex obtained from tessellating the real projective plane or a M\"{o}bius strip encodes a qubit. We discuss the distance scaling for such codes which can be more subtle than in the qubit case due to the concept of logical operator spreading by continuous stabilizer phase-shifts. We give constructions of homological quantum rotor codes based on 2D and 3D manifolds as well as products of chain complexes. Superconducting devices being composed of islands with integer Cooper pair charges could form a natural hardware platform for realizing these codes: we show that the $0$-$\pi$-qubit as well as Kitaev's current-mirror qubit -- also known as the M\"{o}bius strip qubit -- are indeed small examples of such codes and discuss possible extensions.

翻訳日:2023-03-27 16:23:22 公開日:2023-03-24

# 放射線治療中の患者の食道炎の存在と重症度を自動的に抽出する自然言語処理

Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy ( http://arxiv.org/abs/2303.13722v1 )

ライセンス: Link先を確認

Shan Chen, Marco Guevara, Nicolas Ramirez, Arpi Murray, Jeremy L. Warner, Hugo JWL Aerts, Timothy A. Miller, Guergana K. Savova, Raymond H. Mak, Danielle S. Bitterman

(参考訳) 放射線治療(RT)毒性は生存と生活の質を損なうことがあるが、未研究のままである。実世界の証拠は毒性の理解を改善する可能性を秘めているが、毒性情報はしばしば臨床記録に残されている。胸部RT治療患者における食道炎の存在と重症度を判定するための自然言語処理(NLP)モデルを開発した。 3つの食道炎分類タスクの統計的および事前訓練されたBERTモデル 1)食道炎の存在,課題 2)重症食道炎の有無、及び課題 3) 食道炎と1学年対2-3。 RTを施行した食道癌患者345名を対象に移植性試験を行った。微調整のPubmedBERTは最高のパフォーマンスを得た。最も優れたマクロF1は、タスク1、2、3それぞれ0.92、0.82、0.74であった。微調整中に最も情報性の高いノートセクションを選択すると、すべてのタスクでマクロF1が2%以上向上した。シルバーラベルデータのマクロF1は全タスクで3%以上改善された。食道癌注記では,第1節,第2節,第3節のマクロF1は0.73,第74,第0.65であった。当科におけるCTCAEガイドラインから食道炎毒性の重症度を自動的に抽出する試みとしては,これが初めてである。有望なパフォーマンスは、拡張されたドメインにおけるNLPベースの自動詳細な毒性監視のための概念実証を提供する。

Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet remain under-studied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for three esophagitis classification tasks: Task 1) presence of esophagitis, Task 2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by over 2% for all tasks. Silver-labeled data improved the macro-F1 by over 3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning. To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinic notes. The promising performance provides proof-of-concept for NLP-based automated detailed toxicity monitoring in expanded domains.

翻訳日:2023-03-27 16:22:58 公開日:2023-03-24

# ReCOGS:セマンティック解釈の評価における論理形式の詳細について

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation ( http://arxiv.org/abs/2303.13716v1 )

ライセンス: Link先を確認

Zhengxuan Wu, Christopher D. Manning, Christopher Potts

(参考訳) 構成一般化ベンチマークは、モデルが新しい文の意味を正確に計算できるかどうかを評価するが、論理形式(LF)予測の観点からこれを運用する。これにより、選択されたLFの意味的に無関係な詳細がモデルのパフォーマンスを形作るという懸念が持ち上がる。この懸念はCOGSベンチマーク(Kim and Linzen, 2020)で実現されていると論じる。 COGSは、現在のモデルでは不可能と思われる一般化分割を呈し、これらのモデルの起訴と見なすことができる。しかし, COGS LFs の偶発的特徴に負の相関がみられた。これらのLFを意味論的に等価なものに変換し、意味論的解釈とは無関係な能力を分解すると、ベースラインモデルでさえ牽引される。近年の COGS LF の変数自由翻訳では同様の結論が示唆されているが,この形式は意味論的に等価ではなく,COGS の意味を正確に表現することはできない。これらの結果から,COGSの改良版であるReCOGSの提案が示唆された。全体として,構成一般化と注意深いベンチマークタスク設計の重要性を再確認した。

Compositional generalization benchmarks seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen, 2020). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.

翻訳日:2023-03-27 16:22:37 公開日:2023-03-24

# 逆量子アニールとh-ゲインによる初期状態符号化

Initial state encoding via reverse quantum annealing and h-gain features ( http://arxiv.org/abs/2303.13748v1 )

ライセンス: Link先を確認

Elijah Pelofske, Georg Hahn, Hristo Djidjev

(参考訳) 量子アニーリング(quantum annealing)は、組合せ最適化問題の大域的最小解を得るために量子揺らぎを利用する特殊な量子計算手法である。 d-wave systems, inc.はクラウドコンピューティングリソースとして利用可能なquantum annealersを製造し、アニーリング計算で使用されるアニールスケジュールをプログラムできる。本稿では,量子アニーラタによって返される解の質を初期状態の符号化により改善することに関心を寄せる。このような初期状態を符号化できる2つのD-Wave機能、リバースアニーリングとh-ゲイン機能について検討する。 Reverse annealing (RA) は、良い解を表す古典的な状態から始まり、逆の場が存在する点へ後退し、前方のアニールでアニール処理を終了する既知の解を洗練することを目的としている。 h-ゲイン(HG)機能により、ハミルトニアンの線形(h$)バイアスに時間依存重み付けスキームを配置することができる。また,RAに類似した後方位相とHG初期状態符号化を用いた前方位相のハイブリッド手法も検討した。問題に対してRAとHGを反復的に適用するという考え方を,最適でない初期状態を単調に改善することを目的として検討する。 HGエンコーディング技術は、重み付き最大カット問題や重み付き最大斜め問題など、様々な入力問題に対して評価され、いくつかの問題に対してHG手法がRAの代替となることを示す。また, D-Wave Chimera チップと Pegasus チップをネイティブ接続したランダムスピングラス上で, RA および HG 初期状態での繰り返し処理の動作について検討した。

Quantum annealing is a specialized type of quantum computation that aims to use quantum fluctuations in order to obtain global minimum solutions of combinatorial optimization problems. D-Wave Systems, Inc., manufactures quantum annealers, which are available as cloud computing resources, and allow users to program the anneal schedules used in the annealing computation. In this paper, we are interested in improving the quality of the solutions returned by a quantum annealer by encoding an initial state. We explore two D-Wave features allowing one to encode such an initial state: the reverse annealing and the h-gain features. Reverse annealing (RA) aims to refine a known solution following an anneal path starting with a classical state representing a good solution, going backwards to a point where a transverse field is present, and then finishing the annealing process with a forward anneal. The h-gain (HG) feature allows one to put a time-dependent weighting scheme on linear ($h$) biases of the Hamiltonian, and we demonstrate that this feature likewise can be used to bias the annealing to start from an initial state. We also consider a hybrid method consisting of a backward phase resembling RA, and a forward phase using the HG initial state encoding. Importantly, we investigate the idea of iteratively applying RA and HG to a problem, with the goal of monotonically improving on an initial state that is not optimal. The HG encoding technique is evaluated on a variety of input problems including the weighted Maximum Cut problem and the weighted Maximum Clique problem, demonstrating that the HG technique is a viable alternative to RA for some problems. We also investigate how the iterative procedures perform for both RA and HG initial state encoding on random spin glasses with the native connectivity of the D-Wave Chimera and Pegasus chips.

翻訳日:2023-03-27 16:15:23 公開日:2023-03-24

# FixFit:過剰決定モデルにおける逆問題の解法にパラメータ圧縮を用いる

FixFit: using parameter-compression to solve the inverse problem in overdetermined models ( http://arxiv.org/abs/2303.13746v1 )

ライセンス: Link先を確認

Botond B Antal, Anthony G Chesebro, Helmut H Strey, Lilianne R Mujica-Parodi, Corey Weistuch

(参考訳) 科学のあらゆる分野は数学的モデルに依存する。複雑な非線形モデルを使用する際の根本的な問題の一つは、モデルパラメータ間の相互作用がデータに等しく適合する複数のパラメータセットにつながるため、データ駆動パラメータ推定が失敗することが多いことである。そこで本研究では、与えられた数学的モデルのパラメータをモデル出力に固有の潜在表現に圧縮する、この問題に対処する新しい手法であるFixFitを開発する。この表現は、モデルパラメータとモデル出力のデータ対上にボトルネック層を持つニューラルネットワークをトレーニングすることで得られる。ボトルネック層ノードはユニークな潜在パラメータに対応し、その次元はモデルの情報内容を示す。トレーニングされたニューラルネットワークは、ボトルネック層をエンコーダに分割して冗長性とデコーダを特徴付けて、測定値から潜在パラメータを一意に推定することができる。古典物理学と神経科学の2つのユースケースでFixFitを実証する。

All fields of science depend on mathematical models. One of the fundamental problems with using complex nonlinear models is that data-driven parameter estimation often fails because interactions between model parameters lead to multiple parameter sets fitting the data equally well. Here, we develop a new method to address this problem, FixFit, which compresses a given mathematical model's parameters into a latent representation unique to model outputs. We acquire this representation by training a neural network with a bottleneck layer on data pairs of model parameters and model outputs. The bottleneck layer nodes correspond to the unique latent parameters, and their dimensionality indicates the information content of the model. The trained neural network can be split at the bottleneck layer into an encoder to characterize the redundancies and a decoder to uniquely infer latent parameters from measurements. We demonstrate FixFit in two use cases drawn from classical physics and neuroscience.

翻訳日:2023-03-27 16:14:47 公開日:2023-03-24

# EdgeTran: モバイルエッジプラットフォーム上での効率的な推論のための共設計トランスフォーマー

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms ( http://arxiv.org/abs/2303.13745v1 )

ライセンス: Link先を確認

Shikhar Tuli and Niraj K. Jha

(参考訳) 効率的なトランスモデルの自動設計は、最近、産業や学術から大きな注目を集めている。しかしながら、ほとんどの研究は、最高のパフォーマンスのトランスフォーマーアーキテクチャを探しながら、特定のメトリクスのみに焦点を当てている。さらに、従来の複雑で大規模なトランスフォーマーモデルを低スループットのエッジプラットフォーム上で実行することは難しい問題である。本研究では,トランスアーキテクチャとエッジデバイスの多種多様なセットの設計空間におけるハードウェア性能測定をプロファイリングするProTranというフレームワークを提案する。このプロファイラを,提案する共同設計手法と組み合わせて,与えられたタスクの精度が高く,レイテンシ,エネルギー消費,ピーク電力ドローを最小化し,エッジ展開を可能にする最善のモデルを得る。精度とハードウェア性能を協調最適化するためのフレームワークをEdgeTranと呼ぶ。最高のトランスフォーマーモデルとエッジデバイスペアを検索します。最後にgptranを提案する。gptranは、ハードウェアを意識した方法で精度をさらに向上させる、マルチステージのブロックレベルの成長後処理ステップである。得られたトランスモデルは2.8$\times$小さく、ベースライン(BERT-Base)よりも0.8%高いGLUEスコアを持つ。選択されたエッジデバイス上での推論により、15.0%のレイテンシ、10.0$\times$低エネルギー、および10.8$\times$低ピークパワードローが可能となる。

Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8$\times$ smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0$\times$ lower energy, and 10.8$\times$ lower peak power draw compared to an off-the-shelf GPU.

翻訳日:2023-03-27 16:14:31 公開日:2023-03-24

# 潜流拡散モデルを用いた条件付き画像・映像生成

Conditional Image-to-Video Generation with Latent Flow Diffusion Models ( http://arxiv.org/abs/2303.13744v1 )

ライセンス: Link先を確認

Haomiao Ni, Changhao Shi, Kai Li, Sharon X. Huang, Martin Renqiang Min

(参考訳) 条件付き画像合成(cI2V)は、画像(例えば、人の顔)と条件(例えば、笑顔のようなアクションクラスラベル)から始まる新しい可視ビデオの合成を目的としている。 cI2Vタスクの鍵となる課題は、与えられた画像と条件に対応する現実的な空間的外観と時間的ダイナミクスの同時生成である。本稿では,所定の条件に基づいて潜時空間内の光流列を合成し,所定の画像をワープする新しい潜時流拡散モデル(LFDM)を用いたcI2Vのアプローチを提案する。従来の直接合成法と比較して,提案するLFDMは,与えられた画像の空間的内容を完全に活用し,生成した時間的コヒーレントな流れに応じて潜時空間でワープすることで,空間的詳細と時間的動きをよりよく合成することができる。 LFDMの訓練は,(1)映像フレーム間の潜時流を推定するフロー予測器を含む空間コンテンツ生成のための潜時流自動エンコーダを訓練する教師なし学習段階と,(2)時間潜時流生成のための3D-UNetベースの拡散モデル(DM)を訓練する条件付き学習段階とからなる。従来の画素空間や時間的情報を扱う潜在特徴空間で動作するDMとは異なり、われわれのLFDMのDMは動作生成のための低次元の潜在フロー空間を学習するだけで、より計算効率がよい。複数のデータセットに対して総合的な実験を行い、LFDMは先行技術より一貫して優れています。さらに,LFDMは画像デコーダを微調整することで,新しい領域に容易に適応できることを示す。私たちのコードはhttps://github.com/nihaomiao/CVPR23_LFDMで利用可能です。

Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate stages: (1) an unsupervised learning stage to train a latent flow auto-encoder for spatial content generation, including a flow predictor to estimate latent flow between pairs of video frames, and (2) a conditional learning stage to train a 3D-UNet-based diffusion model (DM) for temporal latent flow generation. Unlike previous DMs operating in pixel space or latent feature space that couples spatial and temporal information, the DM in our LFDM only needs to learn a low-dimensional latent flow space for motion generation, thus being more computationally efficient. We conduct comprehensive experiments on multiple datasets, where LFDM consistently outperforms prior arts. Furthermore, we show that LFDM can be easily adapted to new domains by simply finetuning the image decoder. Our code is available at https://github.com/nihaomiao/CVPR23_LFDM.

翻訳日:2023-03-27 16:14:06 公開日:2023-03-24

# teglo: シングルビュー画像からの高忠実度標準テクスチャマッピング

TEGLO: High Fidelity Canonical Texture Mapping from Single-View Images ( http://arxiv.org/abs/2303.13743v1 )

ライセンス: Link先を確認

Vishal Vinod, Tanmay Shah, Dmitry Lagun

(参考訳) 最近のneural fields(nfs)の研究は、クラス固有のシングルビューイメージコレクションから3d表現を学習している。しかし、高周波情報を保存した入力データを再構築することはできない。さらに, これらの手法は外観を幾何学から切り離さないため, テクスチャ転送や編集といった作業には適さない。本研究では,オブジェクトのクラスに対して,単一のビューから3次元表現を学習するためのEGLO(Textured EG3D-GLO)を提案する。我々は, 条件付きニューラルレージアンスフィールド(NeRF)を, 明示的な3次元監視なしに訓練することで実現した。 2次元正準空間への密接な対応写像を作成することにより,この手法を編集機能と対応づける。このようなマッピングは,共有トポロジを持つメッシュを必要とせずに,テクスチャ転送とテクスチャ編集を可能にする。我々の重要な洞察は、入力画像ピクセルをテクスチャ空間にマッピングすることで、ほぼ完璧に再現できる(>=74dB PSNR at 1024^2 resolution)。提案方式により,メガピクセル画像解像度で高周波数詳細を持つ高品質な3次元一貫性を有する新規ビュー合成が可能となる。

Recent work in Neural Fields (NFs) learn 3D representations from class-specific single view image collections. However, they are unable to reconstruct the input data preserving high-frequency details. Further, these methods do not disentangle appearance from geometry and hence are not suitable for tasks such as texture transfer and editing. In this work, we propose TEGLO (Textured EG3D-GLO) for learning 3D representations from single view in-the-wild image collections for a given class of objects. We accomplish this by training a conditional Neural Radiance Field (NeRF) without any explicit 3D supervision. We equip our method with editing capabilities by creating a dense correspondence mapping to a 2D canonical space. We demonstrate that such mapping enables texture transfer and texture editing without requiring meshes with shared topology. Our key insight is that by mapping the input image pixels onto the texture space we can achieve near perfect reconstruction (>= 74 dB PSNR at 1024^2 resolution). Our formulation allows for high quality 3D consistent novel view synthesis with high-frequency details at megapixel image resolution.

翻訳日:2023-03-27 16:13:31 公開日:2023-03-24

# 量子鍵分布プロトコルの準備と測定における安全な距離範囲向上のためのデッドタイム最適化

Dead-time optimization to increase secure distance range in prepare and measure quantum key distribution protocols ( http://arxiv.org/abs/2303.13742v1 )

ライセンス: Link先を確認

Carlos Wiechers, J.L. Lucio, X\'ochitl S\'anchez-Lozano, Rafael G\'omez-Medina, Mariana Salado-Mej\'ia

(参考訳) アフターパルスは、離散可変量子鍵分布系がセキュアである距離を制限する要因であり、単光子検出器で共通の特徴である。この現象の関連性は、確率的かつ自己相互作用的な性質と、その速度が雪崩の回数に比例して上昇し、量子ビット誤差率が増加するという事実に起因している。ここでは、残差補正がデッドタイム値に依存するデッドタイムおよびアフターパルス補正を含む効果的な解析モデルを提案する。このモデルは、ゲート単一光子検出器を用いた量子鍵分布プロトコル(標準およびデコイ版)の準備と測定の性能を評価するのに有用である。このモデルは、安全な通信のために全距離範囲にわたって秘密鍵レートを数値的に最適化する表現を提供し、量子ビット誤り率とセキュア鍵レートの計算を可能にする。従来の方法では、デッドタイム値は距離に関係なく固定され、高い動作周波数でより関係のある残脈効果によりチャネルの距離範囲が制限される。ここでは、デッドタイム値の最適化により、秘密鍵を共有するためのチャネル距離が増加することを示す。

Afterpulsing is a factor limiting the distance over which discrete-variable quantum key distribution systems are secure, and a common feature in single-photon detectors. The relevance of this phenomenon stems from its stochastic, self-interacting nature and the fact that its rate rises with the number of avalanche events, which increases the quantum bit error rate. Here we introduce an effective analytic model, including dead-time and afterpulsing corrections, where afterpulsing correction depends on dead-time value. This model is useful to evaluate the performance of prepare and measure quantum key distribution protocols (standard and decoy versions) that use gated single photon detectors. The model provides an expression to numerically optimize the secret key rate over the full distance range for secure communication, enabling in this way the calculation of quantum bit error rate and secure key rate. In the conventional procedure, the dead-time value is fixed regardless of distance, limiting the distance range of the channel due to remaining afterpulsing effects, which are more relevant at higher operating frequencies. Here we demonstrate that optimizing the dead-time values increases the distance range of the channel to share secret keys.

翻訳日:2023-03-27 16:13:13 公開日:2023-03-24

# mowe:複数の悪天候除去のための気象専門家の混合

MoWE: Mixture of Weather Experts for Multiple Adverse Weather Removal ( http://arxiv.org/abs/2303.13739v1 )

ライセンス: Link先を確認

Yulin Luo, Rui Zhao, Xiaobao Wei, Jinwei Chen, Yijie Lu, Shenghao Xie, Tianyu Wang, Ruiqin Xiong, Ming Lu, Shanghang Zhang

(参考訳) 現在、ほとんどの悪天候除去タスクは、デライニング、デリーディング、デヘイジングなど、独立して処理されている。しかし、自律運転の場合、天候の種類、強度、混合度は不明であり、分離されたタスク設定はこれらの複雑な条件をうまく扱えない。さらに、自動運転におけるビジョンアプリケーションは、しばしば高レベルなタスクを目標としているが、既存の気象除去手法では、知覚的タスクのパフォーマンスと信号の忠実度の関係を無視している。この目的のために,上流タスクにおいて,複雑な気象除去を扱うための新しい気象専門家(mowe)トランスフォーマフレームワークである \textbf{mixture of weather experts (mowe)を提案する。我々は,天気予報時に天気ラベルを必要とせずに,天気予報に関係のある専門家を対象とする「気象予報ルーター」を設計した。多様な気象条件に対処するため,我々は隣接するトークン間で情報を融合する \textbf{multi-scale experts} を提案する。下流タスクでは, セマンティックラベルを必要とせずに, 画像処理モデルの出力が高レベルの認識タスクに適しているかを測定するために, テキストbf{Label-free Perception-aware Metric}を提案する。我々は、既存の手法の複数の天候除去性能をベンチマークするために、自律運転シナリオに対して構文データセット \textbf{MAW-Sim} を収集する。私たちのmoweは,提案するデータセットと2つのパブリックデータセット,すなわち全天候と降雨/フォグ・シティスケープにおける上流タスクにおけるsoma性能を実現し,他の手法と比較して下流セグメンテーションタスクにおける知覚結果も向上する。私たちのコードとデータセットは受け入れてからリリースされます。

Currently, most adverse weather removal tasks are handled independently, such as deraining, desnowing, and dehazing. However, in autonomous driving scenarios, the type, intensity, and mixing degree of the weather are unknown, so the separated task setting cannot deal with these complex conditions well. Besides, the vision applications in autonomous driving often aim at high-level tasks, but existing weather removal methods neglect the connection between performance on perceptual tasks and signal fidelity. To this end, in upstream task, we propose a novel \textbf{Mixture of Weather Experts(MoWE)} Transformer framework to handle complex weather removal in a perception-aware fashion. We design a \textbf{Weather-aware Router} to make the experts targeted more relevant to weather types while without the need for weather type labels during inference. To handle diverse weather conditions, we propose \textbf{Multi-scale Experts} to fuse information among neighbor tokens. In downstream task, we propose a \textbf{Label-free Perception-aware Metric} to measure whether the outputs of image processing models are suitable for high level perception tasks without the demand for semantic labels. We collect a syntactic dataset \textbf{MAW-Sim} towards autonomous driving scenarios to benchmark the multiple weather removal performance of existing methods. Our MoWE achieves SOTA performance in upstream task on the proposed dataset and two public datasets, i.e. All-Weather and Rain/Fog-Cityscapes, and also have better perceptual results in downstream segmentation task compared to other methods. Our codes and datasets will be released after acceptance.

翻訳日:2023-03-27 16:12:52 公開日:2023-03-24

# GQMモデルに基づく機械学習のためのデータセットのライセンスに関する研究

An investigation of licensing of datasets for machine learning based on the GQM model ( http://arxiv.org/abs/2303.13735v1 )

ライセンス: Link先を確認

Junyu Chen, Norihiro Yoshida, Hiroaki Takada

(参考訳) データセットのライセンスは現在、機械学習システムの開発において問題となっている。そして、機械学習システムの開発において、最も広く使われているのは、利用可能なデータセットである。しかし、公開されているデータセット内の画像は主にインターネットから取得されているため、いくつかの画像は商業的に利用できない。さらに、機械学習システムの開発者は、機械学習モデルをトレーニングする際にデータセットのライセンスを気にしないことが多い。要約すると、機械学習システムのためのデータセットのライセンスは、この段階であらゆる面で不完全である。 2つのコレクションデータセットを調査した結果、現在のデータセットのほとんどはライセンスが欠如しており、ライセンスが欠如しているため、データセットの商用可用性が決定できないことが分かった。そこで、より科学的かつ体系的なアプローチで、データセットのライセンスと、データセットを用いた機械学習システムのライセンスについて調査し、機械学習システムの将来の開発者にとって、より簡単かつコンプライアンスの高いものにすることを決定した。

Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.

翻訳日:2023-03-27 16:12:19 公開日:2023-03-24

# 視覚トランスフォーマーの注意はどのように働くのか? Visual Analyticsの試み

How Does Attention Work in Vision Transformers? A Visual Analytics Attempt ( http://arxiv.org/abs/2303.13731v1 )

ライセンス: Link先を確認

Yiran Li, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh, Yan Zheng, Wei Zhang, Kwan-Liu Ma

(参考訳) vision transformer (vit) は、逐次データから画像へトランスフォーマーモデルの成功を広げる。モデルは画像を多数の小さなパッチに分解し、それらをシーケンスに配置する。マルチヘッドの自己注意をシーケンスに適用し、パッチ間の注意を学習する。シーケンシャルデータに対するトランスフォーマーの解釈は成功したが、ViTの解釈にはほとんど取り組みがなく、多くの疑問は未解決のままである。例えば、多くの注目層の中で、どちらが重要なのか? 個々のパッチは、異なる頭の空間的隣人にどれだけ強いか? 個々の頭がどのような注意パターンを学んだか? 本研究では、視覚分析手法を用いてこれらの質問に答える。具体的には、まず、複数のプルーニングベースのメトリクスを導入することで、ViTにおいてどのヘッドがより重要かを特定する。次に,各頭部のパッチ間における注目強度の空間分布と,注目層間における注目強度の傾向を考察した。第3に、オートエンコーダに基づく学習ソリューションを用いて、個々の頭が学習できるすべての注意パターンを要約する。重要な頭部の注意力とパターンを調べることで、なぜ重要なのかを答える。複数のViTについて経験豊富な深層学習の専門家との具体的なケーススタディを通じて、頭の重要性、注意力、注意パターンからViTの理解を深めるソリューションの有効性を検証する。

Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.

翻訳日:2023-03-27 16:12:05 公開日:2023-03-24

# ブロックチェーンを用いたセキュア・プライベートフェデレーション学習に関する調査--資源制約型コンピューティングの理論と応用

A Survey on Secure and Private Federated Learning Using Blockchain: Theory and Application in Resource-constrained Computing ( http://arxiv.org/abs/2303.13727v1 )

ライセンス: Link先を確認

Ervin Moore, Ahmed Imteaj, Shabnam Rezapour, M. Hadi Amini

(参考訳) 近年、高度な機械学習と人工知能の急速なブームと、新たなセキュリティとプライバシーの脅威によって、連合学習(federated learning:fl)が広く普及している。 FLは、機密データをエンティティに公開することなく、エッジデバイスのローカルデータストレージから効率的なモデル生成を可能にする。このパラダイムは、ユーザの機密データのプライバシー問題を部分的に緩和するが、FLプロセスのパフォーマンスは脅威となり、サイバー脅威やプライバシー侵害技術の増加によりボトルネックに達する。 FLプロセスの普及を早めるために、FL環境のためのブロックチェーンの統合は、アカデミックや業界の人々から多くの注目を集めている。ブロックチェーンは、分散化、不変性、コンセンサス、透明性特性によって、セキュリティとプライバシの脅威を防止する可能性がある。しかし、ブロックチェーンメカニズムが高価な計算リソースを必要とする場合、リソースに制約のあるFLクライアントはトレーニングに関わらない。これを踏まえて、この調査は、リソース制約付きfl環境におけるブロックチェーンの展開成功の課題、ソリューション、今後の方向性のレビューに焦点を当てている。 FLプロセスに適したさまざまなブロックチェーンメカニズムを包括的にレビューし、限られたリソース予算に対するトレードオフについて議論する。さらに、リソース制限されたFL環境で観測できるサイバー脅威を広範囲に分析し、ブロックチェーンがサイバー攻撃を阻止する重要な役割を担っているかを分析します。この目的のために、高レベルの信頼性、データプライバシ、分散コンピューティングパフォーマンスを提供するブロックチェーンとフェデレーション付き学習の結合に対する潜在的なソリューションを強調します。

Federated Learning (FL) has gained widespread popularity in recent years due to the fast booming of advanced machine learning and artificial intelligence along with emerging security and privacy threats. FL enables efficient model generation from local data storage of the edge devices without revealing the sensitive data to any entities. While this paradigm partly mitigates the privacy issues of users' sensitive data, the performance of the FL process can be threatened and reached a bottleneck due to the growing cyber threats and privacy violation techniques. To expedite the proliferation of FL process, the integration of blockchain for FL environments has drawn prolific attention from the people of academia and industry. Blockchain has the potential to prevent security and privacy threats with its decentralization, immutability, consensus, and transparency characteristic. However, if the blockchain mechanism requires costly computational resources, then the resource-constrained FL clients cannot be involved in the training. Considering that, this survey focuses on reviewing the challenges, solutions, and future directions for the successful deployment of blockchain in resource-constrained FL environments. We comprehensively review variant blockchain mechanisms that are suitable for FL process and discuss their trade-offs for a limited resource budget. Further, we extensively analyze the cyber threats that could be observed in a resource-constrained FL environment, and how blockchain can play a key role to block those cyber attacks. To this end, we highlight some potential solutions towards the coupling of blockchain and federated learning that can offer high levels of reliability, data privacy, and distributed computing performance.

翻訳日:2023-03-27 16:11:45 公開日:2023-03-24

# イベント誘導ビデオスーパーリゾリューションのための空間的暗黙的ニューラル表現の学習

Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution ( http://arxiv.org/abs/2303.13767v1 )

ライセンス: Link先を確認

Yunfan Lu, Zipeng Wang, Minjie Liu, Hongjian Wang, Lin Wang

(参考訳) イベントカメラは、強度変化を非同期に検知し、高いダイナミックレンジと低レイテンシでイベントストリームを生成する。これは、挑戦的なビデオ超解像(VSR)タスクを導くためにイベントを利用する研究にインスピレーションを与えている。本稿では,イベントの高時間分解能の利点を生かして,ランダムスケールでのVSRの実現という新たな課題に対処する試みを行う。これは、VSRを導く際の事象の時空間的情報を表現することが困難である。そこで本稿では,イベントの時空間補間を統合されたフレームワークでVSRに組み込む新しいフレームワークを提案する。我々のキーとなる考え方は、探索された時空間座標とRGBフレームとイベントの両方の特徴から暗黙の神経表現を学ぶことである。本手法は3つの部分を含む。具体的には、Spatial-Temporal Fusion (STF)モジュールは、まずイベントとRGBフレームから3D特徴を学習する。そして、時間フィルタ(TF)モジュールは、クエリされたタイムスタンプ近くのイベントからより明示的な動作情報をアンロックし、2D特徴を生成する。最後に、Spatial Temporal Implicit Representation (STIR)モジュールは、これらの2つのモジュールの出力から任意の解像度でSRフレームを復元する。さらに、空間的に整列したイベントとRGBフレームを持つ実世界のデータセットを収集する。大規模な実験により,本手法は先行技術を大きく上回り,ランダムスケールのVSR(例えば6.5。コードとデータセットはhttps: //vlis2022.github.io/cvpr23/egvsrで入手できる。

Event cameras sense the intensity changes asynchronously and produce event streams with high dynamic range and low latency. This has inspired research endeavors utilizing events to guide the challenging video superresolution (VSR) task. In this paper, we make the first attempt to address a novel problem of achieving VSR at random scales by taking advantages of the high temporal resolution property of events. This is hampered by the difficulties of representing the spatial-temporal information of events when guiding VSR. To this end, we propose a novel framework that incorporates the spatial-temporal interpolation of events to VSR in a unified framework. Our key idea is to learn implicit neural representations from queried spatial-temporal coordinates and features from both RGB frames and events. Our method contains three parts. Specifically, the Spatial-Temporal Fusion (STF) module first learns the 3D features from events and RGB frames. Then, the Temporal Filter (TF) module unlocks more explicit motion information from the events near the queried timestamp and generates the 2D features. Lastly, the SpatialTemporal Implicit Representation (STIR) module recovers the SR frame in arbitrary resolutions from the outputs of these two modules. In addition, we collect a real-world dataset with spatially aligned events and RGB frames. Extensive experiments show that our method significantly surpasses the prior-arts and achieves VSR with random scales, e.g., 6.5. Code and dataset are available at https: //vlis2022.github.io/cvpr23/egvsr.

翻訳日:2023-03-27 16:05:08 公開日:2023-03-24

# GQE-Net:ポイントクラウドカラー属性のためのグラフベースの品質向上ネットワーク

GQE-Net: A Graph-based Quality Enhancement Network for Point Cloud Color Attribute ( http://arxiv.org/abs/2303.13764v1 )

ライセンス: Link先を確認

Jinrui Xing, Hui Yuan, Raouf Hamzaoui, Hao Liu, and Junhui Hou

(参考訳) 近年、点雲は3次元(3次元)の視覚オブジェクトやシーンを表現するために人気が高まっている。点雲を効率的に保存・送信するために圧縮法が開発されているが、品質が劣化することが多い。点雲の色歪みを低減するため,幾何学情報を補助入力とし,グラフ畳み込みブロックを用いて局所特徴を効率的に抽出するグラフベース品質向上ネットワーク(GQE-Net)を提案する。具体的には,マルチヘッドグラフアテンション機構を備えた並列シリアルグラフアテンションモジュールを用いて重要な点や特徴に着目し,それらを融合させる。さらに,点間の正規性と幾何学的距離を考慮に入れた特徴改善モジュールを設計する。 GPUメモリ容量の制限の中で機能するために、歪んだポイントクラウドはオーバーラップ可能な3Dパッチに分割され、品質向上のためにGQE-Netに送られる。異なる色成分間のデータ分布の違いを考慮し、3つの色成分について3つのモデルを訓練する。実験結果から,本手法は最先端性能を実現することが示された。例えば、G-PCCのコーディング標準テストモデルにGQE-Netを実装する際には、Y、Cb、Crの高密度点雲にそれぞれ14.0%、9.3%、14.5%のBD-rateの保存値に対応する0.43dB、0.25dB、0.36dBのBjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR) が達成される。

In recent years, point clouds have become increasingly popular for representing three-dimensional (3D) visual objects and scenes. To efficiently store and transmit point clouds, compression methods have been developed, but they often result in a degradation of quality. To reduce color distortion in point clouds, we propose a graph-based quality enhancement network (GQE-Net) that uses geometry information as an auxiliary input and graph convolution blocks to extract local features efficiently. Specifically, we use a parallel-serial graph attention module with a multi-head graph attention mechanism to focus on important points or features and help them fuse together. Additionally, we design a feature refinement module that takes into account the normals and geometry distance between points. To work within the limitations of GPU memory capacity, the distorted point cloud is divided into overlap-allowed 3D patches, which are sent to GQE-Net for quality enhancement. To account for differences in data distribution among different color omponents, three models are trained for the three color components. Experimental results show that our method achieves state-of-the-art performance. For example, when implementing GQE-Net on the recent G-PCC coding standard test model, 0.43 dB, 0.25 dB, and 0.36 dB Bjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR), corresponding to 14.0%, 9.3%, and 14.5% BD-rate savings can be achieved on dense point clouds for the Y, Cb, and Cr components, respectively.

翻訳日:2023-03-27 16:04:43 公開日:2023-03-24

# エッジフリーだが構造対応:GNNからMPPへのプロトタイプ誘導知識蒸留

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs ( http://arxiv.org/abs/2303.13763v1 )

ライセンス: Link先を確認

Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

(参考訳) グラフタスクにおける低遅延多層パーセプトロン~(MLP)への高精度グラフニューラルネットワーク〜(GNN)の蒸留はホットな研究トピックとなっている。しかし、MPPはノード機能にのみ依存しており、グラフ構造情報の取得に失敗する。従来の手法では、グラフエッジをMLPの余分な入力に処理することでこの問題に対処するが、このようなグラフ構造は様々なシナリオでは利用できない。そこで我々は,グラフエッジ~(エッジフリー)を必要とせず,構造を意識したMLPを学習するプロトタイプガイド型知識蒸留(PGKD)法を提案する。具体的には, GNN教師のグラフ構造情報を解析し, エッジフリー環境でプロトタイプを用いて, GNNからMPPに抽出する。一般的なグラフベンチマーク実験の結果,提案したPGKDの有効性とロバスト性を示した。

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.

翻訳日:2023-03-27 16:04:08 公開日:2023-03-24

# 逆気象光流に対する教師なし階層型ドメイン適応

Unsupervised Hierarchical Domain Adaptation for Adverse Weather Optical Flow ( http://arxiv.org/abs/2303.13761v1 )

ライセンス: Link先を確認

Hanyu Zhou, Yi Chang, Gang Chen, Luxin Yan

(参考訳) 光流量推定は大きな進歩を遂げたが、通常は悪天候下で劣化する。 semi/full-supervisedメソッドは良い試みをしてきたが、合成画像と実際の悪天候画像のドメインシフトはパフォーマンスを低下させるだろう。この問題を軽減するため、私たちの出発点は、ソースクリーンドメインからの知識を無監督で、劣化したドメインをターゲットに移すことです。我々の重要な洞察は、悪天候はシーンの内在的な光学的流れを変えるのではなく、クリーン画像と劣化画像のワープ誤差に大きな違いをもたらすことである。本研究では,階層的運動境界適応による悪天候光流に対する初の教師なしフレームワークを提案する。具体的には、まず画像翻訳を用いて、クリーンドメインと劣化ドメイン間の変換関係を構築する。動き適応では, フロー一貫性の知識を用いて, クロスドメイン光流を運動非分散共通空間に整列し, 清涼な天候からの光流を案内知識として利用し, 悪天候のための予備光流を得る。さらに, クリーン領域と劣化領域の境界の運動不整合を計測するワープ誤差の不整合を利用して, 動作境界を洗練させるために, コントラスト適応を共同で提案する。階層運動と境界適応は、統一された枠組みにおける光の流れを共同で促進する。提案手法の優位性を検証するため, 大規模定量および定性的実験を行った。

Optical flow estimation has made great progress, but usually suffers from degradation under adverse weather. Although semi/full-supervised methods have made good attempts, the domain shift between the synthetic and real adverse weather images would deteriorate their performance. To alleviate this issue, our start point is to unsupervisedly transfer the knowledge from source clean domain to target degraded domain. Our key insight is that adverse weather does not change the intrinsic optical flow of the scene, but causes a significant difference for the warp error between clean and degraded images. In this work, we propose the first unsupervised framework for adverse weather optical flow via hierarchical motion-boundary adaptation. Specifically, we first employ image translation to construct the transformation relationship between clean and degraded domains. In motion adaptation, we utilize the flow consistency knowledge to align the cross-domain optical flows into a motion-invariance common space, where the optical flow from clean weather is used as the guidance-knowledge to obtain a preliminary optical flow for adverse weather. Furthermore, we leverage the warp error inconsistency which measures the motion misalignment of the boundary between the clean and degraded domains, and propose a joint intra- and inter-scene boundary contrastive adaptation to refine the motion boundary. The hierarchical motion and boundary adaptation jointly promotes optical flow in a unified framework. Extensive quantitative and qualitative experiments have been performed to verify the superiority of the proposed method.

翻訳日:2023-03-27 16:03:53 公開日:2023-03-24

# 構造的不均衡を考慮したグラフ強化学習

Structural Imbalance Aware Graph Augmentation Learning ( http://arxiv.org/abs/2303.13757v1 )

ライセンス: Link先を確認

Zulong Liu, Kejia-Chen, Zheng Liu

(参考訳) グラフ機械学習(GML)は,ノード分類やリンク予測,グラフ分類などにおいて大きな進歩を遂げている。しかし、現実のグラフはしばしば構造的に不均衡であり、わずかなハブノードだけがより密度の高い局所構造を持ち、より影響が大きい。不均衡は既存のGMLモデルの堅牢性を損なう可能性がある。本稿では,この問題を解決するために,選択的グラフ拡張法(SAug)を提案する。まず、pagerankベースのサンプリング戦略は、グラフのハブノードとテールノードを識別するために設計されている。次に,一方のハブノードのノイズの多い隣接ノードを除去し,潜在隣接ノードを検出し,他方のテールノードに対して擬似隣接を生成する選択的拡張戦略を提案する。 2つのタイプのノード間の構造的不均衡を軽減することもできる。最後に、GNNモデルが拡張グラフ上で再トレーニングされる。大規模な実験により、SAugはバックボーンのGNNを大幅に改善し、グラフ拡張法やハブ/テール認識法との競合よりも優れた性能を達成できることが示された。

Graph machine learning (GML) has made great progress in node classification, link prediction, graph classification and so on. However, graphs in reality are often structurally imbalanced, that is, only a few hub nodes have a denser local structure and higher influence. The imbalance may compromise the robustness of existing GML models, especially in learning tail nodes. This paper proposes a selective graph augmentation method (SAug) to solve this problem. Firstly, a Pagerank-based sampling strategy is designed to identify hub nodes and tail nodes in the graph. Secondly, a selective augmentation strategy is proposed, which drops the noisy neighbors of hub nodes on one side, and discovers the latent neighbors and generates pseudo neighbors for tail nodes on the other side. It can also alleviate the structural imbalance between two types of nodes. Finally, a GNN model will be retrained on the augmented graph. Extensive experiments demonstrate that SAug can significantly improve the backbone GNNs and achieve superior performance to its competitors of graph augmentation methods and hub/tail aware methods.

翻訳日:2023-03-27 16:03:28 公開日:2023-03-24

# gp-vton:コラボレーティブなローカルフローグローバルパーシング学習による汎用仮想トライオンに向けて

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning ( http://arxiv.org/abs/2303.13756v1 )

ライセンス: Link先を確認

Zhenyu Xie and Zaiyu Huang and Xin Dong and Fuwei Zhao and Haoye Dong and Xijin Zhang and Feida Zhu and Xiaodan Liang

(参考訳) イメージベースのVirtual Try-ONは、ショップ内服を特定の人に転送する。既存の手法では、異なる衣服部品の異方性変形をモデル化するためにグローバルワーピングモジュールを使用しており、困難な入力を受ける際に異なる部品の意味情報を保存できない(例えば、複雑な人間のポーズ、難しい衣服)。さらに、それらのほとんどは、通常境界形状の制約を満たすためにテクスチャスクイージングを必要とする保存領域の境界に合わせるために、入力された衣料を直接反動させ、テクスチャ歪みを生じさせる。上記の劣った性能は、実世界のアプリケーションから既存の方法を妨げる。これらの問題を解決するために,GP-VTONと呼ばれる汎用仮想トライオンフレームワークを提案し,革新的なローカルフロー・グローバル・パーシング(LFGP)ワーピングモジュールと動的グラディエント・トランニケーション(DGT)トレーニング戦略を開発した。 Specifically, compared with the previous global warping mechanism, LFGP employs local flows to warp garments parts individually, and assembles the local warped results via the global garment parsing, resulting in reasonable warped parts and a semantic-correct intact garment even with challenging inputs.On the other hand, our DGT training strategy dynamically truncates the gradient in the overlap area and the warped garment is no more required to meet the boundary constraint, which effectively avoids the texture squeezing problem. さらに,GP-VTONは多カテゴリーのシナリオに容易に拡張でき,異なる衣服カテゴリーのデータを用いて共同で訓練することができる。 2つの高分解能ベンチマークに関する広範囲な実験は、既存の最先端手法よりも優れていることを示している。

Image-based Virtual Try-ON aims to transfer an in-shop garment onto a specific person. Existing methods employ a global warping module to model the anisotropic deformation for different garment parts, which fails to preserve the semantic information of different parts when receiving challenging inputs (e.g, intricate human poses, difficult garments). Moreover, most of them directly warp the input garment to align with the boundary of the preserved region, which usually requires texture squeezing to meet the boundary shape constraint and thus leads to texture distortion. The above inferior performance hinders existing methods from real-world applications. To address these problems and take a step towards real-world virtual try-on, we propose a General-Purpose Virtual Try-ON framework, named GP-VTON, by developing an innovative Local-Flow Global-Parsing (LFGP) warping module and a Dynamic Gradient Truncation (DGT) training strategy. Specifically, compared with the previous global warping mechanism, LFGP employs local flows to warp garments parts individually, and assembles the local warped results via the global garment parsing, resulting in reasonable warped parts and a semantic-correct intact garment even with challenging inputs.On the other hand, our DGT training strategy dynamically truncates the gradient in the overlap area and the warped garment is no more required to meet the boundary constraint, which effectively avoids the texture squeezing problem. Furthermore, our GP-VTON can be easily extended to multi-category scenario and jointly trained by using data from different garment categories. Extensive experiments on two high-resolution benchmarks demonstrate our superiority over the existing state-of-the-art methods.

翻訳日:2023-03-27 16:03:10 公開日:2023-03-24

# Sparsifiner: 効率的な視覚変換器のためのスパースインスタンス依存注意学習

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers ( http://arxiv.org/abs/2303.13755v1 )

ライセンス: Link先を確認

Cong Wei and Brendan Duke and Ruowei Jiang and Parham Aarabi and Graham W. Taylor and Florian Shkurti

(参考訳) ビジョントランスフォーマー(ViT)は、畳み込みニューラルネットワーク(CNN)と比較してパフォーマンス面での競争上の優位性を示しているが、高い計算コストが伴うことが多い。この目的のために,vitのマルチヘッドセルフアテンション(multi-head self-attention, mhsa)操作を高速化するために,空間的に近接するトークンの数を制限し,様々な注意パターンを探索した。しかし、そのような構造化された注意パターンは、その空間的関連性に対するトークン対token接続を制限し、完全な注意マスクから意味的関係を無視する。本研究では,軽量な接続予測モジュールを考案し,各トークンの接続スコアを推定することで,インスタンス依存の注意パターンを学習する新しい手法を提案する。直感的には、2つのトークンは、その特徴が空間的または意味的に関連があると考えられる場合、高い接続性スコアを持つ。各トークンは他の少数のトークンにしか対応しないため、双有線接続マスクは本質的に非常に疎いため、スパース計算によってネットワークを加速する機会を与える。学習された未構造化の注意パターンと合わせて、スパークアテンションViT(Sparsifiner)は、トークンの間隔と比較して、ImageNet上のFLOPとトップ1の精度との間に優れたパレート最適トレードオフを生成する。 MHSAの48%から69%のFLOPを削減し, 精度は0.4%以内である。また、注意とトークンの間隔を組み合わせることで、ViT FLOPsが60%以上減少することを示す。

Vision Transformers (ViT) have shown their competitive advantages performance-wise compared to convolutional neural networks (CNNs) though they often come with high computational costs. To this end, previous methods explore different attention patterns by limiting a fixed number of spatially nearby tokens to accelerate the ViT's multi-head self-attention (MHSA) operations. However, such structured attention patterns limit the token-to-token connections to their spatial relevance, which disregards learned semantic connections from a full attention mask. In this work, we propose a novel approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module to estimate the connectivity score of each pair of tokens. Intuitively, two tokens have high connectivity scores if the features are considered relevant either spatially or semantically. As each token only attends to a small number of other tokens, the binarized connectivity masks are often very sparse by nature and therefore provide the opportunity to accelerate the network via sparse computations. Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity. Our method reduces 48% to 69% FLOPs of MHSA while the accuracy drop is within 0.4%. We also show that combining attention and token sparsity reduces ViT FLOPs by over 60%.

翻訳日:2023-03-27 16:02:43 公開日:2023-03-24

# EMS-Net:ハイパースペクトル変化検出のための効率的なマルチテンポラル自己注意

EMS-Net: Efficient Multi-Temporal Self-Attention For Hyperspectral Change Detection ( http://arxiv.org/abs/2303.13753v1 )

ライセンス: Link先を確認

Meiqi Hu, Chen Wu, Bo Du

(参考訳) ハイパースペクトル変化検出は、動的都市開発を監視し、精密な物体の進化と変化を検出する上で重要な役割を担っている。本稿では,高スペクトル変化検出のための高効率多時間自己アテンションネットワーク(EMS-Net)を提案する。設計されたEMSモジュールは、類似した非変更機能マップの冗長性を削減し、正確なバイナリ変更マップのための効率的なマルチ時間変更情報を計算する。また、変更検出のクラスタリング特性を探索するために、変更のコンパクト性を高めるために、教師付きコントラスト損失が新たに提供される。 2つのハイパースペクトル変化検出データセットに実装された実験は、提案手法の性能と妥当性を示す。

Hyperspectral change detection plays an essential role of monitoring the dynamic urban development and detecting precise fine object evolution and alteration. In this paper, we have proposed an original Efficient Multi-temporal Self-attention Network (EMS-Net) for hyperspectral change detection. The designed EMS module cuts redundancy of those similar and containing-no-changes feature maps, computing efficient multi-temporal change information for precise binary change map. Besides, to explore the clustering characteristics of the change detection, a novel supervised contrastive loss is provided to enhance the compactness of the unchanged. Experiments implemented on two hyperspectral change detection datasets manifests the out-standing performance and validity of proposed method.

翻訳日:2023-03-27 16:02:14 公開日:2023-03-24

# 医用画像の新しいクラスを継続的に学習する古い知識の活用

Leveraging Old Knowledge to Continually Learn New Classes in Medical Images ( http://arxiv.org/abs/2303.13752v1 )

ライセンス: Link先を確認

Evelyn Chee, Mong Li Lee, Wynne Hsu

(参考訳) クラス増分連続学習は、以前学んだことを忘れずに新しい概念を学習することで、環境の変化に継続的に適応できる人工知能システムを開発するための中核的なステップである。これは、拡大した疾患群を分類するために、新しい入力データから継続的に学習する必要がある医療領域において特に必要である。本研究は,古い知識をいかに活用し,破滅的な忘れを伴わずに新しいクラスを学習できるかに焦点をあてる。本研究では,(1)事前学習した特徴を保存し,新たな特徴に対応するために表現を拡張した動的アーキテクチャ,(2)古いクラスにおけるモデルの性能を維持しながら,新たな特徴の学習のバランスをとるための2つの目的の訓練手順を提案する。複数の医学データセットに対する実験結果から,我々のソリューションは,クラス精度や忘れやすさの観点から,最先端のベースラインよりも優れた性能が得られることが示された。

Class-incremental continual learning is a core step towards developing artificial intelligence systems that can continuously adapt to changes in the environment by learning new concepts without forgetting those previously learned. This is especially needed in the medical domain where continually learning from new incoming data is required to classify an expanded set of diseases. In this work, we focus on how old knowledge can be leveraged to learn new classes without catastrophic forgetting. We propose a framework that comprises of two main components: (1) a dynamic architecture with expanding representations to preserve previously learned features and accommodate new features; and (2) a training procedure alternating between two objectives to balance the learning of new features while maintaining the model's performance on old classes. Experiment results on multiple medical datasets show that our solution is able to achieve superior performance over state-of-the-art baselines in terms of class accuracy and forgetting.

翻訳日:2023-03-27 16:02:02 公開日:2023-03-24

# LONGNN:学習可能な直交基底を持つスペクトルGNN

LONGNN: Spectral GNNs with Learnable Orthonormal Basis ( http://arxiv.org/abs/2303.13750v1 )

ライセンス: Link先を確認

Qian Tao, Zhen Wang, Wenyuan Yu, Yaliang Li, Zhewei Wei

(参考訳) 近年,スペクトルグラフニューラルネットワーク(GNN)手法は,多くのノードレベルタスクにおいて最上位性能を達成するために,学習可能な係数を多項式ベースとして活用している。様々な多項式基底が研究されているが、与えられたグラフの最適選択ではない固定多項式基底を採用する。また,これらの手法のいわゆる過渡問題を特定し,その非正規化戦略と非正規化基底にいくらか根ざしていることを示す。本稿では,この2つの課題に対する最初の試みについて述べる。ヤコビ多項式を用いて,学習可能な正規直交基底を持つ新しいスペクトルgnn,lon-gnnを設計し,正規化係数が現在学習フィルタ関数のノルムを正規化することと同値になることを示す。様々なグラフデータセットについて広範な実験を行い,lon-gnnの適合性と一般化能力を評価した。

In recent years, a plethora of spectral graph neural networks (GNN) methods have utilized polynomial basis with learnable coefficients to achieve top-tier performances on many node-level tasks. Although various kinds of polynomial bases have been explored, each such method adopts a fixed polynomial basis which might not be the optimal choice for the given graph. Besides, we identify the so-called over-passing issue of these methods and show that it is somewhat rooted in their less-principled regularization strategy and unnormalized basis. In this paper, we make the first attempts to address these two issues. Leveraging Jacobi polynomials, we design a novel spectral GNN, LON-GNN, with Learnable OrthoNormal bases and prove that regularizing coefficients becomes equivalent to regularizing the norm of learned filter function now. We conduct extensive experiments on diverse graph datasets to evaluate the fitting and generalization capability of LON-GNN, where the results imply its superiority.

翻訳日:2023-03-27 16:01:45 公開日:2023-03-24

# ロバストビュー合成のためのプログレッシブ最適化局所放射場

Progressively Optimized Local Radiance Fields for Robust View Synthesis ( http://arxiv.org/abs/2303.13791v1 )

ライセンス: Link先を確認

Andreas Meuleman and Yu-Lun Liu and Chen Gao and Jia-Bin Huang and Changil Kim and Min H. Kim and Johannes Kopf

(参考訳) 本稿では,1つのカジュアルな映像から大規模シーンの放射界を再構成するアルゴリズムを提案する。課題は2つある。まず、既存のラディアンスフィールド再構成手法はStructure-from-Motionアルゴリズムから推定された正確なカメラのポーズに頼っている。第二に、有限表現容量を持つ単一の大域的放射場を使うことは、無界シーンの長い軌道にスケールしない。未知のポーズを扱うために,カメラのポーズをプログレッシブな方法でラミアンスフィールドと共同で推定する。プログレッシブ最適化は再建の堅牢性を大幅に向上させることを示す。大きな境界のないシーンを扱うために、テンポラルウィンドウ内でフレームで訓練された新しい局所放射フィールドを動的に割り当てる。これにより、さらに堅牢性が向上し(例えば、適度なポーズドリフトでもうまく機能する)、大きなシーンにスケールできます。 Tanks and Templesデータセットと、収集した屋外データセットであるStatic Hikesに対する広範な評価は、我々のアプローチが最先端技術と比較できることを示している。

We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene. For handling unknown poses, we jointly estimate the camera poses with radiance field in a progressive manner. We show that progressive optimization significantly improves the robustness of the reconstruction. For handling large unbounded scenes, we dynamically allocate new local radiance fields trained with frames within a temporal window. This further improves robustness (e.g., performs well even under moderate pose drifts) and allows us to scale to large scenes. Our extensive evaluation on the Tanks and Temples dataset and our collected outdoor dataset, Static Hikes, show that our approach compares favorably with the state-of-the-art.

翻訳日:2023-03-27 15:56:48 公開日:2023-03-24

# 患者レベルの公正度制約による公正な患者・Trial Matchingに向けて

Towards Fair Patient-Trial Matching via Patient-Criterion Level Fairness Constraint ( http://arxiv.org/abs/2303.13790v1 )

ライセンス: Link先を確認

Chia-Yuan Chang, Jiayi Yuan, Sirui Ding, Qiaoyu Tan, Kai Zhang, Xiaoqian Jiang, Xia Hu, Na Zou

(参考訳) 臨床試験は新しい治療法の開発には欠かせないが、患者の採用と維持の障害に直面し、必要な参加者の受け入れを妨げる。これらの課題に対処するために、患者と試行にマッチするディープラーニングフレームワークが開発された。これらの枠組みは, 受け入れ基準と除外基準の相違を考慮し, 臨床治験適格基準と患者の類似性を計算する。最近の研究では、これらのフレームワークは以前のアプローチよりも優れていることが示されている。しかし、深層学習モデルは、臨床試験において特定の敏感な個人の集団が不足している場合に、患者と臨床のマッチングにおいて公平性の問題を引き起こす可能性がある。本研究は,公平性の問題に対処するために,患者基準レベルの公正性制約を発生させることにより,公正な患者と法廷のマッチングフレームワークを提案する。本研究の枠組みは,異なる敏感群群における包摂の埋め込みと排除基準の矛盾を考察したものである。実世界の患者-心房および患者-基準のマッチングタスクにおける実験結果から,提案手法が偏りやすい予測を効果的に緩和できることが示されている。

Clinical trials are indispensable in developing new treatments, but they face obstacles in patient recruitment and retention, hindering the enrollment of necessary participants. To tackle these challenges, deep learning frameworks have been created to match patients to trials. These frameworks calculate the similarity between patients and clinical trial eligibility criteria, considering the discrepancy between inclusion and exclusion criteria. Recent studies have shown that these frameworks outperform earlier approaches. However, deep learning models may raise fairness issues in patient-trial matching when certain sensitive groups of individuals are underrepresented in clinical trials, leading to incomplete or inaccurate data and potential harm. To tackle the issue of fairness, this work proposes a fair patient-trial matching framework by generating a patient-criterion level fairness constraint. The proposed framework considers the inconsistency between the embedding of inclusion and exclusion criteria among patients of different sensitive groups. The experimental results on real-world patient-trial and patient-criterion matching tasks demonstrate that the proposed framework can successfully alleviate the predictions that tend to be biased.

翻訳日:2023-03-27 15:56:28 公開日:2023-03-24

# さまざまなシナリオにおける人計数のためのアプリケーション駆動aiパラダイム

Application-Driven AI Paradigm for Person Counting in Various Scenarios ( http://arxiv.org/abs/2303.13788v1 )

ライセンス: Link先を確認

Minjie Hua, Yibing Nan, Shiguo Lian

(参考訳) 計数はビデオ監視の基本的な課題と考えられている。しかし、実用的な応用におけるシナリオの多様性は、1人の人計数モデルを一般に利用するのを困難にしている。その結果、エンジニアはビデオストリームをプレビューし、特に大規模デプロイメントにおいて、時間を要するカメラショットのシナリオに基づいて、適切な人物カウントモデルを手動で指定する必要がある。本稿では,シナリオ分類器を用いて,キャプチャされたフレーム毎に適切な人計数モデルを自動的に選択する人計数パラダイムを提案する。まず、入力画像がシナリオ分類器に渡されてシナリオラベルを取得し、そのフレームを5つの微調整されたモデルのうちの1つに割り当てて人物を数える。さらに,さまざまなシナリオから収集した拡張データセットとして,サイドビュー,ロングショット,トップビュー,カスタマイズ,クラウドの5つを紹介し,26323サンプルを含むシナリオ分類データセットも統合する。比較実験において,提案手法は統合データセット上のどのモデルよりもバランスが良く,様々なシナリオでの一般化が証明されている。

Person counting is considered as a fundamental task in video surveillance. However, the scenario diversity in practical applications makes it difficult to exploit a single person counting model for general use. Consequently, engineers must preview the video stream and manually specify an appropriate person counting model based on the scenario of camera shot, which is time-consuming, especially for large-scale deployments. In this paper, we propose a person counting paradigm that utilizes a scenario classifier to automatically select a suitable person counting model for each captured frame. First, the input image is passed through the scenario classifier to obtain a scenario label, which is then used to allocate the frame to one of five fine-tuned models for person counting. Additionally, we present five augmentation datasets collected from different scenarios, including side-view, long-shot, top-view, customized and crowd, which are also integrated to form a scenario classification dataset containing 26323 samples. In our comparative experiments, the proposed paradigm achieves better balance than any single model on the integrated dataset, thus its generalization in various scenarios has been proved.

翻訳日:2023-03-27 15:56:08 公開日:2023-03-24

# 経路コヒーレンスと位相差に基づく指向性単一光子ルーティングのための量子ルータモデル

A quantum router model for directional single-photon routing based on pathway coherence and phase difference ( http://arxiv.org/abs/2303.13784v1 )

ライセンス: Link先を確認

Xu Yang, Lei Tan

(参考訳) 4つの空洞を持つ多チャネル量子ルータは、2つの結合共振器導波路と4つの単一空洞によって構成される。特定のポートから出る光子の確率を100$\%$に近いように調整することで、方向方向のルーティングを実現することができる。このハイブリッドシステムでは、入射ポートから出射ポートまでの光子の間には複数の経路がある。 2つの古典的光場間の位相差の影響下では、異なる経路間の相互干渉を破壊的干渉や建設的干渉に調整することができ、ルーティング確率の増大と減少の基礎となる。単一光子ルーティング確率に対するパラメータ値の影響についても検討した。確率振幅の解析式を研究することにより、あるパラメータ条件下での出口ポートの閉口の物理機構を得る。

A multi-channel quantum router with four nodal cavities is constructed by two coupled-resonator waveguides and four single cavities. We can achieve directional routing by adjusting the probability of photon exiting from the specified port to close to 100$\%$. There are multiple pathways between the photon from the incident port to the outgoing port in this hybrid system. Under the effect of phase difference between two classical light fields, the mutual interference between different pathways can be adjusted to destructive interference or constructive interference, which lays the foundation for the increase and decrease of the routing probability. The influence of different parameter values on single photon routing probability is also studied. By studying the analytic formula of probability amplitude, we get the physical mechanism of exiting ports being closed under certain parameter conditions.

翻訳日:2023-03-27 15:55:50 公開日:2023-03-24

# 機械翻訳におけるChatGPTの活用に向けて

Towards Making the Most of ChatGPT for Machine Translation ( http://arxiv.org/abs/2303.13780v1 )

ライセンス: Link先を確認

Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

(参考訳) ChatGPTは機械翻訳(MT)の優れた機能を示す。以前のいくつかの研究では、高リソース言語の商用システムと同等の結果が得られるが、低リソース翻訳や遠距離言語-ペア変換のような複雑なタスクでは遅れている。しかし、彼らは通常、ChatGPTの能力を十分に引き出すことができない単純なプロンプトを採用する。本稿では,ChatGPTの翻訳能力について,温度,タスク情報,ドメイン情報といったいくつかの側面を再考し,それに対応する2つのプロンプト,タスク特化プロンプト(TSP)とドメイン特化プロンプト(DSP)を提案する。ご覧の通りです 1)ChatGPTの性能は温度に大きく依存し,低い温度では高い性能が得られる。 2)タスク情報の強調は,特に複雑なmtタスクにおいて,chatgptの性能をさらに向上させる。 3) ドメイン情報の導入により,chatgptの一般化能力が向上し,そのドメインにおける性能が向上する。 4)ChatGPTは非英語中心のMTタスクに対して幻覚を引き起こす傾向があり,これは提案したプロンプトによって部分的に対処できるが,MT/NLPコミュニティでは強調する必要がある。また、高度な文脈内学習戦略の効果を探究し、(否定的だが興味深い)観察を見出す: 強力な連鎖的プロンプトは、単語毎の翻訳行動につながり、翻訳の大幅な低下をもたらす。

ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g, low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this report, we aim to further mine ChatGPT's translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information further improves ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation.

翻訳日:2023-03-27 15:55:39 公開日:2023-03-24

# より高精細なSBIRの爆発写真

Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR ( http://arxiv.org/abs/2303.13779v1 )

ライセンス: Link先を確認

Aneeshan Sain, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Soumitri Chattopadhyay, Tao Xiang, Yi-Zhe Song

(参考訳) 本稿では, 先行技術を11%オーバーシュートする強力なベースラインを提示することで, きめ細かなスケッチベース画像検索(FG-SBIR)を推し進める。これは複雑な設計ではなく、コミュニティが直面している2つの重要な問題に対処することで (i)金標準三重項損失は、全体論的潜在空間幾何学を強制せず、 (ii)精度の高いモデルを訓練するだけのスケッチは決して存在しない。前者に対しては、写真/スケッチインスタンス間の分離を明示的に強制する標準三重項損失の簡単な修正を提案する。後者については,モデル学習に写真データを活用する新たな知識蒸留モジュールを提案する。どちらのモジュールもプラグイン可能な新しいトレーニングパラダイムにプラグインされ、より安定したトレーニングが可能になる。具体的には (i)スケッチ間でのモダル内トリプルトロスを利用して、同一のインスタンスのスケッチを他と近づき、さらに1枚写真間で異なる写真インスタンスをプッシュし、同じ写真の構造的に拡張されたバージョン(約4～6%)を近付けます。取り組み方 (ii) 前述したモーダル写真三重項損失に対して,教師がラベルなしの写真の大規模なセットを事前学習した。次に,両組込み空間における各サンプルの特徴間距離の分布を一致させることで,教師の組込み空間のインスタンス間の文脈的類似性を生徒の組込み空間のそれと比較する(さらに4～5%の利得を得る)。先行技術の成績を著しく上回るだけでなく,新しいクラスへの一般化にも満足のいく結果をもたらしている。プロジェクトページ: https://aneeshan95.github.io/Sketch_PVT/

This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%. This is not via complicated design though, but by addressing two critical issues facing the community (i) the gold standard triplet loss does not enforce holistic latent space geometry, and (ii) there are never enough sketches to train a high accuracy model. For the former, we propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances. For the latter, we put forward a novel knowledge distillation module can leverage photo data for model training. Both modules are then plugged into a novel plug-n-playable training paradigm that allows for more stable training. More specifically, for (i) we employ an intra-modal triplet loss amongst sketches to bring sketches of the same instance closer from others, and one more amongst photos to push away different photo instances while bringing closer a structurally augmented version of the same photo (offering a gain of ~4-6%). To tackle (ii), we first pre-train a teacher on the large set of unlabelled photos over the aforementioned intra-modal photo triplet loss. Then we distill the contextual similarity present amongst the instances in the teacher's embedding space to that in the student's embedding space, by matching the distribution over inter-feature distances of respective samples in both embedding spaces (delivering a further gain of ~4-5%). Apart from outperforming prior arts significantly, our model also yields satisfactory results on generalising to new classes. Project page: https://aneeshan95.github.io/Sketch_PVT/

翻訳日:2023-03-27 15:55:12 公開日:2023-03-24

# gm-nerf: 多視点画像から一般化したモデルベースニューラルラミアンスフィールドの学習

GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images ( http://arxiv.org/abs/2303.13777v1 )

ライセンス: Link先を確認

Jianchuan Chen, Wentao Yi, Liqian Ma, Xu Jia, Huchuan Lu

(参考訳) 本研究では,スパースな多視点画像の集合を考えると,任意の人間パフォーマーのための高忠実なノベルビュー画像の合成に焦点をあてる。明瞭な身体ポーズと重度の自己閉塞の多様さから、これは困難な課題である。これを緩和するために,モデルベースニューラル・ラジアンス・フィールド(gm-nerf)を用いて,自由視点画像を合成する効果的な一般化フレームワークを提案する。具体的には,多視点2次元画像からの出現コードを,不正確な図形先行と画素空間のずれを緩和する幾何プロキシに登録する幾何ガイド付注意機構を提案する。さらに,より効率的な知覚の監督と合成の質の向上のために,神経レンダリングと部分勾配のバックプロパゲーションを実施している。提案手法を評価するため, 合成データセット THuman2.0 と Multi-garment と実世界のデータセット Genebody と ZJUMocap について実験を行った。提案手法は,新しいビュー合成と幾何再構成の観点から,最先端の手法よりも優れていることを示す。

In this work, we focus on synthesizing high-fidelity novel view images for arbitrary human performers, given a set of sparse multi-view images. It is a challenging task due to the large variation among articulated body poses and heavy self-occlusions. To alleviate this, we introduce an effective generalizable framework Generalizable Model-based Neural Radiance Fields (GM-NeRF) to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy which can alleviate the misalignment between inaccurate geometry prior and pixel space. On top of that, we further conduct neural rendering and partial gradient backpropagation for efficient perceptual supervision and improvement of the perceptual quality of synthesis. To evaluate our method, we conduct experiments on synthesized datasets THuman2.0 and Multi-garment, and real-world datasets Genebody and ZJUMocap. The results demonstrate that our approach outperforms state-of-the-art methods in terms of novel view synthesis and geometric reconstruction.

翻訳日:2023-03-27 15:54:21 公開日:2023-03-24

# gsplit: スプリットパラレル主義による大規模グラフ上のグラフニューラルネットワークトレーニング

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism ( http://arxiv.org/abs/2303.13775v1 )

ライセンス: Link先を確認

Sandeep Polisetty, Juelin Liu, Kobi Falus, Yi Ren Fung, Seung-Hwan Lim, Hui Guan, Marco Serafini

(参考訳) 数十億のエッジを持つ大規模グラフは、レコメンデーションシステム、社会グラフ分析、知識ベース、物質科学、生物学など、多くの産業、科学、工学の分野で広く使われている。機械学習モデルの新たなクラスであるgraph neural networks(gnn)は、さまざまなグラフ分析タスクのパフォーマンスが優れているため、これらのグラフを学習するためにますます採用されている。ミニバッチトレーニングは大規模グラフのトレーニングに一般的に採用されており、データ並列処理は複数のGPUにミニバッチトレーニングをスケールするための標準的なアプローチである。本稿では、GNNトレーニングシステムの基本的な性能ボトルネックは、データ並列アプローチの固有の制限と関係している、と論じる。次に,新しい並列ミニバッチトレーニングパラダイムであるsplit parallelismを提案する。我々は、gsplitと呼ばれる新しいシステムで分割並列性を実装し、DGL、Quiver、PaGraphといった最先端システムより優れていることを示す。

Large-scale graphs with billions of edges are ubiquitous in many industries, science, and engineering fields such as recommendation systems, social graph analysis, knowledge base, material science, and biology. Graph neural networks (GNN), an emerging class of machine learning models, are increasingly adopted to learn on these graphs due to their superior performance in various graph analytics tasks. Mini-batch training is commonly adopted to train on large graphs, and data parallelism is the standard approach to scale mini-batch training to multiple GPUs. In this paper, we argue that several fundamental performance bottlenecks of GNN training systems have to do with inherent limitations of the data parallel approach. We then propose split parallelism, a novel parallel mini-batch training paradigm. We implement split parallelism in a novel system called gsplit and show that it outperforms state-of-the-art systems such as DGL, Quiver, and PaGraph.

翻訳日:2023-03-27 15:53:50 公開日:2023-03-24

# ナノサテライトタスクスケジューリングへのグラフニューラルネットワークアプローチ:混合整数モデル学習への洞察

A Graph Neural Network Approach to Nanosatellite Task Scheduling: Insights into Learning Mixed-Integer Models ( http://arxiv.org/abs/2303.13773v1 )

ライセンス: Link先を確認

Bruno Machado Pacheco, Laio Oriel Seman, Cezar Ant\^onio Rigo, Eduardo Camponogara, Eduardo Augusto Bezerra, Leandro dos Santos Coelho

(参考訳) 本研究では,グラフニューラルネットワーク(GNN)を用いて,ナノサテライトタスクをより効率的にスケジュールする方法を検討する。オフライン・ナノサテライト・タスク・スケジューリング(onts)問題では、優先度、最小および最大アクティベーションイベント、実行時間枠、期間、実行ウィンドウといったqos(quality-of-service)の考慮事項や、衛星の電力資源の制約、エネルギーの収穫および管理の複雑さを考慮して、軌道上で実行するタスクの最適なスケジュールを見出すことが目的である。 ONTS問題は、従来の数学的定式化や正確な方法を用いてアプローチされてきたが、問題の挑戦事例への適用性は限られている。本研究は,旅行セールスマン問題,スケジューリング問題,施設配置問題など,多くの最適化問題に効果的に適用されたgnnの利用について検討する。ここでは、二部グラフにおけるONTS問題のMILPインスタンスを完全に表現する。 reluアクティベーション機能と連携した特徴集約とメッセージパッシング手法を適用し、古典的なディープラーニングモデルを用いて学習し、最適なパラメータセットを得る。さらに、新たな研究分野である Explainable AI (XAI) を適用して、どの機能 -- ノード、制約 -- が学習のパフォーマンスに最も大きな影響を与え、それらのモデルの内部動作と決定プロセスに光を当てています。また, 最適解における解の実現可能性と決定変数値の確率の予測において, 80\%以上の精度を得ることにより, 初期固定手法を検討した。以上の結果から,gnnはナノサテライトタスクのスケジューリングに有効な手法であり,組合せ最適化問題に対する説明可能な機械学習モデルの利点を浮き彫りにした。

This study investigates how to schedule nanosatellite tasks more efficiently using Graph Neural Networks (GNN). In the Offline Nanosatellite Task Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks to be carried out in orbit while taking into account Quality-of-Service (QoS) considerations such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as constraints on the satellite's power resources and the complexity of energy harvesting and management. The ONTS problem has been approached using conventional mathematical formulations and precise methods, but their applicability to challenging cases of the problem is limited. This study examines the use of GNNs in this context, which has been effectively applied to many optimization problems, including traveling salesman problems, scheduling problems, and facility placement problems. Here, we fully represent MILP instances of the ONTS problem in bipartite graphs. We apply a feature aggregation and message-passing methodology allied to a ReLU activation function to learn using a classic deep learning model, obtaining an optimal set of parameters. Furthermore, we apply Explainable AI (XAI), another emerging field of research, to determine which features -- nodes, constraints -- had the most significant impact on learning performance, shedding light on the inner workings and decision process of such models. We also explored an early fixing approach by obtaining an accuracy above 80\% both in predicting the feasibility of a solution and the probability of a decision variable value being in the optimal solution. Our results point to GNNs as a potentially effective method for scheduling nanosatellite tasks and shed light on the advantages of explainable machine learning models for challenging combinatorial optimization problems.

翻訳日:2023-03-27 15:53:33 公開日:2023-03-24

# 物体検出のための未知のスニッファー: 未知の物体に盲目を向けるな

Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects ( http://arxiv.org/abs/2303.13769v1 )

ライセンス: Link先を確認

Wenteng Liang, Feng Xue, Yihao Liu, Guofeng Zhong, Anlong Ming

(参考訳) 最近提案されたopen-world objectとopen-set detectionは、never-seen-beforeオブジェクトを発見し、それらをクラス既知のオブジェクトと区別するブレークスルーを達成している。しかし、既知のクラスから未知クラスへの知識伝達の研究はより深く、背景に隠された未知のクラスを検出するための精細な能力に繋がる必要がある。本稿では,未知のオブジェクトと未知のオブジェクトの両方を見つけるための未知のスニファー(UnSniffer)を提案する。まず、一般化オブジェクト信頼度(GOC)スコアを導入し、クラス既知のサンプルのみを監督し、背景にある未知の不正な抑制を回避する。特に、クラスに知られたオブジェクトから学んだ信頼度スコアは、未知のものまで一般化することができる。さらに,背景の非対象サンプルを更に制限するために,負のエネルギー抑制損失を提案する。次に、各未知の最良のボックスは、トレーニング中に意味情報を欠いているため、推論中に取得することが難しい。この問題を解決するために,手動設計による非最大抑圧(NMS)後処理を置き換えるグラフベースの決定手法を提案する。最後に、未知のオブジェクト検出の精度評価を知識に含む最初の公開ベンチマークである、未知のオブジェクト検出ベンチマークを示す。実験の結果,本手法は既存の最先端手法よりもはるかに優れていることがわかった。コードは、https://github.com/Went-Liang/UnSniffer.comで入手できる。

The recently proposed open-world object and open-set detection achieve a breakthrough in finding never-seen-before objects and distinguishing them from class-known ones. However, their studies on knowledge transfer from known classes to unknown ones need to be deeper, leading to the scanty capability for detecting unknowns hidden in the background. In this paper, we propose the unknown sniffer (UnSniffer) to find both unknown and known objects. Firstly, the generalized object confidence (GOC) score is introduced, which only uses class-known samples for supervision and avoids improper suppression of unknowns in the background. Significantly, such confidence score learned from class-known objects can be generalized to unknown ones. Additionally, we propose a negative energy suppression loss to further limit the non-object samples in the background. Next, the best box of each unknown is hard to obtain during inference due to lacking their semantic information in training. To solve this issue, we introduce a graph-based determination scheme to replace hand-designed non-maximum suppression (NMS) post-processing. Finally, we present the Unknown Object Detection Benchmark, the first publicly benchmark that encompasses precision evaluation for unknown object detection to our knowledge. Experiments show that our method is far better than the existing state-of-the-art methods. Code is available at: https://github.com/Went-Liang/UnSniffer.

翻訳日:2023-03-27 15:52:58 公開日:2023-03-24

# UniTS: 自己教師型表現学習を備えたユニバーサル時系列分析フレームワーク

UniTS: A Universal Time Series Analysis Framework with Self-supervised Representation Learning ( http://arxiv.org/abs/2303.13804v1 )

ライセンス: Link先を確認

Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang

(参考訳) 機械学習は時系列分析の強力なツールとして登場した。既存のメソッドは通常、異なる分析タスク用にカスタマイズされ、部分的なラベリングやドメインシフトといった実用的な問題に取り組む際の課題に直面します。上記の問題を解決するために,自己指導型表現学習(あるいは事前学習)を取り入れた新しいフレームワークであるUniTSを開発した。 UniTSのコンポーネントは、柔軟な拡張を可能にするためにsklearnのようなAPIを使って設計されている。ユーザがユーザフレンドリなGUIを使って分析タスクを簡単に実行できることを示し、従来のタスク固有の手法よりもUniTSの方が5つのメインストリームタスクと2つの実践的な設定で自己教師付き事前学習なしで優れた性能を示す。

Machine learning has emerged as a powerful tool for time series analysis. Existing methods are usually customized for different analysis tasks and face challenges in tackling practical problems such as partial labeling and domain shift. To achieve universal analysis and address the aforementioned problems, we develop UniTS, a novel framework that incorporates self-supervised representation learning (or pre-training). The components of UniTS are designed using sklearn-like APIs to allow flexible extensions. We demonstrate how users can easily perform an analysis task using the user-friendly GUIs, and show the superior performance of UniTS over the traditional task-specific methods without self-supervised pre-training on five mainstream tasks and two practical settings.

翻訳日:2023-03-27 15:46:22 公開日:2023-03-24

# 感情認識のための分離マルチモーダル蒸留

Decoupled Multimodal Distilling for Emotion Recognition ( http://arxiv.org/abs/2303.13802v1 )

ライセンス: Link先を確認

Yong Li, Yuanzhi Wang, Zhen Cui

(参考訳) ヒトのマルチモーダル感情認識(mer)は、言語、視覚、音響的モダリティを通じて人間の感情を知覚することを目的としている。以前のMERアプローチの印象的な性能にもかかわらず、固有の多モード不均一性はまだ残っており、異なるモダリティの寄与は著しく異なる。本研究では,自由で適応的なクロスモーダル知識蒸留を容易にする脱共役マルチモーダル蒸留(dmd)アプローチを提案し,各モーダルの識別的特徴を高めることを目的とした。特に、各モダリティの表現は、自己回帰的な方法で、2つの部分、すなわちモダリティ-非関係/排他的な空間に分解される。 DMDはグラフ蒸留ユニット(GD-Unit)を各分離部に使用し、より専門的で効果的な方法で各GDを実行できる。 GD-Unitは動的グラフで構成され、各頂点はモダリティを表し、各エッジは動的知識蒸留を示す。このようなgdパラダイムは、蒸留重みを自動的に学習できる柔軟な知識伝達方法を提供し、多様なクロスモーダル知識伝達パターンを可能にする。実験結果からDMDは最先端のMER法よりも優れた性能を示した。 DMDのグラフエッジは、モダリティ非関連かつ排他的な特徴空間に意味のある分布パターンを示す。コードは \url{https://github.com/mdswyz/DMD} でリリースされる。

Human multimodal emotion recognition (MER) aims to perceive human emotions via language, visual and acoustic modalities. Despite the impressive performance of previous MER approaches, the inherent multimodal heterogeneities still haunt and the contribution of different modalities varies significantly. In this work, we mitigate this issue by proposing a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation, aiming to enhance the discriminative features of each modality. Specially, the representation of each modality is decoupled into two parts, i.e., modality-irrelevant/-exclusive spaces, in a self-regression manner. DMD utilizes a graph distillation unit (GD-Unit) for each decoupled part so that each GD can be performed in a more specialized and effective manner. A GD-Unit consists of a dynamic graph where each vertice represents a modality and each edge indicates a dynamic knowledge distillation. Such GD paradigm provides a flexible knowledge transfer manner where the distillation weights can be automatically learned, thus enabling diverse crossmodal knowledge transfer patterns. Experimental results show DMD consistently obtains superior performance than state-of-the-art MER methods. Visualization results show the graph edges in DMD exhibit meaningful distributional patterns w.r.t. the modality-irrelevant/-exclusive feature spaces. Codes are released at \url{https://github.com/mdswyz/DMD}.

翻訳日:2023-03-27 15:46:09 公開日:2023-03-24

# 自己教師付き共訓練によるオープンドメインスロット充填に向けて

Toward Open-domain Slot Filling via Self-supervised Co-training ( http://arxiv.org/abs/2303.13801v1 )

ライセンス: Link先を確認

Adib Mosharrof, Moghis Fereidouni, A.B. Siddique

(参考訳) スロットフィリングは現代の会話システムにおいて重要なタスクの1つである。既存の文献の大部分は、新しいドメインごとにラベル付きトレーニングデータを必要とする教師付き学習手法を採用している。ゼロショット学習や弱い監督アプローチなどは、手動ラベリングの代替としてpromiseが示されている。それでも、これらの学習パラダイムは、パフォーマンスの観点から教師あり学習アプローチよりもかなり劣っている。この性能ギャップを最小化し、オープンドメインスロットフィリングの可能性を示すために、SCotと呼ばれる自己教師付き協調学習フレームワークを提案する。フェーズ1は2つの補完的な擬似ラベルを自動取得する。フェーズ2は、これらの擬似ラベルセットを使用してスロットフィリングタスクに適応することにより、事前訓練された言語モデルBERTのパワーを活用する。フェーズ3では,両モデルが高信頼度ソフトラベルを自動選択し,他のモデルの性能を反復的に向上する自己教師付き協調機構を導入する。 SCotは,SGDデータセットとMultiWoZデータセットでそれぞれ45.57%,37.56%,最先端モデルよりも優れていた。さらに,提案するフレームワークであるSCotは,最先端の完全教師付きモデルと比較して,同等のパフォーマンスを実現する。

Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to supervised learning approaches in terms of performance. To minimize this performance gap and demonstrate the possibility of open-domain slot filling, we propose a Self-supervised Co-training framework, called SCot, that requires zero in-domain manually labeled training examples and works in three phases. Phase one acquires two sets of complementary pseudo labels automatically. Phase two leverages the power of the pre-trained language model BERT, by adapting it for the slot filling task using these sets of pseudo labels. In phase three, we introduce a self-supervised cotraining mechanism, where both models automatically select highconfidence soft labels to further improve the performance of the other in an iterative fashion. Our thorough evaluations show that SCot outperforms state-of-the-art models by 45.57% and 37.56% on SGD and MultiWoZ datasets, respectively. Moreover, our proposed framework SCot achieves comparable performance when compared to state-of-the-art fully supervised models.

翻訳日:2023-03-27 15:45:45 公開日:2023-03-24

# ビデオデモへのステップバイステップインストラクショナルダイアグラムの適応

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations ( http://arxiv.org/abs/2303.13800v1 )

ライセンス: Link先を確認

Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould

(参考訳) マルチモーダルアライメントは、あるモダリティから別のモダリティを使ってクエリする際のインスタンスの検索を容易にする。本稿では,このようなアライメントが中間にある新しい設定を考える。 (i)組み立て図(イケアの組立マニュアルによく見られる)として表される指示ステップ、及び (ii)内装ビデオの映像セグメント(実世界の組立動作の制定を含む。) このアライメントを学習するために,新しい教師付きコントラスト学習手法を導入する。そこで本研究では,本手法の有効性を実証するために,多様な家具組立コレクションからの183時間のビデオと,関連する指導マニュアルからの8,300点近いイラストと,それらの真実のアライメントに注釈を付したイケア組立用IAWを提案する。第1に,ビデオセグメントとイラストレーション間の最寄りの隣接検索,第2に,各ビデオの指示ステップとセグメントのアラインメント,という2つのタスクを定義した。 iawに関する広範な実験は、代替案に対する我々のアプローチの優れた性能を示している。

Multimodal alignment facilitates the retrieval of instances from one modality when queried using another. In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world. To learn this alignment, we introduce a novel supervised contrastive learning method that learns to align videos with the subtle details in the assembly diagrams, guided by a set of novel losses. To study this problem and demonstrate the effectiveness of our method, we introduce a novel dataset: IAW for Ikea assembly in the wild consisting of 183 hours of videos from diverse furniture assembly collections and nearly 8,300 illustrations from their associated instruction manuals and annotated for their ground truth alignments. We define two tasks on this dataset: First, nearest neighbor retrieval between video segments and illustrations, and, second, alignment of instruction steps and the segments for each video. Extensive experiments on IAW demonstrate superior performances of our approach against alternatives.

翻訳日:2023-03-27 15:45:23 公開日:2023-03-24

# 収束対策と創発モデル:人間自動信頼アンケートのメタ分析

Converging Measures and an Emergent Model: A Meta-Analysis of Human-Automation Trust Questionnaires ( http://arxiv.org/abs/2303.13799v1 )

ライセンス: Link先を確認

Yosef S. Razin and Karen M. Feigh

(参考訳) 人自動信頼度を測定する上で重要な課題は、高度に可変性のある構成拡散、モデル、アンケートの量である。しかし、全員が信頼が技術受容、継続的な使用、流動性、チームワークの重要な要素であることに同意する。そこで本研究では,信頼度・信頼度調査機器のメタ分析を行い,信頼度評価のためのコンセンサスモデルを合成する。この目的を達成するために、この研究は、最も頻繁に引用され、最も有望な人間自動化と人間ロボット信頼のアンケートと、そのような信頼の次元と先行を形成する最も確立された要因を識別する。混乱と人口増加を両立させるため,質問紙間の用語の詳細なマッピングを行う。さらに,多因子サーベイ機器を用いた実験から得られた回帰モデルのメタ分析を行った。このメタアナリシスに基づいて,人間-自律信頼の実験的検証モデルを示す。この収束モデルは、将来の研究のための統合フレームワークを確立する。信頼度測定の現在の境界と、さらなる調査が必要かを特定する。我々は、適切な信頼調査機器の選択と設計を議論することで締めくくります。信頼度調査機器の比較,マッピング,分析により,人間-自律相互作用における信頼のコンセンサス構造を同定する。そうすることで、信頼を測定するためのより完全な基盤が、広く適用できるようになります。信頼という学問的考えと、口語的で常識的な概念を統合する。信頼の重要性がますます認識され、特に人間と自律的な相互作用において、この研究は、それを理解し、測定するためのより良い位置を与えてくれます。

A significant challenge to measuring human-automation trust is the amount of construct proliferation, models, and questionnaires with highly variable validation. However, all agree that trust is a crucial element of technological acceptance, continued usage, fluency, and teamwork. Herein, we synthesize a consensus model for trust in human-automation interaction by performing a meta-analysis of validated and reliable trust survey instruments. To accomplish this objective, this work identifies the most frequently cited and best-validated human-automation and human-robot trust questionnaires, as well as the most well-established factors, which form the dimensions and antecedents of such trust. To reduce both confusion and construct proliferation, we provide a detailed mapping of terminology between questionnaires. Furthermore, we perform a meta-analysis of the regression models that emerged from those experiments which used multi-factorial survey instruments. Based on this meta-analysis, we demonstrate a convergent experimentally validated model of human-automation trust. This convergent model establishes an integrated framework for future research. It identifies the current boundaries of trust measurement and where further investigation is necessary. We close by discussing choosing and designing an appropriate trust survey instrument. By comparing, mapping, and analyzing well-constructed trust survey instruments, a consensus structure of trust in human-automation interaction is identified. Doing so discloses a more complete basis for measuring trust emerges that is widely applicable. It integrates the academic idea of trust with the colloquial, common-sense one. Given the increasingly recognized importance of trust, especially in human-automation interaction, this work leaves us better positioned to understand and measure it.

翻訳日:2023-03-27 15:45:02 公開日:2023-03-24

# ダウンサンプリングに基づく2次元フロアプランセグメンテーション

2D Floor Plan Segmentation Based on Down-sampling ( http://arxiv.org/abs/2303.13798v1 )

ライセンス: Link先を確認

Mohammadreza Sharif, Kiran Mohan, Sarath Suvarna

(参考訳) 近年、フロアプランのセグメンテーションは、フロアプランの再構築やロボット工学における幅広い応用により、注目されている。本稿では,ダウンサンプリング方式に基づく新しい2次元フロアプランセグメンテーション手法を提案する。本手法では,フロアプラン上で連続的なダウンサンプリングを行い,その複雑度を低減しつつ構造情報を維持する。掃除ロボットが未知の環境下で生成した散在するフロアプランとフロアプランのベンチマークから得られた結果を提示することにより,提案手法の有効性を実証する。本手法はフロアプランセグメンテーションの計算と実装の複雑さを大幅に減らし,現実のアプリケーションに適している。さらに,セグメンテーション結果を評価するための適切な指標について検討する。提案手法は, 乱雑な環境下での2次元フロアプランセグメンテーションに有望な結果をもたらす。

In recent years, floor plan segmentation has gained significant attention due to its wide range of applications in floor plan reconstruction and robotics. In this paper, we propose a novel 2D floor plan segmentation technique based on a down-sampling approach. Our method employs continuous down-sampling on a floor plan to maintain its structural information while reducing its complexity. We demonstrate the effectiveness of our approach by presenting results obtained from both cluttered floor plans generated by a vacuum cleaning robot in unknown environments and a benchmark of floor plans. Our technique considerably reduces the computational and implementation complexity of floor plan segmentation, making it more suitable for real-world applications. Additionally, we discuss the appropriate metric for evaluating segmentation results. Overall, our approach yields promising results for 2D floor plan segmentation in cluttered environments.

翻訳日:2023-03-27 15:44:36 公開日:2023-03-24

# ゼロショット一般化リワード関数によるタスク指向対話システムのパーソナライズ

Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function ( http://arxiv.org/abs/2303.13797v1 )

ライセンス: Link先を確認

A.B. Siddique, M.H. Maqbool, Kshitija Taywade, Hassan Foroosh

(参考訳) タスク指向対話システムは、自然言語を使ってタスクを達成できる。最新システムは、個性に関係なくユーザーに対して同じように反応するが、対話のパーソナライズは、より高いレベルの採用とより良いユーザーエクスペリエンスをもたらす可能性がある。パーソナライズされたダイアログシステムの構築は重要だが、挑戦的な取り組みであり、その課題にはほんの一握りの作業しかなかった。既存の作業の多くは教師付き学習アプローチに依存しており、各ユーザプロファイルに対して、厳格で高価なラベル付きトレーニングデータを必要とする。さらに、各ユーザプロファイルのデータ収集とラベル付けは事実上不可能である。本研究では、ゼロショット一般化報酬関数を用いて、広範囲のユーザプロファイルに適応可能なタスク指向対話システムを、教師なしでパーソナライズする新しいフレームワークP-ToDを提案する。 P-ToDは、トレーニング済みのGPT-2をバックボーンモデルとして使用し、3つのフェーズで動作する。第1段階はタスク固有の訓練を行う。フェーズ2は、ゼロショット一般化報酬関数で導かれるポリシー勾配を実行する近似ポリシー最適化アルゴリズムを活用することにより、教師なしのパーソナライゼーションを開始する。新たな報酬機能は,未発見のプロファイルにおいても生成した応答の品質を定量化することができる。オプションの最終フェーズは、いくつかのラベル付きトレーニング例を使用してパーソナライズされたモデルを微調整する。パーソナライズされたbAbIダイアログベンチマークを用いて,5つのタスクと最大180種類のユーザプロファイルに対して,広範な実験分析を行う。実験結果から,P-ToDはラベル付きサンプルがゼロであっても,最先端の教師付きパーソナライゼーションモデルより優れ,強力な完全教師付きGPT-2ベースラインと比較してBLEUおよびROUGEメトリクス上での競争性能が向上することが示された。

Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the challenge. Most existing works rely on supervised learning approaches and require laborious and expensive labeled training data for each user profile. Additionally, collecting and labeling data for each user profile is virtually impossible. In this work, we propose a novel framework, P-ToD, to personalize task-oriented dialog systems capable of adapting to a wide range of user profiles in an unsupervised fashion using a zero-shot generalizable reward function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases. Phase one performs task-specific training. Phase two kicks off unsupervised personalization by leveraging the proximal policy optimization algorithm that performs policy gradients guided by the zero-shot generalizable reward function. Our novel reward function can quantify the quality of the generated responses even for unseen profiles. The optional final phase fine-tunes the personalized model using a few labeled training examples. We conduct extensive experimental analysis using the personalized bAbI dialogue benchmark for five tasks and up to 180 diverse user profiles. The experimental results demonstrate that P-ToD, even when it had access to zero labeled examples, outperforms state-of-the-art supervised personalization models and achieves competitive performance on BLEU and ROUGE metrics when compared to a strong fully-supervised GPT-2 baseline

翻訳日:2023-03-27 15:44:23 公開日:2023-03-24

# Zolly:人間のメッシュ再建のためのズーム焦点長の補正

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction ( http://arxiv.org/abs/2303.13796v1 )

ライセンス: Link先を確認

Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

(参考訳) 野生での単視RGB画像のキャリブレーションが難しいため、既存の3次元メッシュ再構成(3DHMR)手法では、焦点距離が一定であり、背景環境の文脈に基づいて推定することは困難であり、カメラが人体に近づいたときの視界カメラ投影による胴体、手、顔の歪みの問題に対処できない。単純焦点距離の仮定は、不正確な定式化された射影行列でこの課題を害することができる。そこで本稿では,遠近像に着目した最初の3dhmr法であるzollyを提案する。私たちのアプローチは、主に人体のカメラセンターへの相対的な位置によって引き起こされる遠近的歪みの理由を分析することから始まります。本研究では,人体の2次元密歪スケールを記述する新しいカメラモデルと,新しい2次元表現である歪み画像を提案する。次に,環境文脈特徴よりも歪みスケール特徴から距離を推定する。その後、歪み特徴と画像特徴を統合し、ボディメッシュを再構築する。正しい投影行列を定式化し、人体の位置を特定するために、遠近法と弱視投影損失を同時に利用する。既存のデータセットは、このタスクを処理できないため、最初の合成データセットPDHumanを提案し、このタスクに適した2つの実世界のデータセットを拡張する。広範な実験により、zollyはパースペクティブディストリクトデータセットと標準ベンチマーク(3dpw)の両方において、既存の最先端のメソッドよりも優れていることが示されている。

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body. The naive focal length assumptions can harm this task with the incorrectly formulated projection matrices. To solve this, we propose Zolly, the first 3DHMR method focusing on perspective-distorted images. Our approach begins with analysing the reason for perspective distortion, which we find is mainly caused by the relative location of the human body to the camera center. We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body. We then estimate the distance from distortion scale features rather than environment context features. Afterwards, we integrate the distortion feature with image features to reconstruct the body mesh. To formulate the correct projection matrix and locate the human body position, we simultaneously use perspective and weak-perspective projection loss. Since existing datasets could not handle this task, we propose the first synthetic dataset PDHuman and extend two real-world datasets tailored for this task, all containing perspective-distorted human images. Extensive experiments show that Zolly outperforms existing state-of-the-art methods on both perspective-distorted datasets and the standard benchmark (3DPW).

翻訳日:2023-03-27 15:43:54 公開日:2023-03-24

# マッチングキーポイント作物(mkpc)を用いた効率的かつ高精度なコビナブル領域定位 : 画像マッチング性能向上のための2段階パイプライン

Efficient and Accurate Co-Visible Region Localization with Matching Key-Points Crop (MKPC): A Two-Stage Pipeline for Enhancing Image Matching Performance ( http://arxiv.org/abs/2303.13794v1 )

ライセンス: Link先を確認

Hongjian Song, Yuki Kashiwaba, Shuai Wu, Canming Wang

(参考訳) 画像マッチングはコンピュータビジョンにおける古典的な基本的なタスクである。本稿では,コビジブル領域以外の領域にはほとんど情報を持たないという仮説の下で,マッチング型キーポイント作物(MKPC)アルゴリズムを提案する。 MKPCは、非常に効率と精度のよい視認可能な領域である臨界領域を特定し、提案し、収穫する。さらに,mkpcを基盤として,任意の画像マッチングモデルや組合せと互換性のある画像マッチングのための一般的な2段階パイプラインを提案する。 2段階のパイプラインにsuperpoint + superglueを差し込む実験を行い,提案手法が屋外ポーズ推定の性能を向上させることを示した。さらに,本手法は,画像マッチングチャレンジ2022ベンチマークにおいて,現在最も難しい屋外ベンチマークであるsoma on image matching challenge 2022を上回っている。

Image matching is a classic and fundamental task in computer vision. In this paper, under the hypothesis that the areas outside the co-visible regions carry little information, we propose a matching key-points crop (MKPC) algorithm. The MKPC locates, proposes and crops the critical regions, which are the co-visible areas with great efficiency and accuracy. Furthermore, building upon MKPC, we propose a general two-stage pipeline for image matching, which is compatible to any image matching models or combinations. We experimented with plugging SuperPoint + SuperGlue into the two-stage pipeline, whose results show that our method enhances the performance for outdoor pose estimations. What's more, in a fair comparative condition, our method outperforms the SOTA on Image Matching Challenge 2022 Benchmark, which represents the hardest outdoor benchmark of image matching currently.

翻訳日:2023-03-27 15:43:26 公開日:2023-03-24

# 関連イベントによるコンペティション予測

Forecasting Competitions with Correlated Events ( http://arxiv.org/abs/2303.13793v1 )

ライセンス: Link先を確認

Rafael Frongillo, Manuel Lladser, Anish Thilagar, Bo Waggoner

(参考訳) Witkowskiらから始める。 [2022]近年の競争予測に関する研究は、共通の勝者獲得機構によるインセンティブの問題に対処している。 Frongilloなど。 2021] オンライン学習フレームワークであるフォロー・ザ・レギュラライズド・リーダー(FTRL)に基づく競争機構を提案する。それらのメカニズムは、$O(\log(n)/\epsilon^2)$イベントのみを使用して高い確率で$\epsilon$-optimal forecasterを選択する。これらの作業は、この問題に関するこれまでのすべての先行作業とともに、イベントが独立していると仮定する。相関イベントの予測競争に関する研究を開始する。相関を定量化するために、ブロック相関の概念を導入し、各事象を最大$b$の他の事象と強く相関させることができる。この相関による分布の下では、FTRL機構は$O(b^2 \log(n)/\epsilon^2)$イベントを使用して、$\epsilon$-optimal guaranteeを保っている。我々の証明は、より広い関心を持つかもしれない相関確率変数に束縛された新しい濃度を含む。

Beginning with Witkowski et al. [2022], recent work on forecasting competitions has addressed incentive problems with the common winner-take-all mechanism. Frongillo et al. [2021] propose a competition mechanism based on follow-the-regularized-leader (FTRL), an online learning framework. They show that their mechanism selects an $\epsilon$-optimal forecaster with high probability using only $O(\log(n)/\epsilon^2)$ events. These works, together with all prior work on this problem thus far, assume that events are independent. We initiate the study of forecasting competitions for correlated events. To quantify correlation, we introduce a notion of block correlation, which allows each event to be strongly correlated with up to $b$ others. We show that under distributions with this correlation, the FTRL mechanism retains its $\epsilon$-optimal guarantee using $O(b^2 \log(n)/\epsilon^2)$ events. Our proof involves a novel concentration bound for correlated random variables which may be of broader interest.

翻訳日:2023-03-27 15:43:10 公開日:2023-03-24

# テキスト・画像合成のための因子分解型生成逆数ネットワーク

Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis ( http://arxiv.org/abs/2303.13821v1 )

ライセンス: Link先を確認

Jiguo Li, Xiaobin Liu, Lirong Zheng

(参考訳) テキストと画像の合成に関する以前の研究は、通常、文章の埋め込みとノイズベクトルを結合し、文章の埋め込みとノイズベクトルは、生成の異なる側面を制御する2つの異なる要因である。単純にそれらを結合すれば、潜伏因子を絡み、生成モデルを包み込む。本稿では,これら2つの要因を分解し,FDGAN(Facter Decomposed Generative Adversarial Networks)を提案する。これを実現するために、まずノイズベクトルから画像を生成し、その後、生成者および判別者の両方に正規化層に埋め込まれた文を適用する。また,テキスト画像機能を調整するための付加型ノルム層も設計した。実験の結果,雑音の分解と文の埋め込みは,テキストから画像への合成において潜在因子を分離し,生成モデルをより効率的にすることが示された。ベースラインと比較すると、FDGANは性能が向上し、パラメータが少ない。

Prior works about text-to-image synthesis typically concatenated the sentence embedding with the noise vector, while the sentence embedding and the noise vector are two different factors, which control the different aspects of the generation. Simply concatenating them will entangle the latent factors and encumber the generative model. In this paper, we attempt to decompose these two factors and propose Factor Decomposed Generative Adversarial Networks~(FDGAN). To achieve this, we firstly generate images from the noise vector and then apply the sentence embedding in the normalization layer for both generator and discriminators. We also design an additive norm layer to align and fuse the text-image features. The experimental results show that decomposing the noise and the sentence embedding can disentangle latent factors in text-to-image synthesis, and make the generative model more efficient. Compared with the baseline, FDGAN can achieve better performance, while fewer parameters are used.

翻訳日:2023-03-27 15:37:19 公開日:2023-03-24

# Pre-RadGraphFormer:X線から放射線グラフを生成するための事前知識強化変換器

Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays ( http://arxiv.org/abs/2303.13818v1 )

ライセンス: Link先を確認

Yiheng Xiong, Jingsong Liu, Kamilia Zaripova, Sahand Sharifzadeh, Matthias Keicher, Nassir Navab

(参考訳) ラジオグラフィーグラフを用いた自由テキストX線写真からの構造化された臨床情報の抽出は, レポートジェネレーション法の臨床正当性を評価する上で有用であることが示されている。しかし胸部X線像(CXR)からの放射線線図の直接生成は試みられていない。このギャップに対処するために,確率的知識グラフ(PKG)の形で事前知識を持つトランスフォーマーモデルを用いて,CXR画像から直接ラジオロジーグラフを生成する,Preside-RadGraphFormerという新しい手法を提案する。 PKGは、解剖学的構造や医学的観察を含む放射線学の実体間の統計的関係をモデル化する。この追加の文脈情報は、エンティティと関係抽出の精度を高める。生成されたラジオロジーグラフは、自由テキストや構造化レポートの生成や病理の多ラベル分類など、様々な下流タスクに適用することができる。提案手法は,CXR画像から直接ラジオグラフィーグラフを生成するための有望な手法であり,医用画像解析や臨床診断に有意な可能性を秘めている。

The extraction of structured clinical information from free-text radiology reports in the form of radiology graphs has been demonstrated to be a valuable approach for evaluating the clinical correctness of report-generation methods. However, the direct generation of radiology graphs from chest X-ray (CXR) images has not been attempted. To address this gap, we propose a novel approach called Prior-RadGraphFormer that utilizes a transformer model with prior knowledge in the form of a probabilistic knowledge graph (PKG) to generate radiology graphs directly from CXR images. The PKG models the statistical relationship between radiology entities, including anatomical structures and medical observations. This additional contextual information enhances the accuracy of entity and relation extraction. The generated radiology graphs can be applied to various downstream tasks, such as free-text or structured reports generation and multi-label classification of pathologies. Our approach represents a promising method for generating radiology graphs directly from CXR images, and has significant potential for improving medical image analysis and clinical decision-making.

翻訳日:2023-03-27 15:37:03 公開日:2023-03-24

# ABLE-NeRF:ニューラルラジアンスフィールドのための学習可能な埋め込みによる注意に基づくレンダリング

ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field ( http://arxiv.org/abs/2303.13817v1 )

ライセンス: Link先を確認

Zhe Jun Tang, Tat-Jen Cham, Haiyu Zhao

(参考訳) neural radiance field(nerf)は、連続ボリュームシーン機能を最適化して3dシーンを表現する一般的な方法である。ボリュームレンダリング(VR)を適用した大きな成功は、ビュー依存効果を生み出す際のアキレスのヒールでもある。その結果、光沢と透明な表面はしばしば濁っている。これらのアーティファクトを減らすための治療法は、後ろ向きの正常なボリュームを除外することで、このVR方程式を制約することである。このアプローチは光沢のある表面をレンダリングすることに成功したが、半透明なオブジェクトはいまだに表現に乏しい。本稿では,光線に沿ったボリュームに自己注意型フレームワークを導入することで,物理ベースのVRアプローチに代わる手法を提案する。また、光プローブを利用してシーンを透過するローカル照明を記憶する現代のゲームエンジンにインスパイアされ、Learningable Embeddingsを組み込んでシーン内のビュー依存効果をキャプチャする。 ABLE-NeRFと呼ばれる本手法は,レンダリングにおける光沢表面を著しく低減し,先行技術に欠ける現実的な半透明表面を生成する。 Blenderデータセットでは、ABLE-NeRFはSOTAの結果を達成し、3つの画像品質指標PSNR、SSIM、LPIPSでRef-NeRFを上回っている。

Neural Radiance Field (NeRF) is a popular method in representing 3D scenes by optimising a continuous volumetric scene function. Its large success which lies in applying volumetric rendering (VR) is also its Achilles' heel in producing view-dependent effects. As a consequence, glossy and transparent surfaces often appear murky. A remedy to reduce these artefacts is to constrain this VR equation by excluding volumes with back-facing normal. While this approach has some success in rendering glossy surfaces, translucent objects are still poorly represented. In this paper, we present an alternative to the physics-based VR approach by introducing a self-attention-based framework on volumes along a ray. In addition, inspired by modern game engines which utilise Light Probes to store local lighting passing through the scene, we incorporate Learnable Embeddings to capture view dependent effects within the scene. Our method, which we call ABLE-NeRF, significantly reduces `blurry' glossy surfaces in rendering and produces realistic translucent surfaces which lack in prior art. In the Blender dataset, ABLE-NeRF achieves SOTA results and surpasses Ref-NeRF in all 3 image quality metrics PSNR, SSIM, LPIPS.

翻訳日:2023-03-27 15:36:44 公開日:2023-03-24

# キーレスアテンションに基づくディープニューラルネットワークによる顔と歩行の特徴のマルチモーダル・アダプティブフュージョン

Multimodal Adaptive Fusion of Face and Gait Features using Keyless attention based Deep Neural Networks for Human Identification ( http://arxiv.org/abs/2303.13814v1 )

ライセンス: Link先を確認

Ashwin Prakash, Thejaswin S, Athira Nambiar and Alexandre Bernardino

(参考訳) バイオメトリックスは視覚ベースの監視アプリケーションにおいて重要な役割を果たす。ガイトのようなソフトバイオメトリックスは、人物認識や再同定といった監視タスクで顔と共に広く使われている。それにもかかわらず、実用的なシナリオでは、古典的な融合技術は個々のユーザや外部環境の変化にはあまり反応しない。そこで本研究では,キーレスアテンション深層ニューラルネットワークを活用することで,歩行と顔の動的取り込みを実現するための適応型マルチバイオメトリックフュージョン戦略を提案する。本研究では,カメラの視点や距離など様々な外部要因について検討した。大規模な実験により,提案モデルの性能は最先端モデルと比較して優れていた。

Biometrics plays a significant role in vision-based surveillance applications. Soft biometrics such as gait is widely used with face in surveillance tasks like person recognition and re-identification. Nevertheless, in practical scenarios, classical fusion techniques respond poorly to changes in individual users and in the external environment. To this end, we propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks. Various external factors such as viewpoint and distance to the camera, are investigated in this study. Extensive experiments have shown superior performanceof the proposed model compared with the state-of-the-art model.

翻訳日:2023-03-27 15:36:22 公開日:2023-03-24

# Generalist:自然とロバストな一般化の分離

Generalist: Decoupling Natural and Robust Generalization ( http://arxiv.org/abs/2303.13813v1 )

ライセンス: Link先を確認

Hongjun Wang, Yisen Wang

(参考訳) 標準訓練によって得られた深層ニューラルネットワークは、常に敵の例に苦しめられている。敵の訓練は敵の例から防御する能力を示すが、残念ながら自然の一般化は必然的に減少する。この問題に対処するため,共同トレーニングから自然の一般化と堅牢な一般化を分離し,それぞれ異なるトレーニング戦略を定式化する。具体的には,これらの2つの一般化誤差に対する期待値の国際的損失を最小化する代わりに,基本学習者をタスク認識戦略で同時に訓練し,それぞれの分野に特化できるようにする,「emph{Generalist}」というバイエキスパートフレームワークを提案する。ベース学習者のパラメータを収集して結合し、トレーニングプロセス中の間隔でグローバル学習者を形成する。グローバル学習者は、継続トレーニングの初期化パラメータとして、ベース学習者に配布される。理論的には、基礎学習者が十分に訓練された後、ジェネラリストのリスクが低下することを証明する。広範囲な実験により、自然例に対して高い精度を達成し、敵例に対するかなりの堅牢性を維持しながら、ジェネラリストの適用性を検証する。コードはhttps://github.com/PKU-ML/Generalist.comで入手できる。

Deep neural networks obtained by standard training have been constantly plagued by adversarial examples. Although adversarial training demonstrates its capability to defend against adversarial examples, unfortunately, it leads to an inevitable drop in the natural generalization. To address the issue, we decouple the natural generalization and the robust generalization from joint training and formulate different training strategies for each one. Specifically, instead of minimizing a global loss on the expectation over these two generalization errors, we propose a bi-expert framework called \emph{Generalist} where we simultaneously train base learners with task-aware strategies so that they can specialize in their own fields. The parameters of base learners are collected and combined to form a global learner at intervals during the training process. The global learner is then distributed to the base learners as initialized parameters for continued training. Theoretically, we prove that the risks of Generalist will get lower once the base learners are well trained. Extensive experiments verify the applicability of Generalist to achieve high accuracy on natural examples while maintaining considerable robustness to adversarial ones. Code is available at https://github.com/PKU-ML/Generalist.

翻訳日:2023-03-27 15:36:13 公開日:2023-03-24

# Evidence-aware multi-modal data fusion と人工膝置換予測への応用

Evidence-aware multi-modal data fusion and its application to total knee replacement prediction ( http://arxiv.org/abs/2303.13810v1 )

ライセンス: Link先を確認

Xinwen Liu, Jing Wang, S. Kevin Zhou, Craig Engstrom, Shekhar S. Chandra

(参考訳) ディープニューラルネットワークは、人工膝置換(TKR)などの医療状態を予測するために広く研究されている。画像データ, 臨床変数, 人口統計情報などの異なる形態のデータが補完的情報を提供し, 予測精度を共に向上できることを示した。しかし、様々なモダリティのデータソースは必ずしも高品質であるとは限らないし、それぞれのモダリティは医療条件の部分的な情報しか持たないかもしれない。したがって、異なるモダリティからの予測は反対であり、最終的な予測はそのような矛盾が存在する場合に失敗する可能性がある。したがって、最終決定を行う際に、各ソースデータの信頼性と予測出力を考慮することが重要である。本稿では,DST(Dempster-Shafer theory)に基づくエビデンス対応マルチモーダルデータ融合フレームワークを提案する。バックボーンモデルにはイメージブランチ、非イメージブランチ、フュージョンブランチが含まれる。各ブランチには、抽出された特徴を入力として出力するエビデンスネットワークがあり、これは現在のブランチからの出力の信頼性を表すように設計されている。複数の支店からの証拠スコアと共に出力確率をデンプスターの組合せ則と組み合わせて最終的な予測を行う。 TKR予測タスクのための公共OAイニシアチブ(OAI)データセットの実験結果は、様々なバックボーンモデル上で提案された融合戦略の優位性を示している。

Deep neural networks have been widely studied for predicting a medical condition, such as total knee replacement (TKR). It has shown that data of different modalities, such as imaging data, clinical variables and demographic information, provide complementary information and thus can improve the prediction accuracy together. However, the data sources of various modalities may not always be of high quality, and each modality may have only partial information of medical condition. Thus, predictions from different modalities can be opposite, and the final prediction may fail in the presence of such a conflict. Therefore, it is important to consider the reliability of each source data and the prediction output when making a final decision. In this paper, we propose an evidence-aware multi-modal data fusion framework based on the Dempster-Shafer theory (DST). The backbone models contain an image branch, a non-image branch and a fusion branch. For each branch, there is an evidence network that takes the extracted features as input and outputs an evidence score, which is designed to represent the reliability of the output from the current branch. The output probabilities along with the evidence scores from multiple branches are combined with the Dempster's combination rule to make a final prediction. Experimental results on the public OA initiative (OAI) dataset for the TKR prediction task show the superiority of the proposed fusion strategy on various backbone models.

翻訳日:2023-03-27 15:35:52 公開日:2023-03-24

# 大規模言語モデルにおけるヒューマンライクな翻訳評価を可能にする誤り解析プロンプト:ChatGPTを事例として

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT ( http://arxiv.org/abs/2303.13809v1 )

ライセンス: Link先を確認

Qingyu Lu, Baopu Qiu, Liang Ding, Liping Xie, Dacheng Tao

(参考訳) 生成型大規模言語モデル(LLM)、例えばChatGPTは、機械翻訳、質問応答、テキスト要約、自然言語理解など、いくつかのNLPタスクにおいて顕著な習熟性を示している。近年の研究では,機械翻訳(MT)の品質評価にChatGPTを用いることで,システムレベルでの最先端性能が向上するが,セグメントレベルでは性能が低下することが示されている。 MT品質評価におけるLCMの性能向上を図るため,いくつかのプロンプト法の検討を行った。その結果、連鎖的思考と誤り分析を組み合わせることで、chatgptのようなllmは、システムレベルとセグメントレベルの両方において、人間的mt評価を生成できるという新しいプロンプト法である \textbf{\texttt{error analysis prompting}} が得られた。さらに, MT評価器としてのChatGPTの限界として, 不安定なスコアリングや1つのクエリで複数の翻訳が提供される場合のバイアスなどを見出した。本研究の目的は,ChatGPTの翻訳品質を適切に評価する上で,テキスト内学習のためのプロンプトを設計する上で,様々なトリックを提供することである。本報告は,メトリクスの精度と信頼性を両立させることで,LSMによる翻訳評価の分野を推し進めることに新たな光を当てることが期待できる。このプロジェクトは \url{https://github.com/Coldmist-Lu/ErrorAnalysis_Prompt} で見ることができる。

Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks such as machine translation, question answering, text summarization, and natural language understanding. Recent research has shown that utilizing ChatGPT for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level but performs poorly at the segment level. To further improve the performance of LLMs on MT quality assessment, we conducted an investigation into several prompting methods. Our results indicate that by combining Chain-of-Thoughts and Error Analysis, a new prompting method called \textbf{\texttt{Error Analysis Prompting}}, LLMs like ChatGPT can \textit{generate human-like MT evaluations at both the system and segment level}. Additionally, we discovered some limitations of ChatGPT as an MT evaluator, such as unstable scoring and biases when provided with multiple translations in a single query. Our findings aim to provide a preliminary experience for appropriately evaluating translation quality on ChatGPT while offering a variety of tricks in designing prompts for in-context learning. We anticipate that this report will shed new light on advancing the field of translation evaluation with LLMs by enhancing both the accuracy and reliability of metrics. The project can be found in \url{https://github.com/Coldmist-Lu/ErrorAnalysis_Prompt}.

翻訳日:2023-03-27 15:35:29 公開日:2023-03-24

# marl-jax: 社会一般化のための多エージェント強化学習フレームワーク

marl-jax: Multi-agent Reinforcement Leaning framework for Social Generalization ( http://arxiv.org/abs/2303.13808v1 )

ライセンス: Link先を確認

Kinal Mehta, Anuj Mahajan, Pawan Kumar

(参考訳) 強化学習(RL)の最近の進歩は、多くのエキサイティングな応用につながっている。これらの進歩は、アルゴリズムと工学の両方の改善によって推進され、RLエージェントの訓練が高速化された。エージェントの社会的一般化を訓練・評価するためのマルチエージェント強化学習ソフトウェアであるmarl-jaxを提案する。このパッケージは、マルチエージェント環境でエージェントの集団を訓練し、さまざまなバックグラウンドエージェントに一般化する能力を評価するために設計されている。 DeepMindのJAXエコシステム~\cite{deepmind2020jax}の上に構築されており、DeepMindが開発したRLエコシステムを活用している。当社のフレームワークであるmarl-jaxは,複数のエージェントと協調的かつ競争的かつ同時動作する環境で動作する。このパッケージは、人口を訓練し、その一般化能力を評価するための直感的でユーザフレンドリなコマンドラインインターフェースを提供する。結論として、Marl-jaxは、MARLの文脈における社会的一般化の探求に興味を持つ研究者に貴重なリソースを提供する。 marl-jaxのオープンソースコードは以下の通りである。 \href{https://github.com/kinalmehta/marl-jax}{https://github.com/kinalmehta/marl-jax}

Recent advances in Reinforcement Learning (RL) have led to many exciting applications. These advancements have been driven by improvements in both algorithms and engineering, which have resulted in faster training of RL agents. We present marl-jax, a multi-agent reinforcement learning software package for training and evaluating social generalization of the agents. The package is designed for training a population of agents in multi-agent environments and evaluating their ability to generalize to diverse background agents. It is built on top of DeepMind's JAX ecosystem~\cite{deepmind2020jax} and leverages the RL ecosystem developed by DeepMind. Our framework marl-jax is capable of working in cooperative and competitive, simultaneous-acting environments with multiple agents. The package offers an intuitive and user-friendly command-line interface for training a population and evaluating its generalization capabilities. In conclusion, marl-jax provides a valuable resource for researchers interested in exploring social generalization in the context of MARL. The open-source code for marl-jax is available at: \href{https://github.com/kinalmehta/marl-jax}{https://github.com/kinalmehta/marl-jax}

翻訳日:2023-03-27 15:34:58 公開日:2023-03-24

# PFT-SSR:ステレオ画像超解像用パララックス核融合変圧器

PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution ( http://arxiv.org/abs/2303.13807v1 )

ライセンス: Link先を確認

Hansheng Guo, Juncheng Li, Guangwei Gao, Zhi Li, Tieyong Zeng

(参考訳) ステレオ画像超解像は、双眼鏡システムが提供する補助情報を活用することにより、画像超解像の性能を高めることを目的としている。従来の手法は有望な結果を得たが、クロスビューやイントラビューの情報を十分に活用しなかった。両眼画像の可能性をさらに解き放つために,Parallax Fusion Transformer (PFT) と呼ばれるトランスフォーマーベースの新しいパララックス融合モジュールを提案する。 PFTは、クロスビュー情報を利用するためにクロスビューフュージョントランス(CVFT)と、イントラビュー機能改善のためのイントラビューリファインメントトランス(IVRT)を使用している。一方,機能抽出とSR再構成のバックボーンとしてSwin Transformerを採用し,PFT-SSRと呼ばれる純粋なTransformerアーキテクチャを構築した。大規模な実験とアブレーション研究により、PFT-SSRは競争的な結果を得ることができ、ほとんどのSOTA法より優れていることが示されている。ソースコードはhttps://github.com/MIVRC/PFT-PyTorchで入手できる。

Stereo image super-resolution aims to boost the performance of image super-resolution by exploiting the supplementary information provided by binocular systems. Although previous methods have achieved promising results, they did not fully utilize the information of cross-view and intra-view. To further unleash the potential of binocular images, in this letter, we propose a novel Transformerbased parallax fusion module called Parallax Fusion Transformer (PFT). PFT employs a Cross-view Fusion Transformer (CVFT) to utilize cross-view information and an Intra-view Refinement Transformer (IVRT) for intra-view feature refinement. Meanwhile, we adopted the Swin Transformer as the backbone for feature extraction and SR reconstruction to form a pure Transformer architecture called PFT-SSR. Extensive experiments and ablation studies show that PFT-SSR achieves competitive results and outperforms most SOTA methods. Source code is available at https://github.com/MIVRC/PFT-PyTorch.

翻訳日:2023-03-27 15:34:42 公開日:2023-03-24

# ガラスを通して見る:透明な容器の中の物体のニューラルな3D再構成

Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container ( http://arxiv.org/abs/2303.13805v1 )

ライセンス: Link先を確認

Jinguang Tong, Sundaram Muthu, Fahira Afzal Maken, Chuong Nguyen, Hongdong Li

(参考訳) 本稿では,透明な囲いの中に閉じ込められた物体の3次元形状を復元する新たな問題を定義する。また,この課題を解決するための新しい手法を提案する。透明な囲いは、空気やガラスなどの異なる伝搬媒体間の界面における複数の光反射と屈折の課題をもたらす。これらの多重反射と屈折は深刻な画像歪みを引き起こし、単一の視点仮定を無効にする。したがって、これらの物体の3次元形状は、運動による伝統的な構造や現代の神経再構成法といった既存の手法で確実に再構築することはできない。この問題を、透明な囲いの内部と外側の2つの異なる部分空間として明示的にモデル化することで解決する。我々は,内部部分空間の形状と外観を暗黙的に表現する既存のニューラルリコンストラクション法(neus)を用いる。複雑な光相互作用を説明するため,ボリュームレンダリングとレイトレーシングを組み合わせたハイブリッドレンダリング戦略を開発した。次に,実画像とハイブリッド画像の違いを最小化することで,モデルの基本形状と外観を復元する。本手法を合成データと実データの両方で評価する。実験の結果,本手法は最先端(SOTA)法よりも優れていた。コードとデータはhttps://github.com/hirotong/ReNeuSで入手できる。

In this paper, we define a new problem of recovering the 3D geometry of an object confined in a transparent enclosure. We also propose a novel method for solving this challenging problem. Transparent enclosures pose challenges of multiple light reflections and refractions at the interface between different propagation media e.g. air or glass. These multiple reflections and refractions cause serious image distortions which invalidate the single viewpoint assumption. Hence the 3D geometry of such objects cannot be reliably reconstructed using existing methods, such as traditional structure from motion or modern neural reconstruction methods. We solve this problem by explicitly modeling the scene as two distinct sub-spaces, inside and outside the transparent enclosure. We use an existing neural reconstruction method (NeuS) that implicitly represents the geometry and appearance of the inner subspace. In order to account for complex light interactions, we develop a hybrid rendering strategy that combines volume rendering with ray tracing. We then recover the underlying geometry and appearance of the model by minimizing the difference between the real and hybrid rendered images. We evaluate our method on both synthetic and real data. Experiment results show that our method outperforms the state-of-the-art (SOTA) methods. Codes and data will be available at https://github.com/hirotong/ReNeuS

翻訳日:2023-03-27 15:34:23 公開日:2023-03-24

# 分布シフトによる異常検出

Anomaly Detection under Distribution Shift ( http://arxiv.org/abs/2303.13845v1 )

ライセンス: Link先を確認

Tri Cao, Jiawen Zhu, and Guansong Pang

(参考訳) 異常検出(AD)は、通常のトレーニングサンプルのセットからパターンを学習し、テストデータの異常サンプルを特定することを目的とした、重要な機械学習タスクである。既存のad研究のほとんどは、トレーニングデータとテストデータが同じデータ分布から引き出されると仮定しているが、テストデータは、新しい照明条件、オブジェクトのポーズ、背景の外観など、様々な自然な変化のために、多くの現実世界のアプリケーションで発生する大きな分散シフトを持つ可能性がある。本稿では,分布シフトによる異常検出の問題点を考察し,広く使用されている3つのADおよびアウト・オブ・ディストリビューション(OOD)一般化データセットの性能ベンチマークを確立する。ラベル付き異常データの欠如により,AD設定への最新のOOD一般化手法の簡単な適応が効果的に機能しないことを示す。さらに, 訓練段階と推論段階の両方において, 分布内サンプルとood正規サンプルの分布ギャップを最小化し, 多様な分布シフトに対する新しいロバストad手法を導入する。 3つのデータセットの広範な実験結果から,本手法は分布シフトの異なるデータに対して,現状のAD法とOOD一般化法を大幅に上回るが,分布内データの検出精度は維持されている。

Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data. Most existing AD studies assume that the training and test data are drawn from the same data distribution, but the test data can have large distribution shifts arising in many real-world applications due to different natural variations such as new lighting conditions, object poses, or background appearances, rendering existing AD methods ineffective in such cases. In this paper, we consider the problem of anomaly detection under distribution shift and establish performance benchmarks on three widely-used AD and out-of-distribution (OOD) generalization datasets. We demonstrate that simple adaptation of state-of-the-art OOD generalization methods to AD settings fails to work effectively due to the lack of labeled anomaly data. We further introduce a novel robust AD approach to diverse distribution shifts by minimizing the distribution gap between in-distribution and OOD normal samples in both the training and inference stages in an unsupervised way. Our extensive empirical results on the three datasets show that our approach substantially outperforms state-of-the-art AD methods and OOD generalization methods on data with various distribution shifts, while maintaining the detection accuracy on in-distribution data.

翻訳日:2023-03-27 15:28:20 公開日:2023-03-24

# CompoNeRF:編集可能な3Dシーンレイアウトによるテキスト誘導多目的合成型NeRF

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout ( http://arxiv.org/abs/2303.13843v1 )

ライセンス: Link先を確認

Yiqi Lin, Haotian Bai, Sijia Li, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang

(参考訳) 最近の研究により、ニューラルネットワークの放射場(nerfs)と事前学習された拡散モデルの組み合わせは、テキストから3dを生成する上で大きな可能性を秘めていることが示されている。テキストと画像の拡散モデルは本質的に制約がなく、オブジェクトのセマンティクスと特定の3D構造を正確に関連付ける能力が低い。この問題に対処するため,我々はCompoNeRFと呼ばれる新しいフレームワークを提案し,編集可能な3Dシーンレイアウトを明示的に組み込んで,単一のオブジェクト(ローカル)と全シーン(グローバル)レベルで効果的なガイダンスを提供する。まず、複数オブジェクトのテキストを、オブジェクト固有の3Dボックス座標とテキストプロンプトに関連付けられた複数のローカルNeRFを含む編集可能な3Dシーンレイアウトとして解釈する。次に,局所的NeRFから合成潜時特徴を校正するグローバルMPPを導入し,異なる局所的NeRF間のビュー一貫性を驚くほど向上させる。最後に,グローバルレベルとローカルレベルに関するテキストガイダンスを,対応するビューを通じて適用することで,ガイダンスあいまいさを回避できる。このようにして、当社のCompoNeRFは、3Dレイアウトやテキストプロンプトを操作することで、訓練されたローカルNeRFのフレキシブルなシーン編集と再構成を可能にします。オープンソースの安定拡散モデルを利用することで,編集可能な3dシーンレイアウトを通じて,テキスト誘導型多目的合成の潜在的方向を開放しつつ,忠実かつ編集可能なテキストから3dへの結果を生成することができる。

Recent research endeavors have shown that combining neural radiance fields (NeRFs) with pre-trained diffusion models holds great potential for text-to-3D generation.However, a hurdle is that they often encounter guidance collapse when rendering complex scenes from multi-object texts. Because the text-to-image diffusion models are inherently unconstrained, making them less competent to accurately associate object semantics with specific 3D structures. To address this issue, we propose a novel framework, dubbed CompoNeRF, that explicitly incorporates an editable 3D scene layout to provide effective guidance at the single object (i.e., local) and whole scene (i.e., global) levels. Firstly, we interpret the multi-object text as an editable 3D scene layout containing multiple local NeRFs associated with the object-specific 3D box coordinates and text prompt, which can be easily collected from users. Then, we introduce a global MLP to calibrate the compositional latent features from local NeRFs, which surprisingly improves the view consistency across different local NeRFs. Lastly, we apply the text guidance on global and local levels through their corresponding views to avoid guidance ambiguity. This way, our CompoNeRF allows for flexible scene editing and re-composition of trained local NeRFs into a new scene by manipulating the 3D layout or text prompt. Leveraging the open-source Stable Diffusion model, our CompoNeRF can generate faithful and editable text-to-3D results while opening a potential direction for text-guided multi-object composition via the editable 3D scene layout.

翻訳日:2023-03-27 15:27:55 公開日:2023-03-24

# fishdreamer: 画像の切り出しとセグメンテーションの統合による魚眼意味完成に向けて

FishDreamer: Towards Fisheye Semantic Completion via Unified Image Outpainting and Segmentation ( http://arxiv.org/abs/2303.13842v1 )

ライセンス: Link先を確認

Hao Shi, Yu Li, Kailun Yang, Jiaming Zhang, Kunyu Peng, Alina Roitberg, Yaozu Ye, Huajian Ni, Kaiwei Wang, Rainer Stiefelhagen

(参考訳) 本稿では,魚眼画像の密接なテクスチャ,構造,意味を,センサ・フィールド・オブ・ビュー(FoV)を超えて推定する,魚眼セマンティック・コンプリート(FSC)の新たな課題を提起する。魚眼カメラは通常のピンホールカメラよりもfovが大きいが、そのユニークな特殊な撮像モデルは、画像平面の端にある盲目領域を自然に導く。これは、セマンティックセグメンテーションのような重要な知覚タスクがブラインドゾーン内で非常に困難になるため、安全クリティカルなアプリケーションには最適である。以前の作品では、アウトFoVのアウトペイントとインFoVのセグメンテーションは別々に検討されていた。しかし、これらの2つのタスクは実際には密結合である。魚眼画像とシーン意味論の密接な絡み合いを共同評価するために,新たな極性認識クロスアテンションモジュール (pca) により拡張されたvitsを活用して,異なる極性分布を考慮しつつ,密接な文脈と意味的一貫性のあるコンテンツ生成をガイドする新しい魚眼モデルを提案する。新たなタスクとアーキテクチャの貢献に加えて,Cityscapes-BFとKITTI360-BFデータセットを導出して,この新しいトラックのトレーニングと評価を容易にする。実験により,提案手法が各課題を単独で解決する手法を上回り,魚眼意味補完の代替手法を上回ったことを示す。コードとデータセットはhttps://github.com/masterhow/fishdreamerで入手できる。

This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applications since important perception tasks, such as semantic segmentation, become very challenging within the blind zone. Previous works considered the out-FoV outpainting and in-FoV segmentation separately. However, we observe that these two tasks are actually closely coupled. To jointly estimate the tightly intertwined complete fisheye image and scene semantics, we introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation while considering different polar distributions. In addition to the contribution of the novel task and architecture, we also derive Cityscapes-BF and KITTI360-BF datasets to facilitate training and evaluation of this new track. Our experiments demonstrate that the proposed FishDreamer outperforms methods solving each task in isolation and surpasses alternative approaches on the Fisheye Semantic Completion. Code and datasets will be available at https://github.com/MasterHow/FishDreamer.

翻訳日:2023-03-27 15:27:25 公開日:2023-03-24

# HRDoc:文書構造の階層的再構築に向けたデータセットとベースライン手法

HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures ( http://arxiv.org/abs/2303.13839v1 )

ライセンス: Link先を確認

Jiefeng Ma, Jun Du, Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Huihui Zhu, Cong Liu

(参考訳) 文書構造再構成の問題は、デジタルまたはスキャンされた文書を対応する意味構造に変換することである。既存の作品の多くは、各要素の境界を単一の文書ページに分割することに集中しており、複数ページの文書における意味構造の再構築を無視している。本稿では,NLPおよびCVフィールドに適した新しいタスクとして,文書構造の階層的再構築を提案する。新しいタスクでシステム性能をよりよく評価するために、2500のマルチページ文書と200万近いセマンティックユニットからなるHRDocという大規模なデータセットを構築した。 HRDocのすべてのドキュメントは、ルールベースの抽出器と人間のアノテーションから得られるカテゴリや関係を含む行レベルのアノテーションを持っている。さらに,この問題に対処するためのエンコーダデコーダに基づく階層型文書構造解析システム (DSPS) を提案する。マルチモーダル双方向エンコーダとソフトマスク操作を備えた構造対応GRUデコーダを採用することにより,DSPSモデルはベースライン法を大きなマージンで上回る。すべてのスクリプトとデータセットはhttps://github.com/jfma-USTC/HRDocで公開される。

The problem of document structure reconstruction refers to converting digital or scanned documents into corresponding semantic structures. Most existing works mainly focus on splitting the boundary of each element in a single document page, neglecting the reconstruction of semantic structure in multi-page documents. This paper introduces hierarchical reconstruction of document structures as a novel task suitable for NLP and CV fields. To better evaluate the system performance on the new task, we built a large-scale dataset named HRDoc, which consists of 2,500 multi-page documents with nearly 2 million semantic units. Every document in HRDoc has line-level annotations including categories and relations obtained from rule-based extractors and human annotators. Moreover, we proposed an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem. By adopting a multi-modal bidirectional encoder and a structure-aware GRU decoder with soft-mask operation, the DSPS model surpass the baseline method by a large margin. All scripts and datasets will be made publicly available at https://github.com/jfma-USTC/HRDoc.

翻訳日:2023-03-27 15:26:48 公開日:2023-03-24

# ドライバキャラクタの編集:対話型交通シミュレーションのための社会的制御可能な行動生成

Editing Driver Character: Socially-Controllable Behavior Generation for Interactive Traffic Simulation ( http://arxiv.org/abs/2303.13830v1 )

ライセンス: Link先を確認

Wei-Jer Chang, Chen Tang, Chenran Li, Yeping Hu, Masayoshi Tomizuka, and Wei Zhan

(参考訳) 交通シミュレーションは、自動運転計画システムの評価と改善において重要な役割を果たす。公道に配備された後、自動運転車は異なる社会的好み(利己的あるいは礼儀正しい人間ドライバーなど)を持つ人間の道路参加者と対話する必要がある。自動運転車が、異なる対話的な交通シナリオにおいて安全かつ効率的な操作を行うことを保証するため、シミュレーション環境で異なる社会的特性を持つ反応性エージェントに対して、自動運転車を評価することができる必要がある。本研究では,実世界の運転データから学習することで,現実的かつ人間ライクな軌道生成を保証しつつ,生成した軌道の便宜レベルを指定できる社会的制御可能な行動生成(SCBG)モデルを提案する。具体的には,実世界の運転データから学習した限界的および条件的行動予測モデルを活用して,運転行動の資質を定量化するための新規かつ微分可能な尺度を定式化する。提案手法により,実世界の運転データからトラジェクトリのオペレーショナルレベルを自動ラベルし,入力されたオペレーショナル値に基づいてSCBGモデル生成トラジェクトリを簡便に訓練することができる。 Waymo Open Motion Dataset (WOMD) 上でSCBGモデルを検討した結果,SCBGモデルを制御して,所望の礼儀正しい運転行動を生成することができた。興味深いことに、SCBGモデルでは、シナリオによって異なる動きパターンを識別できることが判明した。

Traffic simulation plays a crucial role in evaluating and improving autonomous driving planning systems. After being deployed on public roads, autonomous vehicles need to interact with human road participants with different social preferences (e.g., selfish or courteous human drivers). To ensure that autonomous vehicles take safe and efficient maneuvers in different interactive traffic scenarios, we should be able to evaluate autonomous vehicles against reactive agents with different social characteristics in the simulation environment. We propose a socially-controllable behavior generation (SCBG) model for this purpose, which allows the users to specify the level of courtesy of the generated trajectory while ensuring realistic and human-like trajectory generation through learning from real-world driving data. Specifically, we define a novel and differentiable measure to quantify the level of courtesy of driving behavior, leveraging marginal and conditional behavior prediction models trained from real-world driving data. The proposed courtesy measure allows us to auto-label the courtesy levels of trajectories from real-world driving data and conveniently train an SCBG model generating trajectories based on the input courtesy values. We examined the SCBG model on the Waymo Open Motion Dataset (WOMD) and showed that we were able to control the SCBG model to generate realistic driving behaviors with desired courtesy levels. Interestingly, we found that the SCBG model was able to identify different motion patterns of courteous behaviors according to the scenarios.

翻訳日:2023-03-27 15:26:31 公開日:2023-03-24

# 小型変圧器を用いた複合型ウェーハ欠陥パターン認識

Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers ( http://arxiv.org/abs/2303.13827v1 )

ライセンス: Link先を確認

Nitish Shukla

(参考訳) ウェハーの製造は何千ものステップを伴う複雑な作業です。ウェハマップの欠陥パターン認識(DPR)は,問題の根本原因を見つけ,ウェハファウントリーの収量を改善するために重要である。混合型DPRは, 空間的特徴の変化, 欠陥の不確かさ, 存在する欠陥の数により, 単型DPRよりも複雑である。欠陥数と欠陥の種類を正確に予測するために, コンパクトな変形可能な畳み込み変圧器 (DC Transformer) を提案する。特に、DC Transformerは、学習可能な変形可能なカーネルとグローバル機能へのマルチヘッドによる、ウェハマップに存在するグローバル機能に焦点を当てている。提案手法は,ウェハマップと欠陥の関係を簡潔にモデル化する。 DC Transformerは38の欠陥パターンを含む実際のデータセットで評価される。実験結果から,DCトランスフォーマーは単型と混合型の両方の欠陥を認識するのに極めて優れた性能を示した。提案手法はモデルの現在の状態をかなりのマージンで上回る

Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately predict the number of defects as well as the types of defects, we propose a novel compact deformable convolutional transformer (DC Transformer). Specifically, DC Transformer focuses on the global features present in the wafer map by virtue of learnable deformable kernels and multi-head attention to the global features. The proposed method succinctly models the internal relationship between the wafer maps and the defects. DC Transformer is evaluated on a real dataset containing 38 defect patterns. Experimental results show that DC Transformer performs exceptionally well in recognizing both single and mixed-type defects. The proposed method outperforms the current state of the models by a considerable margin

翻訳日:2023-03-27 15:26:05 公開日:2023-03-24

# ゼロショット量子化におけるハードサンプルの重要事項

Hard Sample Matters a Lot in Zero-Shot Quantization ( http://arxiv.org/abs/2303.13826v1 )

ライセンス: Link先を確認

Huantong Li, Xiangmiao Wu, Fanbing Lv, Daihai Liao, Thomas H. Li, Yonggang Zhang, Bo Han, Mingkui Tan

(参考訳) ゼロショット量子化(ZSQ)は、完全精度モデルのトレーニング用データがアクセスできない場合に、ディープニューラルネットワークの圧縮と加速を約束する。 ZSQでは、合成サンプルを用いてネットワーク量子化を行うため、量子化モデルの性能は合成サンプルの品質に大きく依存する。しかし, 既存のZSQ法で構築した合成試料は, モデルにより容易に装着できることが判明した。したがって, これらの手法により得られた定量化モデルは, 硬質試料の顕著な性能劣化に悩まされる。この問題に対処するため,我々はHArdサンプル合成訓練(HAST)を提案する。具体的には、HASTはサンプルを合成する際に硬いサンプルに注意を払い、量子化されたモデルを訓練する際に合成サンプルが適合しにくくする。 HASTは、これらの2つのモデルによって抽出された特徴の類似性を保証するために、完全精度と量子化モデルによって抽出された特徴を整列する。大規模な実験により、HASTは既存のZSQ手法を著しく上回り、実データで定量化されるモデルに匹敵する性能を達成することが示された。

Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural networks when the data for training full-precision models are inaccessible. In ZSQ, network quantization is performed using synthetic samples, thus, the performance of quantized models depends heavily on the quality of synthetic samples. Nonetheless, we find that the synthetic samples constructed in existing ZSQ methods can be easily fitted by models. Accordingly, quantized models obtained by these methods suffer from significant performance degradation on hard samples. To address this issue, we propose HArd sample Synthesizing and Training (HAST). Specifically, HAST pays more attention to hard samples when synthesizing samples and makes synthetic samples hard to fit when training quantized models. HAST aligns features extracted by full-precision and quantized models to ensure the similarity between features extracted by these two models. Extensive experiments show that HAST significantly outperforms existing ZSQ methods, achieving performance comparable to models that are quantized with real data.

翻訳日:2023-03-27 15:25:47 公開日:2023-03-24

# HandNeRF: Animatable Interacting Handsのための神経放射場

HandNeRF: Neural Radiance Fields for Animatable Interacting Hands ( http://arxiv.org/abs/2303.13825v1 )

ライセンス: Link先を確認

Zhiyang Guo, Wengang Zhou, Min Wang, Li Li, Houqiang Li

(参考訳) そこで本研究では,ニューラル・ラジアンス・フィールド(neural radiance fields, nerf)を用いて手の動きを再現し,任意の視点からのジェスチャ・アニメーションのためのフォトリアリスティックな画像や映像のレンダリングを可能にする新しい枠組みを提案する。シングルハンドのマルチビュー画像やインタラクションハンドが与えられた場合、オフザシェルフスケルトン推定器がまず手ポーズのパラメータ化に使用される。そこで我々は,ポーズ駆動変形場を設計し,それらの異なるポーズから共通な標準空間への対応性を確立する。このような統一されたモデリングは、両手でほとんど観測されない領域における幾何学とテクスチャの手がかりを効率的に補完する。また, 咬合認識密度学習のためのガイダンスとして, ポーズプリエントを活用し, 擬似深度マップを生成する。さらに,色最適化のためのクロスドメインアライメントを実現するために,神経特徴蒸留法を提案する。我々は,提案したHandNeRFのメリットを検証するための広範な実験を行い,大規模なInterHand2.6Mデータセット上で定性的かつ定量的に,一連の最先端の結果を報告する。

We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands, enabling the rendering of photo-realistic images and videos for gesture animation from arbitrary views. Given multi-view images of a single hand or interacting hands, an off-the-shelf skeleton estimator is first employed to parameterize the hand poses. Then we design a pose-driven deformation field to establish correspondence from those different poses to a shared canonical space, where a pose-disentangled NeRF for one hand is optimized. Such unified modeling efficiently complements the geometry and texture cues in rarely-observed areas for both hands. Meanwhile, we further leverage the pose priors to generate pseudo depth maps as guidance for occlusion-aware density learning. Moreover, a neural feature distillation method is proposed to achieve cross-domain alignment for color optimization. We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results both qualitatively and quantitatively on the large-scale InterHand2.6M dataset.

翻訳日:2023-03-27 15:25:29 公開日:2023-03-24

# $k$NN Prompting: キャリブレーションのない近接的推論によるコンテキスト学習

$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference ( http://arxiv.org/abs/2303.13824v1 )

ライセンス: Link先を確認

Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, Yongdong Zhang

(参考訳) インコンテキスト・ラーニング (ICL) は、インコンテキスト・デモの即時完了条件として目標タスクを定式化し、LLMの利用が主流となっている。本稿では,コンテキスト長制限のためトレーニングデータではスケールアップできないという,この典型的な使用方法の前提を最初に明らかにする。また、既存の研究によれば、iclも様々なバイアスを負い、微妙な校正処理を必要とすることが示されている。両課題に対処するために,まず LLM を分散表現のトレーニングデータでクエリし,次に近くの隣人を参照してテストインスタンスを予測する,シンプルで効果的なソリューションである $k$NN Prompting を提唱する。我々は、その2倍の優位性を示す包括的な実験を行う。 1) Calibration-Free: $k$NN Promptingは、LSM出力分布とタスク固有のラベル空間を直接整列するのではなく、テストとトレーニングインスタンスを整列するためにそのような分布を利用する。数ショットのシナリオでは、最先端のキャリブレーションベースの手法よりも大幅に優れています。 2)Beyond-Context:$k$NN Promptingは、可能な限り多くのトレーニングデータを使って、より効果的にスケールアップでき、継続的な改善をもたらす。スケーリングの傾向は、2ショットから1024ショットまでの10桁、および0.8Bから30Bまでの様々なLLMスケールにまたがる。データスケーリングをモデルスケーリングにブリッジし、LLMデプロイメントの勾配のないパラダイムに新たな可能性をもたらす。コードは公開されている。

In-Context Learning (ICL), which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing utilization of LLMs. In this paper, we first disclose an actual predicament for this typical usage that it can not scale up with training data due to context length restriction. Besides, existing works have shown that ICL also suffers from various biases and requires delicate calibration treatment. To address both challenges, we advocate a simple and effective solution, $k$NN Prompting, which first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors. We conduct comprehensive experiments to demonstrate its two-fold superiority: 1) Calibration-Free: $k$NN Prompting does not directly align LLM output distribution with task-specific label space, instead leverages such distribution to align test and training instances. It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario. 2) Beyond-Context: $k$NN Prompting can further scale up effectively with as many training data as are available, continually bringing substantial improvements. The scaling trend holds across 10 orders of magnitude ranging from 2 shots to 1024 shots as well as different LLMs scales ranging from 0.8B to 30B. It successfully bridges data scaling into model scaling, and brings new potentials for the gradient-free paradigm of LLM deployment. Code is publicly available.

翻訳日:2023-03-27 15:25:09 公開日:2023-03-24

# マグノン量子系におけるマグノン遮断

Magnon blockade in magnon-qubit systems ( http://arxiv.org/abs/2303.13823v1 )

ライセンス: Link先を確認

Zhu-yao Jin and Jun Jing

(参考訳) マグノンと超伝導トランスモン量子ビットの直接相互作用によって構築されるハイブリッドシステムを考える。マグノンを弱く駆動し、クォービットを探索することで、単一のマグノンのレベルでの量子操作に対する収入を賄う高次マグノン遮断を実現することができる。 magnon-blockadeの提案は、magnon-qubit のトランスバーサル結合強度が、qubit と probing field のデチューニング、あるいは magnon と driving field のデチューニングと等価である場合に最適化される。この条件下では、同時間二階相関関数 $g^{(2)}(0)$ の解析式は、検出強度が駆動強度の約3倍であるときに最小化することができる。さらに、適切な駆動強度とシステム崩壊速度を選択することにより、マグノン遮断の度合いをさらに高めることができる。実験と関係のあるパラメータでは、相関関数は、キャビティ光力学系における光子遮断に対して約2桁低い$g^{(2)}(0)\sim10^{-7}$となる。また、マグノンとクビットの熱雑音による$g^{(2)}(0)$の効果と、2つの成分間の超長手相互作用についても論じる。封鎖に最適化された条件は、いまだに両方の非理想的な状況で保たれている。

We consider a hybrid system that is established by the direct interaction between a magnon and a superconducting transmon qubit. Through weakly driving the magnon and probing the qubit, a high-degree magnon blockade can be realized, which paves a revenue toward quantum manipulation at the level of a single magnon. Our magnon-blockade proposal is optimized when the magnon-qubit transversal coupling strength is equivalent to the detuning of the qubit and the probing field or that of the magnon and the driving field. Under this condition, the analytical expression of the equal-time second-order correlation function $g^{(2)}(0)$ can be minimized when the probing intensity is about three times of the driving intensity. Moreover, the degree of the magnon blockade could be further enhanced by choosing proper driving intensity and system decay rate. With experimental-relevant parameters, the correlation function attains $g^{(2)}(0)\sim10^{-7}$, about two orders lower than that for the photon blockade in cavity optomechanical systems. Also we discuss the effects on $g^{(2)}(0)$ from the thermal noise on magnon and qubit and the extra longitudinal interaction between the two components. Our optimized conditions for blockade still hold in both nonideal situations.

翻訳日:2023-03-27 15:24:43 公開日:2023-03-24

# クロスアテンショントランスを用いた医用画像セグメンテーション

Few Shot Medical Image Segmentation with Cross Attention Transformer ( http://arxiv.org/abs/2303.13867v1 )

ライセンス: Link先を確認

Yi Lin, Yufan Chen, Kwang-Ting Cheng, Hao Chen

(参考訳) 近年,医用画像分割が大きな進歩を遂げている。ディープラーニングベースのメソッドは、手動アノテーションで大量のデータを必要とするデータ格納技術として認識される。しかし、手動アノテーションは、ドメイン固有の専門知識を必要とする医療画像解析の分野では高価である。この課題に対処するために、少数のショットラーニングでは、少数の例から新しいクラスを学ぶことができる。本研究では,クロスマスク型アテンショントランスフォーマーをベースとした,数発の医用画像セグメンテーションのための新しいフレームワークCAT-Netを提案する。提案するネットワークは,支援画像と問合せ画像との相関関係をマイニングし,有用なフォアグラウンド情報のみに限定し,サポートプロトタイプと問合せ機能の両方の表現能力を高める。さらに,クエリイメージのセグメンテーションを反復的に洗練する反復的精錬フレームワークを設計し,サポート機能を促進する。提案手法を,Abd-CT,Abd-MRI,Card-MRIの3つの公開データセットで検証した。実験の結果,最先端手法と比較して優れた性能を示し,各成分の有効性を示した。受け入れ次第、私たちのメソッドのソースコードをリリースします。

Medical image segmentation has made significant progress in recent years. Deep learning-based methods are recognized as data-hungry techniques, requiring large amounts of data with manual annotations. However, manual annotation is expensive in the field of medical image analysis, which requires domain-specific expertise. To address this challenge, few-shot learning has the potential to learn new classes from only a few examples. In this work, we propose a novel framework for few-shot medical image segmentation, termed CAT-Net, based on cross masked attention Transformer. Our proposed network mines the correlations between the support image and query image, limiting them to focus only on useful foreground information and boosting the representation capacity of both the support prototype and query features. We further design an iterative refinement framework that refines the query image segmentation iteratively and promotes the support feature in turn. We validated the proposed method on three public datasets: Abd-CT, Abd-MRI, and Card-MRI. Experimental results demonstrate the superior performance of our method compared to state-of-the-art methods and the effectiveness of each component. we will release the source codes of our method upon acceptance.

翻訳日:2023-03-27 15:18:45 公開日:2023-03-24

# ヘルツレート大都市圏量子テレポーテーション

Hertz-rate metropolitan quantum teleportation ( http://arxiv.org/abs/2303.13866v1 )

ライセンス: Link先を確認

Si Shen, Chenzhi Yuan, Zichang Zhang, Hao Yu, Ruiming Zhang, Chuanrong Yang, Hao Li, Zhen Wang, You Wang, Guangwei Deng, Haizhi Song, Lixing You, Yunru Fan, Guangcan Guo, Qiang Zhou

(参考訳) 量子テレポーテーションは、未知の量子状態を遠くの量子ノード間で転送することができる。量子テレポーテーションの完全なポテンシャルを推し進めるためには、量子状態は長距離の高速で忠実に転送されなければならない。最近の目覚ましい進歩にもかかわらず、大都市圏のファイバネットワークにまたがる高速量子テレポーテーションシステムは極めて望まれている。ここでは、64kmのファイバーチャネル上で、7.1$\pm$0.4Hzの速度で独立光子によって運ばれる量子状態を転送する量子テレポーテーションシステムを示す。平均的な単光子忠実度は 90.6$\pm$2.6% であり、古典体制では最大忠実度 2/3 を超える。我々の結果は量子ネットワークにとって重要なマイルストーンであり、将来の量子インターネットへの量子絡み合いベースの情報応用への扉を開く。

Quantum teleportation can transfer an unknown quantum state between distant quantum nodes, which holds great promise in enabling large-scale quantum networks. To advance the full potential of quantum teleportation, quantum states must be faithfully transferred at a high rate over long distance. Despite recent impressive advances, a high-rate quantum teleportation system across metropolitan fiber networks is extremely desired. Here, we demonstrate a quantum teleportation system which transfers quantum states carried by independent photons at a rate of 7.1$\pm$0.4 Hz over 64-km-long fiber channel. An average single-photon fidelity of $\geqslant$ 90.6$\pm$2.6% is achieved, which exceeds the maximum fidelity of 2/3 in classical regime. Our result marks an important milestone towards quantum networks and opens the door to exploring quantum entanglement based informatic applications for the future quantum internet.

翻訳日:2023-03-27 15:18:25 公開日:2023-03-24

# MagicEye:視覚障害者の自立生活を目指す知的ウェアラブル

MagicEye: An Intelligent Wearable Towards Independent Living of Visually Impaired ( http://arxiv.org/abs/2303.13863v1 )

ライセンス: Link先を確認

Sibi C. Sethuraman, Gaurav R. Tadkapally, Saraju P. Mohanty, Gautam Galada and Anitha Subramanian

(参考訳) 視覚障害を持つ個人は、日常生活で多くの困難に直面している。視覚障害は、人の働き、ナビゲート、独立性を維持する能力を著しく損なうことがある。これは教育の限界、事故のリスクの増大、その他多くの問題を引き起こす可能性がある。そこで我々は,視覚障害者を支援する最先端のウェアラブルデバイスであるmagiceyeを提案する。 MagicEyeはカスタムトレーニングされたCNNベースのオブジェクト検出モデルを採用しており、日常生活で頻繁に遭遇する広範囲の屋内および屋外オブジェクトを認識することができる。合計35のクラスで、magiceyeが使用するニューラルネットワークは、オブジェクト検出において高いレベルの効率と精度を達成するために特別に設計されている。また、顔認識と通貨識別モジュールを備えており、視覚障害者に貴重な支援を提供する。さらにmagiceyeはナビゲーション用のgpsセンサーも備えており、ユーザーは簡単に動き回れるほか、物理的に接触せずに近くの物体を検知できる近接センサーも備えている。 MagicEyeは革新的で高度なウェアラブルデバイスで、視覚障害者が直面する多くの課題に対処するために設計されている。最先端の物体検出とナビゲーション機能を備えており、視覚障害者のニーズに合わせて調整されており、視覚障害者を支援する最も有望なソリューションの1つである。

Individuals with visual impairments often face a multitude of challenging obstacles in their daily lives. Vision impairment can severely impair a person's ability to work, navigate, and retain independence. This can result in educational limits, a higher risk of accidents, and a plethora of other issues. To address these challenges, we present MagicEye, a state-of-the-art intelligent wearable device designed to assist visually impaired individuals. MagicEye employs a custom-trained CNN-based object detection model, capable of recognizing a wide range of indoor and outdoor objects frequently encountered in daily life. With a total of 35 classes, the neural network employed by MagicEye has been specifically designed to achieve high levels of efficiency and precision in object detection. The device is also equipped with facial recognition and currency identification modules, providing invaluable assistance to the visually impaired. In addition, MagicEye features a GPS sensor for navigation, allowing users to move about with ease, as well as a proximity sensor for detecting nearby objects without physical contact. In summary, MagicEye is an innovative and highly advanced wearable device that has been designed to address the many challenges faced by individuals with visual impairments. It is equipped with state-of-the-art object detection and navigation capabilities that are tailored to the needs of the visually impaired, making it one of the most promising solutions to assist those who are struggling with visual impairments.

翻訳日:2023-03-27 15:18:08 公開日:2023-03-24

# クラスインクリメンタル学習のための2段階グラフネットワーク

Two-level Graph Network for Few-Shot Class-Incremental Learning ( http://arxiv.org/abs/2303.13862v1 )

ライセンス: Link先を確認

Hao Chen, Linyan Li, Fan Lyu, Fuyuan Hu, Zhenping Xia and Fenglei Xu

(参考訳) FSCIL(Few-shot class-incremental Learning)は、古いクラスの知識を忘れずに、いくつかのデータポイントから新しい概念を継続的に学習できる機械学習アルゴリズムを設計することを目的としている。難点は、新しいクラスからの限られたデータが、重大な過度な問題を引き起こすだけでなく、破滅的な忘れの問題も悪化させることにある。しかし、既存のFSCILメソッドはサンプルレベルとクラスレベルの意味関係を無視している。この論文では,サンプルレベルとクラスレベルのグラフニューラルネットワーク(SCGN, Sample-level and Class-level Graph Neural Network)という,FSCIL用の2レベルグラフネットワークを設計した。具体的には、SCGNモデルパラメータを事前に最適化するための新しいタスクとして、仮想小ショットタスクを合成する擬似漸進学習パラダイムをSCGNで設計する。サンプルレベルのグラフネットワークは、いくつかのサンプルの関係を利用して類似のサンプルを集約し、洗練されたクラスレベルの特徴を得る。クラスレベルのグラフネットワークは、新しいクラスのプロトタイプ機能と古いクラスのセマンティックコンフリクトを軽減することを目的としている。 SCGNは2レベルグラフネットワークを構築し、各数ショットクラスの潜在意味をFSCILで効果的に表現できるようにする。 3つの人気のあるベンチマークデータセットの実験により、我々の手法はベースラインを著しく上回り、新しい最先端の成果を顕著な優位性で設定することを示した。

Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points, without forgetting knowledge of old classes. The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. However, existing FSCIL methods ignore the semantic relationships between sample-level and class-level. % Using the advantage that graph neural network (GNN) can mine rich information among few samples, In this paper, we designed a two-level graph network for FSCIL named Sample-level and Class-level Graph Neural Network (SCGN). Specifically, a pseudo incremental learning paradigm is designed in SCGN, which synthesizes virtual few-shot tasks as new tasks to optimize SCGN model parameters in advance. Sample-level graph network uses the relationship of a few samples to aggregate similar samples and obtains refined class-level features. Class-level graph network aims to mitigate the semantic conflict between prototype features of new classes and old classes. SCGN builds two-level graph networks to guarantee the latent semantic of each few-shot class can be effectively represented in FSCIL. Experiments on three popular benchmark datasets show that our method significantly outperforms the baselines and sets new state-of-the-art results with remarkable advantages.

翻訳日:2023-03-27 15:17:48 公開日:2023-03-24

# 救世主か破壊者か?

Unleasing ChatGPT on the Metaverse: Savior or Destroyer? ( http://arxiv.org/abs/2303.13856v1 )

ライセンス: Link先を確認

Pengyuan Zhou

(参考訳) 人工知能(AI)技術の組み込み、特に自然言語処理(NLP)は、没入的で対話的なメタバース体験の開発にますます不可欠になりつつある。メタバースで注目を集めている人工知能ツールのひとつに、OpenAIがトレーニングした大規模な言語モデルであるChatGPTがある。この記事は、メタバースベースの教育、エンターテイメント、パーソナライゼーション、サポートにChatGPTを活用することの長所と短所を掘り下げている。この技術では動的でパーソナライズされた体験が可能だが、正当なプライバシー、バイアス、倫理的な問題もある。本稿は,ChatGPTがメタバースに与える影響と,これらの機会と障害を評価することで,より没入的で魅力的な仮想環境を効果的に構築する方法について,読者の理解を支援することを目的とする。

The incorporation of artificial intelligence (AI) technology, and in particular natural language processing (NLP), is becoming increasingly vital for the development of immersive and interactive metaverse experiences. One such artificial intelligence tool that is gaining traction in the metaverse is ChatGPT, a large language model trained by OpenAI. The article delves into the pros and cons of utilizing ChatGPT for metaverse-based education, entertainment, personalization, and support. Dynamic and personalized experiences are possible with this technology, but there are also legitimate privacy, bias, and ethical issues to consider. This article aims to help readers understand the possible influence of ChatGPT on the metaverse and how it may be used to effectively create a more immersive and engaging virtual environment by evaluating these opportunities and obstacles.

翻訳日:2023-03-27 15:17:23 公開日:2023-03-24

# 変形性モデル駆動型ニューラルレンダリングによる低視野環境下での頭部の高忠実度3次元再構成

Deformable Model Driven Neural Rendering for High-fidelity 3D Reconstruction of Human Heads Under Low-View Settings ( http://arxiv.org/abs/2303.13855v1 )

ライセンス: Link先を確認

Baixin Xu, Jiarui Zhang, Kwan-Yee Lin, Chen Qian and Ying He

(参考訳) 低視点入力から高忠実度幾何で3次元頭部を再構成する神経暗黙関数の頑健な学習法を提案する。我々は3次元人間の頭部を、スムーズなテンプレート、非剛性変形、高周波変位場からなる符号付き距離場のゼロレベルセットとして表現する。テンプレートは、変形ネットワークとともに複数の個人でトレーニングされるアイデンティティ非依存および表現ニュートラルの特徴を表す。変位場は、個人ごとに訓練されたアイデンティティ依存の幾何学的詳細を符号化する。我々は3Dの監督なしに粗大な戦略を用いてネットワークを2段階に訓練する。実験により, 幾何分解と2段階の訓練により, 提案手法は頑健であり, 低視点環境下での再現精度と新規ビュー合成の点で, 既存手法よりも優れることが示された。さらに、事前学習されたテンプレートは、私たちのモデルが目に見えない個人に適応するための適切な初期化に役立ちます。

We propose a robust method for learning neural implicit functions that can reconstruct 3D human heads with high-fidelity geometry from low-view inputs. We represent 3D human heads as the zero level-set of a composed signed distance field that consists of a smooth template, a non-rigid deformation, and a high-frequency displacement field. The template represents identity-independent and expression-neutral features, which is trained on multiple individuals, along with the deformation network. The displacement field encodes identity-dependent geometric details, trained for each specific individual. We train our network in two stages using a coarse-to-fine strategy without 3D supervision. Our experiments demonstrate that the geometry decomposition and two-stage training make our method robust and our model outperforms existing methods in terms of reconstruction accuracy and novel view synthesis under low-view settings. Additionally, the pre-trained template serves a good initialization for our model to adapt to unseen individuals.

翻訳日:2023-03-27 15:17:08 公開日:2023-03-24

# 2pcnet:昼夜無教師ドメイン適応オブジェクト検出のための2相一貫性トレーニング

2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection ( http://arxiv.org/abs/2303.13853v1 )

ライセンス: Link先を確認

Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan

(参考訳) 夜のオブジェクト検出は、夜の画像アノテーションがないため、難しい問題である。いくつかのドメイン適応手法にもかかわらず、高精度な結果を達成することは依然として問題である。偽陽性の誤り伝播は、確立された学生-教師フレームワーク、特に小規模で低照度なオブジェクトを用いた方法でもまだ観察されている。本稿では,これらの問題に対処するため,二相無教師付きドメイン適応ネットワークである2PCNetを提案する。このネットワークは、教師から第1フェーズにおける高信頼境界予測を採用し、第2フェーズにおける教師の再評価を学生の地域提案に付加することで、高信頼と低信頼の擬似ラベルの組み合わせをもたらす。夜間画像と擬似ラベルは、学生への入力として使用される前にスケールダウンされ、より強力な小型の擬似ラベルを提供する。画像中の低照度領域や他の夜間関連属性から発生するエラーに対処するため,NightAugと呼ばれる夜間特化パイプラインを提案する。このパイプラインは、日中の画像にグラア、ぼかし、ノイズなどのランダムな拡張を適用します。公開データセットを用いた実験により,本手法は20\%の精度で最先端の手法に優れた結果が得られること,および対象データに基づいて直接トレーニングされたモデルを監督できることが証明された。

Object detection at night is a challenging problem due to the absence of night image annotations. Despite several domain adaptation methods, achieving high-precision results remains an issue. False-positive error propagation is still observed in methods using the well-established student-teacher framework, particularly for small-scale and low-light objects. This paper proposes a two-phase consistency unsupervised domain adaptation network, 2PCNet, to address these issues. The network employs high-confidence bounding-box predictions from the teacher in the first phase and appends them to the student's region proposals for the teacher to re-evaluate in the second phase, resulting in a combination of high and low confidence pseudo-labels. The night images and pseudo-labels are scaled-down before being used as input to the student, providing stronger small-scale pseudo-labels. To address errors that arise from low-light regions and other night-related attributes in images, we propose a night-specific augmentation pipeline called NightAug. This pipeline involves applying random augmentations, such as glare, blur, and noise, to daytime images. Experiments on publicly available datasets demonstrate that our method achieves superior results to state-of-the-art methods by 20\%, and to supervised models trained directly on the target data.

翻訳日:2023-03-27 15:16:52 公開日:2023-03-24

# 弱教師付きシングルビュー画像のリライト

Weakly-supervised Single-view Image Relighting ( http://arxiv.org/abs/2303.13852v1 )

ライセンス: Link先を確認

Renjiao Yi, Chenyang Zhu, Kai Xu

(参考訳) 本稿では,ランベルトおよび低周波スペクトルの単一像をリライトする学習に基づくアプローチを提案する。本手法では,写真からオブジェクトを新しいシーンに挿入し,ar応用に不可欠な新しい環境照明下でリライトすることができる。オブジェクトをリライトするため、逆レンダリングと再レンダリングの両方を解決します。この逆レンダリングを解消するために,低ランク制約による弱教師付き手法を提案する。弱教師付きトレーニングを容易にするために,照度の変化を伴ってビデオの大規模(750Kイメージ)データセットであるRelitをコントリビュートする。再レンダリングのために、球面調和の様々な照明下で低周波非ランベルト材料をレンダリングする微分可能な特異なレンダリング層を提案する。パイプライン全体がエンドツーエンドで効率的で、ARオブジェクト挿入のモバイルアプリ実装を可能にする。大規模評価は,本手法が最先端性能を実現することを示す。プロジェクトページ: https://renjiaoyi.github.io/relighting/

We present a learning-based approach to relight a single image of Lambertian and low-frequency specular objects. Our method enables inserting objects from photographs into new scenes and relighting them under the new environment lighting, which is essential for AR applications. To relight the object, we solve both inverse rendering and re-rendering. To resolve the ill-posed inverse rendering, we propose a weakly-supervised method by a low-rank constraint. To facilitate the weakly-supervised training, we contribute Relit, a large-scale (750K images) dataset of videos with aligned objects under changing illuminations. For re-rendering, we propose a differentiable specular rendering layer to render low-frequency non-Lambertian materials under various illuminations of spherical harmonics. The whole pipeline is end-to-end and efficient, allowing for a mobile app implementation of AR object insertion. Extensive evaluations demonstrate that our method achieves state-of-the-art performance. Project page: https://renjiaoyi.github.io/relighting/.

翻訳日:2023-03-27 15:16:29 公開日:2023-03-24

# ニューラルネットワークにおける因果関係の学習--直接的な効果を超えて

Learning Causal Attributions in Neural Networks: Beyond Direct Effects ( http://arxiv.org/abs/2303.13850v1 )

ライセンス: Link先を確認

Abbaavaram Gowtham Reddy, Saketh Bachu, Harsharaj Pathak, Benin L Godfrey, Vineeth N. Balasubramanian, Varshaneya V, Satya Narayanan Kar

(参考訳) 近年,ニューラルネットワーク(NN)モデルにおける因果関係の捕捉と維持に対する関心が高まっている。本研究では,NNモデルの入出力属性を推定・維持するための因果的アプローチについて検討する。特に、この方向の既存の取り組みは(NNアーキテクチャにより)入力変数間の独立性を前提としており、直接的な因果効果のみを研究する。 NNを構造因果モデル(Structuor causal model, SCM)と見なして、直接効果を超えて入力特徴間のエッジを導入し、NNモデルをトレーニングしながら直接的かつ間接的因果効果を捕捉し、維持するためのシンプルで効果的な方法論を提供する。また,高次元データにおける因果帰属を定量化する効果的な近似戦略を提案する。合成および実世界のデータセットに関する幅広い実験により、提案手法は、基底真理効果に近い直接的および間接的因果効果の因果属性を学習することを示す。

There has been a growing interest in capturing and maintaining causal relationships in Neural Network (NN) models in recent years. We study causal approaches to estimate and maintain input-output attributions in NN models in this work. In particular, existing efforts in this direction assume independence among input variables (by virtue of the NN architecture), and hence study only direct causal effects. Viewing an NN as a structural causal model (SCM), we instead focus on going beyond direct effects, introduce edges among input features, and provide a simple yet effective methodology to capture and maintain direct and indirect causal effects while training an NN model. We also propose effective approximation strategies to quantify causal attributions in high dimensional data. Our wide range of experiments on synthetic and real-world datasets show that the proposed ante-hoc method learns causal attributions for both direct and indirect causal effects close to the ground truth effects.

翻訳日:2023-03-27 15:16:11 公開日:2023-03-24

# 対人ロバスト性の特徴分離と再検討

Feature Separation and Recalibration for Adversarial Robustness ( http://arxiv.org/abs/2303.13846v1 )

ライセンス: Link先を確認

Woo Jae Kim, Yoonki Cho, Junsik Jung, Sung-Eui Yoon

(参考訳) 深いニューラルネットワークは、特徴レベルの摂動の蓄積による敵対的攻撃の影響を受けやすく、多くの研究がモデル誤予測を引き起こす非破壊的特徴アクティベーションを非活性化することによってモデルの堅牢性を高めている。しかし、これらの悪意あるアクティベーションは依然として識別的手がかりを含んでおり、再校正によってモデルの正しい予測のために追加の有用な情報を捉えることができると主張している。そこで本研究では,より堅牢な特徴マップに対して,悪意のある非ロバストアクティベーションを分離と再調整によって再結合する機能分離再調整(fsr)という新しい手法を提案する。分離部は、入力特徴マップを、モデルが正しい予測を行うのに役立つアクティベーション付きロバスト特徴と、敵の攻撃時にモデル予測の誤りの原因となるアクティベーションとで区別する。 Recalibration部は、モデル予測のための潜在的に有用なキューを復元するために、非ロバストなアクティベーションを調整する。大規模な実験は、従来の非活性化技術と比較してFSRの優位性を検証し、計算オーバーヘッドを小さくして8.57%まで向上することを示した。コードはhttps://github.com/wkim97/fsrで入手できる。

Deep neural networks are susceptible to adversarial attacks due to the accumulation of perturbations in the feature level, and numerous works have boosted model robustness by deactivating the non-robust feature activations that cause model mispredictions. However, we claim that these malicious activations still contain discriminative cues and that with recalibration, they can capture additional useful information for correct model predictions. To this end, we propose a novel, easy-to-plugin approach named Feature Separation and Recalibration (FSR) that recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration. The Separation part disentangles the input feature map into the robust feature with activations that help the model make correct predictions and the non-robust feature with activations that are responsible for model mispredictions upon adversarial attack. The Recalibration part then adjusts the non-robust activations to restore the potentially useful cues for model predictions. Extensive experiments verify the superiority of FSR compared to traditional deactivation techniques and demonstrate that it improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead. Codes are available at https://github.com/wkim97/FSR.

翻訳日:2023-03-27 15:15:53 公開日:2023-03-24

# 過去を思い出す:アナログプロンプトによるインクリメンタル学習

Remind of the Past: Incremental Learning with Analogical Prompts ( http://arxiv.org/abs/2303.13898v1 )

ライセンス: Link先を確認

Zhiheng Ma, Xiaopeng Hong, Beinan Liu, Yabin Wang, Pinyue Guo, Huiyun Li

(参考訳) データフリーの漸進的学習法はメモリフレンドリだが、過去のデータがない場合には、正確に推定と対応が難しい。本稿では,人間のアナロジー能力に触発された新しいインクリメンタル学習手法を提案する。具体的には、即時チューニングにより新しいデータを古いクラスに再マップするアナロジー作成機構を設計する。これは、新しいクラスのサンプルのみを使用して、古いモデルのターゲットの古いクラスのフィーチャ分布を模倣する。学習プロンプトは、歴史的プロトタイプの微調整による表現シフトを推定し、対処するためにさらに使用される。提案手法は,4つのインクリメンタルラーニングベンチマークに対して,クラスとドメインのインクリメンタルラーニング設定の下で,新しい最先端性能を設定する。クラスごとに機能プロトタイプを保存するだけで、データ再生メソッドを一貫して上回る。 Core50ベンチマークのジョイントトレーニングにより、実証上界にほぼ到達した。コードは \url{https://github.com/ZhihengCV/A-Prompts} でリリースされる。

Although data-free incremental learning methods are memory-friendly, accurately estimating and counteracting representation shifts is challenging in the absence of historical data. This paper addresses this thorny problem by proposing a novel incremental learning method inspired by human analogy capabilities. Specifically, we design an analogy-making mechanism to remap the new data into the old class by prompt tuning. It mimics the feature distribution of the target old class on the old model using only samples of new classes. The learnt prompts are further used to estimate and counteract the representation shift caused by fine-tuning for the historical prototypes. The proposed method sets up new state-of-the-art performance on four incremental learning benchmarks under both the class and domain incremental learning settings. It consistently outperforms data-replay methods by only saving feature prototypes for each class. It has almost hit the empirical upper bound by joint training on the Core50 benchmark. The code will be released at \url{https://github.com/ZhihengCV/A-Prompts}.

翻訳日:2023-03-27 15:09:01 公開日:2023-03-24

# 画像認識のための多項式ネットワークの規則化

Regularization of polynomial networks for image recognition ( http://arxiv.org/abs/2303.13896v1 )

ライセンス: Link先を確認

Grigorios G Chrysos, Bohan Wang, Jiankang Deng, Volkan Cevher

(参考訳) ディープニューラルネットワーク(Deep Neural Networks, DNN)は、タスク全体にわたって優れたパフォーマンスを得ているが、ブラックボックスとして残っている。同時に、PN(Polynomial Networks)は、有望な性能と解釈性を改善した代替手法として登場したが、強力なDNNベースラインのパフォーマンスには達していない。この作業では、パフォーマンスギャップを埋めることを目指しています。 6つのベンチマークでResNetのパフォーマンスに到達できるPNのクラスを紹介します。強正則化が重要であることを実証し,性能に適合するために必要な完全正則化スキームを広範囲に検討した。正規化スキームをさらに進めるために,従来提案されていた多項式ネットワークよりも高次展開を実現するD-PolyNetを導入する。 D-PolyNetはパラメータ効率が良く、他の多項式ネットワークと同じような性能を実現する。我々の新しいモデルは、要素的活性化関数(PNの訓練にはもはや必要ない)の役割の理解につながると期待している。ソースコードはhttps://github.com/grigorisg9gr/regularized_polynomialsで入手できる。

Deep Neural Networks (DNNs) have obtained impressive performance across tasks, however they still remain as black boxes, e.g., hard to theoretically analyze. At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close this performance gap. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks. We demonstrate that strong regularization is critical and conduct an extensive study of the exact regularization schemes required to match performance. To further motivate the regularization schemes, we introduce D-PolyNets that achieve a higher-degree of expansion than previously proposed polynomial networks. D-PolyNets are more parameter-efficient while achieving a similar performance as other polynomial networks. We expect that our new models can lead to an understanding of the role of elementwise activation functions (which are no longer required for training PNs). The source code is available at https://github.com/grigorisg9gr/regularized_polynomials.

翻訳日:2023-03-27 15:08:46 公開日:2023-03-24

# 結合スピン系におけるハイゼンベルク制限スピンスクイージング

Heisenberg-limited spin squeezing in coupled spin systems ( http://arxiv.org/abs/2303.13889v1 )

ライセンス: Link先を確認

Long-Gang Huang, Xuanchen Zhang, Yanzhen Wang, Zhenxing Hua, Yuanjiang Tang and Yong-Chun Liu

(参考訳) スピンスクイージングは量子力学と量子情報科学において重要な役割を果たす。その生成は、さらなる応用の前提条件であるが、既存の物理系が要求されるスクイーズ相互作用をほとんど含まないため、依然として大きな課題に直面している。本稿では, スピンスピン相互作用を持つ結合スピンモデルにおいてスピンスクイージングを生成するための普遍的なスキームを提案する。我々の手法は、結合したスピン相互作用をスクイーズ相互作用に変換し、ハイゼンベルクで制限された測定精度を1/N$$$N$の粒子で極端にスクイーズする。一定かつ連続的な駆動場のみが必要であり、これが現在の現実的な実験の連続に利用できる。この研究は、ハイゼンベルク制限されたスピンスクイージングを生成できる様々なシステムを強化し、量子精密測定に広く応用されている。

Spin squeezing plays a crucial role in quantum metrology and quantum information science. Its generation is the prerequisite for further applications but still faces an enormous challenge since the existing physical systems rarely contain the required squeezing interactions. Here we propose a universal scheme to generate spin squeezing in coupled spin models with collective spin-spin interactions, which commonly exist in various systems. Our scheme can transform the coupled spin interactions into squeezing interactions, and reach the extreme squeezing with Heisenberg-limited measurement precision scaling as $1/N$ for $N$ particles. Only constant and continuous driving fields are required, which is accessible to a series of current realistic experiments. This work greatly enriches the variety of systems that can generate the Heisenberg-limited spin squeezing, with broad applications in quantum precision measurement.

翻訳日:2023-03-27 15:08:29 公開日:2023-03-24

# 手作りカーネルによる効果的なブラックボックス対向攻撃

Effective black box adversarial attack with handcrafted kernels ( http://arxiv.org/abs/2303.13887v1 )

ライセンス: Link先を確認

Petr Dvo\v{r}\'a\v{c}ek, Petr Hurtik, Petra \v{S}tevuli\'akov\'a

(参考訳) ブラックボックス攻撃の敵例を作成するための,新しいシンプルなフレームワークを提案する。そのアイデアは、手作りの畳み込みカーネルのただ1つの層からなる訓練不能なモデルで置換モデルをシミュレートし、生成ニューラルネットワークを訓練して、原画像と生成した逆画像の出力距離を最大化する。本研究では,第1層の予測を騙すことで,ネットワーク全体が騙され,逆入力の精度が低下することを示す。さらに、第1の畳み込み層カーネルを得るためのニューラルネットワークのトレーニングは行わないが、f変換技術を用いてニューラルネットワークを作成する。したがって,本手法は非常に時間と資源効率が高い。

We propose a new, simple framework for crafting adversarial examples for black box attacks. The idea is to simulate the substitution model with a non-trainable model compounded of just one layer of handcrafted convolutional kernels and then train the generator neural network to maximize the distance of the outputs for the original and generated adversarial image. We show that fooling the prediction of the first layer causes the whole network to be fooled and decreases its accuracy on adversarial inputs. Moreover, we do not train the neural network to obtain the first convolutional layer kernels, but we create them using the technique of F-transform. Therefore, our method is very time and resource effective.

翻訳日:2023-03-27 15:08:15 公開日:2023-03-24

# ARKitTrack: モバイルRGB-Dデータによるトラッキングのための新しい横データセット

ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data ( http://arxiv.org/abs/2303.13885v1 )

ライセンス: Link先を確認

Haojie Zhao and Junsong Chen and Lijun Wang and Huchuan Lu

(参考訳) 従来のRGBのみのビジュアルトラッキングと比較して、RGB-Dトラッキング用に構築されたデータセットはほとんどない。本稿では、appleのiphoneおよびipadに搭載された消費者級lidarスキャナーを用いて、静的および動的シーンをキャプチャする新しいrgb-dトラッキングデータセットであるarkittrackを提案する。 ARKitTrackには300のRGB-Dシーケンス、455のターゲット、229.7Kのビデオフレームが含まれている。境界ボックスアノテーションとフレームレベルの属性とともに、このデータセットに123.9Kピクセルレベルのターゲットマスクをアノテートする。また、将来の展開のために、各フレームのカメラ固有およびカメラポーズが設けられる。このデータセットの有用性を示すため,RGB機能と鳥眼視表示を統合したボックスレベルとピクセルレベルのトラッキングの統一ベースラインを新たに提案し,モジュラリティ3次元形状の探索を行う。詳細な実験分析により,ARKitTrackデータセットがRGB-D追跡を著しく促進し,提案手法が芸術的状況と良好に比較できることが確認された。コードとデータセットはhttps://arkittrack.github.ioで入手できる。

Compared with traditional RGB-only visual tracking, few datasets have been constructed for RGB-D tracking. In this paper, we propose ARKitTrack, a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total. Along with the bounding box annotations and frame-level attributes, we also annotate this dataset with 123.9K pixel-level target masks. Besides, the camera intrinsic and camera pose of each frame are provided for future developments. To demonstrate the potential usefulness of this dataset, we further present a unified baseline for both box-level and pixel-level tracking, which integrates RGB features with bird's-eye-view representations to better explore cross-modality 3D geometry. In-depth empirical analysis has verified that the ARKitTrack dataset can significantly facilitate RGB-D tracking and that the proposed baseline method compares favorably against the state of the arts. The code and dataset is available at https://arkittrack.github.io.

翻訳日:2023-03-27 15:08:04 公開日:2023-03-24

# グラフ表現と変化点検出を用いたシンボリック音楽構造解析

Symbolic Music Structure Analysis with Graph Representations and Changepoint Detection Methods ( http://arxiv.org/abs/2303.13881v1 )

ライセンス: Link先を確認

Carlos Hernandez-Olivan, Sonia Rubio Llamas, Jose R. Beltran

(参考訳) 音楽構造分析は音楽情報検索(MIR)におけるオープンな研究課題である。過去には、楽曲を音響領域と記号領域に分割しようとする作品がいくつかあるが、音楽構造の異なるレベルでの識別と分節化は、まだこの分野では未解決の課題である。本研究は,3つの手法を提案する。そのうちの2つは,その形式や構造によって記号的音楽を分割することを目的とした,新しいグラフベースのアルゴリズムである。本研究では,異なる形態や構造を持つ2つの公開データセットを用いたアブレーション実験を行い,それらのパラメータ値の異なる手法を比較し,異なる音楽スタイルと比較した。グラフ表現によるシンボリック音楽を符号化し,グラフから得られる隣接行列の新規性を計算することで,特徴を抽出せずにシンボリック楽曲の構造を表現できることがわかった。オンラインの教師なしのchangepoint検出メソッドで境界を検出でき、この方法のテストに使用した公開データセットの1つで、1バーの許容性に対して、f_1が 0.5640 である。また,提案手法のパラメータが,そのレベルに応じてどのように調整されるかを示すため,構造,高,中,低の異なるレベルでのアルゴリズムの性能評価結果も提供する。本研究の再現性とユーザビリティを高めるため,各構造レベルのパラメータを,オープンソースのpythonパッケージである musicaiz に追加した。この手法が、構造、音楽分類、キーチェンジ検出などの他のMIRタスクを改善するために使用できることを願っている。

Music Structure Analysis is an open research task in Music Information Retrieval (MIR). In the past, there have been several works that attempt to segment music into the audio and symbolic domains, however, the identification and segmentation of the music structure at different levels is still an open research problem in this area. In this work we propose three methods, two of which are novel graph-based algorithms that aim to segment symbolic music by its form or structure: Norm, G-PELT and G-Window. We performed an ablation study with two public datasets that have different forms or structures in order to compare such methods varying their parameter values and comparing the performance against different music styles. We have found that encoding symbolic music with graph representations and computing the novelty of Adjacency Matrices obtained from graphs represent the structure of symbolic music pieces well without the need to extract features from it. We are able to detect the boundaries with an online unsupervised changepoint detection method with a F_1 of 0.5640 for a 1 bar tolerance in one of the public datasets that we used for testing our methods. We also provide the performance results of the algorithms at different levels of structure, high, medium and low, to show how the parameters of the proposed methods have to be adjusted depending on the level. We added the best performing method with its parameters for each structure level to musicaiz, an open source python package, to facilitate the reproducibility and usability of this work. We hope that this methods could be used to improve other MIR tasks such as music generation with structure, music classification or key changes detection.

翻訳日:2023-03-27 15:07:43 公開日:2023-03-24

# モーメント検索とハイライト検出のためのクエリ依存ビデオ表現

Query-Dependent Video Representation for Moment Retrieval and Highlight Detection ( http://arxiv.org/abs/2303.13874v1 )

ライセンス: Link先を確認

WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, Jae-Pil Heo

(参考訳) 近年,映像理解の需要が大幅に増加し,映像モーメント検索とハイライト検出(MR/HD)が注目されている。 MR/HDの主な目的は、与えられたテキストクエリに対して、モーメントをローカライズし、クリップワイドのレベルを推定することである。最近のトランスフォーマーベースモデルにはいくつかの進歩があったが、これらの手法が与えられたクエリの情報を完全に活用していないことがわかった。例えば、テキストクエリとビデオコンテンツの関連性は、モーメントとそのサルジェンシーを予測する際に無視されることがある。本稿では,MR/HDに適した検出変換器であるQuery-Dependent DETR(QD-DETR)を紹介する。トランスフォーマーアーキテクチャにおいて、与えられたクエリの重要でない役割を観察するため、エンコーディングモジュールは、テキストクエリのコンテキストをビデオ表現に明示的に注入するために、クロスアテンション層から始まります。そして,クエリ情報を活用するモデルの性能を高めるために,ビデオクエリペアを操作して無関係なペアを生成する。このような負の(無関係な)ビデオクエリペアは、低いサリエンシースコアを得るために訓練され、その結果、クエリとビデオのペア間の正確な一致をモデルが推定することを奨励する。最後に,与えられたビデオクエリ対に対するサリエンシースコアの基準を適応的に定義する入力適応サリエンシー予測器を提案する。本研究は,mr/hdにおけるクエリ依存表現の構築の重要性を検証する。具体的には、QD-DETRはQVHighlights、TVSum、Charades-STAデータセットで最先端の手法より優れている。コードはgithub.com/wjun0830/QD-DETRで入手できる。

Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative (irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets. Codes are available at github.com/wjun0830/QD-DETR.

翻訳日:2023-03-27 15:07:18 公開日:2023-03-24

# Fantasia3D:高品質なテキストから3Dコンテンツ作成のための幾何学と外観

Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation ( http://arxiv.org/abs/2303.13873v1 )

ライセンス: Link先を確認

Rui Chen, Yongwei Chen, Ningxin Jiao, Kui Jia

(参考訳) 3Dコンテンツの自動作成は、事前訓練された大規模言語モデルと画像拡散モデルが利用可能であることから、近年急速に進歩している。既存のtext-to-3dメソッドでは、ボリュームレンダリングによる幾何学と外観を結合した暗黙的なシーン表現が一般的であり、より細かいジオメトリの復元とフォトリアリスティックなレンダリングの面では最適ではない。本稿では,高品質テキストから3dコンテンツ作成のためのfantasia3dの新しい手法を提案する。 fantasia3dの鍵は、幾何学と外観の疎結合なモデリングと学習である。幾何学学習では,ハイブリッドなシーン表現に依拠し,画像拡散モデルの入力として表現から抽出した面正規化を符号化する。本研究では,空間的に変化する双方向反射率分布関数 (brdf) をtext-to-3dタスクに導入し, 生成面の光リアリスティックレンダリングのための表面材料を学習する。当社のdisentangledフレームワークは、一般的なグラフィックエンジンと互換性があり、生成された3dアセットのリライト、編集、物理シミュレーションをサポートしています。異なるテキストから3dのタスク設定下で既存の方法よりも優れた方法を示す徹底的な実験を行う。プロジェクトページとソースコード: https://fantasia3d.github.io/

Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: https://fantasia3d.github.io/.

翻訳日:2023-03-27 15:06:51 公開日:2023-03-24

# キャビティ設計を用いた通信周波数におけるオンデマンド非識別・絡み合い光子

On-demand indistinguishable and entangled photons at telecom frequencies using tailored cavity designs ( http://arxiv.org/abs/2303.13871v1 )

ライセンス: Link先を確認

David Bauch, Dustin Siebert, Klaus D. J\"ons, Jens F\"orstner and Stefan Schumacher

(参考訳) 量子ドット系でよく用いられるバイエキシトン・エキシトン放出カスケードは、偏光エンタングルメントを生成するために、本質的に区別不可能な光子を生成する。本研究は, 偏光絡み合いの度合いが高く, 同時に不明瞭度が高い光子の対を生成することに焦点を当てる。光共振器を用いたバイエクシトン寿命を選択的に低減することで、この目標を達成する。広帯域光子抽出と2重縮退光モードを併用したバイエクシトンエミッションの十分なパーセル向上の要求を満たすように調整した円形ブラッグ反射器を試作した。我々の詳細な理論研究が組み合わさる (i)モデルパラメータを入力として抽出したマクスウェル方程式を解いた現実的なフォトニック構造の最適化 (ii)光子特性に完全にアクセスできる量子ドットキャビティ励起ダイナミクスの微視的シミュレーション我々は,システムパラメータに対する非自明な依存性を報告し,1550\,\mathrm{nm}$ における通信用cバンドの非識別性と近接ユニティ値への絡み合いを最大化するパーセル強化の最適範囲を決定するために,複合理論手法の予測力を利用する。

The biexciton-exciton emission cascade commonly used in quantum-dot systems to generate polarization entanglement yields photons with intrinsically limited indistinguishability. In the present work we focus on the generation of pairs of photons with high degrees of polarization entanglement and simultaneously high indistinguishibility. We achieve this goal by selectively reducing the biexciton lifetime with an optical resonator. We demonstrate that a suitably tailored circular Bragg reflector fulfills the requirements of sufficient selective Purcell enhancement of biexciton emission paired with spectrally broad photon extraction and two-fold degenerate optical modes. Our in-depth theoretical study combines (i) the optimization of realistic photonic structures solving Maxwell's equations from which model parameters are extracted as input for (ii) microscopic simulations of quantum-dot cavity excitation dynamics with full access to photon properties. We report non-trivial dependencies on system parameters and use the predictive power of our combined theoretical approach to determine the optimal range of Purcell enhancement that maximizes indistinguishability and entanglement to near unity values in the telecom C-band at $1550\,\mathrm{nm}$.

翻訳日:2023-03-27 15:06:25 公開日:2023-03-24

# 学習可能な形状と位置を持つ物理的に対立する赤外線パッチ

Physically Adversarial Infrared Patches with Learnable Shapes and Locations ( http://arxiv.org/abs/2303.13868v1 )

ライセンス: Link先を確認

Wei Xingxing and Yu Jie and Huang Yao

(参考訳) 安全クリティカルなタスクにおける赤外線物体検出器の広範な応用のために,実世界の敵対的事例に対するロバスト性を評価する必要がある。しかし、デジタル世界から物理世界への複雑な変換のため、現実的な応用では、現在の数少ない物理的赤外線攻撃は複雑である。この問題に対処するため,本論文では"adversarial infrared patch"と呼ばれる物理的に実現可能な赤外線攻撃手法を提案する。対象物の熱放射を捉えた赤外線カメラの撮像機構を考慮すると、対向的赤外線パッチはその熱分布を操作するために対象物に熱絶縁材料のパッチを取り付けて攻撃を行う。敵の攻撃を強化するため,対象物体上のパッチの形状と位置の同時学習を誘導する新たなアグリゲーション正規化を提案する。したがって、簡単な勾配に基づく最適化を適用することができる。様々な物体検出装置を用いて、異なる物体検出タスクにおける逆赤外線パッチを検証する。実験の結果, 物体を異なる角度, 距離, 姿勢, シーンで捉えた物理環境において, 歩行者検知器と車両検知器に対して, 攻撃成功率 (asr) が90 %以上に達することがわかった。より重要なことに、敵対的な赤外線パッチは実装が容易であり、物理的な世界で構築するのに0.5時間しかかからず、その効果と効率を検証する。

Owing to the extensive application of infrared object detectors in the safety-critical tasks, it is necessary to evaluate their robustness against adversarial examples in the real world. However, current few physical infrared attacks are complicated to implement in practical application because of their complex transformation from digital world to physical world. To address this issue, in this paper, we propose a physically feasible infrared attack method called "adversarial infrared patches". Considering the imaging mechanism of infrared cameras by capturing objects' thermal radiation, adversarial infrared patches conduct attacks by attaching a patch of thermal insulation materials on the target object to manipulate its thermal distribution. To enhance adversarial attacks, we present a novel aggregation regularization to guide the simultaneous learning for the patch' shape and location on the target object. Thus, a simple gradient-based optimization can be adapted to solve for them. We verify adversarial infrared patches in different object detection tasks with various object detectors. Experimental results show that our method achieves more than 90\% Attack Success Rate (ASR) versus the pedestrian detector and vehicle detector in the physical environment, where the objects are captured in different angles, distances, postures, and scenes. More importantly, adversarial infrared patch is easy to implement, and it only needs 0.5 hours to be constructed in the physical world, which verifies its effectiveness and efficiency.

翻訳日:2023-03-27 15:06:06 公開日:2023-03-24

# C2Cサービスにおける信頼管理アルゴリズムの適用性

Applicability of Trust Management Algorithm in C2C services ( http://arxiv.org/abs/2303.13919v1 )

ライセンス: Link先を確認

Ryohei Suzuki, Iifan Tyou, Shigenori Ohashi, Kazutoshi Sasahara

(参考訳) C2C(Consumer-to-Consumer)プラットフォームの出現により、消費者は商品を直接購入および販売できるようになったが、商品詐欺や偽レビューといった問題も生じている。信頼管理アルゴリズム(TMA)は不正ユーザを検出する対策として期待されている。しかし、ネットワーク上のデバイス間のピアツーピア(p2p)通信用に設計されたtmasが報告されるほど効果的かどうかは不明である。本稿では,エージェントベースモデルを用いたC2Cサービスにおける代表的TMAであるEigenTrustの適用性を検討する。まず、C2Cサービスでトランザクションプロセスを定義し、6種類の不正取引を仮定し、シミュレーションによりC2CシステムにおけるEigenTrustのダイナミクスを分析した。 EigenTrustは2種類の単純な詐欺の信頼度を正確に推定できることがわかった。さらに,2種類の先進的詐欺に対する信頼スコアの揺らぎが,これまでの研究では問題にならなかった。これは、そのような振動を検出することで、EigenTrustがいくつかの(すべてではないが)高度な詐欺を検出できることを示している。本研究は,C2Cサービスにおける取引の信頼性向上に寄与し,消費者サービスのさらなる技術開発に関する洞察を提供する。

The emergence of Consumer-to-Consumer (C2C) platforms has allowed consumers to buy and sell goods directly, but it has also created problems, such as commodity fraud and fake reviews. Trust Management Algorithms (TMAs) are expected to be a countermeasure to detect fraudulent users. However, it is unknown whether TMAs are as effective as reported as they are designed for Peer-to-Peer (P2P) communications between devices on a network. Here we examine the applicability of `EigenTrust', a representative TMA, for the use case of C2C services using an agent-based model. First, we defined the transaction process in C2C services, assumed six types of fraudulent transactions, and then analysed the dynamics of EigenTrust in C2C systems through simulations. We found that EigenTrust could correctly estimate low trust scores for two types of simple frauds. Furthermore, we found the oscillation of trust scores for two types of advanced frauds, which previous research did not address. This suggests that by detecting such oscillations, EigenTrust may be able to detect some (but not all) advanced frauds. Our study helps increase the trustworthiness of transactions in C2C services and provides insights into further technological development for consumer services.

翻訳日:2023-03-27 15:00:19 公開日:2023-03-24

# 胎児超音波画像からの複合情報除去

Removing confounding information from fetal ultrasound images ( http://arxiv.org/abs/2303.13918v1 )

ライセンス: Link先を確認

Kamil Mikolaj, Manxi Lin, Zahra Bashir, Morten Bo S{\o}ndergaard Svendsen, Martin Tolsgaard, Anders Nymark and Aasa Feragen

(参考訳) 医療画像に埋め込まれたテキストやマーキングの形で情報を結合することは、診断深層学習アルゴリズムのトレーニングに深刻な影響を与える可能性がある。しかし、臨床目的で収集されたデータは、しばしばそのようなマークが埋め込まれている。皮膚科では、悪性病変の画像で過剰に表現される図面や定規が知られている。本稿では,胎児検診用超音波スキャンを含む国立データベースに掲載されている画像にテキストと校正器を配置し,標準平面と相関して予測を行う。これらのデータベースで利用可能な膨大なデータを活用するために,標準平面分類をテストケースとして,超音波による深層学習アルゴリズムにおける埋め込みテキストと校正アルゴリズムの結合効果を最小化する手法を開発・検証した。

Confounding information in the form of text or markings embedded in medical images can severely affect the training of diagnostic deep learning algorithms. However, data collected for clinical purposes often have such markings embedded in them. In dermatology, known examples include drawings or rulers that are overrepresented in images of malignant lesions. In this paper, we encounter text and calipers placed on the images found in national databases containing fetal screening ultrasound scans, which correlate with standard planes to be predicted. In order to utilize the vast amounts of data available in these databases, we develop and validate a series of methods for minimizing the confounding effects of embedded text and calipers on deep learning algorithms designed for ultrasound, using standard plane classification as a test case.

翻訳日:2023-03-27 14:59:59 公開日:2023-03-24

# 重力波データストリームにおけるグリッチの分類のための畳み込みニューラルネットワーク

Convolutional Neural Networks for the classification of glitches in gravitational-wave data streams ( http://arxiv.org/abs/2303.13917v1 )

ライセンス: Link先を確認

Tiago S. Fernandes and Samuel J. Vieira and Antonio Onofre and Juan Calder\'on Bustillo and Alejandro Torres-Forn\'e and Jos\'e A. Font

(参考訳) 本稿では,最新のConvNeXtネットワークファミリを含む畳み込みニューラルネットワークを用いて,高度LIGO検出器のデータ中の過渡的雑音信号(グリッチ)と重力波を分類する。まず、Gravity Spyデータセットを使用してゼロからトレーニングされたモデルと、このデータセットでトレーニング済みモデルを微調整して移行学習するモデルを使用する。第2に、自動生成擬似ラベルを用いた事前学習モデルの自己教師型アプローチについても検討する。我々の結果は、同じデータセットの既存の結果に非常に近いものであり、最高の教師付き(自己監督)モデルに対して、F1スコアの97.18%(94.15%)に達した。さらに、LIGO-VirgoのO3ランの実際の重力波信号を用いてモデルをテストする。以前の実行(o1とo2)のデータを使用してトレーニングされるが、特に転送学習を使用する場合、モデルのパフォーマンスは良好である。 Gravity Spyデータセットにあるハードウェアインジェクションの50チャープ未満の例とは別に、実際の信号のトレーニングを必要とせずに、転送学習がスコアを改善することがわかった。これにより、エラー分類だけでなく、信号分類にもトランスファー学習が用いられるようになった。

We investigate the use of Convolutional Neural Networks (including the modern ConvNeXt network family) to classify transient noise signals (i.e.~glitches) and gravitational waves in data from the Advanced LIGO detectors. First, we use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset and employing transfer learning by fine-tuning pre-trained models in this dataset. Second, we also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels. Our findings are very close to existing results for the same dataset, reaching values for the F1 score of 97.18% (94.15%) for the best supervised (self-supervised) model. We further test the models using actual gravitational-wave signals from LIGO-Virgo's O3 run. Although trained using data from previous runs (O1 and O2), the models show good performance, in particular when using transfer learning. We find that transfer learning improves the scores without the need for any training on real signals apart from the less than 50 chirp examples from hardware injections present in the Gravity Spy dataset. This motivates the use of transfer learning not only for glitch classification but also for signal classification.

翻訳日:2023-03-27 14:59:47 公開日:2023-03-24

# 参照誘導動的パラメータ選択による自己監督逆画像信号処理

Self-Supervised Reversed Image Signal Processing via Reference-Guided Dynamic Parameter Selection ( http://arxiv.org/abs/2303.13916v1 )

ライセンス: Link先を確認

Junji Otsuka, Masakazu Yoshimura, Takeshi Ohashi

(参考訳) 非処理センサ出力(RAW画像)は、低レベルと高レベルの両方のコンピュータビジョンアルゴリズムを改善する可能性があるが、大規模RAW画像データセットの欠如は研究の障壁である。そこで,既存のRGB画像をRAWに変換する逆画像信号処理(ISP)について検討した。しかし、既存のほとんどの方法は変換をモデル化するためにカメラ固有のメタデータやRGBとRAWのペア画像を必要とする。さらに、多様なISPの扱いや、世界的な照明の回復に問題がある。これらの制約に対処するために,メタデータとペア画像を必要としない自己教師付き逆ISP方式を提案する。提案手法は,RGB画像を参照RAW画像に基づいて逆ISPパイプラインのパラメータを動的に選択することにより,参照RAW画像と同じ環境下で撮像されたRAWライクな画像に変換する。パラメータ選択は、未ペアRGBおよびRAW画像から生成された擬似ペアデータを介して訓練される。提案手法は,他の最先端教師付き手法と同等の精度で様々な逆ISPを学習し,未知のRGB画像をCOCOやFlickr1Mから変換し,RAWライクな画像を画素分布でより正確にターゲットできることを示す。また、生成したRAW画像が実際のRAW画像オブジェクト検出タスクの性能を向上させることを示す。

Unprocessed sensor outputs (RAW images) potentially improve both low-level and high-level computer vision algorithms, but the lack of large-scale RAW image datasets is a barrier to research. Thus, reversed Image Signal Processing (ISP) which converts existing RGB images into RAW images has been studied. However, most existing methods require camera-specific metadata or paired RGB and RAW images to model the conversion, and they are not always available. In addition, there are issues in handling diverse ISPs and recovering global illumination. To tackle these limitations, we propose a self-supervised reversed ISP method that does not require metadata and paired images. The proposed method converts a RGB image into a RAW-like image taken in the same environment with the same sensor as a reference RAW image by dynamically selecting parameters of the reversed ISP pipeline based on the reference RAW image. The parameter selection is trained via pseudo paired data created from unpaired RGB and RAW images. We show that the proposed method is able to learn various reversed ISPs with comparable accuracy to other state-of-the-art supervised methods and convert unknown RGB images from COCO and Flickr1M to target RAW-like images more accurately in terms of pixel distribution. We also demonstrate that our generated RAW images improve performance on real RAW image object detection task.

翻訳日:2023-03-27 14:59:26 公開日:2023-03-24

# 12レベル心電図の深層学習に基づく心房細動分類におけるノイズの影響のベンチマーク

Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG ( http://arxiv.org/abs/2303.13915v1 )

ライセンス: Link先を確認

Theresa Bender, Philip Gemke, Ennio Idrobo-Avila, Henning Dathe, Dagmar Krefting, Nicolai Spicher

(参考訳) 心電図解析は様々な臨床応用で広く使われており、分類タスクのディープラーニングモデルが現在研究の焦点となっている。データ駆動特性のため、信号ノイズを効率的に処理する可能性を秘めているが、これらの手法の精度への影響はいまだ不明である。そこで本研究では,12誘導心電図における心房細動検出のためのDeep Learning-based methodの精度に対する4種類のノイズの影響をベンチマークした。我々は、公開データセット(PTBXL)のサブセットを使用し、ノイズに関する人間の専門家が提供するメタデータを使用して、各心電図に信号品質を割り当てる。さらに,心電図毎に定量的信号対雑音比を算出する。両指標について深層学習モデルの精度を解析し,ヒトの専門家が複数の手がかりにうるさい信号とラベル付けした場合でも,心房細動を確実に識別できることを観察する。偽陽性率と偽陰性率は、データにノイズとラベル付けされる場合、やや悪化する。興味深いことに、ベースラインドリフトノイズを示すようにアノテートされたデータは、不要なデータと非常によく似た精度をもたらす。ノイズの多い心電図データの処理は,従来の方法のように事前処理を必要としない深層学習法で実現可能であると結論付けた。

Electrocardiography analysis is widely used in various clinical applications and Deep Learning models for classification tasks are currently in the focus of research. Due to their data-driven character, they bear the potential to handle signal noise efficiently, but its influence on the accuracy of these methods is still unclear. Therefore, we benchmark the influence of four types of noise on the accuracy of a Deep Learning-based method for atrial fibrillation detection in 12-lead electrocardiograms. We use a subset of a publicly available dataset (PTBXL) and use the metadata provided by human experts regarding noise for assigning a signal quality to each electrocardiogram. Furthermore, we compute a quantitative signal-to-noise ratio for each electrocardiogram. We analyze the accuracy of the Deep Learning model with respect to both metrics and observe that the method can robustly identify atrial fibrillation, even in cases signals are labelled by human experts as being noisy on multiple leads. False positive and false negative rates are slightly worse for data being labelled as noisy. Interestingly, data annotated as showing baseline drift noise results in an accuracy very similar to data without. We conclude that the issue of processing noisy electrocardiography data can be addressed successfully by Deep Learning methods that might not need preprocessing as many conventional methods do.

翻訳日:2023-03-27 14:59:02 公開日:2023-03-24

# GarmentTracking:カテゴリーレベルのガーメントポッドトラッキング

GarmentTracking: Category-Level Garment Pose Tracking ( http://arxiv.org/abs/2303.13913v1 )

ライセンス: Link先を確認

Han Xue, Wenqiang Xu, Jieyi Zhang, Tutian Tang, Yutong Li, Wenxin Du, Ruolin Ye, Cewu Lu

(参考訳) 衣服は人間にとって重要である。完全な衣服のポーズを推定し追跡できる視覚システムは、多くの下流タスクや現実世界のアプリケーションに有用である。本研究は,(1)VRインタフェースを通じて仮想衣料品モデルを操作することができるVR-Garmentを収録した,カテゴリーレベルの衣料品ポーズ追跡タスクに対処するための完全なパッケージを提案する。 2) フラット化や折りたたみなどの操作において, 複雑な衣料を施した大規模データセットVRフォールディング。 (3) エンド・ツー・エンドのオンライントラッキングフレームワークであるGarmentTrackingは、ポイントクラウドシーケンスを与えられた標準空間とタスク空間の両方で、完全な衣服のポーズを予測する。広汎な実験により, 衣服の非剛性変形が大きい場合でも, 提案したGarmentTrackingは優れた性能を発揮することが示された。速度と精度の両方でベースラインアプローチより優れている。提案されたソリューションが将来の研究のプラットフォームになることを期待しています。コードとデータセットはhttps://garment-tracking.robotflow.aiで利用可能である。

Garments are important to humans. A visual system that can estimate and track the complete garment pose can be useful for many downstream tasks and real-world applications. In this work, we present a complete package to address the category-level garment pose tracking task: (1) A recording system VR-Garment, with which users can manipulate virtual garment models in simulation through a VR interface. (2) A large-scale dataset VR-Folding, with complex garment pose configurations in manipulation like flattening and folding. (3) An end-to-end online tracking framework GarmentTracking, which predicts complete garment pose both in canonical space and task space given a point cloud sequence. Extensive experiments demonstrate that the proposed GarmentTracking achieves great performance even when the garment has large non-rigid deformation. It outperforms the baseline approach on both speed and accuracy. We hope our proposed solution can serve as a platform for future research. Codes and datasets are available in https://garment-tracking.robotflow.ai.

翻訳日:2023-03-27 14:58:43 公開日:2023-03-24

# 任意量子プロセスのクロスプラットフォーム比較

Cross-Platform Comparison of Arbitrary Quantum Processes ( http://arxiv.org/abs/2303.13911v1 )

ライセンス: Link先を確認

Congcong Zheng, Xutao Yu, Kun Wang

(参考訳) 本研究では,局所演算と古典通信(LOCC)を用いて,空間的にあるいは時間的に異なる量子プラットフォーム上で実行される任意の量子プロセスの性能を比較するプロトコルを提案する。このプロトコルは局所ユニタリ演算子をサンプリングし、古典的通信を介して各プラットフォームと通信し、量子状態の準備と測定回路を構築する。その後、各プラットフォームに局所ユニタリ演算子を実装し、測定結果の確率分布を生成する。最大過程の忠実度は確率分布から推定され、量子プロセスの相対的性能を最終的に定量化する。さらに,このプロトコルが量子プロセストモグラフィーに適用可能であることを示す。我々は,IBMの5つの量子デバイスとBaiduのQianshi量子コンピュータの性能をクラウド経由で比較するためにプロトコルを適用した。驚くべきことに、このプロトコルは異なる量子コンピュータに実装された量子プロセスの性能を正確に比較することができ、完全な量子プロセストモグラフィに必要な測定値よりもはるかに少ない測定量を必要とする。我々の研究は、量子コンピュータのクロスプラットフォーム比較における協力的取り組みの触媒であると考えています。

In this work, we present a protocol for comparing the performance of arbitrary quantum processes executed on spatially or temporally disparate quantum platforms using Local Operations and Classical Communication (LOCC). The protocol involves sampling local unitary operators, which are then communicated to each platform via classical communication to construct quantum state preparation and measurement circuits. Subsequently, the local unitary operators are implemented on each platform, resulting in the generation of probability distributions of measurement outcomes. The max process fidelity is estimated from the probability distributions, which ultimately quantifies the relative performance of the quantum processes. Furthermore, we demonstrate that this protocol can be adapted for quantum process tomography. We apply the protocol to compare the performance of five quantum devices from IBM and the "Qianshi" quantum computer from Baidu via the cloud. Remarkably, the experimental results reveal that the protocol can accurately compare the performance of the quantum processes implemented on different quantum computers, requiring significantly fewer measurements than those needed for full quantum process tomography. We view our work as a catalyst for collaborative efforts in cross-platform comparison of quantum computers.

翻訳日:2023-03-27 14:58:26 公開日:2023-03-24

# Wave-U-Net Discriminator: 音声合成のための高速かつ軽量な識別器

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis ( http://arxiv.org/abs/2303.13909v1 )

ライセンス: Link先を確認

Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

(参考訳) 音声合成では、ジェネレータ(音声合成器)と識別器をmin-maxゲームで訓練するGAN(Generative Adversarial Network)が音声品質向上に広く利用されている。識別器のアンサンブルは、近年のニューラルボコーダ(HiFi-GANなど)や、複数の視点から波形を精査するためにTTSシステム(VITSなど)で一般的に用いられている。このような判別器は、合成された音声が実際の音声に適切に近づくことができるが、識別器の数の増加に応じて、モデルサイズと計算時間を増加させる必要がある。あるいは、Wave-U-Netアーキテクチャを持つ単一だが表現力のある識別器であるWave-U-Net判別器を提案する。この判別器は一意で、入力信号と同じ解像度でサンプル的に波形を評価でき、同時にスキップ接続のあるエンコーダとデコーダを介して多レベル特徴を抽出することができる。このアーキテクチャは、合成された音声が実際の音声と密にマッチするのに十分な情報を持つジェネレータを提供する。実験中,提案したアイデアを代表型ニューラルボコーダ (HiFi-GAN) とエンドツーエンドTSシステム (VITS) に適用した。その結果,提案手法は,hifi-ganでは2.31倍高速で14.5倍,vitsでは1.90倍高速で9.62倍軽量な判別器を用いて,同等の音声品質を達成できることがわかった。オーディオサンプルはhttps://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/で入手できる。

In speech synthesis, a generative adversarial network (GAN), training a generator (speech synthesizer) and a discriminator in a min-max game, is widely used to improve speech quality. An ensemble of discriminators is commonly used in recent neural vocoders (e.g., HiFi-GAN) and end-to-end text-to-speech (TTS) systems (e.g., VITS) to scrutinize waveforms from multiple perspectives. Such discriminators allow synthesized speech to adequately approach real speech; however, they require an increase in the model size and computation time according to the increase in the number of discriminators. Alternatively, this study proposes a Wave-U-Net discriminator, which is a single but expressive discriminator with Wave-U-Net architecture. This discriminator is unique; it can assess a waveform in a sample-wise manner with the same resolution as the input signal, while extracting multilevel features via an encoder and decoder with skip connections. This architecture provides a generator with sufficiently rich information for the synthesized speech to be closely matched to the real speech. During the experiments, the proposed ideas were applied to a representative neural vocoder (HiFi-GAN) and an end-to-end TTS system (VITS). The results demonstrate that the proposed models can achieve comparable speech quality with a 2.31 times faster and 14.5 times more lightweight discriminator when used in HiFi-GAN and a 1.90 times faster and 9.62 times more lightweight discriminator when used in VITS. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/.

翻訳日:2023-03-27 14:58:08 公開日:2023-03-24

# 磁気共鳴画像の高分解能化のための3プレーヤGAN

A Three-Player GAN for Super-Resolution in Magnetic Resonance Imaging ( http://arxiv.org/abs/2303.13900v1 )

ライセンス: Link先を確認

Qi Wang, Lucas Mahler, Julius Steiglechner, Florian Birk, Klaus Scheffler, Gabriele Lohmann

(参考訳) 学習ベース単一画像スーパーレゾリューション(SISR)タスクは2次元画像でよく研究されている。しかし、3D磁気共鳴画像(MRI)のSISRは、主にニューラルネットワークパラメータの増大、メモリ要求の増大、利用可能なトレーニングデータの制限により、2Dと比較して困難である。現在の3次元ボリューム画像のSISR法は、GAN(Generative Adversarial Networks)、特にWasserstein GANsのトレーニング安定性に基づくものである。 2Dドメインの他の一般的なアーキテクチャ、例えばトランスフォーマーモデルでは、大量のトレーニングデータを必要とするため、限られた3Dデータには適さない。しかしながら、wasserstein gansは、グローバル最適に収束せず、ぼやけた結果をもたらす可能性があるため、問題となることがある。本稿では,GANフレームワークに基づく3次元SRの新しい手法を提案する。具体的には、GANトレーニングのバランスをとるためにインスタンスノイズを使用します。さらに,学習過程において,相対論的GAN損失関数と更新特徴抽出器を用いる。本手法は高精度な結果が得られることを示す。また、トレーニングサンプルが極めて少ないことも示しています。特に、以前の研究で通常必要とされる何千ものトレーニングサンプルではなく、30未満のサンプルが必要です。最後に,本モデルによるサンプル外結果の改善を示す。

Learning based single image super resolution (SISR) task is well investigated in 2D images. However, SISR for 3D Magnetics Resonance Images (MRI) is more challenging compared to 2D, mainly due to the increased number of neural network parameters, the larger memory requirement and the limited amount of available training data. Current SISR methods for 3D volumetric images are based on Generative Adversarial Networks (GANs), especially Wasserstein GANs due to their training stability. Other common architectures in the 2D domain, e.g. transformer models, require large amounts of training data and are therefore not suitable for the limited 3D data. However, Wasserstein GANs can be problematic because they may not converge to a global optimum and thus produce blurry results. Here, we propose a new method for 3D SR based on the GAN framework. Specifically, we use instance noise to balance the GAN training. Furthermore, we use a relativistic GAN loss function and an updating feature extractor during the training process. We show that our method produces highly accurate results. We also show that we need very few training samples. In particular, we need less than 30 samples instead of thousands of training samples that are typically required in previous studies. Finally, we show improved out-of-sample results produced by our model.

翻訳日:2023-03-27 14:57:35 公開日:2023-03-24

# 動的シナリオにおけるロバストテスト時間適応

Robust Test-Time Adaptation in Dynamic Scenarios ( http://arxiv.org/abs/2303.13899v1 )

ライセンス: Link先を確認

Longhui Yuan, Binhui Xie, Shuang Li

(参考訳) テスト時適応(TTA)は、未ラベルのテストデータストリームのみを用いて、事前訓練されたモデルを分散をテストすることを目的としている。従来のTTA手法のほとんどは、単一あるいは複数のディストリビューションから独立したサンプルデータなど、単純なテストデータストリームで大きな成功を収めている。しかしながら、これらの試みは、環境が徐々に変化し、テストデータが時間とともに相関してサンプリングされるような、自律運転のような現実のアプリケーションの動的シナリオでは失敗する可能性がある。そこで本研究では,実運用テスト時適応 (PTTA) と呼ばれる,実運用テストデータストリームをオンザフライで展開する手法について検討する。そこで我々は,PTTAの複雑なデータストリームに対してロバストテスト時間適応法(RoTTA)を詳述する。より具体的には、正規化統計を推定する頑健なバッチ正規化スキームを提案する。一方、メモリバンクは、時系列や不確実性を考慮したカテゴリバランスデータのサンプリングに利用される。さらに,学習手順を安定させるために,教師・生徒モデルを用いた時間対応型重み付け戦略を考案する。大規模な実験により、RoTTAは相関サンプルデータストリーム上で連続的なテストタイム適応を可能にすることが証明された。私たちのメソッドの実装は簡単で、迅速なデプロイメントに適しています。コードはhttps://github.com/BIT-DA/RoTTAで公開されている。

Test-time adaptation (TTA) intends to adapt the pretrained model to test distributions with only unlabeled test data streams. Most of the previous TTA methods have achieved great success on simple test data streams such as independently sampled data from single or multiple distributions. However, these attempts may fail in dynamic scenarios of real-world applications like autonomous driving, where the environments gradually change and the test data is sampled correlatively over time. In this work, we explore such practical test data streams to deploy the model on the fly, namely practical test-time adaptation (PTTA). To do so, we elaborate a Robust Test-Time Adaptation (RoTTA) method against the complex data stream in PTTA. More specifically, we present a robust batch normalization scheme to estimate the normalization statistics. Meanwhile, a memory bank is utilized to sample category-balanced data with consideration of timeliness and uncertainty. Further, to stabilize the training procedure, we develop a time-aware reweighting strategy with a teacher-student model. Extensive experiments prove that RoTTA enables continual testtime adaptation on the correlatively sampled data streams. Our method is easy to implement, making it a good choice for rapid deployment. The code is publicly available at https://github.com/BIT-DA/RoTTA

翻訳日:2023-03-27 14:57:17 公開日:2023-03-24

# 英国における政府サイバーセキュリティイニシアチブの効果評価

Evaluating the impact of government Cyber Security initiatives in the UK ( http://arxiv.org/abs/2303.13943v1 )

ライセンス: Link先を確認

Adejoke T. Odebade, Elhadj Benkhelifa

(参考訳) サイバーセキュリティイニシアチブは、政府にとって、企業や一般大衆のサイバー衛生を教育し、訓練し、啓発し、促進する大きな機会を提供する。これらのイニシアチブの作成と推進は、政府が国家のサイバー健康を確保するための必要な手段である。ユーザが安全で自信のあるオンラインであることを保証するために、英国政府は、慈善団体のための小さなチャリティーガイド、小規模ビジネスのための小さなビジネスガイド、一般向けに安全なオンライン化、組織のためのサイバーイニシアチブなど、さまざまなユーザのニーズを満たすように設計されたイニシアチブを作成した。しかし、これらのイニシアチブが目的を達成することを保証することは、特に人口に手を差し伸べる場合には、厄介なことだ。したがって、政府は、サイバーセキュリティに対する義務を認識していることを確実にするために、ユーザに連絡する実践的な方法を強化することが不可欠である。この研究は、英国政府のサイバーセキュリティイニシアティブのうち16つを評価し、これらのイニシアチブが失敗した4つの顕著な理由を発見した。これらの理由は、意識と訓練の不足、影響を測定するためのイニシアチブの非評価、行動の変化の不十分、意図された目標に到達するための限られた範囲である。これらの知見に基づく勧告は、これらのイニシアチブを全国および地域レベルで推進することである。

Cyber security initiatives provide immense opportunities for governments to educate, train, create awareness, and promote cyber hygiene among businesses and the general public. Creating and promoting these initiatives are necessary steps governments take to ensure the cyber health of a nation. To ensure users are safe and confident, especially online, the UK government has created initiatives designed to meet the needs of various users such as small charity guide for charity organisations, small business guide for small businesses, get safe online for the general public, and cyber essentials for organisations, among many others. However, ensuring that these initiatives deliver on their objectives can be daunting, especially when reaching out to the whole population. It is, therefore, vital for the government to intensify practical ways of reaching out to users to make sure that they are aware of their obligation to cyber security. This study evaluates sixteen of the UK government's cyber security initiatives and discovers four notable reasons why these initiatives are failing. These reasons are insufficient awareness and training, non-evaluation of initiatives to measure impact, insufficient behavioural change, and limited coverage to reach intended targets. The recommendation based on these findings is to promote these initiatives both nationally and at community levels.

翻訳日:2023-03-27 14:50:09 公開日:2023-03-24

# 機械学習による光電子モーメントからのフェムト秒パルスパラメータ推定

Femtosecond pulse parameter estimation from photoelectron momenta using machine learning ( http://arxiv.org/abs/2303.13940v1 )

ライセンス: Link先を確認

Tomasz Szo{\l}dra, Marcelo F. Ciappina, Nicholas Werby, Philip H. Bucksbaum, Maciej Lewenstein, Jakub Zakrzewski, and Andrew S. Maxwell

(参考訳) ディープラーニングモデルは、画像のようなデータに膨大な解釈能力を提供している。特に、畳み込みニューラルネットワーク(CNN)は、特徴抽出やパラメータ推定といったタスクに対して驚くほどの明度を示した。ここでは強電離光電子スペクトルのcnnをテストし、理論データ集合をインバート実験データにトレーニングする。パルスキャラクタリゼーションは「テストグラウンド」として使われ、具体的には「伝統的な」測定値が通常20\%の不確かさをもたらすレーザー強度を取得する。本稿では,理論データのトレーニングに成功し,検出器飽和度計算を含む実験から一貫した結果を返すために必要な重要なデータ拡張手法について報告する。同じ手順を繰り返すことで、強電界イオン化の様々なシナリオにcnnを適用することができる。予測の不確実性推定を用いて、信頼性のある数パーセントのレーザー強度の不確実性を抽出することができる。解釈可能性法を用いることで、ホログラフィック干渉に直接関連しうるレーザー強度に最も敏感な分布の一部を明らかにすることができる。 CNNは、パラメータを抽出する正確で便利な方法を提供し、強磁場電離スペクトルの新しい解釈ツールを表現している。

Deep learning models have provided huge interpretation power for image-like data. Specifically, convolutional neural networks (CNNs) have demonstrated incredible acuity for tasks such as feature extraction or parameter estimation. Here we test CNNs on strong-field ionization photoelectron spectra, training on theoretical data sets to `invert' experimental data. Pulse characterization is used as a `testing ground', specifically we retrieve the laser intensity, where `traditional' measurements typically leads to 20\% uncertainty. We report on crucial data augmentation techniques required to successfully train on theoretical data and return consistent results from experiments, including accounting for detector saturation. The same procedure can be repeated to apply CNNs in a range of scenarios for strong-field ionization. Using a predictive uncertainty estimation, reliable laser intensity uncertainties of a few percent can be extracted, which are consistently lower than those given by traditional techniques. Using interpretability methods can reveal parts of the distribution that are most sensitive to laser intensity, which can be directly associated to holographic interferences. The CNNs employed provide an accurate and convenient ways to extract parameters, and represent a novel interpretational tool for strong-field ionization spectra.

翻訳日:2023-03-27 14:49:43 公開日:2023-03-24

# MUG: 理解と生成のベンチマーク

MUG: A General Meeting Understanding and Generation Benchmark ( http://arxiv.org/abs/2303.13939v1 )

ライセンス: Link先を確認

Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

(参考訳) ビデオ会議やオンラインコースから長いビデオ/オーディオ録音を聴くことは極めて非効率である。 ASRシステムは、記録を長文の音声文書に書き起こした後でも、ASRの書き起こしを読むことは、情報の検索を高速化するだけである。キーフレーズ抽出やトピックセグメンテーション,要約など,さまざまなNLPアプリケーションが重要情報の収集において,ユーザの効率を著しく向上させることがわかった。ミーティングシナリオは,これらの言語処理(SLP)機能をデプロイする上で,最も価値のあるシナリオのひとつだ。しかし、これらのSLPタスクに注釈を付けた大規模な公開ミーティングデータセットの欠如は、彼らの進歩を著しく妨げている。 slpの進歩を促進するために,トピックセグメンテーション,トピックレベルおよびセッションレベルの抽出要約,トピックタイトル生成,キーフレーズ抽出,アクションアイテム検出など,幅広いslpタスクのパフォーマンスをベンチマークするために,mug(general meeting understanding and generation benchmark)を確立した。 mugベンチマークを容易にするために,大規模会議データセットであるalimeeting4mugコーパスを構築して公開する。このコーパスは654回録音されたマンダリン会議セッションで,トピックカバレッジが多様であり,会議記録のマニュアル書き起こしにslpタスクのマニュアルアノテーションが組み込まれている。私たちの知る限りでは、AliMeeting4MUG Corpusは規模で最大のミーティングコーパスであり、ほとんどのSLPタスクを促進する。本稿では,本コーパスの詳細な紹介,slpタスクと評価方法,ベースラインシステムとその性能について述べる。

Listening to long video/audio recordings from video conferencing and online courses for acquiring information is extremely inefficient. Even after ASR systems transcribe recordings into long-form spoken language documents, reading ASR transcripts only partly speeds up seeking information. It has been observed that a range of NLP applications, such as keyphrase extraction, topic segmentation, and summarization, significantly improve users' efficiency in grasping important information. The meeting scenario is among the most valuable scenarios for deploying these spoken language processing (SLP) capabilities. However, the lack of large-scale public meeting datasets annotated for these SLP tasks severely hinders their advancement. To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection. To facilitate the MUG benchmark, we construct and release a large-scale meeting dataset for comprehensive long-form SLP development, the AliMeeting4MUG Corpus, which consists of 654 recorded Mandarin meeting sessions with diverse topic coverage, with manual annotations for SLP tasks on manual transcripts of meeting recordings. To the best of our knowledge, the AliMeeting4MUG Corpus is so far the largest meeting corpus in scale and facilitates most SLP tasks. In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance.

翻訳日:2023-03-27 14:49:24 公開日:2023-03-24

# 10カ国における国家サイバーセキュリティ戦略の比較研究

A Comparative Study of National Cyber Security Strategies of ten nations ( http://arxiv.org/abs/2303.13938v1 )

ライセンス: Link先を確認

Adejoke T. Odebade, Elhadj Benkhelifa

(参考訳) この研究は、ヨーロッパ(イギリス、フランス、リトアニア、エストニア、スペイン、ノルウェー)、アジア太平洋(シンガポールとオーストラリア)、アメリカ地域(米国とカナダ)の10カ国で公開されている文書のNCSS(National Cybersecurity Strategies)を比較した。この研究は「サイバーセキュリティ」という用語の統一的な理解は存在しないが、NCSSの共通の軌道は、サイバー犯罪との戦いは様々な利害関係者の協力によるものであり、国際協力の必要性が強いことを示している。比較構造とnscフレームワークを用いて、重要な資産の保護、研究開発へのコミットメント、国内外のコラボレーションの改善に類似性を見出した。この研究は、基盤となるサイバーセキュリティフレームワークが統一されていないことが、戦略の構造と内容に相違をもたらすことを示唆している。この研究のNCSSの強みと弱点は、サイバーセキュリティ戦略の開発や更新を計画している国に恩恵をもたらす。この研究は、NCSSを開発する際に戦略開発者が考慮できる推奨事項を提供する。

This study compares the National Cybersecurity Strategies (NCSSs) of publicly available documents of ten nations across Europe (United Kingdom, France, Lithuania, Estonia, Spain, and Norway), Asia-Pacific (Singapore and Australia), and the American region (the United States of America and Canada). The study observed that there is not a unified understanding of the term "Cybersecurity"; however, a common trajectory of the NCSSs shows that the fight against cybercrime is a joint effort among various stakeholders, hence the need for strong international cooperation. Using a comparative structure and an NCSS framework, the research finds similarities in protecting critical assets, commitment to research and development, and improved national and international collaboration. The study finds that the lack of a unified underlying cybersecurity framework leads to a disparity in the structure and contents of the strategies. The strengths and weaknesses of the NCSSs from the research can benefit countries planning to develop or update their cybersecurity strategies. The study gives recommendations that strategy developers can consider when developing an NCSS.

翻訳日:2023-03-27 14:48:55 公開日:2023-03-24

# グラフニューラルネットワークによる粒子物理過程の位相再構成

Topological Reconstruction of Particle Physics Processes using Graph Neural Networks ( http://arxiv.org/abs/2303.13937v1 )

ライセンス: Link先を確認

Lukas Ehrke, John Andrew Raine, Knut Zoch, Manuel Guth, Tobias Golling

(参考訳) 本稿では,粒子の減衰とメッセージパッシンググラフニューラルネットワークの柔軟性を基礎として,中間粒子を含む基礎となる物理過程を再構築する新しい手法であるtopographを提案する。トポグラフは観測された最終状態天体の組合せ的な割り当てを解き、元の母粒子と関連付けるだけでなく、ハード散乱過程における中間粒子の性質とそれに続く崩壊を直接予測する。グラフニューラルネットワークを用いた標準的なコンビネータアプローチや現代的なアプローチと比較すると、グラフの複雑さは再構成されたオブジェクトの数と線形にスケールする。我々は、全ハドロン減衰チャネルにおけるトップクォーク対生成にトポグラフを適用し、標準手法より優れ、最先端の機械学習技術の性能に適合する。

We present a new approach, the Topograph, which reconstructs underlying physics processes, including the intermediary particles, by leveraging underlying priors from the nature of particle physics decays and the flexibility of message passing graph neural networks. The Topograph not only solves the combinatoric assignment of observed final state objects, associating them to their original mother particles, but directly predicts the properties of intermediate particles in hard scatter processes and their subsequent decays. In comparison to standard combinatoric approaches or modern approaches using graph neural networks, which scale exponentially or quadratically, the complexity of Topographs scales linearly with the number of reconstructed objects. We apply Topographs to top quark pair production in the all hadronic decay channel, where we outperform the standard approach and match the performance of the state-of-the-art machine learning technique.

翻訳日:2023-03-27 14:48:39 公開日:2023-03-24

# ソフトウェア開発教育におけるジェネレーティブAIアシスタント

Generative AI Assistants in Software Development Education ( http://arxiv.org/abs/2303.13936v1 )

ライセンス: Link先を確認

Christopher Bull, Ahmed Kharrufa

(参考訳) ソフトウェア開発業界は、ソフトウェア開発にジェネレーティブAI(GAI)アシスタントを使用するという、潜在的に破壊的なパラダイムの変化に直面している。 AIはすでにソフトウェアエンジニアリングのさまざまな分野で使用されているが、GitHub CopilotやChatGPTといったGAIテクノロジは、多くの人々の想像力(と恐怖)に火をつけている。業界がこれらのテクノロジをどのように採用し、適応するのかは不明だが、microsoft(github、bing)やgoogle(bard)といった大企業による、より広範な業界への統合の動きは、意図と方向性を明確に示している。私たちは、現在の実践と課題を理解するために、業界専門家と探索的なインタビューを行い、ソフトウェア開発教育の将来というビジョンに組み込んで、教育的なレコメンデーションを実施しました。

The software development industry is amid another potentially disruptive paradigm change--adopting the use of generative AI (GAI) assistants for software development. Whilst AI is already used in various areas of software engineering, GAI technologies, such as GitHub Copilot and ChatGPT, have ignited the imaginations (and fears) of many people. Whilst it is unclear how the industry will adopt and adapt to these technologies, the move to integrate these technologies into the wider industry by large software companies, such as Microsoft (GitHub, Bing) and Google (Bard), is a clear indication of intent and direction. We performed exploratory interviews with industry professionals to understand current practices and challenges, which we incorporate into our vision of a future of software development education and make some pedagogical recommendations.

翻訳日:2023-03-27 14:48:23 公開日:2023-03-24

# DisC-Diff:マルチコントラストMRI超解像のための遠方拡散モデル

DisC-Diff: Disentangled Conditional Diffusion Model for Multi-Contrast MRI Super-Resolution ( http://arxiv.org/abs/2303.13933v1 )

ライセンス: Link先を確認

Ye Mao, Lan Jiang, Xi Chen, and Chao Li

(参考訳) マルチコントラストMRI(Multi-Contrast MRI)は、脳組織のコントラストに基づいて神経疾患を特徴づける最も一般的な管理ツールである。しかし、高分解能MRIスキャンの取得には時間がかかり、特定の条件下では不可能である。そこで, マルチコントラスト超解像法は, マルチコントラストMRIの相補的情報を活用することで, 低コントラストの品質を向上させるために開発された。現在のディープラーニングに基づく超解法は、復元の不確実性の推定とモード崩壊の回避に限界がある。拡散モデルは画像強調のための有望なアプローチとして現れてきたが、マルチコントラストMRIによる複数の条件間の複雑な相互作用を捉えることは、臨床応用の課題である。本稿では,マルチコントラスト脳MRI超解像のための不整合拡散モデルDisC-Diffを提案する。拡散モデルのサンプリングベース生成と単純な目的関数を利用して、修復における不確実性を効果的に推定し、安定した最適化プロセスを保証する。さらに,DEC-Diffは,マルチコントラストMRIからの補完的情報をフル活用し,マルチコントラスト入力の複数の条件下でのモデル解釈を改善する。 578個の正常脳を含むIXIデータセットと316個の病理脳を含む臨床データセットの2つのデータセットに対するDisC-Diffの有効性を検証した。実験の結果,DisC-Diffは,他の最先端手法よりも定量的にも視覚的にも優れていた。

Multi-contrast magnetic resonance imaging (MRI) is the most common management tool used to characterize neurological disorders based on brain tissue contrasts. However, acquiring high-resolution MRI scans is time-consuming and infeasible under specific conditions. Hence, multi-contrast super-resolution methods have been developed to improve the quality of low-resolution contrasts by leveraging complementary information from multi-contrast MRI. Current deep learning-based super-resolution methods have limitations in estimating restoration uncertainty and avoiding mode collapse. Although the diffusion model has emerged as a promising approach for image enhancement, capturing complex interactions between multiple conditions introduced by multi-contrast MRI super-resolution remains a challenge for clinical applications. In this paper, we propose a disentangled conditional diffusion model, DisC-Diff, for multi-contrast brain MRI super-resolution. It utilizes the sampling-based generation and simple objective function of diffusion models to estimate uncertainty in restorations effectively and ensure a stable optimization process. Moreover, DisC-Diff leverages a disentangled multi-stream network to fully exploit complementary information from multi-contrast MRI, improving model interpretation under multiple conditions of multi-contrast inputs. We validated the effectiveness of DisC-Diff on two datasets: the IXI dataset, which contains 578 normal brains, and a clinical dataset with 316 pathological brains. Our experimental results demonstrate that DisC-Diff outperforms other state-of-the-art methods both quantitatively and visually.

翻訳日:2023-03-27 14:48:09 公開日:2023-03-24

# ICASSP 2023総合会議理解・生成チャレンジ(MUG)の概要

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG) ( http://arxiv.org/abs/2303.13932v1 )

ライセンス: Link先を確認

Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

(参考訳) ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) は、会議において重要な情報を把握する上で、SLPアプリケーションがユーザの効率を向上させるために重要であるため、会議記述書に対する幅広い言語処理(SLP)研究を促進することに焦点を当てている。 MUGにはトピックセグメンテーション、トピックレベルおよびセッションレベルの抽出要約、トピックタイトル生成、キーフレーズ抽出、アクションアイテム検出の5つのトラックが含まれている。 MUGを容易にするために,大規模なミーティングデータセットであるAliMeeting4MUG Corpusを構築し,リリースする。

ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in grasping important information in meetings. MUG includes five tracks, including topic segmentation, topic-level and session-level extractive summarization, topic title generation, keyphrase extraction, and action item detection. To facilitate MUG, we construct and release a large-scale meeting dataset, the AliMeeting4MUG Corpus.

翻訳日:2023-03-27 14:47:42 公開日:2023-03-24

# MSdocTr-Lite: フルページマルチスクリプト手書き文字認識のためのリテラル変換器

MSdocTr-Lite: A Lite Transformer for Full Page Multi-script Handwriting Recognition ( http://arxiv.org/abs/2303.13931v1 )

ライセンス: Link先を確認

Marwa Dhiaf, Ahmed Cheikh Rouhou, Yousri Kessentini, Sinda Ben Salem

(参考訳) トランスフォーマーは、長距離表現能力のため、様々なパターン認識タスクにおいて急速に支配的なアーキテクチャとなっている。しかし、トランスフォーマーはデータハングリーモデルであり、トレーニングには大きなデータセットが必要です。手書き文字認識(HTR)では、大量のラベル付きデータを収集することは複雑で高価な作業である。本稿では,フルページマルチスクリプト手書き文字認識のためのライトトランスアーキテクチャを提案する。提案モデルには3つの利点がある: まず、データ不足の一般的な問題を解決するために、ほとんどのHTRパブリックデータセットにおいて、外部データを必要とせずに、適切な量のデータに基づいてトレーニングできるライトトランスフォーマーモデルを提案する。第二に、カリキュラムの学習戦略のおかげでページレベルでの読み込み順序を学習でき、行分割エラーを避け、より大きなコンテキストを活用し、コストのかかるセグメンテーションアノテーションの必要性を減らすことができる。第3に、ページレベルのラベル付き画像のみを使用して、簡単なトランスファー学習プロセスを適用することで、他のスクリプトに容易に適応できる。異なるスクリプト(フランス語、英語、スペイン語、アラビア語)の異なるデータセットに関する広範な実験は、提案モデルの有効性を示している。

The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transfer-learning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model.

翻訳日:2023-03-27 14:47:31 公開日:2023-03-24

# 粒子平均場変動ベイズ

Particle Mean Field Variational Bayes ( http://arxiv.org/abs/2303.13930v1 )

ライセンス: Link先を確認

Minh-Ngoc Tran, Paco Tseng, Robert Kohn

(参考訳) 平均場変分ベイズ法 (MFVB) はベイズ推論において最も計算効率のよい手法の1つである。しかし、その用途は共役前のモデルや解析計算を必要とするモデルに限られている。本稿では,MFVB法の適用性を大幅に拡大する粒子ベースMFVB法を提案する。本研究では,wasserstein勾配流とlangevin拡散ダイナミクスの結合を利用して,新しい手法の理論的基礎を確立し,ベイズロジスティック回帰,確率的ボラティリティ,ディープニューラルネットワークを用いた手法の有効性を示す。

The Mean Field Variational Bayes (MFVB) method is one of the most computationally efficient techniques for Bayesian inference. However, its use has been restricted to models with conjugate priors or those that require analytical calculations. This paper proposes a novel particle-based MFVB approach that greatly expands the applicability of the MFVB method. We establish the theoretical basis of the new method by leveraging the connection between Wasserstein gradient flows and Langevin diffusion dynamics, and demonstrate the effectiveness of this approach using Bayesian logistic regression, stochastic volatility, and deep neural networks.

翻訳日:2023-03-27 14:47:14 公開日:2023-03-24

# オフライン模倣学習のための最適輸送

Optimal Transport for Offline Imitation Learning ( http://arxiv.org/abs/2303.13971v1 )

ライセンス: Link先を確認

Yicheng Luo, Zhengyao Jiang, Samuel Cohen, Edward Grefenstette, Marc Peter Deisenroth

(参考訳) 大規模データセットの出現に伴い、オフライン強化学習(rl)は、実環境と対話することなく、優れた意思決定ポリシーを学ぶための有望なフレームワークである。しかし、オフラインのRLでは、報酬アノテートが必要なため、報酬エンジニアリングが難しい場合や、報酬アノテートを取得する場合など、現実的な課題が生じる。本稿では,オフライン軌道に報酬を割り当てるアルゴリズムであるOptimal Transport Reward labeling (OTR)を紹介する。 OTRの鍵となる考え方は、データセット内のラベルなし軌跡と専門家のデモンストレーションとの間の最適なアライメントを計算するために最適なトランスポートを使用することで、報酬として解釈可能な類似度測定値を取得し、オフラインのRLアルゴリズムでポリシーを学ぶことができることである。 OTRは実装が簡単で、計算効率が良い。 D4RL ベンチマークでは,単一実演を用いた OTR がオフライン RL の性能に一定の精度で一致することを示す。

With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.

翻訳日:2023-03-27 14:41:50 公開日:2023-03-24

# Si/SiGe構造における量子ドットに影響を及ぼす1/f電荷雑音のシミュレーション

Simulation of 1/f charge noise affecting a quantum dot in a Si/SiGe structure ( http://arxiv.org/abs/2303.13968v1 )

ライセンス: Link先を確認

Marcin K\k{e}pa, Niels Focke, {\L}ukasz Cywi\'nski, Jan. A. Krzywda

(参考訳) コヒーレントスピン制御に必要な磁場勾配が存在するため、シリコン量子ドット内の単一電子スピン量子ビットの強調はしばしば1/f$の電荷ノイズによって支配される。現実的なSi/SiGe構造におけるゲート量子ドット中の電子の基底状態エネルギーの理論的変動について検討する。電荷ノイズは半導体-酸化物界面で捕捉された電荷の運動に起因すると仮定する。捕獲された電荷密度の現実的な範囲を考えると、$\rho \! てめえ! 10^{10}$ cm$^{-2}$、およびそれらの電荷の等方分布変位の典型的なレゲットスケール、$\delta r \! \leq \! 1$ nm で、ノイズスペクトルの振幅と形状が類似した構造に関する最近の実験で再構成されたスペクトルとよく一致しているペア $(\rho,\delta r)$ を識別する。

Due to presence of magnetic field gradient needed for coherent spin control, dephasing of single-electron spin qubits in silicon quantum dots is often dominated by $1/f$ charge noise. We investigate theoretically fluctuations of ground state energy of an electron in gated quantum dot in realistic Si/SiGe structure. We assume that the charge noise is caused by motion of charges trapped at the semiconductor-oxide interface. We consider a realistic range of trapped charge densities, $\rho \! \sim \! 10^{10}$ cm$^{-2}$, and typical lenghtscales of isotropically distributed displacements of these charges, $\delta r \! \leq \! 1$ nm, and identify pairs $(\rho,\delta r)$ for which the amplitude and shape of the noise spectrum is in good agreement with spectra reconstructed in recent experiments on similar structures.

翻訳日:2023-03-27 14:41:31 公開日:2023-03-24

# グラフ学習のための2レベル最適化による勾配不足

Gradient scarcity with Bilevel Optimization for Graph Learning ( http://arxiv.org/abs/2303.13964v1 )

ライセンス: Link先を確認

Hashem Ghanem (IMB), Samuel Vaiter (CNRS, JAD), Nicolas Keriven (CNRS, IRISA)

(参考訳) 半教師付き環境下でのグラフ学習の一般的な問題は勾配不足と呼ばれる。すなわち、ノードのサブセットの損失を最小限にすることでグラフを学習すると、ラベル付けされていないノード間のエッジがゼロ勾配を受ける。この現象は、グラフとグラフニューラルネットワーク(GCN)の重みを共同最適化アルゴリズムで最適化する際に初めて説明された。本研究では,この現象を正確に数学的に解析し,二段階最適化においても問題パラメータ間の追加依存性が存在することを証明した。 GCNの勾配の不足は、その有限受容場によって生じるが、ラプラシア正規化モデルにおいても、勾配の振幅がラベル付きノードとの距離によって指数関数的に減少することを示す。この問題を緩和するために,グラフ・ツー・グラフモデル(G2G)を用いた潜時グラフ学習,グラフに先行構造を課すグラフ正規化,あるいは直径を縮小した元のグラフよりも大きなグラフを最適化することを提案する。合成および実データを用いた実験により,提案手法の有効性が検証された。

A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.

翻訳日:2023-03-27 14:41:17 公開日:2023-03-24

# ステレオシーン:BEV支援のステレオマッチングパワーで3Dセマンティックシーンが完成

StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion ( http://arxiv.org/abs/2303.13959v1 )

ライセンス: Link先を確認

Bohan Li, Yasheng Sun, Xin Jin, Wenjun Zeng, Zheng Zhu, Xiaoefeng Wang, Yunpeng Zhang, James Okae, Hang Xiao, Dalong Du

(参考訳) 3Dセマンティックシーン補完(SSC)は、不完全な観察から密集した3Dシーンを推測する必要がある不適切な課題である。従来の手法では、3Dの幾何学的入力を明示的に取り入れるか、単眼のRGB画像の後方で学習した3Dに頼っていた。しかし、LiDARのような3Dセンサーは高価で侵入性があり、モノクラーカメラは固有の曖昧さのために正確な幾何学をモデル化する上で困難に直面している。本研究では,外部の3dセンサを使わずに,軽量カメラ入力を最大限に活用する3dセマンティックシーン補完(ssc)のためのステレオセンシングを提案する。私たちの重要な洞察は、ステレオマッチングを利用して幾何学的曖昧さを解決することです。未マッチング領域におけるロバスト性を改善するため,リッチな文脈情報による幻覚能力を高めるために,鳥眼ビュー(BEV)表現を導入する。ステレオおよびBEV表現の上に、相互インタラクティブアグリゲーション(MIA)モジュールを慎重に設計し、そのパワーを完全に解放する。具体的には、信頼度再重み付けを付加した双方向相互作用変換器(BIT)を用いて相互誘導による信頼性予測を行い、二重体積集約(DVA)モジュールは相補的な集約を容易にするように設計されている。 semantickittiの実験結果は、提案されたステレオシーンが最先端のカメラベース手法を上回り、相対的に26.9%、セマンティクスが38.6%改善していることを示している。

3D semantic scene completion (SSC) is an ill-posed task that requires inferring a dense 3D scene from incomplete observations. Previous methods either explicitly incorporate 3D geometric input or rely on learnt 3D prior behind monocular RGB images. However, 3D sensors such as LiDAR are expensive and intrusive while monocular cameras face challenges in modeling precise geometry due to the inherent ambiguity. In this work, we propose StereoScene for 3D Semantic Scene Completion (SSC), which explores taking full advantage of light-weight camera inputs without resorting to any external 3D sensors. Our key insight is to leverage stereo matching to resolve geometric ambiguity. To improve its robustness in unmatched areas, we introduce bird's-eye-view (BEV) representation to inspire hallucination ability with rich context information. On top of the stereo and BEV representations, a mutual interactive aggregation (MIA) module is carefully devised to fully unleash their power. Specifically, a Bi-directional Interaction Transformer (BIT) augmented with confidence re-weighting is used to encourage reliable prediction through mutual guidance while a Dual Volume Aggregation (DVA) module is designed to facilitate complementary aggregation. Experimental results on SemanticKITTI demonstrate that the proposed StereoScene outperforms the state-of-the-art camera-based methods by a large margin with a relative improvement of 26.9% in geometry and 38.6% in semantic.

翻訳日:2023-03-27 14:40:56 公開日:2023-03-24

# 量子および半量子通信プロトコルの強化

Boosted quantum and semi-quantum communication protocols ( http://arxiv.org/abs/2303.13958v1 )

ライセンス: Link先を確認

Rajni Bala, Sooryansh Asthana, V. Ravishankar

(参考訳) 準備・測定方式に基づくセキュアな量子通信プロトコルは、相互に偏りのないベースを用いる。これらのプロトコルでは、さまざまな参加者が異なるベースで測定する多数の実行が、単に無駄になってしまう。本稿では,鍵生成規則の適切な設計により,そのような実行回数を減らすことができることを示す。これにより、キー生成速度(KGR)が大幅に増加する。本稿では,高次元量子システムで符号化された効果的な量子ビットを用いて,量子鍵分散プロトコルと半量子鍵分散プロトコルを提案する。いずれも資源として絡み合った状態の準備を要求せず、比較的大量の情報を転送できるため、我々の提案は実験的に追求する価値があると信じている。

Secure quantum communication protocols based on prepare-and-measure scheme employ mutually unbiased bases. In these protocols, a large number of runs, in which different participants measure in different bases, simply go wasted. In this paper, we show that it is possible to reduce the number of such runs by a suitable design of the key generation rule. This results in a significant increase in the key generation rate (KGR). We illustrate this advantage by proposing quantum key distribution protocols and semi-quantum key distribution protocols by employing effective qubits encoded in higher dimensional quantum systems. Since none of them demands the preparation of entangled states as resources and a relatively large amount of information can be transferred, we believe that our proposals are worth pursuing experimentally.

翻訳日:2023-03-27 14:40:13 公開日:2023-03-24

# 分散スパースブロック符号のための因子

Factorizers for Distributed Sparse Block Codes ( http://arxiv.org/abs/2303.13957v1 )

ライセンス: Link先を確認

Michael Hersche, Aleksandar Terzic, Geethan Karunaratne, Jovin Langenegger, Ang\'eline Pouget, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi

(参考訳) 分散スパースブロック符号(SBC)は固定ベクトルを用いてシンボルデータ構造を符号化し操作するためのコンパクトな表現を示す。しかし、大きな課題の1つは、可能なすべての組み合わせを探索することなく、そのようなデータ構造を構成要素に切り離し、あるいは分解することである。この因子化は、現代のニューラルネットワークを用いてクエリベクトルを生成するときの知覚的不確実性や近似によってシンボル表現が緩和されるノイズの多いSBCによってクエリされるとより困難になる。これらの課題に対処するために,我々はまず,GSBCと呼ばれるより柔軟で一般化されたSBCを分解する高速かつ高精度な手法を提案する。反復分解器はしきい値に基づく非線形活性化,条件付きランダムサンプリング,$\ell_\infty$-based similarityメトリックを導入する。そのランダムサンプリング機構と重ね合わせの探索の組み合わせは、gsbcのバンドル能力まで経験的な観察と一致するデコードイテレーションの期待数を解析的に決定することができる。第二に,深層畳み込みニューラルネットワーク (cnns) を用いて生成したノイズ製品ベクトルを問合せした場合,提案手法は高い精度を維持する。これは、cnnの大規模完全連結層(fcl)を置き換えることで、cの訓練可能なクラスベクターまたは属性の組み合わせは、それぞれ$\sqrt[\leftroot{-2}\uproot{2}f]{c}$固定コードベクタを持つf-factor codebookを持つファクタライザによって暗黙的に表現できる。我々は,新しい損失関数を持つcnnの分類層に,因子化器を柔軟に統合する手法を提案する。 CIFAR-100, ImageNet-1K, RAVENデータセット上での4つの深層CNNアーキテクチャの実現可能性を示す。すべてのユースケースにおいて、パラメータと操作の数はFCLに比べて大幅に削減される。

Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-with vectors. One major challenge however is to disentangle, or factorize, such data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when queried by noisy SBCs wherein symbol representations are relaxed due to perceptual uncertainty and approximations made when modern neural networks are used to generate the query vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, a conditional random sampling, and an $\ell_\infty$-based similarity metric. Its random sampling mechanism in combination with the search in superposition allows to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC's bundling capacity. Secondly, the proposed factorizer maintains its high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are significantly reduced compared to the FCL.

翻訳日:2023-03-27 14:39:54 公開日:2023-03-24

# PIAT:パラメータ補間に基づく画像分類のための逆学習

PIAT: Parameter Interpolation based Adversarial Training for Image Classification ( http://arxiv.org/abs/2303.13955v1 )

ライセンス: Link先を確認

Kun He, Xin Liu, Yichen Yang, Zhou Qin, Weigao Wen, Hui Xue, John E. Hopcroft

(参考訳) 敵の攻撃に対して最も効果的なアプローチは、敵の訓練である。しかし, 既存の対人訓練法は, 防御効果を低下させ, トレーニング過程において明らかに振動や過度に適合する問題を示す。本研究では,パラメータ補間に基づく適応学習(PIAT)と呼ばれる新しいフレームワークを提案する。具体的には、各エポックの終わりにpiatはモデルパラメータを前と現在のエポックのパラメータの補間としてチューニングする。さらに、正規化平均正方形誤差(NMSE)を用いて、クリーンかつ対角的な例を整列することにより、ロバスト性をさらに向上することを提案する。他の正規化法と比較して、NMSEは絶対等級よりもロジットの相対等級に重点を置いている。いくつかのベンチマークデータセットと各種ネットワークに対する大規模な実験により,本手法はモデルの堅牢性を顕著に改善し,一般化誤差を低減できることが示された。さらに,我々のフレームワークは汎用的で,他の対向訓練手法と組み合わせることで,ロバストな精度をさらに高めることができる。

Adversarial training has been demonstrated to be the most effective approach to defend against adversarial attacks. However, existing adversarial training methods show apparent oscillations and overfitting issue in the training process, degrading the defense efficacy. In this work, we propose a novel framework, termed Parameter Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training. Specifically, at the end of each epoch, PIAT tunes the model parameters as the interpolation of the parameters of the previous and current epochs. Besides, we suggest to use the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the clean and adversarial examples. Compared with other regularization methods, NMSE focuses more on the relative magnitude of the logits rather than the absolute magnitude. Extensive experiments on several benchmark datasets and various networks show that our method could prominently improve the model robustness and reduce the generalization error. Moreover, our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.

翻訳日:2023-03-27 14:39:20 公開日:2023-03-24

# AssetField: 地平面表現におけるアセットマイニングと再構成

AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation ( http://arxiv.org/abs/2303.13953v1 )

ライセンス: Link先を確認

Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin

(参考訳) 屋内環境も屋外環境も本質的に構造的で反復的である。従来のモデリングパイプラインでは、ユニークなオブジェクトテンプレートを格納するアセットライブラリが維持されている。そこで本研究では,テンプレート特徴パッチを格納したアセットライブラリを教師なしで構築できる,シーンを表現するオブジェクト認識基底特徴平面のセットを学習するニューラルシーン表現であるアセットフィールドを提案する。オブジェクトの編集に空間点を問うためにオブジェクトマスクを必要とする既存の方法とは異なり、地上特徴平面表現は鳥眼ビューのシーンを自然に視覚化し、オブジェクト上の様々な操作(例えば、翻訳、複製、変形)で新しいシーンを構成することができる。テンプレート機能パッチにより、多数の繰り返しアイテムを持つシーンでグループ編集が有効になり、オブジェクト個人に対する反復的な作業が回避される。 AssetFieldは新規ビュー合成のための競争性能を達成するだけでなく、新しいシーン構成のためのリアルレンダリングを生成する。

Both indoor and outdoor environments are inherently structured and repetitive. Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice. Inspired by this observation, we propose AssetField, a novel neural scene representation that learns a set of object-aware ground feature planes to represent the scene, where an asset library storing template feature patches can be constructed in an unsupervised manner. Unlike existing methods which require object masks to query spatial points for object editing, our ground feature plane representation offers a natural visualization of the scene in the bird-eye view, allowing a variety of operations (e.g. translation, duplication, deformation) on objects to configure a new scene. With the template feature patches, group editing is enabled for scenes with many recurring items to avoid repetitive work on object individuals. We show that AssetField not only achieves competitive performance for novel-view synthesis but also generates realistic renderings for new scene configurations.

翻訳日:2023-03-27 14:39:03 公開日:2023-03-24

# CCL:LiDAR位置認識のための連続的コントラスト学習

CCL: Continual Contrastive Learning for LiDAR Place Recognition ( http://arxiv.org/abs/2303.13952v1 )

ライセンス: Link先を確認

Jiafeng Cui, Xieyuanli Chen

(参考訳) 位置認識は、ロボットや自動運転アプリケーションのためのループクローズとグローバルローカライズにおいて、必須かつ困難なタスクである。近年のディープラーニング技術の発展により,LiDAR位置認識(LPR)の性能は大幅に向上した。しかし、現在のディープラーニングベースの手法は、一般化能力の低さと破滅的な忘れることの2つの大きな問題に悩まされている。本稿では,大惨な忘れの問題に対処し,LPRアプローチの堅牢性を改善するために,CCLという連続的なコントラスト学習手法を提案する。我々のCCLは、コントラスト的特徴プールを構築し、コントラスト的損失を利用して、より移動可能な場所表現を訓練する。新たな環境に移行すると、CCLはコントラストメモリバンクを継続的にレビューし、新しいデータから新しい場所を認識することを継続的に学習しながら、過去のデータの検索能力を維持するために分布ベースの知識蒸留を適用します。我々は3つの異なるLPR手法を用いてオックスフォード、MulRan、PNVデータセットに対するアプローチを徹底的に評価した。実験の結果,我々のCCLは,異なる環境における異なる手法の性能を常に改善し,最先端の継続的学習法よりも優れていた。このメソッドの実装はhttps://github.com/cloudcjf/cclでリリースされた。

Place recognition is an essential and challenging task in loop closing and global localization for robotics and autonomous driving applications. Benefiting from the recent advances in deep learning techniques, the performance of LiDAR place recognition (LPR) has been greatly improved. However, current deep learning-based methods suffer from two major problems: poor generalization ability and catastrophic forgetting. In this paper, we propose a continual contrastive learning method, named CCL, to tackle the catastrophic forgetting problem and generally improve the robustness of LPR approaches. Our CCL constructs a contrastive feature pool and utilizes contrastive loss to train more transferable representations of places. When transferred into new environments, our CCL continuously reviews the contrastive memory bank and applies a distribution-based knowledge distillation to maintain the retrieval ability of the past data while continually learning to recognize new places from the new data. We thoroughly evaluate our approach on Oxford, MulRan, and PNV datasets using three different LPR methods. The experimental results show that our CCL consistently improves the performance of different methods in different environments outperforming the state-of-the-art continual learning method. The implementation of our method has been released at https://github.com/cloudcjf/CCL.

翻訳日:2023-03-27 14:38:43 公開日:2023-03-24

# ナレッジグラフ: 機会と課題

Knowledge Graphs: Opportunities and Challenges ( http://arxiv.org/abs/2303.13948v1 )

ライセンス: Link先を確認

Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, Francesco Osborne

(参考訳) 人工知能(AI)とビッグデータの爆発的成長により、膨大な量の知識を適切に整理し、表現することが極めて重要である。グラフデータとして、知識グラフは現実世界の知識を蓄積し伝達する。知識グラフは複雑な情報を効果的に表現していることがよく認識されており、近年は学術や産業の注目を集めている。そこで本稿では,知識グラフの理解を深めるために,この分野を体系的に概観する。具体的には、知識グラフの機会と課題に焦点を当てる。まず,(1)知識グラフに基づくaiシステム,(2)知識グラフの応用分野の可能性,という2つの側面から知識グラフの機会を考察する。そこで我々は,知識グラフの埋め込み,知識獲得,知識グラフの完成,知識融合,知識推論など,この分野の深刻な技術的課題を徹底的に議論する。この調査は、今後の研究と知識グラフの開発に新たな光を当てることを期待している。

With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs.

翻訳日:2023-03-27 14:38:24 公開日:2023-03-24

# 運用中の量子参照フレーム変換

Operational Quantum Reference Frame Transformations ( http://arxiv.org/abs/2303.14002v1 )

ライセンス: Link先を確認

Titouan Carette, Jan G{\l}owacki and Leon Loveridge

(参考訳) 量子参照フレームとその変換の汎用的、操作的、厳密な基礎を提供し、共変正演算子値測度を用いてフレームの可観測性を表現する。このフレームワークは局所コンパクトなグループを対象とし、フレーム変更に関する以前の提案と異なり、物理的に区別できない状態が識別される操作同値の概念を中心に構築されている。これにより、(不変)相対可観測体の空間と相対状態の凸集合を双対対象として構成することができる。フレームの性質を考慮に入れた相対状態について、より等価な関係を求めることにより、量子参照フレーム変更マップを提供する。この写像は、初期フレームと最終フレームが任意に局所化された状態を認めるとき、正確には可逆であることを示す。提案するフレーム変更を文献で利用可能な他の構成と比較し,共通適用領域における運用上の合意を見いだした。

We provide a general, operational, and rigorous basis for quantum reference frames and their transformations using covariant positive operator valued measures to represent frame observables. The framework holds for locally compact groups and differs from all prior proposals for frame changes, being built around the notion of operational equivalence, in which states that cannot be distinguished physically are identified. This allows for the construction of the space of (invariant) relative observables and the convex set of relative states as dual objects. By demanding a further equivalence relation on the relative states which takes into account the nature of the frames, we provide a quantum reference frame change map. We show that this map is invertible exactly when the initial and final frames admit states which are arbitrarily well localized. We compare the presented frame change with other constructions available in the literature, finding operational agreement on the domain of common applicability.

翻訳日:2023-03-27 14:31:20 公開日:2023-03-24

# 大規模都市景観のためのグリッド誘導型ニューラルラジアンスフィールド

Grid-guided Neural Radiance Fields for Large Urban Scenes ( http://arxiv.org/abs/2303.14001v1 )

ライセンス: Link先を確認

Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin

(参考訳) 純粋なMLPベースのニューラルラジアンスフィールド(NeRF法)は、モデル容量の制限により、大規模なシーンでぼやけたレンダリングで不適合に陥ることが多い。近年のアプローチでは、シーンを地理的に分割し、各領域を個別にモデル化するために複数のサブNeRFを採用することが提案されている。別の解決策は、計算効率が高く、グリッド解像度が向上した大きなシーンに自然にスケールできる機能グリッド表現を使用することである。しかし、機能グリッドは制約が少なく、しばしば最適以下のソリューションに到達し、特に複雑な幾何学とテクスチャの領域において、レンダリングにおいてノイズの多いアーティファクトを生成する。本研究では,大規模都市における高忠実度レンダリングを実現するための新しいフレームワークを提案する。我々は,コンパクトな多分解能地上特徴面表現を用いてシーンを粗くキャプチャし,別のNeRFブランチを介して位置符号化入力を補完し,共同学習方式でレンダリングすることを提案する。このような統合は、2つの代替ソリューションの利点を生かしうることを示す: 軽量のNeRFは、特徴格子表現の指導の下で、細部でフォトリアリスティックなノベルビューをレンダリングするのに十分であり、また、共同最適化された地上特徴面は、さらに洗練され、より正確でコンパクトな特徴空間を形成し、より自然なレンダリング結果を生成することができる。

Purely MLP-based neural radiance fields (NeRF-based methods) often suffer from underfitting with blurred renderings on large-scale scenes due to limited model capacity. Recent approaches propose to geographically divide the scene and adopt multiple sub-NeRFs to model each region individually, leading to linear scale-up in training costs and the number of sub-NeRFs as the scene expands. An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene with increased grid resolutions. However, the feature grid tends to be less constrained and often reaches suboptimal solutions, producing noisy artifacts in renderings, especially in regions with complex geometry and texture. In this work, we present a new framework that realizes high-fidelity rendering on large urban scenes while being computationally efficient. We propose to use a compact multiresolution ground feature plane representation to coarsely capture the scene, and complement it with positional encoding inputs through another NeRF branch for rendering in a joint learning fashion. We show that such an integration can utilize the advantages of two alternative solutions: a light-weighted NeRF is sufficient, under the guidance of the feature grid representation, to render photorealistic novel views with fine details; and the jointly optimized ground feature planes, can meanwhile gain further refinements, forming a more accurate and compact feature space and output much more natural rendering results.

翻訳日:2023-03-27 14:31:06 公開日:2023-03-24

# powerpruning: ニューラルネットワーク高速化のための重みとアクティベーションの選択

PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration ( http://arxiv.org/abs/2303.13997v1 )

ライセンス: Link先を確認

Richard Petri, Grace Li Zhang, Yiran Chen, Ulf Schlichtmann, Bing Li

(参考訳) ディープニューラルネットワーク(DNN)は様々な分野に適用されている。 DNNを特にエッジデバイスにデプロイする際の大きな課題は、多数の乗算および累積(MAC)操作のために消費電力である。この課題に対処するため,我々は,mac 操作の消費電力を減少させる重みを選択することで,デジタルニューラルネットワーク加速器の消費電力を削減する新しい手法であるpowerpruningを提案する。また、選択された重みと全ての活性化遷移のタイミング特性を評価する。より小さな遅延につながる重みと活性化がさらに選択される。これにより、MACユニットを変更することなくMACユニットの感度回路パスの最大遅延を低減し、サプライ電圧の柔軟なスケーリングを可能にし、電力消費をさらに削減できる。リトレーニングとともに、提案手法はハードウェア上でのdnnの消費電力を最大78.3%削減できるが、精度の低下は少ない。

Deep neural networks (DNNs) have been successfully applied in various fields. A major challenge of deploying DNNs, especially on edge devices, is power consumption, due to the large number of multiply-and-accumulate (MAC) operations. To address this challenge, we propose PowerPruning, a novel method to reduce power consumption in digital neural network accelerators by selecting weights that lead to less power consumption in MAC operations. In addition, the timing characteristics of the selected weights together with all activation transitions are evaluated. The weights and activations that lead to small delays are further selected. Consequently, the maximum delay of the sensitized circuit paths in the MAC units is reduced even without modifying MAC units, which thus allows a flexible scaling of supply voltage to reduce power consumption further. Together with retraining, the proposed method can reduce power consumption of DNNs on hardware by up to 78.3% with only a slight accuracy loss.

翻訳日:2023-03-27 14:30:38 公開日:2023-03-24

# line: 重要なニューロンを利用した分布外検出

LINe: Out-of-Distribution Detection by Leveraging Important Neurons ( http://arxiv.org/abs/2303.13995v1 )

ライセンス: Link先を確認

Yong Hyun Ahn, Gyeong-Moon Park, Seong Tae Kim

(参考訳) 特に自律運転や医療といったミッションクリティカルな分野において、アウト・オブ・ディストリビューション(OOD)データの障害予測が大きな問題を引き起こす可能性がある場合において、入力サンプルの不確実性を定量化することが重要である。 OOD検出問題は、モデルが認識していないことを表現できないという点で、基本的に始まります。ポストホックなood検出アプローチは、モデルのパフォーマンスを低下させ、トレーニングコストを増加させる追加の再トレーニングプロセスを必要としないため、広く検討されている。本研究では,高次特徴を表すモデルの深層におけるニューロンの観点から,分布内データとOODデータ間のモデル出力の差を解析するための新しい側面を紹介する。本稿では,分布検出のポストホックアウトのための新しい手法であるLINe( Leveraging Important Neurons)を提案する。 shapley値に基づくプルーニングは、入力データの特定のクラスを予測し、残りをマスキングするために高分配ニューロンのみを選択することで、ノイズ出力の効果を減少させる。アクティベーション・クリッピングは、あるしきい値以上のすべての値を同じ値に固定し、lineがクラス固有のすべての特徴を等しく扱うことを可能にし、インディストリビューションとoodデータの間の活性化された特徴の差の数を単に考慮するだけでよい。 CIFAR-10, CIFAR-100, ImageNetデータセット上で, 最先端のOOD検出手法よりも高い性能で提案手法の有効性を検証する。

It is important to quantify the uncertainty of input samples, especially in mission-critical domains such as autonomous driving and healthcare, where failure predictions on out-of-distribution (OOD) data are likely to cause big problems. OOD detection problem fundamentally begins in that the model cannot express what it is not aware of. Post-hoc OOD detection approaches are widely explored because they do not require an additional re-training process which might degrade the model's performance and increase the training cost. In this study, from the perspective of neurons in the deep layer of the model representing high-level features, we introduce a new aspect for analyzing the difference in model outputs between in-distribution data and OOD data. We propose a novel method, Leveraging Important Neurons (LINe), for post-hoc Out of distribution detection. Shapley value-based pruning reduces the effects of noisy outputs by selecting only high-contribution neurons for predicting specific classes of input data and masking the rest. Activation clipping fixes all values above a certain threshold into the same value, allowing LINe to treat all the class-specific features equally and just consider the difference between the number of activated feature differences between in-distribution and OOD data. Comprehensive experiments verify the effectiveness of the proposed method by outperforming state-of-the-art post-hoc OOD detection methods on CIFAR-10, CIFAR-100, and ImageNet datasets.

翻訳日:2023-03-27 14:30:23 公開日:2023-03-24

# 到達性解析を用いた自律走行車の物理的バックドアトリガー起動

Physical Backdoor Trigger Activation of Autonomous Vehicle using Reachability Analysis ( http://arxiv.org/abs/2303.13992v1 )

ライセンス: Link先を確認

Wending Li, Yum Wang, Muhammad Shafique, Saif Eddin Jabari

(参考訳) 近年の研究では、自律走行車(AV)は隠れたバックドアで操作でき、物理的トリガーによって起動されると有害な行動を起こすことが示されている。しかし、これらのトリガーが交通原則に固執しながらどのように活性化されるのかはまだ不明である。動的なトラフィック環境でこの脆弱性を理解することは重要です。この研究は、制御された動的システムの到達可能性問題として物理的トリガの活性化を提示することで、このギャップに対処する。本手法は,事故の引き金条件に到達可能な交通システムにおけるセキュリティクリティカル領域を特定し,その状況に到達するための軌道を提供する。典型的なトラフィックシナリオをテストすると、システムは100%に近いアクティベーション率の条件をトリガーすることに成功した。本手法は,av脆弱性を識別し,効果的な安全性戦略を実現することに有用である。

Recent studies reveal that Autonomous Vehicles (AVs) can be manipulated by hidden backdoors, causing them to perform harmful actions when activated by physical triggers. However, it is still unclear how these triggers can be activated while adhering to traffic principles. Understanding this vulnerability in a dynamic traffic environment is crucial. This work addresses this gap by presenting physical trigger activation as a reachability problem of controlled dynamic system. Our technique identifies security-critical areas in traffic systems where trigger conditions for accidents can be reached, and provides intended trajectories for how those conditions can be reached. Testing on typical traffic scenarios showed the system can be successfully driven to trigger conditions with near 100% activation rate. Our method benefits from identifying AV vulnerability and enabling effective safety strategies.

翻訳日:2023-03-27 14:29:58 公開日:2023-03-24

# パラフレーズ検出:人間と機械のコンテンツ

Paraphrase Detection: Human vs. Machine Content ( http://arxiv.org/abs/2303.13989v1 )

ライセンス: Link先を確認

Jonas Becker and Jan Philip Wahle and Terry Ruas and Bela Gipp

(参考訳) GPT-4やChatGPTといった大規模言語モデルの普及は、機械生成コンテンツやパラフレーズ化の可能性により、学術的整合性に対する懸念が高まっている。人間と機械によるパラフロードコンテンツの検出についての研究は行われてきたが、これらのタイプのコンテンツの比較は未調査のままである。本稿では,パラファーゼ検出タスクに一般的に使用される各種データセットの包括的解析を行い,検出手法の配列を評価する。この結果から,個々のデータセットのパフォーマンスの観点から異なる検出手法の長所と短所を強調し,人間の期待に合致する適切なマシン生成データセットの欠如を明らかにした。我々の主な発見は、人間が書いたパラフレーズが機械で作成したものを超え、難易度、多様性、類似性は、自動生成されたテキストが人間レベルのパフォーマンスとまだ一致していないことを示唆している。トランスフォーマーは、意味的に多様なコーパスに優れたTF-IDFを持つデータセット間で最も効果的な方法として登場した。さらに、4つのデータセットをパラフレーズ検出の最も多様で困難なものとして特定した。

The growing prominence of large language models, such as GPT-4 and ChatGPT, has led to increased concerns over academic integrity due to the potential for machine-generated content and paraphrasing. Although studies have explored the detection of human- and machine-paraphrased content, the comparison between these types of content remains underexplored. In this paper, we conduct a comprehensive analysis of various datasets commonly employed for paraphrase detection tasks and evaluate an array of detection methods. Our findings highlight the strengths and limitations of different detection methods in terms of performance on individual datasets, revealing a lack of suitable machine-generated datasets that can be aligned with human expectations. Our main finding is that human-authored paraphrases exceed machine-generated ones in terms of difficulty, diversity, and similarity implying that automatically generated texts are not yet on par with human-level performance. Transformers emerged as the most effective method across datasets with TF-IDF excelling on semantically diverse corpora. Additionally, we identify four datasets as the most diverse and challenging for paraphrase detection.

翻訳日:2023-03-27 14:29:43 公開日:2023-03-24

# 機械心理学:心理学的手法を用いた大規模言語モデルにおける創発的能力と行動の調査

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods ( http://arxiv.org/abs/2303.13988v1 )

ライセンス: Link先を確認

Thilo Hagendorff

(参考訳) 大規模言語モデル(LLM)は、現在、人間のコミュニケーションと日常の生活を結び付けるAIシステムの最前線にある。急速な技術進歩と極端な汎用性により、LLMは今や数百万人のユーザを抱えており、情報検索、コンテンツ生成、問題解決などの主要なゴート技術になりつつある。そのため、その能力を徹底的に評価し、精査することが重要である。現在のllmでは、ますます複雑で新しい行動パターンがみられるため、もともと人間をテストするために設計された心理学実験の参加者として扱うことができる。そこで本研究では,「機械心理学」と呼ばれる新しい研究分野を紹介する。この論文は、心理学の異なるサブフィールドがLLMの行動テストにどのように影響するかを概説する。機械心理学研究の方法論的基準を定義しており、特にプロンプトデザインのポリシーに焦点を当てている。さらに、LLMで発見された行動パターンがどのように解釈されるかを記述する。要約すると、機械心理学は従来の自然言語処理ベンチマークでは検出できないLLMの創発的能力を発見することを目的としている。

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Due to rapid technological advances and their extreme versatility, LLMs nowadays have millions of users and are at the cusp of being the main go-to technology for information retrieval, content generation, problem-solving, etc. Therefore, it is of great importance to thoroughly assess and scrutinize their capabilities. Due to increasingly complex and novel behavioral patterns in current LLMs, this can be done by treating them as participants in psychology experiments that were originally designed to test humans. For this purpose, the paper introduces a new field of research called "machine psychology". The paper outlines how different subfields of psychology can inform behavioral tests for LLMs. It defines methodological standards for machine psychology research, especially by focusing on policies for prompt designs. Additionally, it describes how behavioral patterns discovered in LLMs are to be interpreted. In sum, machine psychology aims to discover emergent abilities in LLMs that cannot be detected by most traditional natural language processing benchmarks.

翻訳日:2023-03-27 14:29:24 公開日:2023-03-24

# 階層的模倣学習による都市走行の解釈可能な運動プランナ

Interpretable Motion Planner for Urban Driving via Hierarchical Imitation Learning ( http://arxiv.org/abs/2303.13986v1 )

ライセンス: Link先を確認

Bikun Wang, Zhipeng Wang, Chenhao Zhu, Zhiqiang Zhang, Zhichen Wang, Penghong Lin, Jingchu Liu and Qian Zhang

(参考訳) 学習ベースのアプローチは、自律運転において目覚ましいパフォーマンスを達成し、意思決定と計画モジュールでデータ駆動の作業が研究されている。しかし、ニューラルネットワークの信頼性と安定性は依然として課題に満ちている。本稿では,個々のデータ駆動型運転方針だけでなく,ルールベースのアーキテクチャにも容易に組み込むことが可能な,高レベルグリッドベースの行動プランナーと低レベル軌道プランナーを含む階層的模倣手法を提案する。本手法をクローズドループシミュレーションと実世界走行の両方で評価し,複雑な都市自律運転シナリオにおいて,ニューラルネットワークプランナが優れた性能を示した。

Learning-based approaches have achieved impressive performance for autonomous driving and an increasing number of data-driven works are being studied in the decision-making and planning module. However, the reliability and the stability of the neural network is still full of challenges. In this paper, we introduce a hierarchical imitation method including a high-level grid-based behavior planner and a low-level trajectory planner, which is not only an individual data-driven driving policy and can also be easily embedded into the rule-based architecture. We evaluate our method both in closed-loop simulation and real world driving, and demonstrate the neural network planner has outstanding performance in complex urban autonomous driving scenarios.

翻訳日:2023-03-27 14:29:08 公開日:2023-03-24

# 知識蒸留を用いた低メモリデバイス用混合型ウェハ分類

Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation ( http://arxiv.org/abs/2303.13974v1 )

ライセンス: Link先を確認

Nitish Shukla, Anurima Dey, Srivatsan K

(参考訳) ウェハーの製造は何千ものステップを伴う複雑な作業です。ウェハマップの欠陥パターン認識(DPR)は生産欠陥の根本原因決定に不可欠であり、ウェハファウントリーの収量改善の洞察を与える可能性がある。製造中、様々な欠陥がウエハに単独で現れるか、異なる組み合わせとして現れる。ウエハ内の複数の欠陥を特定することは、単一の欠陥を特定するよりも一般的に難しい。近年,混合型DPRの深層学習手法が注目されている。しかし、欠陥の複雑さは複雑で大きなモデルを必要とするため、製造ラボで一般的に使用される低メモリの組み込みデバイスで運用するのが非常に困難である。もうひとつの一般的な問題は、複雑なネットワークをトレーニングするためのラベル付きデータの可用性の欠如である。本研究では,複雑な事前学習モデルの知識を軽量なデプロイメント対応モデルに割くための教師なしトレーニングルーチンを提案する。教師モデルよりも最大10倍小さくても, 精度を犠牲にすることなく, モデルを圧縮できることを実証的に示す。圧縮されたモデルは、現代の最先端モデルよりも優れている。

Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models.

翻訳日:2023-03-27 14:28:57 公開日:2023-03-24

# 深層学習におけるエネルギー効率の高い実践の解明--グリーンaiに向けた予備的ステップ

Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI ( http://arxiv.org/abs/2303.13972v1 )

ライセンス: Link先を確認

Tim Yarally, Lu\'is Cruz, Daniel Feitosa, June Sallou, Arie van Deursen

(参考訳) 現代のAIプラクティスは、すべて同じ目標(より良い結果)に向かっています。ディープラーニングの文脈では、"results"という用語は、しばしば競合問題集合における達成された正確さを指す。本稿では,グリーンAIの新興分野からのアイデアを,エネルギー消費を精度に等しい重要性の指標として捉え,無関係なタスクやエネルギー使用量を減らすために採用する。本研究では,パイプライン全体のエネルギー消費に大きな影響を与える2つの要因であるハイパーパラメータチューニング戦略とモデル複雑性の研究を通じて,持続可能性の観点からディープラーニングパイプラインのトレーニング段階について検討する。まず,ハイパーパラメータチューニングにおけるグリッド探索,ランダム探索,ベイズ最適化の有効性について検討し,ベイズ最適化が他の戦略を大きく支配することを示す。さらに,畳み込みニューラルネットワークのアーキテクチャを,畳み込み層,線形層,relu層という3つの著名な層をエネルギー消費として分析した。その結果,畳み込み層は高いマージンで計算コストが最も高いことがわかった。さらに,エネルギー空調モデルに対する精度の低下を観察する。トレーニングの全体的なエネルギー消費量は、ネットワークの複雑さを減らすことで半減できる。結論として,ディープラーニングモデルのトレーニングにおける革新的かつ有望なエネルギ効率のプラクティスを強調する。グリーンAIの適用を拡大するために,エネルギー効率と精度のトレードオフを考慮し,ディープラーニングモデルの設計の転換を提唱する。

Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of Green AI to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline's energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.

翻訳日:2023-03-27 14:28:39 公開日:2023-03-24

# スターク効果下における量子相関の保存と強化

Preservation and enhancement of quantum correlations under Stark effect ( http://arxiv.org/abs/2303.14030v1 )

ライセンス: Link先を確認

Nitish Kumar Chandra, Rajiuddin Sk and Prasanta K. Panigrahi

(参考訳) 本研究では,バーズ距離エンタングルメント,トレース距離不協和,および2つの二層原子の局所量子不確かさを正確に表現することにより,量子相関のダイナミクスを解析する。ここでは、2光子遷移が中間的な仮想状態を通して媒介され、各原子はスタークシフト効果の存在下で0温度で散逸性貯水池に分離結合される。我々は,この原子系の動力学を環境の2つの異なる初期条件について検討した。第一のケースでは、環境の状態が基底状態であると仮定し、他方のケースでは、状態が第一励起状態であると仮定した。第2の初期条件は、第1の初期条件におけるスタークシフトパラメータの1つとは対照的に、両方のスタークシフトパラメータが果たす役割を示すため重要である。その結果, マルコフ貯水池と非マルコフ貯水池のいずれにおいても, スタークシフト効果の存在下では, 量子相関は長期間持続できることがわかった。非マルコフ貯水池における効果は、スタークシフトパラメータの非常に小さな値であっても、マルコフ貯水池よりも顕著である。相関測度のうち、局所的な量子不確実性のみが突然の変化現象、すなわち相関測度の崩壊速度の急激な変化を伴うことが観察された。量子相関の保存は,量子情報処理における最適性能を達成する上で不可欠である。

We analyze the dynamics of quantum correlations by obtaining the exact expression of Bures distance entanglement, trace distance discord, and local quantum uncertainty of two two-level atoms. Here, the atoms undergo two-photon transitions mediated through an intermediate virtual state where each atom is separately coupled to a dissipative reservoir at zero temperature in the presence of the Stark shift effect. We have investigated the dynamics of this atomic system for two different initial conditions of the environment. In the first case, we have assumed the environment's state to be in ground state and in the other case, we have assumed the state to be in first excited state. The second initial condition is significant as it shows the role played by both the Stark shift parameters in contrast to only one of the Stark shift parameters for the first initial condition. Our results demonstrate that quantum correlations can be sustained for an extended period in the presence of Stark shift effect in the case of both Markovian and non-Markovian reservoirs. The effect in the non-Markovian reservoir is more prominent than the Markovian reservoir, even for a very small value of the Stark shift parameter. We observe that among the correlation measures considered, only local quantum uncertainty is accompanied by a sudden change phenomenon, i.e., an abrupt change in the decay rate of a correlation measure. Our findings are significant as preserving quantum correlations is one of the essential aspects in attaining optimum performance in quantum information tasks.

翻訳日:2023-03-27 14:22:22 公開日:2023-03-24

# PENTACET データ -- 2300万のコンテキストコードコメントと50万のSATDコメント

PENTACET data -- 23 Million Contextual Code Comments and 500,000 SATD comments ( http://arxiv.org/abs/2303.14029v1 )

ライセンス: Link先を確認

Murali Sridharan, Leevi Rantala, Mika M\"antyl\"a

(参考訳) 多くのSATD研究は、SATD検出に「TODO」や「FIXME」のような明示的なSATD特徴を利用している。より詳しく見てみると、SATDの研究は、文脈データ(ソースコードコンテキストの先行と継承)なしで、単純なSATD('Easy to Find')コードコメントを使用する。この作業はpentacet(または5cデータセット)データを通じてこのギャップに対処する。 PENTACETは、コントリビュータ毎のCurated Contextual Code Commentsと、最も広範なSATDデータである。 9,096のオープンソースソフトウェアJavaプロジェクトと合計4億3500万LOCをマイニングしています。結果は、各コメントのソースコードコンテキストに先行して続く2300万のコードコメントと、"Easy to Find"と"Hard to Find"のSATDを含む50,000以上のコメントからなるデータセットである。我々は、PENTACETデータが人工知能技術を用いてSATDの研究をさらに進めると考えている。

Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD features such as 'TODO' and 'FIXME' for SATD detection. A closer look reveals several SATD research uses simple SATD ('Easy to Find') code comments without the contextual data (preceding and succeeding source code context). This work addresses this gap through PENTACET (or 5C dataset) data. PENTACET is a large Curated Contextual Code Comments per Contributor and the most extensive SATD data. We mine 9,096 Open Source Software Java projects with a total of 435 million LOC. The outcome is a dataset with 23 million code comments, preceding and succeeding source code context for each comment, and more than 500,000 comments labeled as SATD, including both 'Easy to Find' and 'Hard to Find' SATD. We believe PENTACET data will further SATD research using Artificial Intelligence techniques.

翻訳日:2023-03-27 14:22:00 公開日:2023-03-24

# Poincar\'e ResNet

Poincar\'e ResNet ( http://arxiv.org/abs/2303.14027v1 )

ライセンス: Link先を確認

Max van Spengler, Erwin Berkhout, Pascal Mettes

(参考訳) 本稿では,双曲空間のPoincar\'e球モデルで完全に動作するエンドツーエンド残差ネットワークを提案する。双曲学習は近年、視覚的理解に大きな可能性を示しているが、現在はディープネットワークの最後尾層でのみ実施されている。すべての視覚的表現は、標準ユークリッドネットワークを通じて学習される。本稿では,視覚データの双曲表現をピクセルレベルから直接学習する方法を検討する。我々は,Poincar\'e 2Dコンボリューションから,Poincar\'e残コネクションまで,有名な残留ネットワークの双曲的対向であるPoincar\'e ResNetを提案する。畳み込みネットワークを完全に双曲空間で訓練するための3つの障害を特定し,それぞれに解を提案する。 (i)現在の双曲的ネットワークの初期化は原点に崩壊し、より深いネットワークでの適用性が制限される。多くの層にまたがって標準を保存するアイデンティティベースの初期化を提供する。 (II)残差ネットワークは高額なFr'echet平均計算を双曲空間で行うバッチ正規化に大きく依存する。 poincar\'e 中間点バッチ正規化を高速かつ均等に有効な代替として導入する。 3) Poincar\'e 層における多くの中間処理により,ディープラーニングライブラリの計算グラフが爆発的に爆発し,深層双曲ネットワークのトレーニング能力が制限されることがわかった。我々は、管理可能な計算グラフを維持するために、コア双曲演算を手動で逆向きに導出する。

This paper introduces an end-to-end residual network that operates entirely on the Poincar\'e ball model of hyperbolic space. Hyperbolic learning has recently shown great potential for visual understanding, but is currently only performed in the penultimate layer(s) of deep networks. All visual representations are still learned through standard Euclidean networks. In this paper we investigate how to learn hyperbolic representations of visual data directly from the pixel-level. We propose Poincar\'e ResNet, a hyperbolic counterpart of the celebrated residual network, starting from Poincar\'e 2D convolutions up to Poincar\'e residual connections. We identify three roadblocks for training convolutional networks entirely in hyperbolic space and propose a solution for each: (i) Current hyperbolic network initializations collapse to the origin, limiting their applicability in deeper networks. We provide an identity-based initialization that preserves norms over many layers. (ii) Residual networks rely heavily on batch normalization, which comes with expensive Fr\'echet mean calculations in hyperbolic space. We introduce Poincar\'e midpoint batch normalization as a faster and equally effective alternative. (iii) Due to the many intermediate operations in Poincar\'e layers, we lastly find that the computation graphs of deep learning libraries blow up, limiting our ability to train on deep hyperbolic networks. We provide manual backward derivations of core hyperbolic operations to maintain manageable computation graphs.

翻訳日:2023-03-27 14:21:44 公開日:2023-03-24

# CF-Font:Few-shot Font生成のためのコンテンツ融合

CF-Font: Content Fusion for Few-shot Font Generation ( http://arxiv.org/abs/2303.14017v1 )

ライセンス: Link先を確認

Chi Wang, Min Zhou, Tiezheng Ge, Yuning Jiang, Hujun Bao, Weiwei Xu

(参考訳) コンテンツとスタイルの切り離しは、少数ショットフォント生成を実現する効果的な方法である。ソースドメイン内のフォントイメージのスタイルを、ターゲットドメイン内のいくつかの参照イメージで定義されたスタイルに転送することができる。しかし、代表フォントで抽出されたコンテンツ機能は最適ではないかもしれない。そこで本研究では,基本フォントのコンテンツ特徴によって定義された線形空間にコンテンツ特徴を投影するコンテンツ融合モジュール(cfm)を提案する。また,isr(lightweightly style-vectorfinement)戦略により,参照画像のスタイル表現ベクトルを最適化する手法を提案する。さらに、文字画像の1次元投影を確率分布として扱い、2つの分布間の距離を再構成損失(すなわち投影文字損失、pcl)として利用する。 L2またはL1再構成損失と比較して、分布距離は文字のグローバルな形状により多くの注意を払う。我々は,6.5k文字の300フォントのデータセットを用いて評価を行った。実験結果から,本手法が既存の最先端フォント生成手法を大差で上回ることを確認した。ソースコードはhttps://github.com/wangchi95/CF-Font.orgにある。

Content and style disentanglement is an effective way to achieve few-shot font generation. It allows to transfer the style of the font image in a source domain to the style defined with a few reference images in a target domain. However, the content feature extracted using a representative font might not be optimal. In light of this, we propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts, which can take the variation of content features caused by different fonts into consideration. Our method also allows to optimize the style representation vector of reference images through a lightweight iterative style-vector refinement (ISR) strategy. Moreover, we treat the 1D projection of a character image as a probability distribution and leverage the distance between two distributions as the reconstruction loss (namely projected character loss, PCL). Compared to L2 or L1 reconstruction loss, the distribution distance pays more attention to the global shape of characters. We have evaluated our method on a dataset of 300 fonts with 6.5k characters each. Experimental results verify that our method outperforms existing state-of-the-art few-shot font generation methods by a large margin. The source code can be found at https://github.com/wangchi95/CF-Font.

翻訳日:2023-03-27 14:21:19 公開日:2023-03-24

# SPEC:低リソース抽象要約のための概要選好分解

SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization ( http://arxiv.org/abs/2303.14011v1 )

ライセンス: Link先を確認

Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai

(参考訳) 神経抽象的要約は広く研究され、大規模コーパスで大きな成功を収めた。しかし、データアノテーションのかなりのコストは、低リソース環境下での学習戦略の必要性を動機付けている。本稿では,ごく少数の例で要約学習の問題点を考察し,改善のための対応手法を提案する。第一に、典型的な転送学習手法は、プリテキストタスクにおけるデータ特性や学習目標に影響を受けやすい。そこで,事前訓練された言語モデルに基づいて,ソースコーパスからターゲットコーパスへ数発の学習プロセスを伝達するメタラーニングフレームワークを提案する。第2に、事前の方法は、内容や好みを分解することなく、トレーニング例から学習する。したがって、生成された要約はトレーニングセットの選好バイアス、特に低リソース設定によって制約される可能性がある。そこで本研究では,パラメータ変調を通じて学習中の内容と嗜好を分解し,推論中の嗜好を制御できるようにする。第三に、対象のアプリケーションがある場合、要求される選好を特定することは、観察を通して選好を引き出すのが難しいため、自明ではない可能性がある。そこで本研究では,適切な好みを自動的に推定し,少数の学習例から対応する要約候補を生成する新しい復号法を提案する。実験により, ROUGE-1/2/Lを10時間, 100時間で平均改善した6種類のコーパスにおいて, 30.11%/33.95%/27.51%, 26.74%/31.14%/24.48%の精度向上が得られた。

Neural abstractive summarization has been widely studied and achieved great success with large-scale corpora. However, the considerable cost of annotating data motivates the need for learning strategies under low-resource settings. In this paper, we investigate the problems of learning summarizers with only few examples and propose corresponding methods for improvements. First, typical transfer learning methods are prone to be affected by data properties and learning objectives in the pretext tasks. Therefore, based on pretrained language models, we further present a meta learning framework to transfer few-shot learning processes from source corpora to the target corpus. Second, previous methods learn from training examples without decomposing the content and preference. The generated summaries could therefore be constrained by the preference bias in the training set, especially under low-resource settings. As such, we propose decomposing the contents and preferences during learning through the parameter modulation, which enables control over preferences during inference. Third, given a target application, specifying required preferences could be non-trivial because the preferences may be difficult to derive through observations. Therefore, we propose a novel decoding method to automatically estimate suitable preferences and generate corresponding summary candidates from the few training examples. Extensive experiments demonstrate that our methods achieve state-of-the-art performance on six diverse corpora with 30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on ROUGE-1/2/L under 10- and 100-example settings.

翻訳日:2023-03-27 14:21:04 公開日:2023-03-24

# 低温ハイブリッド無線/量子コヒーレントネットワーク・イン・パッケージによるスケーラブルマルチチップ量子アーキテクチャ

Scalable multi-chip quantum architectures enabled by cryogenic hybrid wireless/quantum-coherent network-in-package ( http://arxiv.org/abs/2303.14008v1 )

ライセンス: Link先を確認

Eduard Alarc\'on, Sergi Abadal, Fabio Sebastiano, Massoud Babaie, Eduardo Charbon, Peter Haring Bol\'ivar, Maurizio Palesi, Elena Blokhina, Dirk Leipold, Bogdan Staszewski, Artur Garcia-S\'aez, Carmen G. Almudever

(参考訳) 量子コンピュータのスケールアップという大きな課題は、フルスタックアーキテクチャの観点を必要とする。本稿では,分散量子コア(Qcore)を量子コヒーレントな量子ビット状態伝達リンクで相互接続し,統合された無線接続でオーケストレーションする,次世代のスケーラブル量子コンピューティングアーキテクチャの展望を示す。

The grand challenge of scaling up quantum computers requires a full-stack architectural standpoint. In this position paper, we will present the vision of a new generation of scalable quantum computing architectures featuring distributed quantum cores (Qcores) interconnected via quantum-coherent qubit state transfer links and orchestrated via an integrated wireless interconnect.

翻訳日:2023-03-27 14:20:42 公開日:2023-03-24

# チーム・イン・ザ・ループ」によるハイリスクAIの組織的監視

'Team-in-the-loop' organisational oversight of high-stakes AI ( http://arxiv.org/abs/2303.14007v1 )

ライセンス: Link先を確認

Deborah Morgan, Youmna Hashem, Vincent J. Straub, Jonathan Bright

(参考訳) 監視は、意思決定が個人的および集団的な影響をもたらす高リスクの公共部門aiアプリケーションにおいて不可欠であると正しく認識されている。公共部門におけるaiの監視メカニズムの形式に関する最近の多くの考え方は、人間の意思決定者が「ループ内」であり、エラーや潜在的な危害を防ぐために介入できるという考えに起因している。しかし、多くの公共セクターの文脈では、個人ではなく専門家チームによって意思決定の運用上の監督が行われる。デプロイされたaiシステムを既存の運用チームの監視プロセスに統合する方法は、まだ大きな注目を集めていない。我々は、制度分析を通じて、臨床意思決定の事前監視に対するAIの影響を探ることで、このギャップに対処する。既存の監視は専門家のトレーニング要件に埋もれており、重要な情報を引き出すための説明と質問に大きく依存しています。専門的な身体と責任のメカニズムは、監視のさらなるレバーとしても機能する。これらの監視の次元は、AIシステムによって影響を受け、再構成される可能性がある。そこで我々は,公共部門展開におけるai導入に必要なシステムレベルの分析を概念化するために,より広い範囲の「チーム・イン・ザ・ループ」のレンズを提案する。

Oversight is rightly recognised as vital within high-stakes public sector AI applications, where decisions can have profound individual and collective impacts. Much current thinking regarding forms of oversight mechanisms for AI within the public sector revolves around the idea of human decision makers being 'in-the-loop' and thus being able to intervene to prevent errors and potential harm. However, in a number of high-stakes public sector contexts, operational oversight of decisions is made by expert teams rather than individuals. The ways in which deployed AI systems can be integrated into these existing operational team oversight processes has yet to attract much attention. We address this gap by exploring the impacts of AI upon pre-existing oversight of clinical decision-making through institutional analysis. We find that existing oversight is nested within professional training requirements and relies heavily upon explanation and questioning to elicit vital information. Professional bodies and liability mechanisms also act as additional levers of oversight. These dimensions of oversight are impacted, and potentially reconfigured, by AI systems. We therefore suggest a broader lens of 'team-in-the-loop' to conceptualise the system-level analysis required for adoption of AI within high-stakes public sector deployment.

翻訳日:2023-03-27 14:20:34 公開日:2023-03-24

# ASTRA-sim2.0:大規模モデル学習のための階層型ネットワークと分散システムの構築

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale ( http://arxiv.org/abs/2303.14006v1 )

ライセンス: Link先を確認

William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

(参考訳) ディープラーニングモデルと入力データが前例のない速度でスケールしているため、モデルを適合させ、トレーニングスループットを向上させるために、分散トレーニングプラットフォームに移行することは避けられない。ウエハスケールノード、多次元ネットワークトポロジ、分散メモリシステム、並列化戦略といった最先端のアプローチと技術は、新興の分散トレーニングシステムに積極的に採用されている。これにより、分散トレーニングの複雑なSW/HW共同設計スタックが実現され、設計空間探索のためのモデリング/シミュレーションインフラストラクチャが必要とされる。本稿では,オープンソースのASTRA-simインフラストラクチャを拡張し,最先端の分散トレーニングモデルとプラットフォームをモデル化する機能を備える。具体的には i)ASTRA-simがグラフベースのトレーニングループ実装を通じて任意のモデル並列化戦略をサポートできるようにする。 (ii)対象システムを大規模にシミュレーション可能な解析性能推定を伴うパラメータ可能な多次元不均質トポロジー生成基盤を実装した。 (iii)ネットワーク内集団通信と分散メモリシステムの正確なモデリングを支援するために,メモリシステムのモデリングを強化する。このような機能により、新興の分散モデルとプラットフォームをターゲットにした包括的なケーススタディを実行します。このインフラストラクチャは、システム設計者が複雑な共同設計スタックを素早く横断し、分散トレーニングプラットフォームを大規模に設計およびデプロイする際に意味のある洞察を与える。

As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emerging distributed training systems. This results in a complex SW/HW co-design stack of distributed training, necessitating a modeling/simulation infrastructure for design-space exploration. In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms. More specifically, (i) we enable ASTRA-sim to support arbitrary model parallelization strategies via a graph-based training-loop implementation, (ii) we implement a parameterizable multi-dimensional heterogeneous topology generation infrastructure with analytical performance estimates enabling simulating target systems at scale, and (iii) we enhance the memory system modeling to support accurate modeling of in-network collective communication and disaggregated memory systems. With such capabilities, we run comprehensive case studies targeting emerging distributed models and platforms. This infrastructure lets system designers swiftly traverse the complex co-design stack and give meaningful insights when designing and deploying distributed training platforms at scale.

翻訳日:2023-03-27 14:20:13 公開日:2023-03-24

# 人間とオブジェクトのインタラクション分類のためのカテゴリクエリ学習

Category Query Learning for Human-Object Interaction Classification ( http://arxiv.org/abs/2303.14005v1 )

ライセンス: Link先を確認

Chi Xie, Fangao Zeng, Yue Hu, Shuang Liang and Yichen Wei

(参考訳) 従来のhoi法と異なり,人間-対象の特徴の学習に焦点を合わせ,カテゴリ問合せ学習と呼ばれる新しい補完的アプローチを提案する。このようなクエリは、インタラクションカテゴリに明示的に関連付けられ、トランスフォーマーデコーダを介してイメージ固有のカテゴリ表現に変換され、補助的なイメージレベルの分類タスクを介して学習される。このアイデアは、初期のマルチラベル画像分類法に動機づけられているが、挑戦的な人間と対象のインタラクション分類タスクに初めて適用される。私たちの方法は単純で汎用的で効果的です。 3つのHOIベースラインで検証され、2つのベンチマークで新たな最先端結果が得られる。

Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted to image specific category representation via a transformer decoder, and learnt via an auxiliary image-level classification task. This idea is motivated by an earlier multi-label image classification method, but is for the first time applied for the challenging human-object interaction classification task. Our method is simple, general and effective. It is validated on three representative HOI baselines and achieves new state-of-the-art results on two benchmarks.

翻訳日:2023-03-27 14:19:49 公開日:2023-03-24

# 顔形態形成攻撃の脆弱性 : 外観的・同一性双生児を事例として

Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins ( http://arxiv.org/abs/2303.14004v1 )

ライセンス: Link先を確認

Raghavendra Ramachandra, Sushma Venkatesh, Gaurav Jaswal, Guoqiang Li

(参考訳) 顔の変形攻撃は、特に自動境界制御シナリオにおいて潜在的な脅威として浮上している。モーフィング攻撃により、複数の個人が自動境界制御ゲートを使用して国境を越えるのに使用できる旅行文書を使用できる。モーフィング攻撃の可能性は、データ主体(共犯者および悪意あるアクター)の選択に依存する。本研究は、顔形態形成の発生源として、見かけ上の類似性および同一の双生児について検討する。本稿では,顔認識システム(frs)の脆弱性を,見た目と同一の双晶画像にベンチマークする体系的な研究を行う。そこで我々は16対の同一の双子と見た目の類似データを用いた顔形態データセットを構築した。ランドマークベースの手法により、見た目と同一の双生児からの画像が生成される。外観と同一の双子の攻撃ポテンシャルをベンチマークするために、広範囲な実験が行われた。さらに、通常の顔形態変化による脆弱性の影響に関する洞察を、見た目や同一の顔形態変化と比較して提供する実験も行われている。

Face morphing attacks have emerged as a potential threat, particularly in automatic border control scenarios. Morphing attacks permit more than one individual to use travel documents that can be used to cross borders using automatic border control gates. The potential for morphing attacks depends on the selection of data subjects (accomplice and malicious actors). This work investigates lookalike and identical twins as the source of face morphing generation. We present a systematic study on benchmarking the vulnerability of Face Recognition Systems (FRS) to lookalike and identical twin morphing images. Therefore, we constructed new face morphing datasets using 16 pairs of identical twin and lookalike data subjects. Morphing images from lookalike and identical twins are generated using a landmark-based method. Extensive experiments are carried out to benchmark the attack potential of lookalike and identical twins. Furthermore, experiments are designed to provide insights into the impact of vulnerability with normal face morphing compared with lookalike and identical twin face morphing.

翻訳日:2023-03-27 14:19:39 公開日:2023-03-24

# dance the quantum waltz: 3量子ビットゲートを4レベルアーキテクチャにコンパイルする

Dancing the Quantum Waltz: Compiling Three-Qubit Gates on Four Level Architectures ( http://arxiv.org/abs/2303.14069v1 )

ライセンス: Link先を確認

Andrew Litteken (1), Lennart Maximilian Siefert (1), Jason D. Chadwick (1), Natalia Nottingham (1), Tanay Roy (1 and 2), Ziqian Li (1 and 3), David Schuster (1 and 3), Frederic T. Chong (1), Jonathan M. Baker (4) ((1) University of Chicago, (2) Fermilab, (3) Stanford University, (4) Duke University)

(参考訳) 超伝導量子デバイスは量子計算の最先端技術であるが、いくつかの課題を抱えている。ゲートエラー、コヒーレンスエラー、接続性の欠如はいずれも、信頼性の低い結果に寄与する。特に接続制限は、3量子ゲートを1または2量子ゲートに分解する必要があるゲートセットを強制する。これにより、実行すべき2ビットゲートの数を大幅に増加させる。しかし、多くの量子デバイスはより高いエネルギーレベルにアクセスできる。 qubitの$|0\rangle$と$|1\rangle$の抽象化を$|2\rangle$と$|3\rangle$の状態にアクセスできるququartに拡張できます。これにより、2つの量子ビットを1つの量子ビットにエンコードすることができ、2つの隣接する量子ビットから4つの完全に接続された量子ビットへの物理ユニット間の仮想接続が増加する。この接続方式により、2つの物理デバイス間でより効率的に3量子ビットゲートを実行できる。最適制御により合成された数個の3量子ゲートの直接対パルス実装を行い、最適制御により設計された4レベルキュートゲートの最初の実験実験で、4レベルデバイスにアクセス可能な超伝導アーキテクチャ上に3量子ゲートをコンパイルする。我々は、トッフォリゲートの実行に一時的に高レベル状態を使用し、常に高レベル状態を使用して量子回路のフィダリティを改善する戦略を示す。これらの手法は,中間符号化を用いて回路サイズを2倍に向上し,完全符号化クォートコンパイルでは3倍に向上する。

Superconducting quantum devices are a leading technology for quantum computation, but they suffer from several challenges. Gate errors, coherence errors and a lack of connectivity all contribute to low fidelity results. In particular, connectivity restrictions enforce a gate set that requires three-qubit gates to be decomposed into one- or two-qubit gates. This substantially increases the number of two-qubit gates that need to be executed. However, many quantum devices have access to higher energy levels. We can expand the qubit abstraction of $|0\rangle$ and $|1\rangle$ to a ququart which has access to the $|2\rangle$ and $|3\rangle$ state, but with shorter coherence times. This allows for two qubits to be encoded in one ququart, enabling increased virtual connectivity between physical units from two adjacent qubits to four fully connected qubits. This connectivity scheme allows us to more efficiently execute three-qubit gates natively between two physical devices. We present direct-to-pulse implementations of several three-qubit gates, synthesized via optimal control, for compilation of three-qubit gates onto a superconducting-based architecture with access to four-level devices with the first experimental demonstration of four-level ququart gates designed through optimal control. We demonstrate strategies that temporarily use higher level states to perform Toffoli gates and always use higher level states to improve fidelities for quantum circuits. We find that these methods improve expected fidelities with increases of 2x across circuit sizes using intermediate encoding, and increases of 3x for fully-encoded ququart compilation.

翻訳日:2023-03-27 14:13:10 公開日:2023-03-24

# 自動識別システム(AIS)データを用いた船舶軌道協会のためのCNN-LSTMアーキテクチャ

A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data ( http://arxiv.org/abs/2303.14068v1 )

ライセンス: Link先を確認

Md Asif Bin Syed and Imtiaz Ahmed

(参考訳) 海上監視では、通常と異常な船の動きパターンを区別することは、潜在的脅威をタイムリーに識別するために重要である。一旦検出されると、必要な介入が発生するまでこれらの容器を監視し追跡することが重要である。これを実現するために、血管の地質パラメータと運動パラメータを含む逐次観測を行い、それらをそれぞれの容器に関連付けるトラックアソシエーションアルゴリズムを用いる。これらのシーケンシャルな観測に内在する空間的および時間的変化は、従来のマルチオブジェクト追跡アルゴリズムにとって、アソシエーションタスクを困難にする。さらに、重複するトラックと欠落するデータの存在は、軌跡追跡プロセスをさらに複雑にする可能性がある。これらの課題に対処するため、本研究では、このトラッキングタスクを多変量時系列問題としてアプローチし、トラックアソシエーションのための1D CNN-LSTMアーキテクチャベースのフレームワークを導入する。この特別なニューラルネットワークアーキテクチャは、シーケンシャルな観測の間に存在する空間パターンと長期的な時間的関係をキャプチャすることができる。訓練の過程で、基礎となる各船の軌道を学習し、構築する。訓練を終えると、提案されたフレームワークは、自動識別システム(ais)によって収集された船舶の位置と動きデータを入力として取り、最も可能性の高い船舶軌道をリアルタイムで出力として返す。提案手法の有効性を評価するため,特定地域を航行する327隻の船舶の観測データを含むAISデータセットを用いた。提案するフレームワークの性能は,精度,精度,リコール,F1スコアなどの標準的なパフォーマンス指標を用いて測定する。他の競合ニューラルネットワークアーキテクチャと比較すると、このアプローチは優れたトラッキング性能を示している。

In marine surveillance, distinguishing between normal and anomalous vessel movement patterns is critical for identifying potential threats in a timely manner. Once detected, it is important to monitor and track these vessels until a necessary intervention occurs. To achieve this, track association algorithms are used, which take sequential observations comprising geological and motion parameters of the vessels and associate them with respective vessels. The spatial and temporal variations inherent in these sequential observations make the association task challenging for traditional multi-object tracking algorithms. Additionally, the presence of overlapping tracks and missing data can further complicate the trajectory tracking process. To address these challenges, in this study, we approach this tracking task as a multivariate time series problem and introduce a 1D CNN-LSTM architecture-based framework for track association. This special neural network architecture can capture the spatial patterns as well as the long-term temporal relations that exist among the sequential observations. During the training process, it learns and builds the trajectory for each of these underlying vessels. Once trained, the proposed framework takes the marine vessel's location and motion data collected through the Automatic Identification System (AIS) as input and returns the most likely vessel track as output in real-time. To evaluate the performance of our approach, we utilize an AIS dataset containing observations from 327 vessels traveling in a specific geographic region. We measure the performance of our proposed framework using standard performance metrics such as accuracy, precision, recall, and F1 score. When compared with other competitive neural network architectures our approach demonstrates a superior tracking performance.

翻訳日:2023-03-27 14:12:38 公開日:2023-03-24

# SEAL: ロボット行動認識のための意味的フレーム実行と局所化

SEAL: Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions ( http://arxiv.org/abs/2303.14067v1 )

ライセンス: Link先を確認

Cameron Kisailus, Daksh Narang, Matthew Shannon, Odest Chadwicke Jenkins

(参考訳) ロボット移動操作の最近の進歩は、制約された作業空間から大規模な人的環境へのロボットの動作環境の拡大を刺激している。これらの空間でのタスクを効果的に完了させるためには、ロボットは単純なピック・アンド・プレースを超えて、様々な手当を認識、推論し、実行しなくてはならない。セマンティックフレームの概念は、アクション中心の認識、タスクレベル推論、アクションレベル実行、言語との統合に適するロボットアクションの説得力のある表現を提供する。セマンティックフレーム(Semantic frame)は、言語学コミュニティの産物であり、必要な要素、事前および後条件、そして動詞句によって誘発される行動を実行するのに必要な一連のロボットアクションを定義する。本研究では,ロボット操作動作における意味フレーム表現を拡張し,図形モデルとしてロボット動作知覚のための意味フレーム実行と局所化の問題を導入する。 SEAL問題に対して、ロボットに与えられた行動の場所として、有限のセマンティックフレームに対する信念を維持するための非パラメトリックセマンティックフレームマッピング(SeFM)アルゴリズムについて述べる。 GPT-3のような言語モデルはSEALの定式化でカバーされる汎用タスク実行に対応できないことを示し、SEFMはロボットにビルスケール環境での動作に必要な効率的な探索戦略と長期記憶を提供する。

Recent advances in robotic mobile manipulation have spurred the expansion of the operating environment for robots from constrained workspaces to large-scale, human environments. In order to effectively complete tasks in these spaces, robots must be able to perceive, reason, and execute over a diversity of affordances, well beyond simple pick-and-place. We posit the notion of semantic frames provides a compelling representation for robot actions that is amenable to action-focused perception, task-level reasoning, action-level execution, and integration with language. Semantic frames, a product of the linguistics community, define the necessary elements, pre- and post- conditions, and a set of sequential robot actions necessary to successfully execute an action evoked by a verb phrase. In this work, we extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot. We show that language models such as GPT-3 are insufficient to address generalized task execution covered by the SEAL formulation and SeFM provides robots with efficient search strategies and long term memory needed when operating in building-scale environments.

翻訳日:2023-03-27 14:12:11 公開日:2023-03-24

# 協調型マルチエージェントタスクにおける学習報酬マシン

Learning Reward Machines in Cooperative Multi-Agent Tasks ( http://arxiv.org/abs/2303.14061v1 )

ライセンス: Link先を確認

Leo Ardon, Daniel Furelos-Blanco, Alessandra Russo

(参考訳) 本稿では,協調的なタスク分解と,サブタスクの構造を符号化した報酬機械(rms)の学習を組み合わせたマルチエージェント強化学習(marl)への新しいアプローチを提案する。提案手法は, 部分的に観察可能な環境における報酬の非マルコフ的性質に対処し, 協調作業の完了に必要な学習方針の解釈性を向上させる。各サブタスクに関連付けられたrmは分散的に学習され、各エージェントの振る舞いを導くのに使用される。これにより、協調的マルチエージェント問題の複雑さが減少し、より効果的な学習が可能となる。以上の結果から,本手法はMARL,特に大規模状態空間と複数エージェントを持つ複雑な環境での今後の研究の方向性として期待できると考えられる。

This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.

翻訳日:2023-03-27 14:11:46 公開日:2023-03-24

# 量子ビットのキラル基底と可積分スピン鎖への応用

Chiral bases for qubits and their applications to integrable spin chains ( http://arxiv.org/abs/2303.14056v1 )

ライセンス: Link先を確認

Vladislav Popkov, Xin Zhang and Andreas Kl\"umper

(参考訳) 我々は,横スピンヘリックスとキンクからなる新しいクビット基底を提案する。このキラル基底は、通常の計算基底とは対照的に、異なる位相的性質を持ち、非自明な位相を持つ量子状態を記述するのに便利である。適切なパラメータを選択することで、横スピン成分を含む演算子、例えば$\sigma_n^x$または$\sigma_n^y$はキラルベースで対角化され、横スピン成分に焦点を当てた問題の研究が容易になる。本研究では,近年の寒冷原子実験で研究されている$xx$モデルにおけるスピンヘリックスの逆スピンダイナミクスについて述べる。横磁化の時間依存性に関する決定式を導出する。

We propose a novel qubit basis composed of transverse spin helices with kinks. This chiral basis, in contrast to the usual computational basis, possesses distinct topological properties and is convenient for describing quantum states with nontrivial topology. By choosing appropriate parameters, operators containing transverse spin components, such as $\sigma_n^x$ or $\sigma_n^y$, become diagonal in the chiral basis, facilitating the study of problems focused on transverse spin components. As an application, we describe the transverse spin dynamics of a spin helix in the $XX$ model, which has been studied in recent cold atom experiments. We derive a determinantal formula for the temporal dependence of the transverse magnetization.

翻訳日:2023-03-27 14:11:35 公開日:2023-03-24

# ロボット支援療法におけるコミュニケーションの複雑化

Communicating Complex Decisions in Robot-Assisted Therapy ( http://arxiv.org/abs/2303.14054v1 )

ライセンス: Link先を確認

Carl Bettosi, Kefan Chen, Ryan Shah, Lynne Baillie

(参考訳) 社会支援ロボット(SAR)は、意思決定インストラクターやモチベーション・コンパニオンとして治療シナリオにおいて有望な可能性を示してきた。人間と人間のセラピーでは、専門家は透明性を促進し信頼を構築するために意思決定の背後にある思考プロセスを伝える。研究は、より複雑な意思決定モデルをこれらのロボットに組み込むことを目指しており、sarがその決定を説明する能力はますます困難になっている。複雑なSAR意思決定者の最新の例を示す。人間の治療における透過的なコミュニケーションの重要性から、SARはそのようなコンポーネントを設計に組み込むべきであると論じる。この話題に関する議論を刺激するために,研究者に一連の設計考察を提案する。

Socially Assistive Robots (SARs) have shown promising potential in therapeutic scenarios as decision-making instructors or motivational companions. In human-human therapy, experts often communicate the thought process behind the decisions they make to promote transparency and build trust. As research aims to incorporate more complex decision-making models into these robots to drive better interaction, the ability for the SAR to explain its decisions becomes an increasing challenge. We present the latest examples of complex SAR decision-makers. We argue that, based on the importance of transparent communication in human-human therapy, SARs should incorporate such components into their design. To stimulate discussion around this topic, we present a set of design considerations for researchers.

翻訳日:2023-03-27 14:11:20 公開日:2023-03-24

# musicface:音楽駆動型表現型歌唱顔合成

MusicFace: Music-driven Expressive Singing Face Synthesis ( http://arxiv.org/abs/2303.14044v1 )

ライセンス: Link先を確認

Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, and Ming Zeng

(参考訳) 音楽信号による鮮明でリアルな歌声の表情を合成することは、いまだに興味深く難しい問題である。本稿では,唇の自然な動き,表情,頭部のポーズ,眼の状態といった課題について述べる。人間の声と背景音楽の混合情報を音楽音声の共通信号に結合させることにより,課題に取り組むための分離・融合戦略を考案する。まず入力された音楽音声を人間の音声ストリームとバックグラウンド音楽ストリームに分解する。 2つのストリームの入力信号と表情のダイナミクス、頭部の動き、眼の状態との暗黙的かつ複雑な相関関係から、それらの関係を注意スキームでモデル化し、2つのストリームの効果をシームレスに融合させる。さらに、生成した結果の表現性を向上するために、頭部運動生成を速度生成と方向生成に分解し、眼状態生成を短時間点眼生成と長時間点眼生成に分解してモデル化することを提案する。また,この課題の訓練と評価を支援する新たな歌唱表情データセットを構築し,今後の課題への取り組みを促進する。広範囲にわたる実験とユーザ研究により,提案手法は定性的,定量的に実写的な歌唱表情を合成できることがわかった。

It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into human voice stream and background music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressiveness of the generated results, we propose to decompose head movements generation into speed generation and direction generation, and decompose eye states generation into the short-time eye blinking generation and the long-time eye closing generation to model them separately. We also build a novel SingingFace Dataset to support the training and evaluation of this task, and to facilitate future works on this topic. Extensive experiments and user study show that our proposed method is capable of synthesizing vivid singing face, which is better than state-of-the-art methods qualitatively and quantitatively.

翻訳日:2023-03-27 14:11:09 公開日:2023-03-24

# クラスインクリメンタル学習のためのクラスインクリメンタルエクエンプティブ圧縮

Class-Incremental Exemplar Compression for Class-Incremental Learning ( http://arxiv.org/abs/2303.14042v1 )

ライセンス: Link先を確認

Zilin Luo, Yaoyao Liu, Bernt Schiele, Qianru Sun

(参考訳) exemplar-based class-incremental learning (cil) では、新しいクラスのすべてのサンプルでモデルを微調整するが、インクリメンタルなフェーズ毎に古いクラスの少数のexemplarを微調整する。本稿では、この「ファウショット」制限を、非識別画素をダウンサンプリングし、メモリ内の「多くの」圧縮例を節約することで、単純な、驚くほど効果的なアイデアに基づいて破る。手動アノテーションを必要とせず,クラスアクティベーションマップ (cam) から識別画素に0-1マスクを生成することで,この圧縮を実現する。 CAMの2つの難しさを明確に解消するために,CIMと呼ばれる適応マスク生成モデルを提案する。 1)CAMのヒートマップを任意の閾値で0-1マスクに変換すると、全メモリが固定されるにつれて、識別画素のカバレッジと指数の量とのトレードオフにつながる。 2) CILの動的環境において特に明らかな,異なるオブジェクトクラスに対して最適なしきい値が変化する。 CIMモデルを従来のCILモデルに代えてバイレベル最適化問題により最適化する。我々は、Food-101, ImageNet-100, ImageNet-1000などの高分解能CILベンチマークの広範な実験を行い、CIMによる圧縮された例を用いて、10相 ImageNet-1000のFOSTERよりも4.8ポイント高い新しい最先端CIL精度を実現できることを示す。私たちのコードはhttps://github.com/xfflzl/CIM-CILで利用可能です。

Exemplar-based class-incremental learning (CIL) finetunes the model with all samples of new classes but few-shot exemplars of old classes in each incremental phase, where the "few-shot" abides by the limited memory budget. In this paper, we break this "few-shot" limit based on a simple yet surprisingly effective idea: compressing exemplars by downsampling non-discriminative pixels and saving "many-shot" compressed exemplars in the memory. Without needing any manual annotation, we achieve this compression by generating 0-1 masks on discriminative pixels from class activation maps (CAM). We propose an adaptive mask generation model called class-incremental masking (CIM) to explicitly resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to 0-1 masks with an arbitrary threshold leads to a trade-off between the coverage on discriminative pixels and the quantity of exemplars, as the total memory is fixed; and 2) optimal thresholds vary for different object classes, which is particularly obvious in the dynamic environment of CIL. We optimize the CIM model alternatively with the conventional CIL model through a bilevel optimization problem. We conduct extensive experiments on high-resolution CIL benchmarks including Food-101, ImageNet-100, and ImageNet-1000, and show that using the compressed exemplars by CIM can achieve a new state-of-the-art CIL accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase ImageNet-1000. Our code is available at https://github.com/xfflzl/CIM-CIL.

翻訳日:2023-03-27 14:10:46 公開日:2023-03-24

# トポロジカルデータ解析のためのオイラー特性ツール

Euler Characteristic Tools For Topological Data Analysis ( http://arxiv.org/abs/2303.14040v1 )

ライセンス: Link先を確認

Olympio Hacquard, Vadim Lebovici

(参考訳) 本稿では,トポロジカルデータ解析におけるオイラー特性技術について述べる。データから構築された単純複体族のオイラー特性をポイントワイドに計算すると、いわゆるオイラー特性プロファイルが生まれる。この単純なディスクリプタは、非常に低い計算コストで教師付きタスクの最先端のパフォーマンスを実現する。信号解析に着想を得て,オイラー特性プロファイルのハイブリッド変換を計算する。これらの積分変換はオイラー特性とルベーグ積分を混合し、トポロジカル信号の高効率な圧縮機を提供する。その結果、教師なしの設定で顕著なパフォーマンスを示した。定性面では、オイラープロファイルとそれらのハイブリッド変換によって捉えられた位相的および幾何学的情報に関する多くのヒューリスティックスを提供する。最後に,これらの記述子に対する安定性とランダム設定における漸近的保証を証明した。

In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.

翻訳日:2023-03-27 14:10:15 公開日:2023-03-24

# 自由言語モデルによる視覚言語事前学習の高速化

Accelerating Vision-Language Pretraining with Free Language Modeling ( http://arxiv.org/abs/2303.14038v1 )

ライセンス: Link先を確認

Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo

(参考訳) state of the arts in vision-language pretraining (vlp)は模範的なパフォーマンスを達成しているが、特に大規模webデータセットでは、収束が遅く、トレーニング時間が長いことによる高いトレーニングコストに苦しむ。トレーニング効率にとって重要な障害は、マスク言語モデリング(MLM)における絡み合った予測率(復元トークンの割合)と腐敗率(劣化トークンの割合)であり、予測損失から除外された出力トークンの大部分のコストで適切な腐敗率を達成することである。本稿では,vlpの収束を早めるために,自由言語モデリング(flm)という新たな事前学習タスクを提案する。 flmは、腐敗率との結び付きから予測レートを解放し、各トークンを予測できるように腐敗スパンをカスタマイズすることに成功した。 FLMでトレーニングされたモデルは、双方向のコンテキストをより柔軟に活用することで、同じGPU時間からより良く、より速く学習することができる。広汎な実験により、FLMはMLMベースの手法と比較して2.5倍の事前学習時間短縮を実現し、視覚言語理解と生成の両タスクにおける競合性能を維持した。コードはhttps://github.com/TencentARC/FLM.comで公開される。

The state of the arts in vision-language pretraining (VLP) achieves exemplary performance but suffers from high training costs resulting from slow convergence and long training time, especially on large-scale web datasets. An essential obstacle to training efficiency lies in the entangled prediction rate (percentage of tokens for reconstruction) and corruption rate (percentage of corrupted tokens) in masked language modeling (MLM), that is, a proper corruption rate is achieved at the cost of a large portion of output tokens being excluded from prediction loss. To accelerate the convergence of VLP, we propose a new pretraining task, namely, free language modeling (FLM), that enables a 100% prediction rate with arbitrary corruption rates. FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted. FLM-trained models are encouraged to learn better and faster given the same GPU time by exploiting bidirectional contexts more flexibly. Extensive experiments show FLM could achieve an impressive 2.5x pretraining time reduction in comparison to the MLM-based methods, while keeping competitive performance on both vision-language understanding and generation tasks. Code will be public at https://github.com/TencentARC/FLM.

翻訳日:2023-03-27 14:10:02 公開日:2023-03-24

# CTセグメンテーションラベリングの最適化

Optimizing the Procedure of CT Segmentation Labeling ( http://arxiv.org/abs/2303.14089v1 )

ライセンス: Link先を確認

Yaroslav Zharov, Tilo Baumbach, Vincent Heuveline

(参考訳) Computed Tomographyでは、機械学習は自動データ処理によく使用される。しかし、モデル複雑性の増大には、巨大なボリュームデータセットが伴うため、モデルトレーニングのコストが増大する。モデルアーキテクチャとトレーニングアルゴリズムの進歩によってこれを緩和するほとんどの作業とは異なり、アノテーションの手順とそのモデル性能への影響について検討する。モデルトレーニングのために収集された優れたデータセットの主な利点は、ラベルの品質、多様性、完全性である。これらのメリットがopen medical ctデータセットを用いたモデルパフォーマンスに与える影響を比較し,ラベリングの初期段階における多様性よりも品質が重要であり,その多様性は完全性よりも重要である,と結論づけた。この結論と追加実験に基づき, モデル性能を最大化しながらラベリングに費やす労力を最小限に抑えるために, 断層画像のセグメンテーションのためのラベリング手順を提案する。

In Computed Tomography, machine learning is often used for automated data processing. However, increasing model complexity is accompanied by increasingly large volume datasets, which in turn increases the cost of model training. Unlike most work that mitigates this by advancing model architectures and training algorithms, we consider the annotation procedure and its effect on the model performance. We assume three main virtues of a good dataset collected for a model training to be label quality, diversity, and completeness. We compare the effects of those virtues on the model performance using open medical CT datasets and conclude, that quality is more important than diversity early during labeling; the diversity, in turn, is more important than completeness. Based on this conclusion and additional experiments, we propose a labeling procedure for the segmentation of tomographic images to minimize efforts spent on labeling while maximizing the model performance.

翻訳日:2023-03-27 14:04:03 公開日:2023-03-24

# OPDMulti: 複数のオブジェクトに対するオープンな部分検出

OPDMulti: Openable Part Detection for Multiple Objects ( http://arxiv.org/abs/2303.14087v1 )

ライセンス: Link先を確認

Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang

(参考訳) 開部検出は、単視点画像中の物体の開部を検出し、対応する運動パラメータを予測するタスクである。以前の研究は、全ての入力画像が単一のオープンなオブジェクトのみを含む非現実的な設定を調査した。我々は,このタスクを複数のオブジェクトを持つシーンに一般化し,実世界のシーンに基づいて対応するデータセットを作成する。次に、このより困難なシナリオに、OPDFormer:part-aware transformerアーキテクチャを使って対処します。私たちの実験では、opdformerアーキテクチャが以前の作業を大幅に上回っています。私たちが調査したより現実的なマルチオブジェクトシナリオは、将来的な仕事の機会を示しながら、すべてのメソッドで難しいままです。

Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset based on real-world scenes. We then address this more challenging scenario with OPDFormer: a part-aware transformer architecture. Our experiments show that the OPDFormer architecture significantly outperforms prior work. The more realistic multiple-object scenarios we investigated remain challenging for all methods, indicating opportunities for future work.

翻訳日:2023-03-27 14:03:50 公開日:2023-03-24

# 微分プライベート合成制御

Differentially Private Synthetic Control ( http://arxiv.org/abs/2303.14084v1 )

ライセンス: Link先を確認

Saeyoung Rho, Rachel Cummings, Vishal Misra

(参考訳) 合成制御(synthetic control)は、合成反事実データを作成することにより介入の治療効果を推定するために用いられる因果推論ツールである。このアプローチは、他の類似した観測(ドナープール)からの測定を組み合わせて、介入前のターゲットとドナープールの関係を分析することによって、対実的時系列(ターゲットユニット)を予測する。機密データやプロプライエタリデータに合成制御ツールがますます適用されるにつれて、正式なプライバシ保護が求められることが多い。本研究では,明示的な誤差境界を持つ微分プライベート合成制御のための最初のアルゴリズムを提案する。我々のアプローチは、非私的合成制御と微分プライベートな経験的リスク最小化のツールに基づいている。我々は、合成制御クエリの感度に関する上限と下限を提供し、プライベート合成制御アルゴリズムの精度に関する明示的な誤差境界を提供する。我々は,アルゴリズムがターゲットユニットの正確な予測を行い,プライバシのコストが小さいことを示す。最後に,提案アルゴリズムの性能を実証的に評価し,パラメータの多様さに好適な性能を示すとともに,ハイパーパラメータチューニングの実践者へのガイダンスを提供する。

Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.

翻訳日:2023-03-27 14:03:39 公開日:2023-03-24

# 学生教師フレームワークにおけるランダム特徴モデルのオンライン学習

Online Learning for the Random Feature Model in the Student-Teacher Framework ( http://arxiv.org/abs/2303.14083v1 )

ライセンス: Link先を確認

Roman Worschech and Bernd Rosenow

(参考訳) ディープニューラルネットワークは、重みが増加するにつれて性能が向上し、過度にパラメータ化されるような予測アルゴリズムとして広く使われている。我々は,第1層が凍結され,第2層がトレーニング可能である2層ニューラルネットワークをランダム特徴モデルと呼ぶ。学習力学のための微分方程式の集合を導出することにより、学生-教師フレームワークの文脈における過度なパラメトリゼーションを考察する。隠れた層の大きさと入力次元の任意の有限比について、学生は完全一般化できず、非零漸近一般化誤差を計算する。学生の隠れた層の大きさが入力次元よりも指数関数的に大きいときのみ、完全一般化へのアプローチが可能となる。

Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.

翻訳日:2023-03-27 14:03:19 公開日:2023-03-24

# CoLa-Diff:マルチモードMRI合成のための条件付き潜時拡散モデル

CoLa-Diff: Conditional Latent Diffusion Model for Multi-Modal MRI Synthesis ( http://arxiv.org/abs/2303.14081v1 )

ライセンス: Link先を確認

Lan Jiang, Ye Mao, Xi Chen, Xiangfeng Wang, Chao Li

(参考訳) MRI合成は、臨床実践におけるMRIモダリティの欠如の課題を軽減することを約束する。拡散モデルは複雑なデータ分布と可変データ分布をモデル化して画像合成に有効な手法として登場した。しかし、ほとんどの拡散ベースのMRI合成モデルは単一のモードを使用している。元の画像領域で動作するため、メモリ集約性が高く、マルチモーダル合成では実現不可能である。さらに、MRIでは解剖学的構造を保たないことが多い。さらに、マルチモーダルMRI入力からの複数の条件のバランスは、マルチモーダル合成に不可欠である。本稿では,最初の拡散に基づく多モードMRI合成モデル,すなわち条件付き潜在拡散モデル(CoLa-Diff)を提案する。メモリ消費を低減するため,我々はCoLa-Diffを潜在空間で動作させるために設計する。本稿では,遅延空間における圧縮とノイズを解決するために,協調フィルタリングなどの新しいネットワークアーキテクチャを提案する。解剖学的構造をより良く維持するために、拡散過程を導くために密度分布の優先として脳領域マスクが導入された。さらに、マルチモーダル情報を有効に活用するためのオートウェイト適応を提案する。実験の結果、CoLa-Diffは他の最先端MRI合成法よりも優れており、マルチモーダルMRI合成の有効なツールとして機能することを約束している。

MRI synthesis promises to mitigate the challenge of missing MRI modality in clinical practice. Diffusion model has emerged as an effective technique for image synthesis by modelling complex and variable data distributions. However, most diffusion-based MRI synthesis models are using a single modality. As they operate in the original image domain, they are memory-intensive and less feasible for multi-modal synthesis. Moreover, they often fail to preserve the anatomical structure in MRI. Further, balancing the multiple conditions from multi-modal MRI inputs is crucial for multi-modal synthesis. Here, we propose the first diffusion-based multi-modality MRI synthesis model, namely Conditioned Latent Diffusion Model (CoLa-Diff). To reduce memory consumption, we design CoLa-Diff to operate in the latent space. We propose a novel network architecture, e.g., similar cooperative filtering, to solve the possible compression and noise in latent space. To better maintain the anatomical structure, brain region masks are introduced as the priors of density distributions to guide diffusion process. We further present auto-weight adaptation to employ multi-modal information effectively. Our experiments demonstrate that CoLa-Diff outperforms other state-of-the-art MRI synthesis methods, promising to serve as an effective tool for multi-modal MRI synthesis.

翻訳日:2023-03-27 14:03:07 公開日:2023-03-24

# 両世界のベスト:表データと画像データを用いたマルチモーダルコントラスト学習

Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data ( http://arxiv.org/abs/2303.14080v1 )

ライセンス: Link先を確認

Paul Hager, Martin J. Menten, Daniel Rueckert

(参考訳) 医用データセット、特にバイオバンクは、画像に加えて豊富な臨床情報を含む広範な表型データを含むことが多い。実際には、臨床医は多様性とスケールの両面でデータが少ないが、いまだにディープラーニングソリューションの展開を望んでいる。医療データセットのサイズの増加と高価なアノテーションコストに加えて、マルチモーダルで事前訓練し、一様予測できる教師なしの方法の必要性が高まっている。これらのニーズに対処するために,画像と表データを利用して非モーダルエンコーダを訓練する,自己指導型コントラスト学習フレームワークを提案する。我々のソリューションはSimCLRとSCARFという2つの主要なコントラスト学習戦略を組み合わせており、シンプルで効果的です。実験では,心mri画像と4万人の英国バイオバンク患者から120の臨床的特徴を用いて,心筋梗塞および冠動脈疾患(cad)のリスクを予測することにより,枠組みの強度を実証する。さらに,DVMカー広告データセットを用いて,自然画像へのアプローチの一般化可能性を示す。表データの高い解釈可能性を利用し,帰属実験およびアブレーション実験により,形態計測表の特徴は,大きさと形状を記述し,比較学習過程において重要度を大きくし,学習埋め込みの質を向上させることを見出した。最後に,教師付きコントラスト学習の新たな形式であるlaaf( label as a feature)を導入し,マルチモーダル事前学習中に基底真理ラベルを表型特徴として付加し,教師付きコントラストベースラインを上回った。

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻訳日:2023-03-27 14:02:50 公開日:2023-03-24

# DistractFlow: リアルディトラクションと擬似ラベルによる光学的フロー推定の改善

DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling ( http://arxiv.org/abs/2303.14078v1 )

ライセンス: Link先を確認

Jisoo Jeong, Hong Cai, Risheek Garrepalli, Fatih Porikli

(参考訳) 入力フレームに現実的な注意をそらすことにより,光フロー推定モデルのトレーニングを行うための新しいデータ拡張手法である distractflow を提案する。混合比に基づいて, 対のフレームの1つを類似領域を描写した気晴らし画像と組み合わせることにより, 自然物やシーンと相反する視覚摂動を誘発する。このようなペアを気晴らしペアと呼ぶ。直感的には、意味的に意味のある注意をそらすことによって、モデルが関連するバリエーションを学習し、挑戦的な偏差に対して堅牢性を達成することができるということです。具体的には, 初期対流と接地対流との間に計算された教師付き損失に加えて, 気をそらした対流と原対の接地対流との間に定義された第2の教師付き損失を同じ混合比で重み付けした。さらに、ラベルなしデータが利用可能であれば、擬似ラベルと相互一貫性の正規化により、自己管理設定への拡張アプローチを拡張します。元のペアとその気晴らしバージョンが与えられた場合、気晴らしペア上の推定フローを、元のペアの流れと一致させるために強制します。当社のアプローチでは,追加のアノテーションを必要とせずに,利用可能なトレーニングペア数を大幅に増やすことが可能です。モデルアーキテクチャに非依存であり、任意の光フロー推定モデルのトレーニングに適用することができる。 Sintel、KITTI、SlowFlowなど、複数のベンチマークに対する広範な評価は、DistractFlowが既存のモデルを一貫して改善し、最新技術よりも優れていることを示している。

We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames. Based on a mixing ratio, we combine one of the frames in the pair with a distractor image depicting a similar domain, which allows for inducing visual perturbations congruent with natural objects and scenes. We refer to such pairs as distracted pairs. Our intuition is that using semantically meaningful distractors enables the model to learn related variations and attain robustness against challenging deviations, compared to conventional augmentation schemes focusing only on low-level aspects and modifications. More specifically, in addition to the supervised loss computed between the estimated flow for the original pair and its ground-truth flow, we include a second supervised loss defined between the distracted pair's flow and the original pair's ground-truth flow, weighted with the same mixing ratio. Furthermore, when unlabeled data is available, we extend our augmentation approach to self-supervised settings through pseudo-labeling and cross-consistency regularization. Given an original pair and its distracted version, we enforce the estimated flow on the distracted pair to agree with the flow of the original pair. Our approach allows increasing the number of available training pairs significantly without requiring additional annotations. It is agnostic to the model architecture and can be applied to training any optical flow estimation models. Our extensive evaluations on multiple benchmarks, including Sintel, KITTI, and SlowFlow, show that DistractFlow improves existing models consistently, outperforming the latest state of the art.

翻訳日:2023-03-27 14:02:22 公開日:2023-03-24

# 適応型インスタンスワイズ・スムースティングによる対人訓練の改善

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing ( http://arxiv.org/abs/2303.14077v1 )

ライセンス: Link先を確認

Lin Li, Michael Spratling

(参考訳) 深いニューラルネットワークは、逆の摂動によって入力が破壊され、人間の知覚できない人工ノイズによって誤った予測をすることができる。これまでのところ、敵の訓練はこのような敵の攻撃に対する最も成功した防御であった。この研究は、敵の堅牢性を高めるために敵の訓練を改善することに焦点を当てている。まず、インスタンスの観点から、敵のトレーニング中に敵の脆弱性がどのように進化するかを分析します。学習中,攻撃に対して脆弱なトレーニングサンプルのかなりの割合を犠牲にすることで,攻撃的損失の全体的な低減が達成され,その結果,データ間の攻撃的脆弱性が均一に分布することを見出した。このような「不均一な脆弱性」は、いくつかの一般的なロバストなトレーニング方法に広まり、さらに重要なことは、敵のトレーニングにおける過剰フィッティングに関連している。本研究の目的は,新たな対人訓練手法であるインスタンス適応型平滑化強化対人訓練(ISEAT)を提案することである。入力と減量の両方のランドスケープを、適応的でインスタンス固有の方法で円滑にし、高い逆の脆弱性を持つサンプルに対してより堅牢性を高める。本手法が既存の防御法よりも優れていることを示す。特に,最新のデータ拡張と半教師付き学習技術を組み合わせることで,Wide ResNet34-10では59.32%,Wide ResNet28-10では61.55%,CIFAR10では$\ell_{\infty}$-normによる攻撃に対して,最先端の堅牢性を達成している。コードはhttps://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-ATで公開されている。

Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against $\ell_{\infty}$-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data. Code is available at https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT.

翻訳日:2023-03-27 14:01:55 公開日:2023-03-24

# 画像による検索:美容品検索に便利な機能を探る

Search By Image: Deeply Exploring Beneficial Features for Beauty Product Retrieval ( http://arxiv.org/abs/2303.14075v1 )

ライセンス: Link先を確認

Mingqiang Wei, Qian Sun, Haoran Xie, Dong Liang, Fu Lee Wang

(参考訳) 画像による検索は人気があるが、広範にわたる干渉により依然として難しい。一実世界の撮影画像のデータ変動(背景、ポーズ、視角、明るさ等) ii) クエリデータセットに類似した画像。本稿では,ニューラルネットワークによる美容積検索(BPR)の実用的意義について検討する。異なるタイプの画像特徴を幅広く抽出し、これらの特徴が有用かどうかという興味深い疑問を提起する。一実世界撮影画像のデータ変動を抑制すること、及び二非常によく似ているが、本質的に異なる美容製品である他の画像とを区別することにより、BPRの能力が向上する。そこで本研究では,美しい製品画像の複数の特徴(VM-Net)の組み合わせを理解するために,新しい可変アテンションニューラルネットワークを提案する。 BPRのトレーニングデータセットが公開されていないことを考えると、100万以上の画像を20K以上のカテゴリに分類した新しいデータセットを構築し、VM-Netや他の手法の一般化と干渉防止の両方を改善する。我々はvm-netとその競合製品のパフォーマンスをベンチマークデータセットperfect-500kで検証する。ソースコードとデータセットは公開時にリリースされる。

Searching by image is popular yet still challenging due to the extensive interference arose from i) data variations (e.g., background, pose, visual angle, brightness) of real-world captured images and ii) similar images in the query dataset. This paper studies a practically meaningful problem of beauty product retrieval (BPR) by neural networks. We broadly extract different types of image features, and raise an intriguing question that whether these features are beneficial to i) suppress data variations of real-world captured images, and ii) distinguish one image from others which look very similar but are intrinsically different beauty products in the dataset, therefore leading to an enhanced capability of BPR. To answer it, we present a novel variable-attention neural network to understand the combination of multiple features (termed VM-Net) of beauty product images. Considering that there are few publicly released training datasets for BPR, we establish a new dataset with more than one million images classified into more than 20K categories to improve both the generalization and anti-interference abilities of VM-Net and other methods. We verify the performance of VM-Net and its competitors on the benchmark dataset Perfect-500K, where VM-Net shows clear improvements over the competitors in terms of MAP@7. The source code and dataset will be released upon publication.

翻訳日:2023-03-27 14:01:23 公開日:2023-03-24

# ChatDoctor:医学領域知識を用いたLLaMAモデルに基づく医用チャットモデル

ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge ( http://arxiv.org/abs/2303.14070v1 )

ライセンス: Link先を確認

Li Yunxiang, Li Zihan, Zhang Kai, Dan Ruilong, Zhang You

(参考訳) ChatGPTのような一般領域における最近の大規模言語モデル(LLM)は、指示に従うことや、人間のような反応を生み出すことに顕著な成功を収めている。しかし、これらの言語モデルは医療領域で個別に注意深く学習されておらず、診断の正確性が低く、医療診断や医薬品などの適切な推奨ができない。この問題に対処するために,700以上の疾患とその症状,推奨薬,必要な医療検査を収集し,医師と患者の会話を5k以上生成した。医師と患者の会話の微調整モデルにより、これらのモデルは患者のニーズを理解し、アドバイスを提供し、様々な医療関連分野に有用な支援を提供する大きな可能性を持つ。これらの先進的な言語モデルのヘルスケアへの統合は、医療専門家と患者がコミュニケーションする方法に革命をもたらし、最終的にケアの全体的な品質と患者の結果を改善する。さらに、医療分野における対話モデルのさらなる発展を進めるために、すべてのソースコード、データセット、モデルの重み付けを開放する。さらに、このプロジェクトのトレーニングデータ、コード、重み付けは、https://github.com/Kent0n-Li/ChatDoctor.comで入手できる。

Recent large language models (LLMs) in the general domain, such as ChatGPT, have shown remarkable success in following instructions and producing human-like responses. However, such language models have not been learned individually and carefully for the medical domain, resulting in poor diagnostic accuracy and inability to give correct recommendations for medical diagnosis, medications, etc. To address this issue, we collected more than 700 diseases and their corresponding symptoms, recommended medications, and required medical tests, and then generated 5K doctor-patient conversations. By fine-tuning models of doctor-patient conversations, these models emerge with great potential to understand patients' needs, provide informed advice, and offer valuable assistance in a variety of medical-related fields. The integration of these advanced language models into healthcare can revolutionize the way healthcare professionals and patients communicate, ultimately improving the overall quality of care and patient outcomes. In addition, we will open all source code, datasets and model weights to advance the further development of dialogue models in the medical field. In addition, the training data, code, and weights of this project are available at: https://github.com/Kent0n-Li/ChatDoctor.

翻訳日:2023-03-27 14:01:04 公開日:2023-03-24

# 気象条件下におけるドメイン・インクリメンタルセマンティクスセグメンテーションにおける忘れ方原理

Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions ( http://arxiv.org/abs/2303.14115v1 )

ライセンス: Link先を確認

Tobias Kalb, J\"urgen Beyerer

(参考訳) 自動運転車のシーン認識のためのディープニューラルネットワークは、訓練されたドメインに対して優れた結果をもたらす。しかし,実世界の状況では,操作領域とその基礎となるデータ分布は変化する。特に悪天候条件は、トレーニング中にデータが得られない場合、モデル性能を著しく低下させ、さらに、モデルが新しいドメインに段階的に適合すると、壊滅的な忘れがちとなり、以前観測された領域でパフォーマンスが大幅に低下する。破滅的な忘れを減らそうとする最近の進歩にもかかわらず、その原因と効果はいまだに不明である。そこで本研究では, 気象条件下でのドメインインクリメンタル学習において, 意味セグメンテーションモデルの表現がどう影響するかについて検討する。実験と表現分析の結果,大惨な忘れはドメイン・インクリメンタル・ラーニングにおける低レベルな特徴の変化によって主に引き起こされ,事前学習と画像拡張によるソース・ドメイン上のより一般的な特徴の学習が,その後のタスクにおける効率的な機能の再利用につながることが示唆された。これらの知見は,効果的な連続学習アルゴリズムのための一般化機能を促進する手法の重要性を強調した。

Deep neural networks for scene perception in automated vehicles achieve excellent results for the domains they were trained on. However, in real-world conditions, the domain of operation and its underlying data distribution are subject to change. Adverse weather conditions, in particular, can significantly decrease model performance when such data are not available during training.Additionally, when a model is incrementally adapted to a new domain, it suffers from catastrophic forgetting, causing a significant drop in performance on previously observed domains. Despite recent progress in reducing catastrophic forgetting, its causes and effects remain obscure. Therefore, we study how the representations of semantic segmentation models are affected during domain-incremental learning in adverse weather conditions. Our experiments and representational analyses indicate that catastrophic forgetting is primarily caused by changes to low-level features in domain-incremental learning and that learning more general features on the source domain using pre-training and image augmentations leads to efficient feature reuse in subsequent tasks, which drastically reduces catastrophic forgetting. These findings highlight the importance of methods that facilitate generalized features for effective continual learning algorithms.

翻訳日:2023-03-27 13:54:38 公開日:2023-03-24

# 物体の動き感度:イベントベースカメラのエゴモーション問題に対するバイオインスパイアソリューション

Object Motion Sensitivity: A Bio-inspired Solution to the Ego-motion Problem for Event-based Cameras ( http://arxiv.org/abs/2303.14114v1 )

ライセンス: Link先を確認

Shay Snyder (1), Hunter Thompson (2), Md Abdullah-Al Kaiser (3), Gregory Schwartz (4), Akhilesh Jaiswal (3), and Maryam Parsa (4) ((1) George Mason University, (2) Georgia Institute of Technology, (3) University of Southern California, (4) Northwestern University)

(参考訳) ニューロモルフィック(イベントベースの)イメージセンサーは、人間の網膜からインスピレーションを得て、生体によく似た方法で視覚刺激を処理できる電子機器を作る。これらのセンサーは従来のRGBセンサーとは大きく異なる情報を処理する。具体的には、イベントベースイメージセンサが生成する知覚情報は、RGBセンサと比べて桁違いのスペーサーである。第1世代のニューロモルフィック画像センサであるDynamic Vision Sensor (DVS)は、光受容体と最初の網膜シナプスに制限された計算にインスパイアされている。本研究は,ニューロモルフィック画像センサの第2世代,CMOSイメージセンサ(IRIS)における統合網膜機能(Integrated Retinal Functionality in CMOS Image Sensors)の能力を強調するものである。この研究で選択される特徴は、IRISセンサーで局所的に処理されるオブジェクト運動感度(OMS)である。イベントベースカメラのエゴモーション問題を解決するためのOMSの能力について検討する。 OMS は従来の RGB や DVS と同様の効率で標準的なコンピュータビジョンタスクを実現できるが,帯域幅の大幅な削減が可能である。これにより、ワイヤレスおよびコンピューティングの電力予算が削減され、高速、堅牢、エネルギー効率、低帯域幅のリアルタイム意思決定において大きな機会が開ける。

Neuromorphic (event-based) image sensors draw inspiration from the human-retina to create an electronic device that can process visual stimuli in a way that closely resembles its biological counterpart. These sensors process information significantly different than the traditional RGB sensors. Specifically, the sensory information generated by event-based image sensors are orders of magnitude sparser compared to that of RGB sensors. The first generation of neuromorphic image sensors, Dynamic Vision Sensor (DVS), are inspired by the computations confined to the photoreceptors and the first retinal synapse. In this work, we highlight the capability of the second generation of neuromorphic image sensors, Integrated Retinal Functionality in CMOS Image Sensors (IRIS), which aims to mimic full retinal computations from photoreceptors to output of the retina (retinal ganglion cells) for targeted feature-extraction. The feature of choice in this work is Object Motion Sensitivity (OMS) that is processed locally in the IRIS sensor. We study the capability of OMS in solving the ego-motion problem of the event-based cameras. Our results show that OMS can accomplish standard computer vision tasks with similar efficiency to conventional RGB and DVS solutions but offers drastic bandwidth reduction. This cuts the wireless and computing power budgets and opens up vast opportunities in high-speed, robust, energy-efficient, and low-bandwidth real-time decision making.

翻訳日:2023-03-27 13:54:16 公開日:2023-03-24

# 離散最適化による解釈可能な異常検出

Interpretable Anomaly Detection via Discrete Optimization ( http://arxiv.org/abs/2303.14111v1 )

ライセンス: Link先を確認

Simon Lutz, Florian Wittbold, Simon Dierl, Benedikt B\"oing, Falk Howar, Barbara K\"onig, Emmanuel M\"uller, Daniel Neider

(参考訳) 異常検出は、サイバーセキュリティ、法執行、医療、詐欺保護など、多くのアプリケーションドメインにおいて不可欠である。しかし、現在のディープラーニングアプローチの意思決定は理解が難しいことで知られており、多くの場合、実践的な適用性を制限している。この制限を克服するために、シーケンシャルデータから本質的に解釈可能な異常検出器を学習するためのフレームワークを提案する。具体的には、与えられたラベルなしシーケンスの多重集合から決定論的有限オートマトン(DFA)を学ぶことを考える。この問題は計算量的に難しいことを示し,制約最適化に基づく2つの学習アルゴリズムを開発した。さらに, DFAの全体的な解釈性を改善するために, 最適化問題に対する新たな正規化手法を導入する。プロトタイプ実装を用いて,提案手法は精度とF1スコアの点で有望な結果を示す。

Anomaly detection is essential in many application domains, such as cyber security, law enforcement, medicine, and fraud protection. However, the decision-making of current deep learning approaches is notoriously hard to understand, which often limits their practical applicability. To overcome this limitation, we propose a framework for learning inherently interpretable anomaly detectors from sequential data. More specifically, we consider the task of learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled sequences. We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.

翻訳日:2023-03-27 13:53:49 公開日:2023-03-24

# エンコーダ・デコーダを用いた散水滴の形態変化の予測

Prediction of the morphological evolution of a splashing drop using an encoder-decoder ( http://arxiv.org/abs/2303.14109v1 )

ライセンス: Link先を確認

Jingzu Yee, Daichi Igarashi, Shun Miyatake, Yoshiyuki Tagawa

(参考訳) 固体表面への落下の影響は、様々な影響と応用を持つ重要な現象である。しかし、この現象の多相性は、特に落下が跳ね上がると、その形態的進化の予測に複雑を引き起こす。多くの機械学習に基づくドロップインパクト研究は物理パラメータを中心にしているが、この研究ではエンコーダデコーダを訓練し、画像データを用いてドロップ形態を予測するコンピュータビジョン戦略を用いた。ここでは、この訓練されたエンコーダデコーダが、スプラッシュや非スラッシュドロップの形態を示すビデオを生成することができることを示す。興味深いことに、これらの生成されたビデオのフレームごとに、落下の直径が実際のビデオとよく一致していることが判明した。また,スプラッシュ/ノンスプラッシュ予測の精度も高かった。これらの結果は、トレーニングされたエンコーダデコーダが、ドロップ形態を正確に表現できるビデオを生成する能力を示している。このアプローチは、実験および数値研究の高速で安価な代替手段を提供する。

The impact of a drop on a solid surface is an important phenomenon that has various implications and applications. However, the multiphase nature of this phenomenon causes complications in the prediction of its morphological evolution, especially when the drop splashes. While most machine-learning-based drop-impact studies have centred around physical parameters, this study used a computer-vision strategy by training an encoder-decoder to predict the drop morphologies using image data. Herein, we show that this trained encoder-decoder is able to successfully generate videos that show the morphologies of splashing and non-splashing drops. Remarkably, in each frame of these generated videos, the spreading diameter of the drop was found to be in good agreement with that of the actual videos. Moreover, there was also a high accuracy in splashing/non-splashing prediction. These findings demonstrate the ability of the trained encoder-decoder to generate videos that can accurately represent the drop morphologies. This approach provides a faster and cheaper alternative to experimental and numerical studies.

翻訳日:2023-03-27 13:53:36 公開日:2023-03-24

# 超伝導トランスモンプロセッサのクロストーク特性

Characterizing crosstalk of superconducting transmon processors ( http://arxiv.org/abs/2303.14103v1 )

ライセンス: Link先を確認

Andreas Ketterer, Thomas Wellens

(参考訳) 現在利用可能な量子コンピューティングハードウェアは、超伝導トランスモンアーキテクチャに基づくもので、数百キュービットのネットワークを実現する。しかし、そのような量子チップの固有のノイズとデコヒーレンス効果は、基本的なゲート演算をかなり変化させ、ターゲットの量子計算の不完全な出力をもたらす。本研究では,隣接量子ビット上で同時に実行される量子ゲート間の相関関係に現れるクロストーク効果の特性について考察する。このような相関関係の物理的起源を簡潔に説明した後、ランダム化ベンチマークプロトコルを用いて量子チップ全体のクロストーク効果の大きさを効率よく体系的に特徴付ける方法を示す。我々は,IBMが提供する実際の量子ハードウェア上で,クロストークによるゲート忠実度の変化を観測することで,導入プロトコルを実証する。最後に、得られた情報を用いて、適切なクロストーク対応ノイズモデルを考案し、ノイズ量子ハードウェアをシミュレートするより正確な手法を提案する。

Currently available quantum computing hardware based on superconducting transmon architectures realizes networks of hundreds of qubits with the possibility of controlled nearest-neighbor interactions. However, the inherent noise and decoherence effects of such quantum chips considerably alter basic gate operations and lead to imperfect outputs of the targeted quantum computations. In this work, we focus on the characterization of crosstalk effects which manifest themselves in correlations between simultaneously executed quantum gates on neighboring qubits. After a short explanation of the physical origin of such correlations, we show how to efficiently and systematically characterize the magnitude of such crosstalk effects on an entire quantum chip using the randomized benchmarking protocol. We demonstrate the introduced protocol by running it on real quantum hardware provided by IBM observing significant alterations in gate fidelities due to crosstalk. Lastly, we use the gained information in order to propose more accurate means to simulate noisy quantum hardware by devising an appropriate crosstalk-aware noise model.

翻訳日:2023-03-27 13:53:20 公開日:2023-03-24

# 分散シルエットアルゴリズム:ビッグデータによるクラスタリングの評価

Distributed Silhouette Algorithm: Evaluating Clustering on Big Data ( http://arxiv.org/abs/2303.14102v1 )

ライセンス: Link先を確認

Marco Gaido

(参考訳) ビッグデータの時代において、各アルゴリズムが持つ必要のある重要な特徴は、分散環境で効率的に並列に実行する可能性である。クラスタリングの品質を評価するための一般的なシルエット計量は、残念ながらこの性質を持たず、入力データセットのサイズに関して二次計算の複雑さを持っている。このため、クラスタリングを別途評価する必要のあるビッグデータシナリオでは、その実行が妨げられている。本稿では,このギャップを埋めるため,線形複雑性を持つシルエット計量を計算し,分散環境で並列に実行可能な最初のアルゴリズムを提案する。その実装はApache Spark MLライブラリで無料で利用できる。

In the big data era, the key feature that each algorithm needs to have is the possibility of efficiently running in parallel in a distributed environment. The popular Silhouette metric to evaluate the quality of a clustering, unfortunately, does not have this property and has a quadratic computational complexity with respect to the size of the input dataset. For this reason, its execution has been hindered in big data scenarios, where clustering had to be evaluated otherwise. To fill this gap, in this paper we introduce the first algorithm that computes the Silhouette metric with linear complexity and can easily execute in parallel in a distributed environment. Its implementation is freely available in the Apache Spark ML library.

翻訳日:2023-03-27 13:53:04 公開日:2023-03-24

# Nuisance-extended Information Bottleneckによる複数信頼性対策の強化

Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck ( http://arxiv.org/abs/2303.14096v1 )

ライセンス: Link先を確認

Jongheon Jeong, Sihyun Yu, Hankook Lee, Jinwoo Shin

(参考訳) トレーニングデータが制限される現実のシナリオでは、データ内の多くの予測信号は、データ取得のバイアス(つまり一般化できない)からではなく、モデルがそのような(いわゆる)「ショートカット」信号に共適応することを防ぐことができない。このような障害モードを回避すべく、トレーニングにおけるより広い種類の摂動をカバーするために、相互情報制約の下で敵の脅威モデルを考える。これにより、標準情報ボトルネックを拡張して、ニュアサンス情報をモデル化するモチベーションが生まれます。提案する畳み込み型とトランスフォーマー型の両方のアーキテクチャに関するハイブリッド識別生成型トレーニングを容易にするために,目標を実現するためのオートエンコーダベースのトレーニングと,実用的なエンコーダ設計を提案する。実験結果から,提案手法は学習した表現の堅牢性(ドメイン固有の知識を使わずに顕著な)を向上させることが示唆された。例えば、我々のモデルは、aurocで78.4\% \rightarrow 87.2\%$の新規性検出において、最近の挑戦的オブジェクトベンチマークの最先端を前進させ、腐敗、背景、(証明された)敵対的ロバスト性の向上を同時に享受できる。コードはhttps://github.com/jh-jeong/nuisance_ibで入手できる。

In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition (i.e., less generalizable), so that one cannot prevent a model from co-adapting on such (so-called) "shortcut" signals: this makes the model fragile in various distribution shifts. To bypass such failure modes, we consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. This motivates us to extend the standard information bottleneck to additionally model the nuisance information. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training concerning both convolutional- and Transformer-based architectures. Our experimental results show that the proposed scheme improves robustness of learned representations (remarkably without using any domain-specific knowledge), with respect to multiple challenging reliability measures. For example, our model could advance the state-of-the-art on a recent challenging OBJECTS benchmark in novelty detection by $78.4\% \rightarrow 87.2\%$ in AUROC, while simultaneously enjoying improved corruption, background and (certified) adversarial robustness. Code is available at https://github.com/jh-jeong/nuisance_ib.

翻訳日:2023-03-27 13:52:53 公開日:2023-03-24

# パノVPR:パノラマを横切るスライディングウインドウによる一様視界から等角視界認識を目指して

PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View ( http://arxiv.org/abs/2303.14095v1 )

ライセンス: Link先を確認

Ze Shi, Hao Shi, Kailun Yang, Zhe Yin, Yining Lin, Kaiwei Wang

(参考訳) 近年、視覚位置認識は自動運転とロボット工学の重要な技術として注目を集めている。現在の主流のアプローチは、視点ビュー検索視点ビュー(P2P)パラダイムまたは等方形画像検索等方形画像(E2E)パラダイムを使用する。しかし、自然で実践的なアイデアは、ユーザーはクエリパースペクティブの画像を取得し、地図プロバイダからパノラマデータベースイメージで取得するために、消費者級のピンホールカメラしか持っていないということである。そこで我々はPanoVPR (P2E) を提案する。PanoVPRは、平板上をスライドするウィンドウと、ウィンドウ間の特徴記述子を比較することで、ハードクロップによる特徴トランケーションを解消する、スライドウインドウに基づく視界-等角形(P2E)視覚位置認識フレームワークである。さらに、この統一フレームワークは、p2p(perspective-to-perspective)メソッドで使用されるネットワーク構造を変更せずに直接転送することができる。トレーニングと評価を容易にするため,pits250kからpits250k-P2Eデータセットを抽出し,有望な結果を得るとともに,モバイルロボットプラットフォームによる現実シナリオにおけるP2Eデータセットも構築する。コードとデータセットはhttps://github.com/zafirshi/PanoVPR.comで公開される。

Visual place recognition has received increasing attention in recent years as a key technology in autonomous driving and robotics. The current mainstream approaches use either the perspective view retrieval perspective view (P2P) paradigm or the equirectangular image retrieval equirectangular image (E2E) paradigm. However, a natural and practical idea is that users only have consumer-grade pinhole cameras to obtain query perspective images and retrieve them in panoramic database images from map providers. To this end, we propose PanoVPR, a sliding-window-based perspective-to-equirectangular (P2E) visual place recognition framework, which eliminates feature truncation caused by hard cropping by sliding windows over the whole equirectangular image and computing and comparing feature descriptors between windows. In addition, this unified framework allows for directly transferring the network structure used in perspective-to-perspective (P2P) methods without modification. To facilitate training and evaluation, we derive the pitts250k-P2E dataset from the pitts250k and achieve promising results, and we also establish a P2E dataset in a real-world scenario by a mobile robot platform, which we refer to YQ360. Code and datasets will be made available at https://github.com/zafirshi/PanoVPR.

翻訳日:2023-03-27 13:52:25 公開日:2023-03-24

# NeuFace:マルチビュー画像からのリアルな3Dニューラルフェイスレンダリング

NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images ( http://arxiv.org/abs/2303.14092v1 )

ライセンス: Link先を確認

Mingwu Zheng, Haiyu Zhang, Hongyu Yang, Di Huang

(参考訳) マルチビュー画像からのリアルな顔レンダリングは、様々なコンピュータビジョンやグラフィックアプリケーションに有用である。しかし, 顔の複雑な空間的な反射特性と幾何学的特徴から, 顔の3次元表現を忠実かつ効率的に復元することは依然として困難である。本稿では,ニューラルレンダリング技術を用いて,正確で物理的に意味のある3次元表現を学習する,新しい3次元顔レンダリングモデルneufaceを提案する。自然に神経BRDFを物理的にベースとしたレンダリングに組み込んで、高度な顔形状と外観の手がかりを協調的に捉える。具体的には、近距離BRDF統合と、簡単な新しい低ランク前処理を導入し、曖昧さを効果的に低減し、顔面BRDFの性能を高める。大規模な実験は、人間の顔レンダリングにおけるNeuFaceの優位性を実証し、共通オブジェクトへの適切な一般化能力を示した。

Realistic face rendering from multi-view images is beneficial to various computer vision and graphics applications. Due to the complex spatially-varying reflectance properties and geometry characteristics of faces, however, it remains challenging to recover 3D facial representations both faithfully and efficiently in the current studies. This paper presents a novel 3D face rendering model, namely NeuFace, to learn accurate and physically-meaningful underlying 3D representations by neural rendering techniques. It naturally incorporates the neural BRDFs into physically based rendering, capturing sophisticated facial geometry and appearance clues in a collaborative manner. Specifically, we introduce an approximated BRDF integration and a simple yet new low-rank prior, which effectively lower the ambiguities and boost the performance of the facial BRDFs. Extensive experiments demonstrate the superiority of NeuFace in human face rendering, along with a decent generalization ability to common objects.

翻訳日:2023-03-27 13:51:55 公開日:2023-03-24

# 暗黒物質からの流体力学シミュレーションのレクリエーションにおける物理インフォームニューラルネットワーク

Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter ( http://arxiv.org/abs/2303.14090v1 )

ライセンス: Link先を確認

Zhenyu Dai, Ben Moews, Ricardo Vilalta, Romeel Dave

(参考訳) 物理インフォームドニューラルネットワークは、統計的パターンとドメイン知識を組み合わせた予測モデルを構築するためのコヒーレントなフレームワークとして登場した。基本的な考え方は、可能な解の空間を制約するために既知の関係を持つ最適化損失関数を強化することである。流体力学シミュレーションは現代の宇宙論の中核であり、必要な計算は費用も時間もかかる。同時に、ダークマターの比較的高速なシミュレーションには少ないリソースを必要とするため、バリオンを研究の活発な領域として扱うための機械学習アルゴリズムが出現し、水力学シミュレーションで見られる散乱を再現することは、現在進行中の課題である。本稿では,バリオン変換効率に関する理論をモデル損失関数に注入し,ニューラルネットワークアーキテクチャの進歩と物理的制約を組み合わせたバリオン塗装への物理インフォームニューラルネットワークの最初の応用について述べる。また,散乱再生を強制するKulback-Leibler分散に基づく時間的予測比較も導入する。宇宙シミュレーションのシムバ集合に対するバリオニクス特性の完全な集合を同時に抽出することにより, ダークマターハロ特性に基づくバリオニクス予測の精度の向上, 基本的金属性関係の回復, ターゲットシミュレーションの分布を辿る散乱体の回収を実証した。

Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.

翻訳日:2023-03-27 13:51:28 公開日:2023-03-24

# minddiffuser: 意味的および構造的拡散を伴うヒト脳活動からの画像再構成制御

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion ( http://arxiv.org/abs/2303.14139v1 )

ライセンス: Link先を確認

Yizhuo Lu, Changde Du, Dianpeng Wang and Huiguang He

(参考訳) 機能的磁気共鳴イメージング(fmri)による視覚刺激の再構成は有意義かつ困難な課題である。従来の研究は、いくつかの自然画像の輪郭や大きさなど、原像に似た構造で復元に成功した。しかし、これらの再構成には明確な意味情報がなく、識別が難しい。近年、多くの研究は、より強力な生成能力を持つマルチモーダル事前学習モデルを用いて、本来のものと意味的に類似した画像を再構成している。しかし、これらの画像は位置や方向などの制御不能な構造情報を持っている。両課題を同時に解決するために,安定拡散を利用した2段階画像再構成モデルMindDiffuserを提案する。ステージ1では、VQ-VAE潜在表現とfMRIからデコードされたCLIPテキスト埋め込みを安定拡散のイメージ・ツー・イメージプロセスに配置し、セマンティックおよび構造情報を含む予備画像を生成する。ステージ2では、fMRIからデコードされた低レベルCLIP視覚特徴を監視情報として利用し、バックプロパゲーションによりステージ1の2つの特徴を継続的に調整し、構造情報を整列させる。定性的および定量的解析の結果から,提案モデルが自然景観データセット(NSD)の再構成結果において,現在の最先端モデルを上回っていることが示唆された。さらに, アブレーション実験の結果から, モデルの各成分が画像再構成に有効であることが示唆された。

Reconstructing visual stimuli from measured functional magnetic resonance imaging (fMRI) has been a meaningful and challenging task. Previous studies have successfully achieved reconstructions with structures similar to the original images, such as the outlines and size of some natural images. However, these reconstructions lack explicit semantic information and are difficult to discern. In recent years, many studies have utilized multi-modal pre-trained models with stronger generative capabilities to reconstruct images that are semantically similar to the original ones. However, these images have uncontrollable structural information such as position and orientation. To address both of the aforementioned issues simultaneously, we propose a two-stage image reconstruction model called MindDiffuser, utilizing Stable Diffusion. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion, which yields a preliminary image that contains semantic and structural information. In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information, and continually adjust the two features in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our proposed model has surpassed the current state-of-the-art models in terms of reconstruction results on Natural Scenes Dataset (NSD). Furthermore, the results of ablation experiments indicate that each component of our model is effective for image reconstruction.

翻訳日:2023-03-27 13:45:31 公開日:2023-03-24

# 医用画像解析における敵攻撃と防御:方法と応用

Adversarial Attack and Defense for Medical Image Analysis: Methods and Applications ( http://arxiv.org/abs/2303.14133v1 )

ライセンス: Link先を確認

Junhao Dong, Junxi Chen, Xiaohua Xie, Jianhuang Lai, and Hao Chen

(参考訳) 深層学習技術は, コンピュータ支援画像解析において優れた性能を示しているが, 相変わらず脆弱であり, 臨床における誤診の可能性を秘めている。近年では, 深部医療診断システムにおいて, これらの逆境に対する防御が顕著に進歩している。本論では, 新たな分類法を応用シナリオとして, 対人攻撃の進展と医療画像解析の防御に関する総合的な調査を行う。また,医療画像解析のための異なる種類の敵攻撃と防御方法のための統一的理論的枠組みも提供する。公平な比較のために,様々なシナリオ下での対人訓練により得られた対人的堅牢な医療診断モデルのための新しいベンチマークを構築した。我々の知る限りでは、逆向きに堅牢な医療診断モデルの徹底的な評価を提供する最初の調査論文である。質的,定量的な結果を分析することで,医用画像解析システムにおける敵攻撃と防御の課題を解明し,今後の研究の方向性を明らかにした。

Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in deep medical diagnosis systems. In this exposition, we present a comprehensive survey on recent advances in adversarial attack and defense for medical image analysis with a novel taxonomy in terms of the application scenario. We also provide a unified theoretical framework for different types of adversarial attack and defense methods for medical image analysis. For a fair comparison, we establish a new benchmark for adversarially robust medical diagnosis models obtained by adversarial training under various scenarios. To the best of our knowledge, this is the first survey paper that provides a thorough evaluation of adversarially robust medical diagnosis models. By analyzing qualitative and quantitative results, we conclude this survey with a detailed discussion of current challenges for adversarial attack and defense in medical image analysis systems to shed light on future research directions.

翻訳日:2023-03-27 13:45:07 公開日:2023-03-24

# 量子鎖の準粒子状態におけるシャノンエントロピー

Shannon entropy in quasiparticle states of quantum chains ( http://arxiv.org/abs/2303.14132v1 )

ライセンス: Link先を確認

Wentao Ye and Jiaju Zhang

(参考訳) 本稿では,自由ボソニック鎖とフェルミイオン鎖の準粒子励起状態とスピン1/2xxx鎖の強磁性相において,全系とそのサブシステムのシャノンエントロピーとサブシステムシャノン相互情報について検討する。我々は, 単粒子および二重粒子状態に着目し, スケーリング限界における自由ボゾン鎖とフェルミオン鎖の様々な解析式を導出する。これらの公式は、ある条件下でのxxx鎖のマグノン励起状態にも適用できる。絡み合うエントロピーとは異なり、シャノンエントロピーは2つの準粒子が運動量差が大きい場合に分離しない。さらに、大きな運動量差極限では、準粒子の半古典的図では説明できない量子スピン鎖の普遍的な結果が得られる。

In this paper, we investigate the Shannon entropy of the total system and its subsystems, as well as the subsystem Shannon mutual information, in quasiparticle excited states of free bosonic and fermionic chains and the ferromagnetic phase of the spin-1/2 XXX chain. Our focus is on single-particle and double-particle states, and we derive various analytical formulas for free bosonic and fermionic chains in the scaling limit. These formulas are also applicable to magnon excited states in the XXX chain under certain conditions. We discover that, unlike entanglement entropy, Shannon entropy does not separate when two quasiparticles have a large momentum difference. Moreover, in the large momentum difference limit, we obtain universal results for quantum spin chains that cannot be explained by a semiclassical picture of quasiparticles.

翻訳日:2023-03-27 13:44:51 公開日:2023-03-24

# 貧乏の罪

The crime of being poor ( http://arxiv.org/abs/2303.14128v1 )

ライセンス: Link先を確認

Georgina Curto, Svetlana Kiritchenko, Isar Nejadgholi and Kathleen C. Fraser

(参考訳) 貧困の犯罪は、最も脆弱な人々に対する集団的偏見として広く非難されている。 ngoや国際機関は、貧困者が自らの状況で非難され、社会の富裕層よりも犯罪に関連し、貧乏であるために単に犯罪を犯すことが多いと主張している。貧困と全体犯罪率に相関する証拠は文献に見出されていないが、本稿は両概念を関連づけた集団的信念の証拠を提供する。この報告は、Twitterの自然言語処理(NLP)技術を用いて、富裕層と比較して犯罪と貧困層を関連付ける社会的バイアスを測定する。この論文は、8つの異なる英語圏のパネルで犯罪-貧困バイアスのレベルを定量化している。犯罪と貧困の関連性における地域差は、文学が財産犯罪と相関する不平等や失業のレベルによって正当化できない。地理的に異なる地域における犯罪・貧困バイアスの観測率の変動は、文化的要因や、特定の国における機会と社会的移動の平等を過大評価する傾向に影響される可能性がある。これらの結果は政策形成に影響を及ぼし、貧困軽減のための新たな研究の道を開き、貧困だけでなく社会全体にも焦点をあてる。貧困者に対する集団的偏見に基づいて行動することで、貧困削減政策の承認や、影響を受けた人々の尊厳の回復が促進される。

The criminalization of poverty has been widely denounced as a collective bias against the most vulnerable. NGOs and international organizations claim that the poor are blamed for their situation, are more often associated with criminal offenses than the wealthy strata of society and even incur criminal offenses simply as a result of being poor. While no evidence has been found in the literature that correlates poverty and overall criminality rates, this paper offers evidence of a collective belief that associates both concepts. This brief report measures the societal bias that correlates criminality with the poor, as compared to the rich, by using Natural Language Processing (NLP) techniques in Twitter. The paper quantifies the level of crime-poverty bias in a panel of eight different English-speaking countries. The regional differences in the association between crime and poverty cannot be justified based on different levels of inequality or unemployment, which the literature correlates to property crimes. The variation in the observed rates of crime-poverty bias for different geographic locations could be influenced by cultural factors and the tendency to overestimate the equality of opportunities and social mobility in specific countries. These results have consequences for policy-making and open a new path of research for poverty mitigation with the focus not only on the poor but on society as a whole. Acting on the collective bias against the poor would facilitate the approval of poverty reduction policies, as well as the restoration of the dignity of the persons affected.

翻訳日:2023-03-27 13:44:36 公開日:2023-03-24

# CIFAKE:AI生成合成画像の分類と説明可能な識別

CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images ( http://arxiv.org/abs/2303.14126v1 )

ライセンス: Link先を確認

Jordan J. Bird, Ahmad Lotfi

(参考訳) 近年の合成データの技術進歩により、人間が実際の写真とai(artificial intelligence)生成画像の違いを区別できないほど高品質な画像が生成されるようになった。本論文は,データの信頼性と認証の必要性を考慮し,コンピュータビジョンによるai画像認識能力を向上させることを目的とする。最初は、既に利用可能なcifar-10データセットの10のクラスと、実際の写真と比較してコントラストのあるイメージセットを提供する潜在拡散を反映する合成データセットが生成される。このモデルは、水中のフォトリアリスティック反射のような複雑な視覚特性を生成することができる。写真が本物かAIによって生成されるかに関して、バイナリ分類問題として存在する2つのデータセット。そこで本研究では,畳み込みニューラルネットワーク(CNN)を用いて,画像をリアルとフェイクの2つのカテゴリに分類する。ハイパーパラメータチューニングと36個のネットワークトポロジのトレーニングの後、最適なアプローチは92.98%の精度で画像を正しく分類することができた。最後に,グラデーションクラスアクティベーションマッピングによる説明可能なaiを実装し,画像内のどの特徴が分類に有用かを検討する。解釈は、特に、実際の実体自体が分類に有用な情報を持っていないことに注目し、画像の背景にある小さな視覚的欠陥に焦点を当てている。 CIFAKEデータセットと呼ばれるこの研究のために設計された完全なデータセットは、将来の研究のために研究コミュニティに公開されている。

Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.

翻訳日:2023-03-27 13:44:12 公開日:2023-03-24

# 多様なビデオのためのスケーラブルなニューラル表現に向けて

Towards Scalable Neural Representation for Diverse Videos ( http://arxiv.org/abs/2303.14124v1 )

ライセンス: Link先を確認

Bo He, Xitong Yang, Hanyu Wang, Zuxuan Wu, Hao Chen, Shuaiyi Huang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava

(参考訳) Inlicit Neural representations (INR)は、3Dシーンや画像の表現に注目が集まり、最近ビデオのエンコード(例えば、NeRV、E-NeRV)に応用されている。有望な結果を達成する一方で、既存のINRベースの手法は、少数のショートビデオ(UVGデータセットの7つの5秒ビデオなど)を冗長なビジュアルコンテンツで符号化することに限定され、個々のビデオフレームを独立して適合させ、多数の多様なビデオに対して効率よく拡張できないモデル設計につながる。本稿では,多彩な視覚コンテンツを含む長大な映像を符号化する,より実用的なセットアップのためのニューラル表現の開発に着目する。まず、動画を小さなサブセットに分割し、別々のモデルでエンコードする代わりに、長く多様なビデオを統一されたモデルでエンコードすることで、より良い圧縮結果が得られることを示す。そこで本研究では,多様な映像をエンコードするニューラル表現フレームワークD-NeRVを提案する。 (i)映像情報からクリップ特有の視覚コンテンツを分離すること。 (ii)暗黙のニューラルネットワークに時間的推論を導入すること、 (iii)中間出力としてタスク指向の流れを用い、空間的冗長性を低減すること。我々の新しいモデルは、ビデオ圧縮タスクにおけるUCF101およびUVGデータセット上のNERVおよび従来のビデオ圧縮技術を大きく上回っている。さらに、効率的なデータローダとして使用する場合、同じ圧縮比でUCF101データセット上のアクション認識タスクにおいて、D-NeRVはNeRVよりも3%-10%高い精度を達成する。

Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images, and have been recently applied to encode videos (e.g., NeRV, E-NeRV). While achieving promising results, existing INR-based methods are limited to encoding a handful of short videos (e.g., seven 5-second videos in the UVG dataset) with redundant visual content, leading to a model design that fits individual video frames independently and is not efficiently scalable to a large number of diverse videos. This paper focuses on developing neural representations for a more practical setup -- encoding long and/or a large number of videos with diverse visual content. We first show that instead of dividing videos into small subsets and encoding them with separate models, encoding long and diverse videos jointly with a unified model achieves better compression results. Based on this observation, we propose D-NeRV, a novel neural representation framework designed to encode diverse videos by (i) decoupling clip-specific visual content from motion information, (ii) introducing temporal reasoning into the implicit neural network, and (iii) employing the task-oriented flow as intermediate output to reduce spatial redundancies. Our new model largely surpasses NeRV and traditional video compression techniques on UCF101 and UVG datasets on the video compression task. Moreover, when used as an efficient data-loader, D-NeRV achieves 3%-10% higher accuracy than NeRV on action recognition tasks on the UCF101 dataset under the same compression ratios.

翻訳日:2023-03-27 13:43:48 公開日:2023-03-24

# Few-Shot画像認識のための意味プロンプト

Semantic Prompt for Few-Shot Image Recognition ( http://arxiv.org/abs/2303.14123v1 )

ライセンス: Link先を確認

Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu Tan

(参考訳) 新しいクラスを認識するためにいくつかの例が提供されているだけで、ほとんどショット学習は難しい問題である。いくつかの最近の研究は、セマンティックプロトタイプとビジュアルプロトタイプを組み合わせることで、稀なサンプルの問題に対処するために、クラス名のテキスト埋め込みのような追加のセマンティック情報を利用する。しかし、これらの手法は、稀なサポートサンプルから得られた視覚的特徴に悩まされ、限られた利益をもたらす。本稿では,単発学習のための新しい意味的プロンプト(sp)手法を提案する。セマンティクス情報を利用した分類器の修正に代えて,視覚特徴抽出ネットワークを適応的にチューニングするための提案としてセマンティクス情報を活用することを検討する。具体的には,特徴抽出器に意味的プロンプトを挿入する2つの補完機構を設計する。一つは意味的プロンプトと,自己アテンションによる空間的次元に沿ったパッチ埋め込みの相互作用を可能にすること,もうひとつはチャネル次元に沿って変換された意味的プロンプトで視覚的特徴を補うことである。これらの2つのメカニズムを組み合わせることで、特徴抽出器はクラス固有の特徴によりよい対応能力を示し、少数のサポートサンプルでより一般的なイメージ表現を得ることができる。 4つのデータセットに関する広範な実験を通じて、提案手法は有望な結果を達成し、1ショットの学習精度を平均3.67%向上させる。

Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.

翻訳日:2023-03-27 13:43:24 公開日:2023-03-24

# 分子オブザーバのフォールトトレラント量子計算

Fault-tolerant quantum computation of molecular observables ( http://arxiv.org/abs/2303.14118v1 )

ライセンス: Link先を確認

Mark Steudtner, Sam Morley-Short, William Pol, Sukin Sim, Cristian L. Cortes, Matthias Loipersberger, Robert M. Parrish, Matthias Degroote, Nikolaj Moll, Raffaele Santagati, Michael Streif

(参考訳) 過去30年間で、量子コンピュータを用いて分子ハミルトニアンの基底状態エネルギーを推定するコストが大幅に削減された。しかし,多くの産業用途において重要な,他の観測対象の観測対象の期待値の推定には,比較的注意が払われていない。本研究では,システムの任意の固有状態に対する任意の可観測値の期待値を推定するために適用可能な,新しい期待値推定(eve)量子アルゴリズムを提案する。特に、標準量子位相推定に基づく std-EVE と量子信号処理(QSP)技術を用いた QSP-EVE の2つの変種を考える。両変種について厳密な誤差解析を行い、QSPEVEの個別位相因子数を最小化する。これらの誤差分析により、様々な分子系と観測可能な領域にわたって、std-EVEとQSP-EVEの双方に対して、定数要素の量子リソース推定を作成できる。検討したシステムでは,QSP-EVEは最大3桁のゲート数を減少させ,std-EVEに比べて最大25%のビット幅を減少させる。第1世代のフォールトトレラント量子コンピュータでは、推定資源数はまだ高すぎるが、予測値推定と最新のQSPベースの技術の両方の適用において、我々の推定値が最初のものである。

Over the past three decades significant reductions have been made to the cost of estimating ground-state energies of molecular Hamiltonians with quantum computers. However, comparatively little attention has been paid to estimating the expectation values of other observables with respect to said ground states, which is important for many industrial applications. In this work we present a novel expectation value estimation (EVE) quantum algorithm which can be applied to estimate the expectation values of arbitrary observables with respect to any of the system's eigenstates. In particular, we consider two variants of EVE: std-EVE, based on standard quantum phase estimation, and QSP-EVE, which utilizes quantum signal processing (QSP) techniques. We provide rigorous error analysis for both both variants and minimize the number of individual phase factors for QSPEVE. These error analyses enable us to produce constant-factor quantum resource estimates for both std-EVE and QSP-EVE across a variety of molecular systems and observables. For the systems considered, we show that QSP-EVE reduces (Toffoli) gate counts by up to three orders of magnitude and reduces qubit width by up to 25% compared to std-EVE. While estimated resource counts remain far too high for the first generations of fault-tolerant quantum computers, our estimates mark a first of their kind for both the application of expectation value estimation and modern QSP-based techniques.

翻訳日:2023-03-27 13:42:46 公開日:2023-03-24

# 基礎・応用研究から見た注意機構による予測性能とモデル解釈可能性の向上

Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives ( http://arxiv.org/abs/2303.14116v1 )

ライセンス: Link先を確認

Shunsuke Kitada

(参考訳) ディープラーニング技術の劇的な進歩により、機械学習の研究は、モデル予測の解釈可能性の向上と、基礎研究と応用研究の両方における予測性能の向上に注力している。ディープラーニングモデルは従来の機械学習モデルよりもはるかに高い予測性能を持つが、特定の予測プロセスは解釈や説明が難しい。これは機械学習モデルのブラックボックス化として知られており、製造業、商業、ロボット工学などの幅広い研究分野において、そのような技術の使用が一般的になっている産業や、ミスを許容しない医療分野などにおいて、特に重要な問題として認識されている。この論文は著者の論文の要約に基づいている。論文の中で要約された研究は、近年注目されている注意機構に焦点をあて、予測性能と解釈可能性の向上の観点から基礎研究の可能性について論じ、実験室環境を超えて大規模なデータセットを用いて実世界の応用に応用した研究を行った。この論文はまた、これらの発見がその後の研究や今後の分野の展望にもたらす意味をまとめて締めくくっている。

With the dramatic advances in deep learning technology, machine learning research is focusing on improving the interpretability of model predictions as well as prediction performance in both basic and applied research. While deep learning models have much higher prediction performance than traditional machine learning models, the specific prediction process is still difficult to interpret and/or explain. This is known as the black-boxing of machine learning models and is recognized as a particularly important problem in a wide range of research fields, including manufacturing, commerce, robotics, and other industries where the use of such technology has become commonplace, as well as the medical field, where mistakes are not tolerated. This bulletin is based on the summary of the author's dissertation. The research summarized in the dissertation focuses on the attention mechanism, which has been the focus of much attention in recent years, and discusses its potential for both basic research in terms of improving prediction performance and interpretability, and applied research in terms of evaluating it for real-world applications using large data sets beyond the laboratory environment. The dissertation also concludes with a summary of the implications of these findings for subsequent research and future prospects in the field.

翻訳日:2023-03-27 13:42:23 公開日:2023-03-24

# 逆の例を見つけるのに何次元が必要か?

How many dimensions are required to find an adversarial example? ( http://arxiv.org/abs/2303.14173v1 )

ライセンス: Link先を確認

Charles Godfrey, Henry Kvinge, Elise Bishoff, Myles Mckay, Davis Brown, Tim Doster, and Eleanor Byler

(参考訳) 敵の脆弱性を探究する過去の研究は、敵がモデル入力のすべての次元を摂動できる状況に焦点を当ててきた。一方、近年の研究ではどちらの場合も考慮している。 (i)敵は、限られた数の入力パラメータを乱すことができる。 (ii)マルチモーダル問題におけるモダリティの部分集合。どちらの場合も、逆例は、周囲の入力空間$\mathcal{X}$内の部分空間$V$に効果的に制約される。これに動機づけられたこの研究では、敵の脆弱性がどのように$\dim(V)$に依存するかを調べる。特に、$\ell^p$の通常の制約を持つ標準的なpgd攻撃の敵意的な成功は、$\epsilon (\frac{\dim(v)}{\dim \mathcal{x}})^{\frac{1}{q}}$の単調に増加する関数のように振る舞う。この関数形式は単純な玩具線形モデルから容易に導出することができ、その結果は高次元空間上の局所線型モデルに対して逆例が固有であるという議論にさらなる信頼を与える。

Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.

翻訳日:2023-03-27 13:36:28 公開日:2023-03-24

# 局在軌道間の物理的絡み合い

Physical Entanglement Between Localized Orbitals ( http://arxiv.org/abs/2303.14170v1 )

ライセンス: Link先を確認

Lexin Ding, Gesa D\"unnweber, Christian Schilling

(参考訳) ArXiv:2207.03377]では、現実的な電子系に適用可能な忠実絡み合い尺度の最初の閉じた公式が導出された。本研究は,量子技術開発を導くという究極の目標をもって,この重要な成果を生かしたものである。そのため、まず原子、分子、固体体などの電子系における絡み合い交換の過程を明らかにする。このことは、局所化された小軌道サブシステムへの参照と、数値パリティ選択規則の実装の両方の必要性を明確に示している。したがって、ウィックの定理により、自由電子鎖の部位間の真の物理的絡み合いの完全な解析的研究を行う。その意味では、そのような分析分析を単位不変な設定、すなわち鎖をより非現実的でマクロ的に大きなサブシステムに分割することを制限する共通のパラダイムを破る。次に、このモデルを相互作用する電子の水素環にアップグレードし、探索された局在軌道を構築する。両システムとも,充填率が十分に低い場合,長距離絡み合いの存在が確認される。

In [arXiv:2207.03377] the first closed formula of a faithful entanglement measure applicable to realistic electron systems has been derived. In the present work, we build on this key achievement with the ultimate goal of guiding the development of quantum technologies. For this, we first elucidate the process of entanglement swapping in electron systems such as atoms, molecules or solid bodies. This clearly demonstrates the necessity of both the reference to localized few-orbital subsystems and the implementation of the number-parity superselection rule. Accordingly, in virtue of Wick's theorem, we then provide a fully analytical study of the true physical entanglement between sites in free electron chains. In that sense, we break the common paradigm of restricting such analytical analyses to unitarily invariant settings, i.e. bipartitions of the chain into rather impractical, macroscopically large subsystems. We then upgrade this model to a hydrogen ring of interacting electrons and construct the sought-after localized orbitals. For both systems, we confirm the presence of long-distance entanglement, provided the filling fractions are sufficiently low/high.

翻訳日:2023-03-27 13:35:56 公開日:2023-03-24

# 都市GIRAFFE:構成生成型ニューラル特徴場としての都市景観の表現

UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields ( http://arxiv.org/abs/2303.14167v1 )

ライセンス: Link先を確認

Yuanbo Yang, Yifei Yang, Hanlei Guo, Rong Xiong, Yue Wang, Yiyi Liao

(参考訳) AR/VRやシミュレーションを含む多くのアプリケーションにおいて、カメラポーズやシーン内容の制御が可能なフォトリアリスティック画像の生成が不可欠である。 3D認識生成モデルで急速に進歩しているにもかかわらず、既存の手法のほとんどはオブジェクト中心の画像に焦点を当てており、自由カメラ視点制御やシーン編集のための都市シーンの生成には適用できない。そこで本稿では,難易度の高い3dパンオプティクスを用いた3d認識生成モデルを導出するために,可算物と可算物体のレイアウト分布を含む粗い3dパンオプティクスを用いた都市giraffeを提案する。私たちのモデルは、シーンを物、物、空に分解するので、構成と制御が可能です。セマンティクスボクセルグリッド(semantic voxel grids)の形式に先立って、粗いセマンティクスと幾何情報を効果的に組み込んだ条件付き生成器を構築します。事前のオブジェクトレイアウトにより、散らかったシーンからオブジェクトジェネレータを学ぶことができます。適切な損失関数により,大規模なカメラの動き,物体の編集,物体の操作など,様々な制御性を持つ光リアルな3D認識画像合成が容易となる。 kitti-360データセットを含む合成データと実世界のデータセットの両方において,モデルの有効性を検証する。

Generating photorealistic images with controllable camera pose and scene contents is essential for many applications including AR/VR and simulation. Despite the fact that rapid progress has been made in 3D-aware generative models, most existing methods focus on object-centric images and are not applicable to generating urban scenes for free camera viewpoint control and scene editing. To address this challenging task, we propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior, including the layout distribution of uncountable stuff and countable objects, to guide a 3D-aware generative model. Our model is compositional and controllable as it breaks down the scene into stuff, objects, and sky. Using stuff prior in the form of semantic voxel grids, we build a conditioned stuff generator that effectively incorporates the coarse semantic and geometry information. The object layout prior further allows us to learn an object generator from cluttered scenes. With proper loss functions, our approach facilitates photorealistic 3D-aware image synthesis with diverse controllability, including large camera movement, stuff editing, and object manipulation. We validate the effectiveness of our model on both synthetic and real-world datasets, including the challenging KITTI-360 dataset.

翻訳日:2023-03-27 13:35:40 公開日:2023-03-24

# BundleSDF:ニューラル6-DoF追跡と未知物体の3次元再構成

BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects ( http://arxiv.org/abs/2303.14158v1 )

ライセンス: Link先を確認

Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield

(参考訳) 本稿では,モノクロRGBDビデオシーケンスから未知物体の6-DoF追跡をリアルタイムに行うとともに,物体のニューラル3D再構成を行う。視覚的テクスチャがほとんど欠如している場合でも,任意の剛体オブジェクトに対して有効である。オブジェクトは第1フレームのみにセグメント化されていると仮定される。追加情報は不要で、相互作用エージェントに関する仮定は不要である。提案手法の鍵となるのは,形状と外観の両方を捉える一貫した3次元表現にロバストに情報を蓄積するために,ポーズグラフ最適化プロセスと並行して学習するニューラルオブジェクトフィールドである。これらのスレッド間の通信を容易にするために、ポーズ付きメモリフレームの動的プールが自動的に維持される。提案手法では,大きなポーズ変化,部分的および完全閉塞,無テクスチャ面,特異なハイライトなどの課題に対処する。 ho3d、ycbineoat、behavior datasetsの結果を示し、この手法が既存のアプローチを大きく上回ることを示した。プロジェクトページ: https://bundlesdf.github.io

We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is made about the interaction agent. Key to our method is a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation capturing both geometry and appearance. A dynamic pool of posed memory frames is automatically maintained to facilitate communication between these threads. Our approach handles challenging sequences with large pose changes, partial and full occlusion, untextured surfaces, and specular highlights. We show results on HO3D, YCBInEOAT, and BEHAVE datasets, demonstrating that our method significantly outperforms existing approaches. Project page: https://bundlesdf.github.io

翻訳日:2023-03-27 13:35:16 公開日:2023-03-24

# カラムローアンタングル型画素合成による高効率スケール不変発電機

Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis ( http://arxiv.org/abs/2303.14157v1 )

ライセンス: Link先を確認

Thuan Hoang Nguyen, Thanh Van Le, Anh Tran

(参考訳) 任意のスケールの画像合成は、任意のスケールで写真リアルな画像を合成する、効率的でスケーラブルなソリューションを提供する。しかし、既存のGANベースのソリューションは畳み込みと階層アーキテクチャに過度に依存するため、出力解像度をスケールする際、一貫性と$``$texture sticking$"$問題が発生する。別の観点では、inrベースのジェネレータは設計によってスケール等価であるが、その巨大なメモリフットプリントと遅い推論は、大規模またはリアルタイムシステムでこれらのネットワークを採用することを妨げている。本研究では,空間的畳み込みや粗雑な設計を使わずに,効率的かつスケール等価な新しい生成モデルである$\textbf{c}$olumn-$\textbf{r}$ow$\textbf{e}$ntangled$\textbf{p}$ixel$\textbf{s}$ynthesis (\textbf{creps}$)を提案する。メモリフットプリントを節約し、システムをスケーラブルにするために、レイヤ毎の機能マップを$`$thick$"$カラムと行エンコーディングに分割する、新しい双方向表現を採用しました。 FFHQ、LSUN-Church、MetFaces、Flickr-Sceneryといったさまざまなデータセットの実験では、CREPSが適切なトレーニングと推論速度で任意の解像度でスケール一貫性とエイリアスのない画像を合成する能力を確認している。コードはhttps://github.com/VinAIResearch/CREPS.comから入手できる。

Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.

翻訳日:2023-03-27 13:34:57 公開日:2023-03-24

# 医用画像認識のための局所コントラスト学習

Local Contrastive Learning for Medical Image Recognition ( http://arxiv.org/abs/2303.14153v1 )

ライセンス: Link先を確認

S. A. Rizvi, R. Tang, X. Jiang, X. Ma, X. Hu

(参考訳) 放射線画像解析におけるDeep Learning (DL) を用いた手法の普及は,専門家による放射線学データに対する大きな需要を生み出している。最近の自己監督型フレームワークは、関連する放射線学レポートから専門家のラベル付けの必要性を軽減している。しかし、これらのフレームワークは、医学画像の異なる病理の微妙な違いを区別するのに苦労している。さらに、それらの多くは画像領域とテキストの解釈を提供しておらず、放射線科医がモデル予測を評価するのが困難である。本研究では,画像領域選択のためのレイヤの追加と相互モダリティの相互作用を目的とした,フレキシブルな微調整フレームワークであるLRCLRを提案する。胸部x線検査の結果から,lrclrは重要な局所画像領域を同定し,胸部x線医学的所見のゼロショット性能を改善しつつ,放射線学的テキストに対して有意義な解釈を行っていることが示唆された。

The proliferation of Deep Learning (DL)-based methods for radiographic image analysis has created a great demand for expert-labeled radiology data. Recent self-supervised frameworks have alleviated the need for expert labeling by obtaining supervision from associated radiology reports. These frameworks, however, struggle to distinguish the subtle differences between different pathologies in medical images. Additionally, many of them do not provide interpretation between image regions and text, making it difficult for radiologists to assess model predictions. In this work, we propose Local Region Contrastive Learning (LRCLR), a flexible fine-tuning framework that adds layers for significant image region selection as well as cross-modality interaction. Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text while improving zero-shot performance on several chest x-ray medical findings.

翻訳日:2023-03-27 13:34:20 公開日:2023-03-24

# 幻想的な破片:現実世界の壊れた物体とその完全なカウンターの3Dスキャンデータ

Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts ( http://arxiv.org/abs/2303.14152v1 )

ライセンス: Link先を確認

Nikolas Lamb, Cameron Palmer, Benjamin Molloy, Sean Banerjee, Natasha Kholgade Banerjee

(参考訳) 自動形状修正アプローチは現在、現実世界の損傷形状を記述するデータセットへのアクセスを欠いている。 https://terascale-all-sensing-research-studio.github.io/fantasticbreaks)は、78個の壊れたオブジェクトのスキャン、防水、クリーンな3dメッシュを含むデータセット。 Fantastic Breaksには、クラスとマテリアルラベル、壊れたメッシュに結合して完全なメッシュを生成する修復部品の合成プロキシ、手動で注釈付き骨折境界が含まれている。フラクチャー幾何学の詳細な解析を通して, ファンタスティック・ブレークと幾何学的および物理的手法を用いて生成した合成破砕物のデータセットの違いを明らかにする。本稿では,Fantastic Breaks を用いた形状修復実験の結果を,合成データセットを用いて事前学習し,Fantastic Breaks のサブセットを用いて再訓練した。

Automated shape repair approaches currently lack access to datasets that describe real-world damage geometry. We present Fantastic Breaks (and Where to Find Them: https://terascale-all-sensing-research-studio.github.io/FantasticBreaks), a dataset containing scanned, waterproofed, and cleaned 3D meshes for 78 broken objects, paired and geometrically aligned with complete counterparts. Fantastic Breaks contains class and material labels, synthetic proxies of repair parts that join to broken meshes to generate complete meshes, and manually annotated fracture boundaries. Through a detailed analysis of fracture geometry, we reveal differences between Fantastic Breaks and datasets of synthetically fractured objects generated using geometric and physics-based methods. We show experimental results of shape repair with Fantastic Breaks using multiple learning-based approaches pre-trained using a synthetic dataset and re-trained using a subset of Fantastic Breaks.

翻訳日:2023-03-27 13:34:04 公開日:2023-03-24

# double descent demystified: 深層学習パズルの源を同定、解釈、補間する

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle ( http://arxiv.org/abs/2303.14151v1 )

ライセンス: Link先を確認

Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo

(参考訳) ダブル降下は機械学習において驚くべき現象であり、モデルパラメータ数がデータ数に対して増加するにつれて、モデルが大きくなり、テストエラーが減少し、高度に過大にパラメータ化(データサンプル化)される。このテストエラーの減少は、オーバーフィッティングに関する古典的な学習理論に反し、機械学習における大きなモデルの成功を暗示している。このテスト損失の非単調な振る舞いは、データの数、データの次元性、モデルパラメータの数に依存する。ここでは、二重降下を簡潔に記述し、なぜ二重降下が非公式で接近可能な方法で起こるのかを説明し、線型代数と導入確率にのみ親しむ必要がある。多項式回帰を用いた視覚的直観を提供し、次に通常の線形回帰を用いて2重降下を数学的に解析し、同時に3つの解釈可能な因子を同定する。通常の線形回帰を用いた場合, 2重降下は実データ上で起こることを実証し, いずれかの因子が崩壊しても2重降下は起こらないことを示した。重ね合わせと二重降下に関する非線形モデルにおける最近の観測に光を当てるために、この理解を用いる。コードは公開されている。

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.

翻訳日:2023-03-27 13:33:42 公開日:2023-03-24

# 離散対称性の進化

Evolution of Discrete Symmetries ( http://arxiv.org/abs/2303.14150v1 )

ライセンス: Link先を確認

P. Schmelcher

(参考訳) 対称性は重要な物理特性を規定することが知られており、特に波動構造や結果として生じる伝播力学を含む波動物理学における設計原理として用いられる。局所対称性は、空間の有限領域にしか持たない対称性という意味では、自己組織化過程の結果か、合成合成された物理系への構造的成分の結果である。与えられた有限鎖を拡張するために局所対称性演算を適用することで、結果の1次元格子は過渡的およびその後の周期的挙動からなることを示す。建設により、埋め込みされた局所対称性が強に重なり、結果として生じる格子はそのような対称性の密度の高い骨格を持つ。この挙動を局所対称性演算のクラスに基づいて証明し、最終周期や単位セルの分解、過渡的な長さと分解といった「漸近的」な性質を結論付けることができる。例として、対応するタイト結合ハミルトニアンを考察する。それらのエネルギー固有値スペクトルと固有状態は、いくつかの詳細で分析され、特に局所対称性の多元性の存在による固有状態の局在特性の強い変動を示す。

Symmetries are known to dictate important physical properties and can be used as a design principle in particular in wave physics, including wave structures and the resulting propagation dynamics. Local symmetries, in the sense of a symmetry that holds only in a finite domain of space, can be either the result of a self-organization process or a structural ingredient into a synthetically prepared physical system. Applying local symmetry operations to extend a given finite chain we show that the resulting one-dimensional lattice consists of a transient followed by a subsequent periodic behaviour. Due to the fact that, by construction, the implanted local symmetries strongly overlap the resulting lattice possesses a dense skeleton of such symmetries. We proof this behaviour on the basis of a class of local symmetry operations allowing us to conclude upon the 'asymptotic' properties such as the final period, decomposition of the unit-cell and the length and decomposition of the transient. As an example case, we explore the corresponding tight-binding Hamiltonians. Their energy eigenvalue spectra and eigenstates are analyzed in some detail, showing in particular the strong variability of the localization properties of the eigenstates due to the presence of a plethora of local symmetries.

翻訳日:2023-03-27 13:33:17 公開日:2023-03-24

# パーティーの準備」:大規模言語モデルの助けを借りてスマートなスマートスペースを探る

"Get ready for a party": Exploring smarter smart spaces with help from large language models ( http://arxiv.org/abs/2303.14143v1 )

ライセンス: Link先を確認

Evan King, Haoxiang Yu, Sangsu Lee, and Christine Julien

(参考訳) パーティーの準備ができている」と言う人に対する正しい反応は、意味と文脈に深く影響されている。スマートホームアシスタント(例えばgoogle home)にとって、理想的な反応は、家庭で利用可能なデバイスを調査し、その状態を変えてお祝いの雰囲気を作り出すことだ。現在の実用的なシステムでは,(1)抽象文の背後にある意味を推測する機能,(2)その推論をコンテキスト(例えば,特定のデバイスの設定を変更する)に適した具体的な行動コースにマップする機能が必要となるため,そのような要求を処理できない。本稿では、GPT-3のような最近のタスク非依存の大規模言語モデル(LLM)が、既存のルールベースのホームアシスタントシステムに欠けている、膨大な量のクロスドメイン、時には予測不可能な文脈的知識を具現化しているという観察を活用する。まず、LLMをコマンド推論とアクション計画の中心に配置するシステムの実現可能性について検討し、LCMが「パーティーの準備が整う」といったあいまいでコンテキスト依存的なコマンドの背後にある意図を推論し、スマートデバイスを制御するために使用できる具体的な機械パース可能な命令に応答する能力を示す。さらに、LLMが実際のデバイスを制御するための概念実証を行い、微調整やタスク固有の訓練を伴わずに、意図を推論し、デバイス状態を適切に変更する能力を示す。我々の研究は、スマート環境における文脈認識のためのLLM駆動システムの実現を示唆し、この分野における今後の研究を動機付けている。

The right response to someone who says "get ready for a party" is deeply influenced by meaning and context. For a smart home assistant (e.g., Google Home), the ideal response might be to survey the available devices in the home and change their state to create a festive atmosphere. Current practical systems cannot service such requests since they require the ability to (1) infer meaning behind an abstract statement and (2) map that inference to a concrete course of action appropriate for the context (e.g., changing the settings of specific devices). In this paper, we leverage the observation that recent task-agnostic large language models (LLMs) like GPT-3 embody a vast amount of cross-domain, sometimes unpredictable contextual knowledge that existing rule-based home assistant systems lack, which can make them powerful tools for inferring user intent and generating appropriate context-dependent responses during smart home interactions. We first explore the feasibility of a system that places an LLM at the center of command inference and action planning, showing that LLMs have the capacity to infer intent behind vague, context-dependent commands like "get ready for a party" and respond with concrete, machine-parseable instructions that can be used to control smart devices. We furthermore demonstrate a proof-of-concept implementation that puts an LLM in control of real devices, showing its ability to infer intent and change device state appropriately with no fine-tuning or task-specific training. Our work hints at the promise of LLM-driven systems for context-awareness in smart environments, motivating future research in this area.

翻訳日:2023-03-27 13:32:58 公開日:2023-03-24

# Masked Scene Contrast: 教師なし3D表現学習のためのスケーラブルなフレームワーク

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning ( http://arxiv.org/abs/2303.14191v1 )

ライセンス: Link先を確認

Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao

(参考訳) 先駆的な研究として、PointContrastは生のRGB-Dフレーム上のコントラスト学習を活用して教師なしの3D表現学習を行い、様々な下流タスクにおいてその効果を証明する。しかし、rgb-dフレームをコントラストビューとしてマッチングする非効率性と、前述したような煩わしいモード崩壊現象という2つの障害により、3dでの大規模非教師なし学習の傾向はまだ現れていない。筆者らはまず,2つのスタブルブロックを経験的ステップストーンに変換し,よく計算されたデータ拡張パイプラインと実用的なビューミキシング戦略により,シーンレベルの点雲に直接コントラストビューを生成する,効率的かつ効果的なコントラスト学習フレームワークを提案する。次に,ポイントカラーとサーフェルノーマルの再構築を目標としたコントラストクロスマスクをデザインしたコントラスト学習フレームワークの再構築学習について紹介する。マスキングシーンコントラスト(msc)フレームワークは,包括的3次元表現をより効率的かつ効果的に抽出することができる。トレーニング前の手順を少なくとも3倍に加速し、以前の作業と比べて未妥協のパフォーマンスを実現している。さらに、MSCは複数のデータセットにわたる大規模な3D事前トレーニングを可能にし、パフォーマンスをさらに向上し、ScanNetセマンティックセグメンテーション検証セットの75.5% mIoUなど、いくつかの下流タスクで最先端の微調整結果を達成する。

As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks. However, the trend of large-scale unsupervised learning in 3D has yet to emerge due to two stumbling blocks: the inefficiency of matching RGB-D frames as contrastive views and the annoying mode collapse phenomenon mentioned in previous works. Turning the two stumbling blocks into empirical stepping stones, we first propose an efficient and effective contrastive learning framework, which generates contrastive views directly on scene-level point clouds by a well-curated data augmentation pipeline and a practical view mixing strategy. Second, we introduce reconstructive learning on the contrastive learning framework with an exquisite design of contrastive cross masks, which targets the reconstruction of point color and surfel normal. Our Masked Scene Contrast (MSC) framework is capable of extracting comprehensive 3D representations more efficiently and effectively. It accelerates the pre-training procedure by at least 3x and still achieves an uncompromised performance compared with previous work. Besides, MSC also enables large-scale 3D pre-training across multiple datasets, which further boosts the performance and achieves state-of-the-art fine-tuning results on several downstream tasks, e.g., 75.5% mIoU on ScanNet semantic segmentation validation set.

翻訳日:2023-03-27 13:27:26 公開日:2023-03-24

# WildLight:フラッシュライトを使った逆レンダリング

WildLight: In-the-wild Inverse Rendering with a Flashlight ( http://arxiv.org/abs/2303.14190v1 )

ライセンス: Link先を確認

Ziang Cheng, Junxuan Li, Hongdong Li

(参考訳) 本稿では,未知の環境光の下での逆レンダリングの課題に対する実用的な測光手法を提案する。本システムは,スマートフォンで撮影した多視点画像のみを用いて,シーン形状と反射率を復元する。重要なアイデアは、スマートフォンの内蔵フラッシュライトを最小制御光源として活用し、画像強度を2つのフォトメトリックコンポーネントに分解することだ。我々の方法では、フラッシュ/非フラッシュ画像はペアでキャプチャする必要がない。ニューラル・ライト・フィールドの成功に基づき、オフ・ザ・シェルフ法を用いて周囲の反射を捉え、フラッシュライト・コンポーネントは物理的に正確な光度制約により反射率と照明を分離する。既存の逆レンダリング手法と比較して,非暗室環境に適用できるが,環境反射を明示的に解くことの難しさは回避できる。提案手法は実装が容易で,セットアップも容易で,既存の逆レンダリング技術よりも一貫して優れていることを示す。最後に,産業用レンダラ用に用意されたpbrテクスチャトライアングルメッシュに,神経再構成を容易にエクスポートできる。

This paper proposes a practical photometric solution for the challenging problem of in-the-wild inverse rendering under unknown ambient lighting. Our system recovers scene geometry and reflectance using only multi-view images captured by a smartphone. The key idea is to exploit smartphone's built-in flashlight as a minimally controlled light source, and decompose image intensities into two photometric components -- a static appearance corresponds to ambient flux, plus a dynamic reflection induced by the moving flashlight. Our method does not require flash/non-flash images to be captured in pairs. Building on the success of neural light fields, we use an off-the-shelf method to capture the ambient reflections, while the flashlight component enables physically accurate photometric constraints to decouple reflectance and illumination. Compared to existing inverse rendering methods, our setup is applicable to non-darkroom environments yet sidesteps the inherent difficulties of explicit solving ambient reflections. We demonstrate by extensive experiments that our method is easy to implement, casual to set up, and consistently outperforms existing in-the-wild inverse rendering techniques. Finally, our neural reconstruction can be easily exported to PBR textured triangle mesh ready for industrial renderers.

翻訳日:2023-03-27 13:26:53 公開日:2023-03-24

# FastViT:構造リパラメータを用いた高速ハイブリッドビジョントランス

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization ( http://arxiv.org/abs/2303.14189v1 )

ライセンス: Link先を確認

Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

(参考訳) 近年の変圧器と畳み込み設計の融合により、モデルの精度と効率が着実に改善されている。本稿では,最先端のレイテンシ-精度トレードオフを得るハイブリッドビジョントランスフォーマーアーキテクチャであるFastViTを紹介する。この目的のために,FastViTのビルディングブロックである新しいトークンミキシング演算子RepMixerを導入する。さらに、列車時間オーバーパラメータ化と大きなカーネル畳み込みを適用して精度を高め、これらの選択が遅延に与える影響を実証的に示します。我々のモデルは、最近の最先端ハイブリッドトランスフォーマーアーキテクチャであるCMTよりも3.5倍速く、EfficientNetより4.9倍速く、ImageNetデータセットと同じ精度でモバイルデバイス上のConvNeXtより1.9倍速い。同様のレイテンシでは、MobileOneよりもImageNetのTop-1精度が4.2%向上しています。私たちのモデルは、画像分類、検出、セグメンテーション、および3Dメッシュレグレッションといった、いくつかのタスクで競合するアーキテクチャを一貫して上回ります。さらに,本モデルは分布外サンプルや腐敗に対して非常に堅牢であり,競合するロバストモデルよりも優れている。

The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network. We further apply train-time overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency. We show that - our model is 3.5x faster than CMT, a recent state-of-the-art hybrid transformer architecture, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset. At similar latency, our model obtains 4.2% better Top-1 accuracy on ImageNet than MobileOne. Our model consistently outperforms competing architectures across several tasks -- image classification, detection, segmentation and 3D mesh regression with significant improvement in latency on both a mobile device and a desktop GPU. Furthermore, our model is highly robust to out-of-distribution samples and corruptions, improving over competing robust models.

翻訳日:2023-03-27 13:26:35 公開日:2023-03-24

# TRAK: スケールでのモデル行動への貢献

TRAK: Attributing Model Behavior at Scale ( http://arxiv.org/abs/2303.14186v1 )

ライセンス: Link先を確認

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry

(参考訳) データ帰属の目的は、モデルの予測をトレーニングデータに遡ることである。この目標への長い努力にもかかわらず、データ帰属に対する既存のアプローチは、ユーザに計算の扱いやすさと有効性を選択させる傾向がある。すなわち、計算可能な手法は、非凸設定(ディープニューラルネットワークの文脈など)におけるモデル予測の正確な帰属に苦労するが、そのような手法では、数千のモデルを訓練する必要があるため、大規模モデルやデータセットでは実用的でない。本稿では,大規模で微分可能なモデルに対して,有効かつ計算的に抽出可能なデータ帰属法であるTRAK(Tracing with the Randomly-projected After Kernel)を紹介する。特に、わずかに訓練されたモデルを活用することで、TRAKは何千ものモデルのトレーニングを必要とする属性メソッドのパフォーマンスにマッチすることができる。我々は、イメージネットで訓練された画像分類器、視覚言語モデル(CLIP)、言語モデル(BERT、mT5)のTRAKの有用性を実証する。私たちは https://github.com/MadryLab/trak で TRAK を使用するためのコードを提供しています。

The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .

翻訳日:2023-03-27 13:26:11 公開日:2023-03-24

# Make-It-3D:拡散前の単一画像からの高忠実度3D創出

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior ( http://arxiv.org/abs/2303.14184v1 )

ライセンス: Link先を確認

Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen

(参考訳) 本研究では,1枚の画像のみから高忠実度3Dコンテンツを作成する問題について検討する。基本的には下層の3d幾何学を推定し、目に見えないテクスチャを同時に幻覚させる。この課題に対処するために,訓練された2次元拡散モデルからの事前知識を活用し,3次元生成のための3次元認識監督を行う。提案手法であるMake-It-3Dは,2段階の最適化パイプラインを用いており,第1段階は前部からの基準画像からの制約を取り入れ,第2段階は粗いモデルをテクスチャ化された点雲に変換し,第2段階は参照画像から高品質なテクスチャを活用しながら,拡散により現実性を高める。広汎な実験により,本手法は先行研究よりも大きなマージンを達成し,忠実な再建と印象的な視覚的品質を実現した。本手法は,汎用オブジェクトの単一画像から高品質な3D作成を実現するための最初の試みであり,テキスト・ツー・3D作成やテクスチャ編集などの様々な応用を可能にする。

In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.

翻訳日:2023-03-27 13:25:48 公開日:2023-03-24

# すべてを支配する1つのプロトコル? 相互運用可能なメッセージングのセキュリティについて

One Protocol to Rule Them All? On Securing Interoperable Messaging ( http://arxiv.org/abs/2303.14178v1 )

ライセンス: Link先を確認

Jenny Blessing and Ross Anderson

(参考訳) 欧州の議員は、異なるプラットフォーム上のユーザーが互いにメッセージを交換できるべきだと裁定した。しかし、メッセージングの相互運用性は、Pandoraのセキュリティとプライバシの課題の箱を開く。反トラスト対策としてだけでなく、エンドユーザにより良いエクスペリエンスを提供する手段としても支持されているが、相互運用性は、貧弱な実行時にユーザエクスペリエンスを悪化させるリスクを負う。実際のメッセージ交換を有効にする方法と、あるサービスプロバイダから別のサービスプロバイダに渡される暗号化メッセージから生じる多数の残余の課題にどのように対処するか – コンテンツモデレーション、ユーザ認証、キー管理、プロバイダ間のメタデータ共有など – という2つの基本的な疑問がある。本研究では、エンドツーエンドの暗号化メッセージにおける相互運用可能な通信に関する特定のオープンな質問と課題を特定し、これらの課題に取り組むためのハイレベルな提案を示す。

European lawmakers have ruled that users on different platforms should be able to exchange messages with each other. Yet messaging interoperability opens up a Pandora's box of security and privacy challenges. While championed not just as an anti-trust measure but as a means of providing a better experience for the end user, interoperability runs the risk of making the user experience worse if poorly executed. There are two fundamental questions: how to enable the actual message exchange, and how to handle the numerous residual challenges arising from encrypted messages passing from one service provider to another -- including but certainly not limited to content moderation, user authentication, key management, and metadata sharing between providers. In this work, we identify specific open questions and challenges around interoperable communication in end-to-end encrypted messaging, and present high-level suggestions for tackling these challenges.

翻訳日:2023-03-27 13:25:04 公開日:2023-03-24

# 教師なしドメインディスカバリによるエキスパート言語モデルのスケーリング

Scaling Expert Language Models with Unsupervised Domain Discovery ( http://arxiv.org/abs/2303.14177v1 )

ライセンス: Link先を確認

Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

(参考訳) 大規模言語モデルは一般的に密に訓練され、全てのパラメータは全ての入力に対して更新される。これは数千のGPU間で数十億のパラメータの同期を必要とする。任意のテキストコーパス上で,大小の言語モデルを非同期に訓練する,シンプルだが効果的な手法を提案する。提案手法では,コーパスを関連文書の集合にクラスタリングし,各クラスタ上で個別の専門家言語モデルを学習し,それらを疎結合に組み合わせて推論を行う。このアプローチは、各エキスパートのドメインを自動的に発見することで、恥ずかしい並列トレーニングを一般化し、既存のスパース言語モデルのほとんどすべての通信オーバーヘッドを取り除く。分析の結果,有意義なクラスタに専門家を特化することが,これらの向上の鍵であることがわかった。また、専門家の数やトレーニングデータのサイズによってパフォーマンスが向上し、これは大規模な言語モデルをトレーニングするための非常に効率的でアクセスしやすいアプローチであることを示唆している。

Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introduce a simple but effective method to asynchronously train large, sparse language models on arbitrary text corpora. Our method clusters a corpus into sets of related documents, trains a separate expert language model on each cluster, and combines them in a sparse ensemble for inference. This approach generalizes embarrassingly parallel training by automatically discovering the domains for each expert, and eliminates nearly all the communication overhead of existing sparse language models. Our technique outperforms dense baselines on multiple corpora and few-shot tasks, and our analysis shows that specializing experts to meaningful clusters is key to these gains. Performance also improves with the number of experts and size of training data, suggesting this is a highly efficient and accessible approach to training large language models.

翻訳日:2023-03-27 13:24:47 公開日:2023-03-24

# 低消費電力・低レイテンシ視覚知覚のためのハイブリッドANN-SNNアーキテクチャ

A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception ( http://arxiv.org/abs/2303.14176v1 )

ライセンス: Link先を確認

Asude Aydin, Mathias Gehrig, Daniel Gehrig, and Davide Scaramuzza

(参考訳) Spiking Neural Networks(SNN)は、バイオインスパイアされたニューラルネットワークのクラスで、非同期およびスパース処理を通じてエッジデバイスに低電力および低レイテンシ推論をもたらすことを約束する。しかしながら、時相モデルであるSNNは、古典的人工ニューラルネットワーク(ANN)と同等の予測を生成するために、表現的状態に大きく依存している。これらの状態は、長い過渡期の後だけ収束し、入力データなしで急速に崩壊し、より高いレイテンシ、消費電力、精度が低下する。この作業は、補助的なANNが低速で動作する状態の初期化によってこの問題に対処する。その後、SNNは状態を使用して、次の初期化フェーズまで高時間分解能の予測を生成する。我々のハイブリッドANN-SNNモデルは、両者の長所を結合する: ANNのおかげで長い状態の過渡性と状態崩壊に悩まされず、SNNのおかげで高時間分解能、低レイテンシ、低電力で予測を生成することができる。イベントベース2Dおよび3Dヒューマンポーズ推定の課題について,提案手法は,同じ推論速度で実行した場合のANNと比べ,性能を4%低下させることなく88%の消費電力を消費することを示した。さらに,snsと比較した場合,誤差は74%低減した。この研究は、それぞれの利益を最大化するために、ANNとSNNをどのように使用できるか、新たな理解を提供する。

Spiking Neural Networks (SNN) are a class of bio-inspired neural networks that promise to bring low-power and low-latency inference to edge devices through asynchronous and sparse processing. However, being temporal models, SNNs depend heavily on expressive states to generate predictions on par with classical artificial neural networks (ANNs). These states converge only after long transient periods, and quickly decay without input data, leading to higher latency, power consumption, and lower accuracy. This work addresses this issue by initializing the state with an auxiliary ANN running at a low rate. The SNN then uses the state to generate predictions with high temporal resolution until the next initialization phase. Our hybrid ANN-SNN model thus combines the best of both worlds: It does not suffer from long state transients and state decay thanks to the ANN, and can generate predictions with high temporal resolution, low latency, and low power thanks to the SNN. We show for the task of event-based 2D and 3D human pose estimation that our method consumes 88% less power with only a 4% decrease in performance compared to its fully ANN counterparts when run at the same inference rate. Moreover, when compared to SNNs, our method achieves a 74% lower error. This research thus provides a new understanding of how ANNs and SNNs can be used to maximize their respective benefits.

翻訳日:2023-03-27 13:24:29 公開日:2023-03-24

# 半教師付き医用画像セグメンテーションにおける固有一貫性学習

Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2303.14175v1 )

ライセンス: Link先を確認

Ye Zhu, Jie Yang, Si-Qi Liu and Ruimao Zhang

(参考訳) 近年,医用画像アノテーションのコストが高いことから,半監督的医用画像分割が注目されている。本稿では,ラベル付きおよびラベル付きデータの意味的一貫性ガイダンスを通じて,ロバストな意味カテゴリー表現を学習し,セグメンテーションを支援する新しい本質的一貫性学習(icl)手法を提案する。実際には、ラベル付きおよびラベルなしデータのセマンティックなカテゴリ表現を整列するアテンション機構に基づいて、トレーニングセット全体にわたってグローバルなセマンティックなセマンティックな表現を更新する2つの外部モジュール、SSPA(Supervised Semantic Proxy Adaptor)とUnsupervised Semantic Consistent Learner(USCL)を導入する。 iclは様々なネットワークアーキテクチャのためのプラグイン・アンド・プレイ方式であり、2つのモジュールはテスト段階には関与していない。 3つの公開ベンチマークによる実験結果から,提案手法は特に注釈付きデータの数が極めて限られている場合に,最先端の手法よりも優れていることが示された。コードはhttps://github.com/zhuye98/icl.git。

Semi-supervised medical image segmentation has attracted much attention in recent years because of the high cost of medical image annotations. In this paper, we propose a novel Inherent Consistent Learning (ICL) method, which aims to learn robust semantic category representations through the semantic consistency guidance of labeled and unlabeled data to help segmentation. In practice, we introduce two external modules namely Supervised Semantic Proxy Adaptor (SSPA) and Unsupervised Semantic Consistent Learner (USCL) that based on the attention mechanism to align the semantic category representations of labeled and unlabeled data, as well as update the global semantic representations over the entire training set. The proposed ICL is a plug-and-play scheme for various network architectures and the two modules are not involved in the testing stage. Experimental results on three public benchmarks show that the proposed method can outperform the state-of-the-art especially when the number of annotated data is extremely limited. Code is available at: https://github.com/zhuye98/ICL.git.

翻訳日:2023-03-27 13:24:04 公開日:2023-03-24

# オンライン連続学習におけるリアルタイム評価:新しい希望

Real-Time Evaluation in Online Continual Learning: A New Hope ( http://arxiv.org/abs/2302.01047v3 )

ライセンス: Link先を確認

Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Ameya Prabhu, Philip H. S. Torr, Bernard Ghanem

(参考訳) 現在のCL(Continuous Learning)手法の評価では、トレーニング時間や計算に制約がないと仮定することが多い。ストリームはモデルが予測のために次のデータを明らかにする前にトレーニングを完了するのを待たない、連続学習の実用的なリアルタイム評価です。そこで本研究では,現在のCL手法を計算コストに対して評価する。位置ラベル付き3900万のタイムスタンプ画像を含む大規模データセットであるCLOCについて広範な実験を行った。本評価では, 現状のCL手法よりも単純なベースラインが優れており, 現実的な設定における既存手法の適用性に疑問を呈する。さらに,メモリサンプリング戦略や正規化アプローチなど,文献で一般的に使用される様々なclコンポーネントについて検討する。考慮されたすべてのメソッドが、私たちの単純なベースラインと競合しないことがわかった。これは、既存のCL文献の大部分は、実用的でない特定の種類のストリームに適合していることを驚くほど示唆している。我々は,オンライン連続学習手法の開発において,計算コストを考慮するためのパラダイムシフトに向けた第一歩となることを期待する。

Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any real-world setting, which motivates us to propose: a practical real-time evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions. To do this, we evaluate current CL methods with respect to their computational costs. We conduct extensive experiments on CLOC, a large-scale dataset containing 39 million time-stamped images with geolocation labels. We show that a simple baseline outperforms state-of-the-art CL methods under this evaluation, questioning the applicability of existing methods in realistic settings. In addition, we explore various CL components commonly used in the literature, including memory sampling strategies and regularization approaches. We find that all considered methods fail to be competitive against our simple baseline. This surprisingly suggests that the majority of existing CL literature is tailored to a specific class of streams that is not practical. We hope that the evaluation we provide will be the first step towards a paradigm shift to consider the computational cost in the development of online continual learning methods.

翻訳日:2023-03-27 11:32:16 公開日:2023-03-24

# Kupczynski の文脈局所因果確率モデルはベルの定理によって制約される

Kupczynski's Contextual Locally Causal Probabilistic Models are constrained by Bell's theorem ( http://arxiv.org/abs/2208.09930v5 )

ライセンス: Link先を確認

Richard D. Gill and Justo Pastor Lambare

(参考訳) マリアン・クプシンスキーは一連の論文で、ベルの定理は測定器を記述する文脈的設定依存パラメータを正しく考慮すれば回避できると主張した。これは事実ではないことを示す。初期の出現にもかかわらず、クプシンキの文脈的局所因果確率モデルの概念は数学的にはベル局所隠れ変数モデルの特別な場合である。したがって、たとえ彼が提案した方法で文脈性を考慮するとしても、ベル-CHSHの不等式は導出可能である。量子力学と局所実在論(クプチンスキーの主張による概念の拡大を含む)は互いに相容れない。さらなる検査の結果、クプチンスキーは実際に検出の抜け穴に落ちていることがわかった。 2015年以降、ベル・チェシュの不等式に違反する多くの抜け穴のない実験が行われており、そのような実験の他の不完全さにもかかわらず、クプチンスキーの局所実在論への脱出ルートは入手できない。

In a sequence of papers, Marian Kupczynski has argued that Bell's theorem can be circumvented if one takes correct account of contextual setting-dependent parameters describing measuring instruments. We show that this is not true. Despite first appearances, Kupczynksi's concept of a contextual locally causal probabilistic model is mathematically a special case of a Bell local hidden variables model. Thus, even if one takes account of contextuality in the way he suggests, the Bell-CHSH inequality can still be derived. Violation thereof by quantum mechanics cannot be easily explained away: quantum mechanics and local realism (including Kupczynski's claimed enlargement of the concept) are not compatible with one another. Further inspection shows that Kupczynski is actually falling back on the detection loophole. Since 2015, numerous loophole-free experiments have been performed, in which the Bell-CHSH inequality is violated, so despite any other possible imperfections of such experiments, Kupczynski's escape route for local realism is not available

翻訳日:2023-03-27 11:31:59 公開日:2023-03-24

# 消散状態準備のための初期状態依存量子速度制限:枠組みと最適化

Initial-state-dependent quantum speed limit for dissipative state preparation: Framework and optimization ( http://arxiv.org/abs/2303.12967v2 )

ライセンス: Link先を確認

Junjie Liu and Hanlin Nie

(参考訳) 散逸は伝統的に量子情報処理の障害と考えられてきたが、近年の研究により、所望の量子状態を生成するために利用できることが示されている。実用的な用途に有用であるためには、散逸的な進化をスピードアップする能力が不可欠である。本研究では, 生成状態がエネルギー固有状態の1つであるマルコフ散逸状態生成スキームに着目した。我々は、一般的に用いられる初期状態非依存緩和時間と比較して、実際の進化時間のより洗練された測定値を提供する初期状態依存量子速度制限(QSL)を導出する。これにより、異なる初期状態にわたる散逸的進化のパッシブ最適化が可能になる。 qslを用いた進化時間の最小化を条件とした調製過程における散逸熱の最小化により、望ましい初期状態は固有値の増加の順序エネルギー固有値に対して対角要素の特定の置換を持つことがわかった。この構成では、準備された状態の個体数は最大であり、残りの対角要素は、同じ順序のエネルギー固有基底における受動的状態の順にソートされる。ベル状態を作成するための散逸ライドバーグ原子系における戦略の有効性を実証する。我々の研究は、散逸状態準備プロセスの最適化に関する新たな洞察を提供し、実用的な量子技術に重大な影響を与える可能性がある。

Dissipation has traditionally been considered a hindrance to quantum information processing, but recent studies have shown that it can be harnessed to generate desired quantum states. To be useful for practical applications, the ability to speed up the dissipative evolution is crucial. In this study, we focus on a Markovian dissipative state preparation scheme where the prepared state is one of the energy eigenstates. We derive an initial-state-dependent quantum speed limit (QSL) that offers a more refined measure of the actual evolution time compared to the commonly used initial-state-independent relaxation time. This allows for a passive optimization of dissipative evolution across different initial states. By minimizing the dissipated heat during the preparation process, conditioned on the minimization of evolution time using the QSL, we find that the preferred initial state has a specific permutation of diagonal elements with respect to an ordered energy eigenbasis of increasing eigenvalues. In this configuration, the population on the prepared state is the largest, and the remaining diagonal elements are sorted in an order resembling that of a passive state in the same ordered energy eigenbasis. We demonstrate the effectiveness of our strategy in a dissipative Rydberg atom system for preparing the Bell state. Our work provides new insights into the optimization of dissipative state preparation processes and could have significant implications for practical quantum technologies.

翻訳日:2023-03-27 11:24:13 公開日:2023-03-24

# 非トリミングビデオにおけるDense-Localizing Audio-Visual Events:大規模ベンチマークとベースライン

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline ( http://arxiv.org/abs/2303.12930v2 )

ライセンス: Link先を確認

Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng

(参考訳) 既存のオーディオ視覚イベントローカライゼーション(AVE)は、手動でトリミングされたビデオを処理する。しかし、この設定は非現実的であり、自然ビデオは様々なカテゴリーの多数のオーディオ視覚イベントを含むことが多い。本稿では,実生活の応用をよりよくするために,未編集映像に発生するすべての音声視覚イベントを共同でローカライズし,認識することを目的とした,密集した音声視覚イベントのタスクに焦点をあてる。この問題は、きめ細かいオーディオ視覚シーンとコンテキスト理解を必要とするため、難しい。この問題に対処するために,最初のUntrimmed Audio-Visual (UnAV-100)データセットを導入する。各ビデオには平均して2.8の映像イベントがあり、イベントは通常互いに関連しており、現実のシーンのように共起する可能性がある。次に,様々な長さの音声視覚イベントをローカライズし,それら間の依存関係をひとつのパスでキャプチャする,学習ベースの新しいフレームワークを用いてタスクを定式化する。提案手法の有効性と,マルチスケールクロスモーダル知覚と依存性モデリングの意義を実証する実験を行った。

Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. The problem is challenging as it requires fine-grained audio-visual scene and context understanding. To tackle this problem, we introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events. Each video has 2.8 audio-visual events on average, and the events are usually related to each other and might co-occur as in real-life scenes. Next, we formulate the task using a new learning-based framework, which is capable of fully integrating audio and visual modalities to localize audio-visual events with various lengths and capture dependencies between them in a single pass. Extensive experiments demonstrate the effectiveness of our method as well as the significance of multi-scale cross-modal perception and dependency modeling for this task.

翻訳日:2023-03-27 11:23:49 公開日:2023-03-24

# フェルミオン系およびボソニックガウス系におけるエントロピー生成に対する量子的および古典的貢献

Quantum and classical contributions to entropy production in fermionic and bosonic Gaussian systems ( http://arxiv.org/abs/2303.12749v2 )

ライセンス: Link先を確認

Krzysztof Ptaszynski, Massimiliano Esposito

(参考訳) 前述したように、熱力学過程の不可逆性を特徴づける重要な量であるエントロピー生成は、系の自由度と熱環境の間の相関関係の生成に関係している。これは、そのような相関が古典的か量子的か、すなわち測定によってアクセス可能であるかという疑問を提起する。フェルミオン系とボソニックガウス系を考えることでこの問題に対処する。フェルミオンの場合、エントロピー生成は、物理的に許容される測定のセットをフォック状態の射影に制限し、古典的にアクセス可能な相関の量を大幅に制限するパリティ超選択規則により、ほとんど量子的であることを示す。対照的に、ボソニック系では、ガウス測度によってはるかに多くの相関がアクセス可能である。具体的には、量子寄与は低温では重要であるが、高温ではエントロピー生成は純粋に古典的な位置-運動量相関に対応する。本研究は, エントロピー生成の微視的定式化における量子-古典遷移の存在に関して, フェルミオン系とボソニック系の重要な違いを示した。また、エントロピー生成は、弱いカップリング限界においても主に量子相関によって引き起こされる可能性があり、これは状態人口の古典的な速度方程式や、ボソンとフェルミオンの輸送特性が古典的な粒子のそれと収束する低粒子密度限界において記述される。

As previously demonstrated, the entropy production - a key quantity characterizing the irreversibility of thermodynamic processes - is related to generation of correlations between degrees of freedom of the system and its thermal environment. This raises the question of whether such correlations are of a classical or quantum nature, namely, whether they are accessible through measurements. We address this problem by considering fermionic and bosonic Gaussian systems. We show that for fermions the entropy production is mostly quantum due to the parity superselection rule which restricts the set of physically allowed measurements to projections on the Fock states, thus significantly limiting the amount of classically accessible correlations. In contrast, in bosonic systems a much larger amount of correlations can be accessed through Gaussian measurements. Specifically, while the quantum contribution may be important at low temperatures, in the high temperature limit the entropy production corresponds to purely classical position-momentum correlations. Our results demonstrate an important difference between fermionic and bosonic systems regarding the existence of a quantum-to-classical transition in the microscopic formulation of the entropy production. They also show that entropy production can be mainly caused by quantum correlations even in the weak coupling limit, which admits a description in terms of classical rate equations for state populations, as well as in the low particle density limit, where the transport properties of both bosons and fermions converge to those of classical particles.

翻訳日:2023-03-27 11:23:25 公開日:2023-03-24

# 人工知能の火花:GPT-4による初期の実験

Sparks of Artificial General Intelligence: Early experiments with GPT-4 ( http://arxiv.org/abs/2303.12712v2 )

ライセンス: Link先を確認

S\'ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

(参考訳) 人工知能(AI)の研究者たちは、さまざまなドメインやタスクにまたがる優れた能力を示す大規模な言語モデル(LLM)を開発し、洗練し、学習と認知の理解に挑戦しています。 OpenAIが開発した最新のモデルであるGPT-4は、前例のない規模の計算とデータを使って訓練された。本稿では,openaiによる開発が盛んであったgpt-4の初期バージョンについて報告する。 GPT-4は(例えばChatGPTやGoogleのPaLMとともに)従来のAIモデルよりも汎用的なインテリジェンスを示すLLMの新たなコホートの一部である、と私たちは主張する。我々は、これらのモデルの能力と影響について論じる。 GPT-4は、言語習得以外にも、数学、コーディング、ビジョン、医学、法、心理学など、特別なプロンプトを必要とせずに、新しくて困難なタスクを解くことができる。さらに、これらすべてのタスクにおいて、GPT-4のパフォーマンスは人間レベルのパフォーマンスに非常に近く、しばしばChatGPTのような以前のモデルを大きく上回っている。 GPT-4の能力の広さと深さを考えると、人工知能(AGI)システムの早期(まだ未完成)バージョンと見なすことができると信じている。我々は, GPT-4の探索において, 限界の発見に特に重点を置いており, 次世代の予測を超えて新たなパラダイムを追求する必要性を含む, より深く包括的なAGIバージョンに向けて進む上での課題について論じている。我々は,最近の技術的飛躍と今後の研究方向の社会的な影響を振り返って結論づける。

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

翻訳日:2023-03-27 11:22:58 公開日:2023-03-24

# RaBit: トポロジ一貫性データセットを用いた3次元二足歩行文字のパラメトリックモデリング

RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset ( http://arxiv.org/abs/2303.12564v2 )

ライセンス: Link先を確認

Zhongjin Luo, Shengcai Cai, Jinguo Dong, Ruibo Ming, Liangdong Qiu, Xiaohang Zhan, Xiaoguang Han

(参考訳) 視覚的に可視な3D文字を効率的に作成する支援は、コンピュータビジョンとコンピュータグラフィックスの基本的な研究課題である。最近の学習に基づくアプローチは、3d現実の人間のデジタル化の領域において前例のない精度と効率を達成している。しかし、以前の作品ではゲームや撮影にも大きな需要がある3Dバイペッドの漫画キャラクターのモデリングに焦点を当てていなかった。本稿では,3D2Dアニメキャラクタの最初の大規模データセットである3DBiCarと,対応するパラメトリックモデルであるRaBitを紹介する。私たちのデータセットには1500のトポロジ的に一貫性のある高品質な3Dテクスチャモデルが含まれています。このデータに基づいて、RaBitはSMPLのような線形ブレンド形状モデルとStyleGANベースのニューラルUVテクスチャ生成器で設計され、形状、ポーズ、テクスチャを同時に表現する。 3DBiCarとRaBitの実用性を実証するため, シングルビュー再構成, スケッチベースモデリング, 3Dアニメーションアニメーションなど, 様々な応用が行われている。単一視点の再構成設定では、入力画像から出力されたuvベースのテクスチャマップへの直接的なグローバルマッピングは、いくつかのローカル部分(例えば鼻、耳)の詳細な外観を失う傾向がある。これにより、すべての重要な地域を知覚する部分感性テクスチャ推論器が採用される。さらに,本手法の有効性を定量的および定量的に実証する実験を行った。 3DBiCarとRaBitは gaplab.cuhk.edu.cn/projects/RaBitで利用可能である。

Assisting people in efficiently producing visually plausible 3D characters has always been a fundamental research topic in computer vision and computer graphics. Recent learning-based approaches have achieved unprecedented accuracy and efficiency in the area of 3D real human digitization. However, none of the prior works focus on modeling 3D biped cartoon characters, which are also in great demand in gaming and filming. In this paper, we introduce 3DBiCar, the first large-scale dataset of 3D biped cartoon characters, and RaBit, the corresponding parametric model. Our dataset contains 1,500 topologically consistent high-quality 3D textured models which are manually crafted by professional artists. Built upon the data, RaBit is thus designed with a SMPL-like linear blend shape model and a StyleGAN-based neural UV-texture generator, simultaneously expressing the shape, pose, and texture. To demonstrate the practicality of 3DBiCar and RaBit, various applications are conducted, including single-view reconstruction, sketch-based modeling, and 3D cartoon animation. For the single-view reconstruction setting, we find a straightforward global mapping from input images to the output UV-based texture maps tends to lose detailed appearances of some local parts (e.g., nose, ears). Thus, a part-sensitive texture reasoner is adopted to make all important local areas perceived. Experiments further demonstrate the effectiveness of our method both qualitatively and quantitatively. 3DBiCar and RaBit are available at gaplab.cuhk.edu.cn/projects/RaBit.

翻訳日:2023-03-27 11:22:32 公開日:2023-03-24

# 情報手段によるベイズリスクの低水準化

Lower Bounds on the Bayesian Risk via Information Measures ( http://arxiv.org/abs/2303.12497v3 )

ライセンス: Link先を確認

Amedeo Roberto Esposito, Adrien Vandenbroucque, Michael Gastpar

(参考訳) 本稿ではパラメータ推定に着目し,ベイズリスクを低く抑える新しい手法を提案する。この方法は、r\'enyi の $\alpha$, $\varphi$-divergences や sibson の $\alpha$-mutual 情報を含む、事実上 \emph{any} 情報測度の使用を可能にする。このアプローチは発散を測度の関数と見なし、測度の空間と関数の空間の間の双対性を利用する。特に、マルコフの不等式を介して双対を上界にすることで、あらゆる情報測度でリスクを低くすることができることを示す。したがって、ダイバージェンスが満足するデータ処理の不等式により、推定子非依存の不可能性結果を提供できる。結果は、'Hide-and-Seek'問題を含む離散パラメータと連続パラメータの両方を含む関心の設定に適用され、最先端技術と比較される。重要な観察は、サンプル数における下位境界の挙動が、情報尺度の選択によって影響を受けることである。私たちはこれを、‘Hockey-Stick’のDiversergenceにインスパイアされた、すべての考慮された設定で最大の下位バウンドを提供するために、経験的に実証された新しい分散を導入することで活用します。観察が民営化の対象となる場合、強いデータ処理の不等式によってより強い不可能性が得られる。論文はまた、いくつかの一般化と代替方向についても論じている。

This paper focuses on parameter estimation and introduces a new method for lower bounding the Bayesian risk. The method allows for the use of virtually \emph{any} information measure, including R\'enyi's $\alpha$, $\varphi$-Divergences, and Sibson's $\alpha$-Mutual Information. The approach considers divergences as functionals of measures and exploits the duality between spaces of measures and spaces of functions. In particular, we show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality. We are thus able to provide estimator-independent impossibility results thanks to the Data-Processing Inequalities that divergences satisfy. The results are then applied to settings of interest involving both discrete and continuous parameters, including the ``Hide-and-Seek'' problem, and compared to the state-of-the-art techniques. An important observation is that the behaviour of the lower bound in the number of samples is influenced by the choice of the information measure. We leverage this by introducing a new divergence inspired by the ``Hockey-Stick'' Divergence, which is demonstrated empirically to provide the largest lower-bound across all considered settings. If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities. The paper also discusses some generalisations and alternative directions.

翻訳日:2023-03-27 11:22:06 公開日:2023-03-24

# エゴセントリックビュー合成のための平衡球面格子

Balanced Spherical Grid for Egocentric View Synthesis ( http://arxiv.org/abs/2303.12408v2 )

ライセンス: Link先を確認

Changwoon Choi, Sang Min Kim, Young Min Kim

(参考訳) egonerfは,vr資産のための大規模実環境を再構築するための実用的なソリューションである。カジュアルにキャプチャされた360度ビデオの数秒を与えられたEgoNeRFは、ニューラルラジアンスフィールドを効率的に構築し、新しい視点から高品質なレンダリングを可能にする。特徴格子を用いた最近のNeRF加速により,従来のカルト座標の代わりに球面座標を採用する。カーテシアン・フィーチャー・グリッドは、視聴者からの距離に関係なく空間的に均一な解像度を持つため、大規模な境界のないシーンを表現するのに非効率である。球面パラメタライゼーションは、エゴ中心画像の光線との整合性が良く、性能向上のための分解が可能である。しかし、na\\\ 球面格子は2つの極における不規則性に悩まされており、非有界な場面も表現できない。極近傍の特異点を避けるため、2つの平衡格子を結合し、準一様角格子となる。また、指数関数的にラジアルグリッドを分割し、無限大の環境マップを非有界シーンを表す。さらに,グリッド方式の再サンプリング手法により,NeRFボリュームのトレーニングに有効なサンプル数を増やすことができる。今回紹介した合成および実世界エゴセントリック360度ビデオデータセットにおいて,本手法を広範囲に評価し,最先端の性能を一貫して達成した。

We present EgoNeRF, a practical solution to reconstruct large-scale real-world environments for VR assets. Given a few seconds of casually captured 360 video, EgoNeRF can efficiently build neural radiance fields which enable high-quality rendering from novel viewpoints. Motivated by the recent acceleration of NeRF using feature grids, we adopt spherical coordinate instead of conventional Cartesian coordinate. Cartesian feature grid is inefficient to represent large-scale unbounded scenes because it has a spatially uniform resolution, regardless of distance from viewers. The spherical parameterization better aligns with the rays of egocentric images, and yet enables factorization for performance enhancement. However, the na\"ive spherical grid suffers from irregularities at two poles, and also cannot represent unbounded scenes. To avoid singularities near poles, we combine two balanced grids, which results in a quasi-uniform angular grid. We also partition the radial grid exponentially and place an environment map at infinity to represent unbounded scenes. Furthermore, with our resampling technique for grid-based methods, we can increase the number of valid samples to train NeRF volume. We extensively evaluate our method in our newly introduced synthetic and real-world egocentric 360 video datasets, and it consistently achieves state-of-the-art performance.

翻訳日:2023-03-27 11:21:39 公開日:2023-03-24

# マルチエージェント軌道予測のための階層型ハイブリッド学習フレームワーク

A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction ( http://arxiv.org/abs/2303.12274v3 )

ライセンス: Link先を確認

Yujun Jiao, Mingze Miao, Zhishuai Yin, Chunyuan Lei, Xu Zhu, Linzhen Nie and Bo Tao

(参考訳) 近隣のエージェントの正確な軌道予測は、複雑な場面で走行する自動運転車にとって重要である。近年提案されている手法の多くは,複雑な相互作用のエンコーディングの強みから,深層学習に基づくものである。しかし、過去の観測に重きを置き、スパースサンプルからの過渡的および偶発的相互作用を効果的に捉えることができないため、賞賛できない予測がしばしば発生する。本稿では,マルチエージェント軌道予測のための階層型ハイブリッド・フレームワークである深層学習(DL)と強化学習(RL)を提案し,マルチスケール相互作用によって形成される動きを予測することの課題に対処する。 DL段階では、トラフィックシーンは、中間レベルとグローバルレベルの異種相互作用をエンコードするためにTransformerスタイルのGNNを採用する複数の中間スケール異種グラフに分割される。 rlステージでは、dlステージで予測される重要な将来ポイントを利用して、トラフィックシーンをローカルなサブシーンに分割する。運動計画手順をエミュレートし、軌道予測を生成するため、車載キネマティクスモデルに組み込んだトランスフォーマーベースのPPO(Pximal Policy Optimization)を設計し、微視的相互作用の圧倒的な影響下で動作を計画する。多目的報酬はエージェント中心の精度とシーンワイド互換性のバランスをとるように設計されている。実験の結果,本提案手法はargoverse forecasting benchmarkの最先端技術に適合することがわかった。また、階層的な学習フレームワークがマルチスケールのインタラクションをキャプチャし、予測されたトラジェクトリの実現性とコンプライアンスを改善することも可視化された結果から明らかになった。

Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.

翻訳日:2023-03-27 11:21:14 公開日:2023-03-24

# スタイルRF:Zero-shot 3Dスタイルの神経放射場移動

StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields ( http://arxiv.org/abs/2303.10598v3 )

ライセンス: Link先を確認

Kunhao Liu, Fangneng Zhan, Yiwen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

(参考訳) 3dスタイル転送は、3dシーンのスタイル化されたノベルビューをマルチビュー一貫性で描画することを目的としている。しかし、既存の作品の多くは正確な幾何学的再構成、高品質なスタイライゼーション、任意の新しいスタイルに一般化された3方向のジレンマに苦しめられている。放射場の特徴空間内でスタイル変換を行うことで3方向ジレンマを解消する3次元スタイル転送技術であるStyleRF(Style Radiance Fields)を提案する。 StyleRFは3Dシーンを表現するために高精細な特徴の明示的なグリッドを採用しており、ボリュームレンダリングによって高精細な形状を確実に復元することができる。さらに、グリッド機能は参照スタイルに従って変換され、高品質なゼロショットスタイル転送に直接繋がる。 StyleRFは2つの革新的な設計で構成されている。 1つ目はサンプリング不変なコンテンツ変換であり、この変換はサンプル化された3D点の全体統計に不変であり、したがってマルチビュー整合性を保証する。 2つ目は、3Dポイントの変換と同等の2D特徴写像の遅延型変換であるが、マルチビューの一貫性を損なうことなくメモリフットプリントを大幅に削減する。広範な実験により、stylerfは正確な形状再構成により優れた3dスタイライゼーション品質を達成し、ゼロショット方式で様々な新しいスタイルに一般化できることを示した。

3D style transfer aims to render stylized novel views of a 3D scene with multi-view consistency. However, most existing work suffers from a three-way dilemma over accurate geometry reconstruction, high-quality stylization, and being generalizable to arbitrary new styles. We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by performing style transformation within the feature space of a radiance field. StyleRF employs an explicit grid of high-level features to represent 3D scenes, with which high-fidelity geometry can be reliably restored via volume rendering. In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer. StyleRF consists of two innovative designs. The first is sampling-invariant content transformation that makes the transformation invariant to the holistic statistics of the sampled 3D points and accordingly ensures multi-view consistency. The second is deferred style transformation of 2D feature maps which is equivalent to the transformation of 3D points but greatly reduces memory footprint without degrading multi-view consistency. Extensive experiments show that StyleRF achieves superior 3D stylization quality with precise geometry reconstruction and it can generalize to various new styles in a zero-shot manner.

翻訳日:2023-03-27 11:20:49 公開日:2023-03-24

# カラースタイル伝達のためのニューラルプリセット

Neural Preset for Color Style Transfer ( http://arxiv.org/abs/2303.13511v2 )

ライセンス: Link先を確認

Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W.H. Lau

(参考訳) 本稿では,視覚アーチファクトや膨大なメモリ要求,スロースタイルスイッチング速度など,既存のカラースタイル転送方法の制限に対処するためのニューラルプリセット手法を提案する。我々の手法は2つのコア設計に基づいている。まず,画像適応色マッピングマトリクスを介して各画素に対して一貫して動作し,アーティファクトを回避し,少ないメモリフットプリントで高解像度入力をサポートする決定論的ニューラルネットワークマッピング(dncm)を提案する。次に,カラー正規化とスタイライゼーションにタスクを分割し,カラースタイルをプリセットとして抽出し,正規化入力画像で再利用することで,効率的なスタイル切り替えを実現する2段階パイプラインを開発した。ペアワイズデータセットが利用できないため、自己教師型戦略を用いてNeural Presetをトレーニングする方法を解説する。既存の手法に対するニューラル・プリセットの様々な利点は包括的評価によって示される。特にneural presetは、アーティファクトなしで安定した4kカラースタイル転送を可能にする。さらに,本モデルでは,低照度画像強調,水中画像補正,デハージング,画像調和など,微調整の必要なく複数のアプリケーションを自然にサポートできることが示されている。デモのあるプロジェクトページ: https://zhkke.github.io/NeuralPreset 。

In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Mapping (DNCM) to consistently operate on each pixel via an image-adaptive color mapping matrix, avoiding artifacts and supporting high-resolution inputs with a small memory footprint. Second, we develop a two-stage pipeline by dividing the task into color normalization and stylization, which allows efficient style switching by extracting color styles as presets and reusing them on normalized input images. Due to the unavailability of pairwise datasets, we describe how to train Neural Preset via a self-supervised strategy. Various advantages of Neural Preset over existing methods are demonstrated through comprehensive evaluations. Notably, Neural Preset enables stable 4K color style transfer in real-time without artifacts. Besides, we show that our trained model can naturally support multiple applications without fine-tuning, including low-light image enhancement, underwater image correction, image dehazing, and image harmonization. Project page with demos: https://zhkkke.github.io/NeuralPreset .

翻訳日:2023-03-27 11:14:45 公開日:2023-03-24

# CLIP for All Things Zero-Shot Sketch-based Image Retrieval, Fine-Grained or not

CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not ( http://arxiv.org/abs/2303.13440v2 )

ライセンス: Link先を確認

Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang, Yi-Zhe Song

(参考訳) 本稿では,ゼロショットスケッチに基づく画像検索(ZS-SBIR)にCLIPを利用する。私たちは、ファンデーションモデルにおける最近の進歩と、彼らが提供していると思われる非並列の一般化能力に大きく影響を受けています。我々は、このシナジーをいかに最適に達成するかという新しいデザインを、カテゴリー設定ときめ細かい設定("all")の両方のために提案した。私たちのソリューションの核心は、迅速な学習セットアップです。まず、スケッチ固有のプロンプトをファクタリングすることで、すでにカテゴリレベルのZS-SBIRシステムがあり、すべての先行芸術をオーバーシュートし(24.8%)、CLIPとZS-SBIRのシナジーを研究する上で大きな証拠となります。しかし、細かな設定に移行するのは難しく、このシナジーを深く掘り下げる必要がある。そのため、この問題のきめ細かいマッチング性に取り組むために、2つの具体的な設計を考え出した。 (i)スケッチと写真の相対的な分離がカテゴリ間で均一であることを保証するための追加の正規化損失。金本位制の三重項損失はそうではない。 (ii)スケッチとフォトのペア間のインスタンスレベルの構造的対応を確立するための巧妙なパッチシャッフル技術。これらの設計により、我々は以前の最先端よりも26.9%の領域での大幅な性能向上を再び観察する。提案されているクリップとプロンプト学習のパラダイムは、データ不足が大きな課題である他のスケッチ関連のタスク(zs-sbirに限らず)に取り組む上で、大きな可能性を秘めています。プロジェクトページ: https://aneeshan95.github.io/Sketch_LVM/

In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained setting ("all"). At the very core of our solution is a prompt learning setup. First we show just via factoring in sketch-specific prompts, we already have a category-level ZS-SBIR system that overshoots all prior arts, by a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR synergy. Moving onto the fine-grained setup is however trickier, and requires a deeper dive into this synergy. For that, we come up with two specific designs to tackle the fine-grained matching nature of the problem: (i) an additional regularisation loss to ensure the relative separation between sketches and photos is uniform across categories, which is not the case for the gold standard standalone triplet loss, and (ii) a clever patch shuffling technique to help establishing instance-level structural correspondences between sketch-photo pairs. With these designs, we again observe significant performance gains in the region of 26.9% over previous state-of-the-art. The take-home message, if any, is the proposed CLIP and prompt learning paradigm carries great promise in tackling other sketch-related tasks (not limited to ZS-SBIR) where data scarcity remains a great challenge. Project page: https://aneeshan95.github.io/Sketch_LVM/

翻訳日:2023-03-27 11:14:23 公開日:2023-03-24

# ゼロセグメントラベルを用いたゼロ誘導セグメンテーション

Zero-guidance Segmentation Using Zero Segment Labels ( http://arxiv.org/abs/2303.13396v2 )

ライセンス: Link先を確認

Pitchaporn Rewatbowornwong, Nattanat Chatthee, Ekapol Chuangsuwanich, Supasorn Suwajanakorn

(参考訳) CLIPは新しくてエキサイティングな共同ビジョン言語アプリケーションを実現した。ひとつはオープン語彙セグメンテーションで、任意のテキストクエリの任意のセグメントを特定できる。本研究では,テキストクエリや事前定義されたクラスでユーザ誘導なしに意味セグメントを見つけ出し,自然言語で自動的にラベル付けすることができるか質問する。そこで本研究では,DINOとCLIPという2つの事前学習されたジェネラリストモデルを利用したゼロガイダンスセグメンテーションと第1ベースラインを提案する。一般的なアイデアは、まず画像を小さなオーバーセグメントに分割し、クリップのビジュアル言語空間にエンコードし、テキストラベルに変換し、意味的に類似したセグメントをマージすることだ。しかし、重要な課題は、視覚セグメントを、グローバルなコンテキスト情報とローカルなコンテキスト情報のバランスをとるセグメント固有の埋め込みにエンコードする方法だ。私たちの主な貢献は、CLIP内のアテンション層を分析することによって、2つのコンテキストのバランスをとる新しいアテンションマスキング技術です。この新しいタスクの評価のための指標もいくつか紹介する。 CLIPの生来の知識により、美術館の観衆の間でモナ・リザの絵を正確に見つけることができる。プロジェクトページ: https://zero-guide-seg.github.io/

CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabulary segmentation, which can locate any segment given an arbitrary text query. In our research, we ask whether it is possible to discover semantic segments without any user guidance in the form of text queries or predefined classes, and label them using natural language automatically? We propose a novel problem zero-guidance segmentation and the first baseline that leverages two pre-trained generalist models, DINO and CLIP, to solve this problem without any fine-tuning or segmentation dataset. The general idea is to first segment an image into small over-segments, encode them into CLIP's visual-language space, translate them into text labels, and merge semantically similar segments together. The key challenge, however, is how to encode a visual segment into a segment-specific embedding that balances global and local context information, both useful for recognition. Our main contribution is a novel attention-masking technique that balances the two contexts by analyzing the attention layers inside CLIP. We also introduce several metrics for the evaluation of this new task. With CLIP's innate knowledge, our method can precisely locate the Mona Lisa painting among a museum crowd. Project page: https://zero-guide-seg.github.io/.

翻訳日:2023-03-27 11:13:53 公開日:2023-03-24

# GETT-QA:知識グラフ質問応答のためのグラフ埋め込みベースのT2T変換器

GETT-QA: Graph Embedding based T2T Transformer for Knowledge Graph Question Answering ( http://arxiv.org/abs/2303.13284v2 )

ライセンス: Link先を確認

Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann

(参考訳) 本稿では, GETT-QA というエンドツーエンドの知識グラフ質問応答システムを提案する。 GETT-QAは、人気のあるテキストからテキストまでの事前訓練言語モデルであるT5を使用している。このモデルは自然言語を入力とし、意図したSPARQLクエリのよりシンプルな形式を生成する。単純な形式では、モデルは直接エンティティと関係IDを生成しない。代わりに、対応するエンティティと関係ラベルを生成する。ラベルは、その後のステップでkgエンティティとリレーションシップidに接地される。結果をさらに改善するため、各エンティティに対してKG埋め込みの切り離されたバージョンを作成するようモデルに指示する。切断されたkg埋め込みは、曖昧さの目的をより細かく探索することができる。その結果,T5 は損失関数の変化を伴わずに絡み合った KG 埋め込みを学習でき,KGQA 性能が向上することがわかった。その結果, LC-QuAD 2.0 と SimpleQuestions-Wikidata のデータセットを Wikidata 上のエンドツーエンド KGQA 上に構築した。

In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces corresponding entity and relation labels. The labels are grounded to KG entity and relation IDs in a subsequent step. To further improve the results, we instruct the model to produce a truncated version of the KG embedding for each entity. The truncated KG embedding enables a finer search for disambiguation purposes. We find that T5 is able to learn the truncated KG embeddings without any change of loss function, improving KGQA performance. As a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata datasets on end-to-end KGQA over Wikidata.

翻訳日:2023-03-27 11:13:30 公開日:2023-03-24

# 不完全ラベルを用いた複数ラベル認識のための構造化セマンティック先行探索

Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels ( http://arxiv.org/abs/2303.13223v2 )

ライセンス: Link先を確認

Zixuan Ding, Ao Wang, Hui Chen, Qiang Zhang, Pengzhang Liu, Yongjun Bao, Weipeng Yan, Jungong Han

(参考訳) 不完全なラベルを持つマルチラベル認識(MLR)は非常に難しい。近年、視覚言語モデルである \ie, clip で画像とラベルの対応を探求し、アノテーションの不足を補う研究が進められている。有望なパフォーマンスにもかかわらず、彼らは一般にラベルとラベルの対応について価値ある事前を見落としている。本稿では,semantic prior prompter によるラベル間対応の構造化された意味を導出することにより,不完全なラベルを持つmlrのラベル管理の欠如を解消することを推奨する。次に、構造化されたセマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティックス・ネットワーク(SCPNet)を提案する。さらに,事前学習の促進を図るために,事前学習法が導入された。ベンチマークデータセットの総合的な実験と解析により,提案手法が既存の手法を全データセットで大幅に上回っており,提案手法の有効性と優越性が実証されている。私たちのコードはhttps://github.com/jameslahm/scpnetで利用可能です。

Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, \ie, CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method. Our code will be available at https://github.com/jameslahm/SCPNet.

翻訳日:2023-03-27 11:13:16 公開日:2023-03-24

# チャネルワイズ変換による特徴蒸留のためのシンプルで汎用的なフレームワーク

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation ( http://arxiv.org/abs/2303.13212v2 )

ライセンス: Link先を確認

Ziwei Liu, Yongtao Wang, Xiaojie Chu

(参考訳) 知識蒸留は、大きな教師モデルから小さな学生モデルに模倣して知識を伝達する一般的な手法である。しかし,教師と生徒間で特徴マップを直接調整することで,生徒に過度に厳格な制約を課すことができるため,学生モデルの性能は低下する。上記の特徴の不一致問題を軽減するため,既存の研究は教師と生徒の特徴マップをピクセルワイドな変換で空間的に整列させることに重点を置いている。本稿では,教師と生徒の特徴マップをチャネル次元に沿って整列させることが,特徴的不一致問題への対処に有効であることを新たに発見する。具体的には,教師モデルと教師モデルの特徴を整合させるために,学習可能な非線形チャネル回り変換を提案する。そこで,我々はさらに,蒸留損失とタスク固有損失のバランスをとるためのハイパーパラメータを1つだけ備えた,シンプルで汎用的な機能蒸留フレームワークを提案する。 Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. コードは公開される予定だ。

Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.

翻訳日:2023-03-27 11:12:55 公開日:2023-03-24

# クラス増分学習のための適応正規化

Adaptive Regularization for Class-Incremental Learning ( http://arxiv.org/abs/2303.13113v2 )

ライセンス: Link先を確認

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Mert Kilickaya, Joaquin Vanschoren

(参考訳) クラスインクリメンタルラーニングは、以前に観測されたクラスの精度を維持しながら、新しいカテゴリで深い分類器を更新する。ニューラルネットワークの重み付けを正則化することは、新しいクラスを学習しながら学習したクラスを忘れることを防ぐ一般的な方法である。しかし、既存の正則化器は学習セッションを通して一定等級を使い、漸進的な学習で遭遇するタスクの難しさのレベルを反映していない可能性がある。本研究は,課題の複雑さに応じて動的に正則化強度を調節する授業インクリメンタルラーニングにおける適応正則化の必要性について検討する。ベイズ最適化に基づく学習タスクごとに最適な正則化量を自動的に決定する手法を提案する。 2つの正規化器による2つのデータセットの実験は、正確で忘れられない視覚的漸進学習を実現するための適応正規化の重要性を示している。

Class-Incremental Learning updates a deep classifier with new categories while maintaining the previously observed class accuracy. Regularizing the neural network weights is a common method to prevent forgetting previously learned classes while learning novel ones. However, existing regularizers use a constant magnitude throughout the learning sessions, which may not reflect the varying levels of difficulty of the tasks encountered during incremental learning. This study investigates the necessity of adaptive regularization in Class-Incremental Learning, which dynamically adjusts the regularization strength according to the complexity of the task at hand. We propose a Bayesian Optimization-based approach to automatically determine the optimal regularization magnitude for each learning task. Our experiments on two datasets via two regularizers demonstrate the importance of adaptive regularization for achieving accurate and less forgetful visual incremental learning.

翻訳日:2023-03-27 11:12:29 公開日:2023-03-24

# 容積型医用画像分割のための可変ハイブリッドネットワーク

A Permutable Hybrid Network for Volumetric Medical Image Segmentation ( http://arxiv.org/abs/2303.13111v2 )

ライセンス: Link先を確認

Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

(参考訳) 視覚トランスフォーマー(vit)の出現は、3dボリュームベンチマーク、特に3d医療画像セグメンテーションの大幅な進歩をもたらした。同時に、Multi-Layer Perceptron(MLP)ネットワークは、重い自己保持モジュールを除外したにもかかわらず、ViTに匹敵する結果により、研究者の間で人気を取り戻している。本稿では,畳み込みニューラルネットワーク (CNN) と MLP の利点を利用する,PHNet という医用画像分割のための可変ハイブリッドネットワークを提案する。 PHNetは2次元CNNと3次元CNNの両方を用いて3次元ボリュームデータの固有等方性問題に対処する。また, 位置情報を保持しながら長距離依存を得ることにより, 元のmlpを増大させるmlppという, 効率的な多層透過型パーセプトロンモジュールを提案する。大規模な実験結果によると、PHNetは2つのパブリックデータセット、すなわちCOVID-19-20とSynapseで最先端の手法より優れている。さらに, PHNet が CNN および MLP の強度に有効であることを示す。コードは受理後、一般に公開されます。

The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.

翻訳日:2023-03-27 11:12:17 公開日:2023-03-24

# OCELOT:病理組織学のための組織データセット上のオーバーラップ細胞

OCELOT: Overlapped Cell on Tissue Dataset for Histopathology ( http://arxiv.org/abs/2303.13110v2 )

ライセンス: Link先を確認

Jeongun Ryu, Aaron Valero Puche, JaeWoong Shin, Seonwook Park, Biagio Brattoli, Jinhee Lee, Wonkyung Jung, Soo Ick Cho, Kyunghyun Paeng, Chan-Young Ock, Donggeun Yoo, S\'ergio Pereira

(参考訳) 細胞検出は計算病理学の基本的な課題であり、全スライディング画像から高レベルの医療情報を抽出するのに使用できる。正確な細胞検出のために、病理学者は組織レベルの構造を理解するためにズームアウトし、その形態と周囲の状況に基づいて細胞を分類する。しかしながら、細胞検出モデルにおける病理学者のこのような行動を反映しようとする努力の欠如は、主に重複した領域を持つ細胞と組織の両方を含むデータセットの欠如によるものである。この制限を克服するために,組織学における細胞検出のための細胞間関係の研究を目的としたデータセットOCELOTを提案する。 OCELOTは複数の臓器から取得した画像に重複する細胞および組織アノテーションを提供する。この設定内では,細胞と組織の両方のタスクを同時に学習できるマルチタスク学習手法も提案する。細胞検出タスクのみで訓練されたモデルと比較すると,提案手法はOCELOT,パブリックTIGER,内部CARPデータセットの3つのデータセット上での細胞検出性能を向上させる。特にOCELOTテストセットでは、F1スコアが最大6.79改善されている。我々は,OCELOTデータセットをhttps://lunit-io.github.io/research/publications/ocelotでリリースすることを含め,本論文のコントリビューションは,計算病理学に細胞-組織関係を組み込む上で重要な研究方向への重要な出発点であると考えている。

Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at https://lunit-io.github.io/research/publications/ocelot are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.

翻訳日:2023-03-27 11:11:54 公開日:2023-03-24

# 合成分析によるトップダウン視覚注意

Top-Down Visual Attention from Analysis by Synthesis ( http://arxiv.org/abs/2303.13043v2 )

ライセンス: Link先を確認

Baifeng Shi, Trevor Darrell, Xin Wang

(参考訳) 現在の注意アルゴリズム(例えば、自己注意)は刺激駆動であり、画像内のすべての有能な物体をハイライトする。しかしながら、人間のような知的エージェントは、手前の高レベルなタスクに基づいて注意を誘導し、タスク関連のオブジェクトのみに焦点を当てることが多い。このタスク誘導トップダウンアテンションの能力は、タスク適応表現を提供し、モデルが様々なタスクに一般化するのに役立つ。本稿では,古典的分析合成(AbS)による視覚の視点からトップダウンの注意を考察する。先行研究は,視覚注意とスパース再構成との間の機能的等価性を示し,目標指向トップダウン信号によって変調される類似スパース再構築目標を最適化するabs視覚システムは,自然にトップダウン注意をシミュレートすることを示す。さらに、AbSを変動的に近似するトップダウン変調ViTモデルであるAbSViT(Analytic-by-Synthesis Vision Transformer)を提案する。現実世界のアプリケーションでは、AbSViTは、VQAやゼロショット検索などのビジョン言語タスクのベースラインを一貫して改善し、言語がトップダウンの注意を導く。 AbSViTは一般的なバックボーンとしても機能し、分類、セマンティックセグメンテーション、モデルロバスト性が改善される。

Current attention algorithms (e.g., self-attention) are stimulus-driven and highlight all the salient objects in an image. However, intelligent agents like humans often guide their attention based on the high-level task at hand, focusing only on task-related objects. This ability of task-guided top-down attention provides task-adaptive representation and helps the model generalize to various tasks. In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision. Prior work indicates a functional equivalence between visual attention and sparse reconstruction; we show that an AbS visual system that optimizes a similar sparse reconstruction objective modulated by a goal-directed top-down signal naturally simulates top-down attention. We further propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and achieves controllable top-down attention. For real-world applications, AbSViT consistently improves over baselines on Vision-Language tasks such as VQA and zero-shot retrieval where language guides the top-down attention. AbSViT can also serve as a general backbone, improving performance on classification, semantic segmentation, and model robustness.

翻訳日:2023-03-27 11:11:31 公開日:2023-03-24

PDF登録状況（公開日: 20230324）