Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230204となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# MIMOネットワークにおけるクロス層フェデレーション学習最適化 Cross-Layer Federated Learning Optimization in MIMO Networks ( http://arxiv.org/abs/2302.14648v1 ) ライセンス: Link先を確認	Sihua Wang and Mingzhe Chen and Cong Shen and Changchuan Yin and Christopher G. Brinton	(参考訳) 本稿では,ディジタル変調とaircomp(over-the-air computation)を用いた現実的無線多入力多重出力(mimo)通信システム上でのフェデレーション学習(fl)の性能最適化について検討する。特に、エッジデバイスが(ローカル収集データを用いて訓練された)ローカルFLモデルをビームフォーミングを用いてパラメータサーバ(PS)に送信し、送信予定デバイスの数を最大化するMIMOシステムを考える。中央コントローラとして機能するPSは、受信したローカルFLモデルを使用してグローバルFLモデルを生成し、それを全デバイスにブロードキャストする。無線ネットワークの帯域幅が限られているため、効率的な無線データアグリゲーションを実現するためにAirCompが採用されている。しかし、無線チャネルのフェードはAirCompベースのFLスキームにおいて集約歪みを生じさせる。そこで本研究では,ディジタル変調とaircompを組み合わせたfederated averaging(fedavg)アルゴリズムを提案する。これは、現在のflモデルパラメータに基づいてビームフォーミング行列を動的に調整し、送信誤差を最小化し、fl性能を確保する最適化問題として定式化されたジョイント送信受信ビームフォーミング設計によって達成される。この目的を達成するために,まずビームフォーミング行列がfedavgの性能に与える影響を解析的に特徴付ける。この関係に基づいて、人工知能ニューラルネットワーク(ANN)を用いて、全デバイスの局所FLモデルを推定し、将来のモデル伝送のためにPSのビーム形成行列を調整する。提案手法のアルゴリズム的利点と性能改善は, 広範囲な数値実験により実証された。 In this paper, the performance optimization of federated learning (FL), when deployed over a realistic wireless multiple-input multiple-output (MIMO) communication system with digital modulation and over-the-air computation (AirComp) is studied. In particular, an MIMO system is considered in which edge devices transmit their local FL models (trained using their locally collected data) to a parameter server (PS) using beamforming to maximize the number of devices scheduled for transmission. The PS, acting as a central controller, generates a global FL model using the received local FL models and broadcasts it back to all devices. Due to the limited bandwidth in a wireless network, AirComp is adopted to enable efficient wireless data aggregation. However, fading of wireless channels can produce aggregate distortions in an AirComp-based FL scheme. To tackle this challenge, we propose a modified federated averaging (FedAvg) algorithm that combines digital modulation with AirComp to mitigate wireless fading while ensuring the communication efficiency. This is achieved by a joint transmit and receive beamforming design, which is formulated as a optimization problem to dynamically adjust the beamforming matrices based on current FL model parameters so as to minimize the transmitting error and ensure the FL performance. To achieve this goal, we first analytically characterize how the beamforming matrices affect the performance of the FedAvg in different iterations. Based on this relationship, an artificial neural network (ANN) is used to estimate the local FL models of all devices and adjust the beamforming matrices at the PS for future model transmission. The algorithmic advantages and improved performance of the proposed methodologies are demonstrated through extensive numerical experiments.	翻訳日:2023-03-05 05:44:47 公開日:2023-02-04
# テキスト処理サービスの信頼性の自動評価の進歩 Advances in Automatically Rating the Trustworthiness of Text Processing Services ( http://arxiv.org/abs/2302.09079v1 ) ライセンス: Link先を確認	Biplav Srivastava, Kausik Lakkaraju, Mariana Bernagozzi, Marco Valtorta	(参考訳) AIサービスは、データ、モデル、あるいはユーザの変化を受けると不安定な振る舞いを持つことが知られている。このような行動は、欠席や委任によって引き起こされたとしても、AIが人間と働くときの信頼の問題につながる。消費者がAIのソースコードやトレーニングデータにアクセスできないブラックボックス設定でAIサービスを評価する現在のアプローチは限られている。コンシューマはai開発者のドキュメンテーションに頼り、前述のようにシステムが構築されていることを信頼する必要がある。さらに、AIコンシューマがサービスを再利用して顧客に販売する他のサービスを構築する場合、コンシューマはサービスプロバイダ(データとモデルプロバイダの両方)のリスクにさらされます。この文脈での私たちのアプローチは、食品産業における健康促進のための栄養ラベル付けの成功にインスパイアされ、独立した利害関係者の視点からAIサービスの評価と評価を目指しています。評価はAIシステムの行動を伝える手段となり、消費者がリスクを知らせ、情報的な決定を下すことができる。本稿では,まず,ユーザ研究に期待できるテキストベースの機械翻訳aiサービスのための評価手法の開発動向について述べる。次に、原則化されたマルチモーダルな因果評価手法の課題とビジョンと、健康や食品レコメンデーションといった現実のシナリオにおける意思決定支援の意義について概説する。 AI services are known to have unstable behavior when subjected to changes in data, models or users. Such behaviors, whether triggered by omission or commission, lead to trust issues when AI works with humans. The current approach of assessing AI services in a black box setting, where the consumer does not have access to the AI's source code or training data, is limited. The consumer has to rely on the AI developer's documentation and trust that the system has been built as stated. Further, if the AI consumer reuses the service to build other services which they sell to their customers, the consumer is at the risk of the service providers (both data and model providers). Our approach, in this context, is inspired by the success of nutritional labeling in food industry to promote health and seeks to assess and rate AI services for trust from the perspective of an independent stakeholder. The ratings become a means to communicate the behavior of AI systems so that the consumer is informed about the risks and can make an informed decision. In this paper, we will first describe recent progress in developing rating methods for text-based machine translator AI services that have been found promising with user studies. Then, we will outline challenges and vision for a principled, multi-modal, causality-based rating methodologies and its implication for decision-support in real-world scenarios like health and food recommendation.	翻訳日:2023-02-26 14:55:17 公開日:2023-02-04
# 形態分化に基づく人工神経回路を用いた悪性脳腫瘍の温熱解析 Thermal Analysis of Malignant Brain Tumors by Employing a Morphological Differentiation-Based Method in Conjunction with Artificial Neural Network ( http://arxiv.org/abs/2302.10271v1 ) ライセンス: Link先を確認	Hamed Hani, Afsaneh Mojra	(参考訳) 本研究では,脳腫瘍の悪性度を検出するために,組織表面の温度分布を利用した形態分化に基づく方法を提案する。腫瘍CTでは悪性腫瘍の異常な形状を記述するために2つの異なるシナリオが実装されている。第1のシナリオでは腫瘍はポリゴンベースプリズムと見なされ、第2のシナリオでは星型ベースプリズムと見なされている。ポリゴンの側面や恒星の翼の数を増やすことで、悪性度が増大した。腫瘍に対して一定の熱発生が検討され,両腫瘍モデル上のPYTHONスクリプトとリンクしたBAQUSソフトウェアを用いて上組織表面の温度変化を研究する有限要素解析が行われた。この温度分布は10のパラメータによって特徴づけられる。各シナリオでは、これらのパラメータの98セットを放射基底関数ニューラルネットワーク(RBFNN)の入力として使用し、出力として側面や翼の数を選択している。 RBFNNはその形態に基づいて腫瘍の悪性度を特定するために訓練されている。 RBFNNの結果によると,本手法は良性腫瘍と悪性腫瘍の鑑別が可能であり,悪性度を高い精度で推定できる。 In this study, a morphological differentiation-based method has been introduced which employs temperature distribution on the tissue surface to detect brain tumor's malignancy. According to the common tumor CT scans, two different scenarios have been implemented to describe irregular shape of the malignant tumor. In the first scenario, tumor has been considered as a polygon base prism and in the second one, it has been considered as a star-shaped base prism. By increasing the number of sides of the polygon or wings of the star, degree of the malignancy has been increased. Constant heat generation has been considered for the tumor and finite element analysis has been conducted by the ABAQUS software linked with a PYTHON script on both tumor models to study temperature variations on the top tissue surface. This temperature distribution has been characterized by 10 parameters. In each scenario, 98 sets of these parameters has been used as inputs of a radial basis function neural network (RBFNN) and number of sides or wings has been selected to be the output. The RBFNN has been trained to identify malignancy of tumor based on its morphology. According to the RBFNN results, the proposed method has been capable of differentiating between benign and malignant tumors and estimating the degree of malignancy with high accuracy	翻訳日:2023-02-26 14:36:42 公開日:2023-02-04
# 量子相対論 Quantum Relativity ( http://arxiv.org/abs/2302.10216v1 ) ライセンス: Link先を確認	Michael Spanner	(参考訳) 量子力学におけるベルの不等式の意味を考慮し、古典的局所性と量子物理学への因果性を取り戻すために新しい量子補間が提案されている: 検出された量子事象間の相対座標のみが有効な可観測性である。この仮定は、量子力学が不完全であるというeprの見解を支持する一方で、ボーアの量子論とは相容れない。量子相対性理論のより一般的な原理は、量子事象の実験的検出の間の相関のみが真の古典的存在を持つというものである。量子相対性理論は、量子世界と古典世界を区別する枠組みを提供する。 Starting with a consideration of the implication of Bell inequalities in quantum mechanics, a new quantum postulate is suggested in order to restore classical locality and causality to quantum physics: only the relative coordinates between detected quantum events are valid observables. This postulate supports the EPR view that quantum mechanics is incomplete, while also staying compatible to the Bohr view that nothing exists beyond the quantum. The new postulate follows from a more general principle of quantum relativity, which states that only correlations between experimental detections of quantum events have a real classical existence. Quantum relativity provides a framework to differentiate the quantum and classical world.	翻訳日:2023-02-26 14:35:56 公開日:2023-02-04
# 低次元カオスによるスパースシステム同定のベンチマーク Benchmarking sparse system identification with low-dimensional chaos ( http://arxiv.org/abs/2302.10787v1 ) ライセンス: Link先を確認	Alan A. Kaptanoglu and Lanyue Zhang and Zachary G. Nicolaou and Urban Fasel and Steven L. Brunton	(参考訳) スパース・システム同定(英: Sparse System Identification)は、力学系の進化を記述し、モデルの複雑さと精度のバランスをとる擬似微分方程式を得るデータ駆動プロセスである。科学的領域間でのシステム識別には急速な革新があったが、様々な力学系で評価される大規模方法論比較の文献にはまだ差がある。本研究では,カオスシステムのdysts標準データベースを用いて,分散回帰型を体系的にベンチマークする。特に,このオープンソースツールを用いて,異なるシステム識別手法を定量的に比較する方法を実証する。このベンチマークをどのように利用できるかを説明するために、非線形力学最適化問題(SINDy)のスパース同定を解くための4つのアルゴリズムを比較し、元のアルゴリズムと最近の混合整数離散アルゴリズムの強い性能を求める。いずれの場合も,SINDyの雑音頑健性を改善し,統計的比較を行うためにアンサンブルを用いた。さらに,SINDyの弱い定式化が,クリーンデータにおいても従来の手法よりも大幅に改善されていることを示す。最後に,シンディアルゴリズムから生成するパレート・オプティカルモデルが方程式の性質にどのように依存しているかを考察し,カオス量,スケール分離量,非線形度,構文複雑性を定量化する力学特性の組に対して,その性能が有意な依存性を示さないことを見出した。 Sparse system identification is the data-driven process of obtaining parsimonious differential equations that describe the evolution of a dynamical system, balancing model complexity and accuracy. There has been rapid innovation in system identification across scientific domains, but there remains a gap in the literature for large-scale methodological comparisons that are evaluated on a variety of dynamical systems. In this work, we systematically benchmark sparse regression variants by utilizing the dysts standardized database of chaotic systems. In particular, we demonstrate how this open-source tool can be used to quantitatively compare different methods of system identification. To illustrate how this benchmark can be utilized, we perform a large comparison of four algorithms for solving the sparse identification of nonlinear dynamics (SINDy) optimization problem, finding strong performance of the original algorithm and a recent mixed-integer discrete algorithm. In all cases, we used ensembling to improve the noise robustness of SINDy and provide statistical comparisons. In addition, we show very compelling evidence that the weak SINDy formulation provides significant improvements over the traditional method, even on clean data. Lastly, we investigate how Pareto-optimal models generated from SINDy algorithms depend on the properties of the equations, finding that the performance shows no significant dependence on a set of dynamical properties that quantify the amount of chaos, scale separation, degree of nonlinearity, and the syntactic complexity.	翻訳日:2023-02-26 13:58:33 公開日:2023-02-04
# モバイルアプリデータを用いた社会経済的幸福の予測 : フランスを事例として Predicting Socio-Economic Well-being Using Mobile Apps Data: A Case Study of France ( http://arxiv.org/abs/2301.09986v2 ) ライセンス: Link先を確認	Rahul Goel, Angelo Furno, Rajesh Sharma	(参考訳) 社会経済指標は、国の全体状態を評価する文脈を提供する。これらの指標には教育、性別、貧困、雇用、その他の要因に関する情報が含まれる。そのため、社会調査や政府の監視には信頼性と正確性が不可欠である。国勢調査など現在のデータソースの多くは、人口が少ないか、頻繁に更新されている。それでも、コールデータレコード(CDR)やモバイルアプリの利用といった代替データソースは、社会経済的指標を特定するための費用対効果と最新の情報源として機能する。本研究では,モバイルアプリデータを用いて社会経済的特徴を予測する。約3000万のユーザが550,000平方km以上を分散し,25,000以上の基地局を運用する,数千のモバイルアプリケーションのトラフィックをキャプチャするデータを用いた大規模調査を行った。データセットはフランス全土をカバーし、2019年3月16日から6月6日までの2.5ヶ月以上に及ぶ。アプリの利用パターンを使うことで、最良のモデルは社会経済指標を見積もることができる(r-二乗スコアは0.16)。さらに,モデルの説明可能性を用いて,モバイルアプリの利用パターンがirisの社会経済格差を明らかにする可能性を見出した。本研究は,進化するネットワークパターンを理解するためのユーザ時間的ネットワーク分析や,代替データソースの探索など,今後の介入に対するいくつかの方法を提供する。 Socio-economic indicators provide context for assessing a country's overall condition. These indicators contain information about education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Most data sources available today, such as censuses, have sparse population coverage or are updated infrequently. Nonetheless, alternative data sources, such as call data records (CDR) and mobile app usage, can serve as cost-effective and up-to-date sources for identifying socio-economic indicators. This work investigates mobile app data to predict socio-economic features. We present a large-scale study using data that captures the traffic of thousands of mobile applications by approximately 30 million users distributed over 550,000 km square and served by over 25,000 base stations. The dataset covers the whole France territory and spans more than 2.5 months, starting from 16th March 2019 to 6th June 2019. Using the app usage patterns, our best model can estimate socio-economic indicators (attaining an R-squared score upto 0.66). Furthermore, using models' explainability, we discover that mobile app usage patterns have the potential to reveal socio-economic disparities in IRIS. Insights of this study provide several avenues for future interventions, including user temporal network analysis to understand evolving network patterns and exploration of alternative data sources.	翻訳日:2023-02-19 13:45:41 公開日:2023-02-04
# 米国における生活支援: オープンデータセット Assisted Living in the United States: an Open Dataset ( http://arxiv.org/abs/2212.14092v2 ) ライセンス: Link先を確認	Anton Stengel, Jaan Altosaar, Rebecca Dittrich, Noemie Elhadad	(参考訳) 補助生活施設(英: assisted living facility、alf)は、誰かが生活し、交通などの社会的支援を受け、トイレやドレッシングといった日常生活の活動を補助する場所である。 alfsが重要な役割を担っているにもかかわらず、メディケアの認定を受ける必要はなく、これらの施設の公共の国立データベースも存在しない。アメリカ合衆国で最初のALFの公開データセットを公開し、50の州とDC全てを44,638の施設と120万のベッドでカバーした。このデータセットは、既存の公衆衛生問題に対する答えを提供するだけでなく、必要な施設を見つけるのに役立つ。このデータセットは, 人種, 障害, 所得などの健康格差に関連する郡レベルの社会経済変数について, 閉データを用いたALFの全国調査[4]の結果を再現して検証した。このデータセットの価値を示すために、コミュニティベースのケアへのアクセスを評価するための新しいメトリクスも提案する。必要な個人がalfに到達するために移動しなければならない平均距離を計算する。データセットと関連するコードはgithub.com/antonstengel/assisted-living-dataで入手できる。 An assisted living facility (ALF) is a place where someone can live, have access to social supports such as transportation, and receive assistance with the activities of daily living such as toileting and dressing. Despite the important role of ALFs, they are not required to be certified with Medicare and there is no public national database of these facilities. We present the first public dataset of ALFs in the United States, covering all 50 states and DC with 44,638 facilities and over 1.2 million beds. This dataset can help provide answers to existing public health questions as well as help those in need find a facility. The dataset was validated by replicating the results of a nationwide study of ALFs that uses closed data [4], where the prevalence of ALFs is assessed with respect to county-level socioeconomic variables related to health disparity such as race, disability, and income. To showcase the value of this dataset, we also propose a novel metric to assess access to community-based care. We calculate the average distance an individual in need must travel in order to reach an ALF. The dataset and all relevant code are available at github.com/antonstengel/assisted-living-data.	翻訳日:2023-02-19 13:22:21 公開日:2023-02-04
# フェイクニュースにおけるジェンダーバイアス:分析 Gender Bias in Fake News: An Analysis ( http://arxiv.org/abs/2209.11984v3 ) ライセンス: Link先を確認	Navya Sahadevan, Deepak P	(参考訳) 偽ニュースに関するデータサイエンスの研究は近年、大きな公開ベンチマークデータセットの出現によって、非常に勢いを増している。ジェンダーバイアスはニュースメディアを広める問題であるとするメディア研究の中で、確立されているが、ジェンダーバイアスとフェイクニュースの関係についてはほとんど調査されていない。本研究では,公開ベンチマークデータセットよりも単純で透明なレキシコンベースの手法を活用し,性バイアスvis-a-vis偽ニュースを初めて実証的に分析する。本分析により, 偽ニュースにおける性バイアスの頻度は, 3つの顔, 豊富, 感情, 近位語にまたがる。この分析から得られた知見は、フェイクニュースの研究においてジェンダーバイアスが重要な考慮事項である必要があるという強い議論をもたらす。 Data science research into fake news has gathered much momentum in recent years, arguably facilitated by the emergence of large public benchmark datasets. While it has been well-established within media studies that gender bias is an issue that pervades news media, there has been very little exploration into the relationship between gender bias and fake news. In this work, we provide the first empirical analysis of gender bias vis-a-vis fake news, leveraging simple and transparent lexicon-based methods over public benchmark datasets. Our analysis establishes the increased prevalance of gender bias in fake news across three facets viz., abundance, affect and proximal words. The insights from our analysis provide a strong argument that gender bias needs to be an important consideration in research into fake news.	翻訳日:2023-02-19 11:21:47 公開日:2023-02-04
# スリランカにおけるビデオレビューをアンボックスするyoutubeスマートフォンの感情分析 Sentiment Analysis on YouTube Smart Phone Unboxing Video Reviews in Sri Lanka ( http://arxiv.org/abs/2302.03496v1 ) ライセンス: Link先を確認	Sherina Sally	(参考訳) 製品関連レビューは、主にYouTubeのビデオで共有されるユーザー体験に基づいている。 2021年に世界で2番目に人気のあるウェブサイトである。人々は購入する前に、全体的なフィードバックを集め、価値のある決定を下すために、最近リリースされた製品でビデオを見ることを好む。これらのビデオは、技術材料に熱心であるvloggerたちによって作成され、フィードバックは通常、製品やブランドの経験豊富なユーザーによって置かれる。ユーザレビューの感情を分析することは、製品全般に対する有用な洞察を与えます。この調査は、2021年にリリースされたiPhone 13、Google Pixel 6、Samsung Galaxy S21の3つのスマートフォンレビューに焦点を当てている。語彙と規則に基づく感情分析ツールであるVADERは、それぞれのコメントを適切な正または負の向きに分類するために使用された。 3つのスマートフォンはいずれもユーザーの視点から肯定的な評価を示し、iphone 13は肯定的なレビュー数が最も多い。得られたモデルはN\"aive Bayes, Decision Tree, Support Vector Machineを使ってテストされている。これら3つの分類器のうち、Support Vector Machineはより高い精度とF1スコアを示している。 Product-related reviews are based on users' experiences that are mostly shared on videos in YouTube. It is the second most popular website globally in 2021. People prefer to watch videos on recently released products prior to purchasing, in order to gather overall feedback and make worthy decisions. These videos are created by vloggers who are enthusiastic about technical materials and feedback is usually placed by experienced users of the product or its brand. Analyzing the sentiment of the user reviews gives useful insights into the product in general. This study is focused on three smartphone reviews, namely, Apple iPhone 13, Google Pixel 6, and Samsung Galaxy S21 which were released in 2021. VADER, which is a lexicon and rule-based sentiment analysis tool was used to classify each comment to its appropriate positive or negative orientation. All three smartphones show a positive sentiment from the users' perspective and iPhone 13 has the highest number of positive reviews. The resulting models have been tested using N\"aive Bayes, Decision Tree, and Support Vector Machine. Among these three classifiers, Support Vector Machine shows higher accuracies and F1-scores.	翻訳日:2023-02-08 16:16:27 公開日:2023-02-04
# 生体信号と浅部機械学習によるハンチントン病予後自動診断 Automated Huntington's Disease Prognosis via Biomedical Signals and Shallow Machine Learning ( http://arxiv.org/abs/2302.03605v1 ) ライセンス: Link先を確認	Sucheer Maddury	(参考訳) ハンティントン病(英: huntington's disease、hd)は、hdの早期予後は患者の生活の質を著しく改善するが、患者の寿命を制限する稀な遺伝的決定脳障害である。現在のHD予後法には、臨床および画像因子などの様々な複雑なバイオマーカーの使用が含まれるが、これらの手法には、そのリソース需要や、症状や非症状の患者を区別できないことなど、多くの欠点がある。定量的なバイオメディカルシグナルは統合失調症などの他の神経疾患の診断に使われており、hd患者の異常を暴露する可能性がある。本研究は, 心電図, 心電図, 機能的近赤外分光データを用いて, 27例のHD陽性患者, 36例, 6例の未知の患者を対象に, プレメイド, 認定データセットを用いた。最初にデータを前処理し、変換信号と生信号の両方から様々な特徴を抽出し、その後、多くの浅い機械学習技術を適用した。最大精度はスケールアウトしたExtremely Randomized Treesアルゴリズムにより達成され、受信者特性0.963の曲線下と91.353%の精度で達成された。その後の機能分析の結果、60.865%がp<0.05であり、生信号の特徴が最も重要であることがわかった。以上の結果から,hdの異常をマークする神経信号と心臓信号の有望性,および疾患の進行についての評価を行った。 Huntington's disease (HD) is a rare, genetically-determined brain disorder that limits the life of the patient, although early prognosis of HD can substantially improve the patient's quality of life. Current HD prognosis methods include using a variety of complex biomarkers such as clinical and imaging factors, however these methods have many shortfalls, such as their resource demand and failure to distinguish symptomatic and asymptomatic patients. Quantitative biomedical signaling has been used for diagnosis of other neurological disorders such as schizophrenia, and has potential for exposing abnormalities in HD patients. In this project, we used a premade, certified dataset collected at a clinic with 27 HD positive patients, 36 controls, and 6 unknowns with electroencephalography, electrocardiography, and functional near-infrared spectroscopy data. We first preprocessed the data and extracted a variety of features from both the transformed and raw signals, after which we applied a plethora of shallow machine learning techniques. We found the highest accuracy was achieved by a scaled-out Extremely Randomized Trees algorithm, with area under the curve of the receiver operator characteristic of 0.963 and accuracy of 91.353%. The subsequent feature analysis showed that 60.865% of the features had p<0.05, with the features from the raw signal being most significant. The results indicate the promise of neural and cardiac signals for marking abnormalities in HD, as well as evaluating the progression of the disease in	翻訳日:2023-02-08 15:40:10 公開日:2023-02-04
# PartitionVAE -- 人間の解釈可能なVAE PartitionVAE -- a human-interpretable VAE ( http://arxiv.org/abs/2302.03689v1 ) ライセンス: Link先を確認	Fareed Sheriff, Sameer Pai	(参考訳) 可変オートエンコーダ(VAE)は、入力画像空間の分布を、その分布に関する事前情報を前提とせず明示的に学習するオートエンコーダである。これにより、潜在空間の分布において互いに近い類似のサンプルを分類することができる。 VAEは古典的には、遅延空間は通常の分布であると仮定するが、多くの分布先行は機能し、損失関数のK-L発散項を通じてこの仮定を符号化する。 While VAEs learn the distribution of the latent space and naturally make each dimension in the latent space as disjoint from the others as possible, they do not group together similar features -- the image space feature represented by one unit of the representation layer does not necessarily have high correlation with the feature represented by a neighboring unit of the representation layer. This makes it difficult to interpret VAEs since the representation layer is not structured in a way that is easy for humans to parse. We aim to make a more interpretable VAE by partitioning the representation layer into disjoint sets of units. Partitioning the representation layer into disjoint sets of interconnected units yields a prior that features of the input space to this new VAE, which we call a partition VAE or PVAE, are grouped together by correlation -- for example, if our image space were the space of all ping ping game images (a somewhat complex image space we use to test our architecture) then we would hope the partitions in the representation layer each learned some large feature of the image like the characteristics of the ping pong table or the characteristics and position of the players or the ball. また、PVAEにコスト削減策として、サブレゾリューションを追加します。長時間GPUトレーニング環境にアクセスできず、Google Colab Proは費用がかかるため、入力画像からスケールダウンした寸法の画像を一定要素で出力することにより、PVAEの複雑さを低減しようとするため、モデルのより小さなバージョンを出力せざるを得ない。次に、隣接する画素を補間することで、損失と訓練を計算する解像度を高める。 MNISTとSports10でPVAEをチューニングし、その有効性をテストする。 VAEs, or variational autoencoders, are autoencoders that explicitly learn the distribution of the input image space rather than assuming no prior information about the distribution. This allows it to classify similar samples close to each other in the latent space's distribution. VAEs classically assume the latent space is normally distributed, though many distribution priors work, and they encode this assumption through a K-L divergence term in the loss function. While VAEs learn the distribution of the latent space and naturally make each dimension in the latent space as disjoint from the others as possible, they do not group together similar features -- the image space feature represented by one unit of the representation layer does not necessarily have high correlation with the feature represented by a neighboring unit of the representation layer. This makes it difficult to interpret VAEs since the representation layer is not structured in a way that is easy for humans to parse. We aim to make a more interpretable VAE by partitioning the representation layer into disjoint sets of units. Partitioning the representation layer into disjoint sets of interconnected units yields a prior that features of the input space to this new VAE, which we call a partition VAE or PVAE, are grouped together by correlation -- for example, if our image space were the space of all ping ping game images (a somewhat complex image space we use to test our architecture) then we would hope the partitions in the representation layer each learned some large feature of the image like the characteristics of the ping pong table or the characteristics and position of the players or the ball. We also add to the PVAE a cost-saving measure: subresolution. Because we do not have access to GPU training environments for long periods of time and Google Colab Pro costs money, we attempt to decrease the complexity of the PVAE by outputting an image with dimensions scaled down from the input image by a constant factor, thus forcing the model to output a smaller version of the image. We then increase the resolution to calculate loss and train by interpolating through neighboring pixels. We train a tuned PVAE on MNIST and Sports10 to test its effectiveness.	翻訳日:2023-02-08 15:13:35 公開日:2023-02-04
# インテリジェント交通システムにおける交通光制御のための深層強化学習 Deep Reinforcement Learning for Traffic Light Control in Intelligent Transportation Systems ( http://arxiv.org/abs/2302.03669v1 ) ライセンス: Link先を確認	Xiao-Yang Liu, Ming Zhu, Sem Borst, and Anwar Walid	(参考訳) インテリジェントトランスポートシステム(ITS)におけるスマートトラヒックライトは、交通効率を大幅に向上させ、混雑を低減するために考えられている。道路網におけるリアルタイム交通状況に基づいて信号機を適応的に制御する手法として,深部強化学習(DRL)がある。しかし、従来の手法はスケーラビリティに乏しい。 In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. また、5 \times 10$グリッドロードネットワークで ``greenwave" パターンを検証する。第3に, 実験結果に示された「グリーンウェーブ」ポリシーは, 特定交通モデル(複数道路を横断する道路)において最適であることが証明されたため, DRLアルゴリズムが好ましい解を生成することを示す。単一の道路交差点とグリッド道路ネットワークの両方で配信されたポリシーは、DRLアルゴリズムのスケーラビリティを示している。 Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms.	翻訳日:2023-02-08 15:10:33 公開日:2023-02-04
# 量子回路の完全等式理論 A Complete Equational Theory for Quantum Circuits ( http://arxiv.org/abs/2206.10577v2 ) ライセンス: Link先を確認	Alexandre Cl\'ement, Nicolas Heurtel, Shane Mansfield, Simon Perdrix, Beno\^it Valiron	(参考訳) 量子回路に対する最初の完全方程式理論を導入する。より正確には、2つの回路が同じユニタリ写像を表現していることと、2つの回路が一方を他方に変換できるかどうかを方程式を用いて証明する一連の回路方程式を導入する。この証明は、基本ゲートを用いて定義されるマルチコントロールゲートの性質と、線形光回路への量子回路の符号化に基づくもので、完全な公理化であることが証明されている。 We introduce the first complete equational theory for quantum circuits. More precisely, we introduce a set of circuit equations that we prove to be sound and complete: two circuits represent the same unitary map if and only if they can be transformed one into the other using the equations. The proof is based on the properties of multi-controlled gates -- that are defined using elementary gates -- together with an encoding of quantum circuits into linear optical circuits, which have been proved to have a complete axiomatisation.	翻訳日:2023-02-08 12:44:23 公開日:2023-02-04
# 自由形電磁逆設計のためのニューラルネットワークに基づくサロゲート解法 A neural operator-based surrogate solver for free-form electromagnetic inverse design ( http://arxiv.org/abs/2302.01934v1 ) ライセンス: Link先を確認	Yannick Augenstein, Taavi Rep\"an, Carsten Rockstuhl	(参考訳) ニューラルネットワークは、科学機械学習の文脈で偏微分方程式を解く強力なツールとして登場した。本稿では,改良したフーリエニューラル演算子を電磁散乱問題のサロゲート解法として実装し,そのデータ効率を既存の手法と比較する。さらに,自由形,完全3次元電磁散乱器の勾配に基づくナノフォトニクス逆設計への応用を実証する。 Neural operators have emerged as a powerful tool for solving partial differential equations in the context of scientific machine learning. Here, we implement and train a modified Fourier neural operator as a surrogate solver for electromagnetic scattering problems and compare its data efficiency to existing methods. We further demonstrate its application to the gradient-based nanophotonic inverse design of free-form, fully three-dimensional electromagnetic scatterers, an area that has so far eluded the application of deep learning techniques.	翻訳日:2023-02-07 21:08:35 公開日:2023-02-04
# 因果レンズによるバイアスに対する感情分析システムの評価 Rating Sentiment Analysis Systems for Bias through a Causal Lens ( http://arxiv.org/abs/2302.02038v1 ) ライセンス: Link先を確認	Kausik Lakkaraju, Biplav Srivastava, Marco Valtorta	(参考訳) 感情分析システム(sass)はデータ駆動型人工知能(ai)システムであり、テキストの一部が与えられたとき、入力で表現される極性と感情の強さを伝える1つ以上の数字を割り当てる。他の自動機械学習システムと同様に、入力の(小さな)変化が出力の劇的な揺らぎを引き起こすようなモデルの不確実性を示すことも知られている。これは、入力が性別や人種のような保護された特徴と関連付けられている場合、特に問題となる。本稿では,テキスト入力の他の構成要素,例えば選択された感情語が固定された場合でも,出力感情が保護変数に敏感であるかどうかをテストするために,制御因果設定において入力が摂動しているsassを評価し評価する新しい手法を提案する。次に、結果を使用してラベル(レーティング)を細かなレベルと全体的なレベルに割り当て、入力変更に対するsasの堅牢さを伝達します。評価は、SASを比較し、行動に基づいてそれらの中から選択する原則として機能する。これは、すべてのユーザー、特に既存のsassを再利用してより大きなaiシステムを構築しているが、比較するためのコードやトレーニングデータにアクセスできない開発者にとって有益である。 Sentiment Analysis Systems (SASs) are data-driven Artificial Intelligence (AI) systems that, given a piece of text, assign one or more numbers conveying the polarity and emotional intensity expressed in the input. Like other automatic machine learning systems, they have also been known to exhibit model uncertainty where a (small) change in the input leads to drastic swings in the output. This can be especially problematic when inputs are related to protected features like gender or race since such behavior can be perceived as a lack of fairness, i.e., bias. We introduce a novel method to assess and rate SASs where inputs are perturbed in a controlled causal setting to test if the output sentiment is sensitive to protected variables even when other components of the textual input, e.g., chosen emotion words, are fixed. We then use the result to assign labels (ratings) at fine-grained and overall levels to convey the robustness of the SAS to input changes. The ratings serve as a principled basis to compare SASs and choose among them based on behavior. It benefits all users, especially developers who reuse off-the-shelf SASs to build larger AI systems but do not have access to their code or training data to compare.	翻訳日:2023-02-07 20:43:25 公開日:2023-02-04
# GDB: Gated Convolutionsベースのドキュメントバイナリ化 GDB: Gated convolutions-based Document Binarization ( http://arxiv.org/abs/2302.02073v1 ) ライセンス: Link先を確認	Zongyuan Yang, Yongping Xiong, Guibin Wu	(参考訳) ドキュメントビナライゼーションは多くの文書分析タスクにおいて重要な前処理ステップである。しかし,既存の方法では,バニラ畳み込みの公平な処理や境界情報による適切な監視を伴わないストロークエッジの抽出などにより,ストロークエッジを微細に抽出することはできない。本稿では、ゲーティング値の学習としてテキスト抽出を定式化し、不正確なストロークエッジ抽出の問題を解決するために、エンドツーエンドのゲート畳み込みネットワーク(GDB)を提案する。ゲート畳み込みを適用して、異なる注意でストロークの特徴を選択的に抽出する。提案する枠組みは2段階からなる。まず、余分なエッジブランチを持つ粗いサブネットワークをトレーニングし、プリオリマスクとエッジを入力してより正確な特徴マップを得る。次に、シャープエッジに基づくゲート畳み込みにより第1段の出力を洗練するために、改良サブネットワークをカスケードする。グローバル情報に関しては、GDBにはローカル機能とグローバル機能を組み合わせたマルチスケール操作も含まれている。 2009年から2019年にかけて,dibco(document image binarization contest)データセットの総合実験を行った。実験の結果,提案手法は平均値で最先端手法を上回り,6つのベンチマークデータセットで上位ランキングを得た。 Document binarization is a key pre-processing step for many document analysis tasks. However, existing methods can not extract stroke edges finely, mainly due to the fair-treatment nature of vanilla convolutions and the extraction of stroke edges without adequate supervision by boundary-related information. In this paper, we formulate text extraction as the learning of gating values and propose an end-to-end gated convolutions-based network (GDB) to solve the problem of imprecise stroke edge extraction. The gated convolutions are applied to selectively extract the features of strokes with different attention. Our proposed framework consists of two stages. Firstly, a coarse sub-network with an extra edge branch is trained to get more precise feature maps by feeding a priori mask and edge. Secondly, a refinement sub-network is cascaded to refine the output of the first stage by gated convolutions based on the sharp edge. For global information, GDB also contains a multi-scale operation to combine local and global features. We conduct comprehensive experiments on ten Document Image Binarization Contest (DIBCO) datasets from 2009 to 2019. Experimental results show that our proposed methods outperform the state-of-the-art methods in terms of all metrics on average and achieve top ranking on six benchmark datasets.	翻訳日:2023-02-07 20:35:18 公開日:2023-02-04
# 事前学習モデルによる意味誘導画像の拡張 Semantic-Guided Image Augmentation with Pre-trained Models ( http://arxiv.org/abs/2302.02070v1 ) ライセンス: Link先を確認	Bohan Li, Xinghao Wang, Xiao Xu, Yutai Hou, Yunlong Feng, Feng Wang, Wanxiang Che	(参考訳) 画像拡張は、コンピュータビジョンにおけるデータの不足を軽減する共通のメカニズムである。既存の画像増倍法は、しばしば元の画像の増倍に事前定義された変換や混合を適用するが、局所的にしか変化しない。これにより、意味情報の維持と画像の多様性の向上のバランスを見つけるのに苦労する。本稿では,事前学習モデル(SIP)を用いたセマンティック誘導画像拡張手法を提案する。具体的には、SIPは画像ラベルとキャプションでプロンプトを構築し、事前訓練された安定拡散モデルのイメージ・ツー・イメージ生成プロセスをより良くガイドする。元の画像に含まれる意味情報はよく保存でき、拡張された画像は依然として多様性を維持している。実験の結果、SIPは一般的に使用されている2つのバックボーン、すなわちResNet-50とViTを平均して7つのデータセットで12.60%、2.07%改善できることがわかった。さらに、SIPは最高の画像拡張ベースラインRandAugmentを2つのバックボーンで4.46%、1.23%上回るだけでなく、ベースラインと自然に統合することでパフォーマンスも向上する。拡張画像の多様性,テキストプロンプトのアブレーション研究,生成画像の事例研究など,sipの詳細な解析を行った。 Image augmentation is a common mechanism to alleviate data scarcity in computer vision. Existing image augmentation methods often apply pre-defined transformations or mixup to augment the original image, but only locally vary the image. This makes them struggle to find a balance between maintaining semantic information and improving the diversity of augmented images. In this paper, we propose a Semantic-guided Image augmentation method with Pre-trained models (SIP). Specifically, SIP constructs prompts with image labels and captions to better guide the image-to-image generation process of the pre-trained Stable Diffusion model. The semantic information contained in the original images can be well preserved, and the augmented images still maintain diversity. Experimental results show that SIP can improve two commonly used backbones, i.e., ResNet-50 and ViT, by 12.60% and 2.07% on average over seven datasets, respectively. Moreover, SIP not only outperforms the best image augmentation baseline RandAugment by 4.46% and 1.23% on two backbones, but also further improves the performance by integrating naturally with the baseline. A detailed analysis of SIP is presented, including the diversity of augmented images, an ablation study on textual prompts, and a case study on the generated images.	翻訳日:2023-02-07 20:34:53 公開日:2023-02-04
# 学習とアンラーニングを組み込んだヘテロジニアス連合知識グラフ Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning ( http://arxiv.org/abs/2302.02069v1 ) ライセンス: Link先を確認	Xiangrong Zhu and Guangyao Li and Wei Hu	(参考訳) Federated Learning(FL)は最近、生データを共有せずに分散クライアント間でグローバル機械学習モデルをトレーニングするパラダイムとして登場した。知識グラフ(KG)埋め込みは、多くの知識駆動アプリケーションのバックボーンとして機能する連続ベクトル空間におけるKGを表す。有望な組み合わせとして、フェデレーションkg埋め込みは、ローカルデータのプライバシーを保ちながら、異なるクライアントから学んだ知識を十分に活用することができる。しかし、データの異質性や知識の忘れといった現実的な問題はいまだに残っている。本稿では,不均一なKG埋め込み学習とアンラーニングのための新しいFLフレームワークであるFedLUを提案する。データの不均一性による局所最適化とグローバル収束のドリフトに対処するため,局所的な知識をグローバルに伝達し,グローバルな知識を吸収する相互知識蒸留を提案する。さらに, 遡及的干渉と受動的減衰を組み合わせた認知神経科学に基づく未学習手法を提案し, 知識蒸留を再利用して, 地域顧客からの特定の知識を消去し, グローバルモデルに伝播させる手法を提案する。我々は最新技術の現実的な性能を評価するための新しいデータセットを構築する。大規模な実験により、FedLUはリンク予測と知識忘れの両方において優れた結果が得られることが示された。 Federated Learning (FL) recently emerges as a paradigm to train a global machine learning model across distributed clients without sharing raw data. Knowledge Graph (KG) embedding represents KGs in a continuous vector space, serving as the backbone of many knowledge-driven applications. As a promising combination, federated KG embedding can fully take advantage of knowledge learned from different clients while preserving the privacy of local data. However, realistic problems such as data heterogeneity and knowledge forgetting still remain to be concerned. In this paper, we propose FedLU, a novel FL framework for heterogeneous KG embedding learning and unlearning. To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back. Moreover, we present an unlearning method based on cognitive neuroscience, which combines retroactive interference and passive decay to erase specific knowledge from local clients and propagate to the global model by reusing knowledge distillation. We construct new datasets for assessing realistic performance of the state-of-the-arts. Extensive experiments show that FedLU achieves superior results in both link prediction and knowledge forgetting.	翻訳日:2023-02-07 20:34:28 公開日:2023-02-04
# live experience matters: ソーシャルメディア上で物質を使用する人に対するスティグマの自動検出 Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media ( http://arxiv.org/abs/2302.02064v1 ) ライセンス: Link先を確認	Salvatore Giorgi, Douglas Bellew, Daniel Roy Sadek Habib, Joao Sedoc, Chase Smitterberg, Amanda Devoto, McKenzie Himelein-Wachowiak, and Brenda Curtis	(参考訳) 物質(PWUS)を使用する人々に対するスティグマは、治療を求める主要な障壁である。さらに、治療中の患者は、より高いスティグマティゼーションを経験すれば脱落する傾向が強い。ヘイトスピーチと毒性の関連概念は、脆弱な人口を対象としたものを含むが、自動コンテンツモデレーション研究、スティグマ(stigma)、特に物質を使用する人はそうではない。本稿では、約5000の公開Reddit投稿のデータセットを用いて、PWUSに対するスティグマについて検討する。我々は,PWUSに対するスティグマの存在について,各投稿に注釈を付けるように依頼し,物質使用経験に関する一連の質問に回答するクラウドソースアノテーションタスクを実施した。結果、物質を使ったり、薬物使用障害の人を知っている労働者は、投稿を汚職として評価する傾向が強いことがわかった。これに基づいて、redditの投稿にスティグマタイジング(stigmatizing)とラベル付けする、生きた物質使用経験のある労働者を集中させる、教師付き機械学習フレームワークを使用します。コメントレベルの言語に加えて、個人レベルの人口層をモデル化すると、分類精度は0.69で、モデリング言語だけで17%向上している。最後に、pwusの物質と、他の言語(「人々」や「彼ら」)を取り巻く言語に同意しない人々、そして「アドディクト」のような用語がスティグマタイジングであるのに対し、pwusは特定の物質に関する議論をよりスティグマタイジングするのと対照的に)を区別する言語学者の手がかりを探究する。本研究は, 物質使用におけるスティグマの知覚特性について考察した。さらに、これらの結果は、これらの機械学習タスクの主観的な性質をさらに確立し、彼らの社会的コンテキストを理解する必要性を強調している。 Stigma toward people who use substances (PWUS) is a leading barrier to seeking treatment. Further, those in treatment are more likely to drop out if they experience higher levels of stigmatization. While related concepts of hate speech and toxicity, including those targeted toward vulnerable populations, have been the focus of automatic content moderation research, stigma and, in particular, people who use substances have not. This paper explores stigma toward PWUS using a data set of roughly 5,000 public Reddit posts. We performed a crowd-sourced annotation task where workers are asked to annotate each post for the presence of stigma toward PWUS and answer a series of questions related to their experiences with substance use. Results show that workers who use substances or know someone with a substance use disorder are more likely to rate a post as stigmatizing. Building on this, we use a supervised machine learning framework that centers workers with lived substance use experience to label each Reddit post as stigmatizing. Modeling person-level demographics in addition to comment-level language results in a classification accuracy (as measured by AUC) of 0.69 -- a 17% increase over modeling language alone. Finally, we explore the linguist cues which distinguish stigmatizing content: PWUS substances and those who don't agree that language around othering ("people", "they") and terms like "addict" are stigmatizing, while PWUS (as opposed to those who do not) find discussions around specific substances more stigmatizing. Our findings offer insights into the nature of perceived stigma in substance use. Additionally, these results further establish the subjective nature of such machine learning tasks, highlighting the need for understanding their social contexts.	翻訳日:2023-02-07 20:34:10 公開日:2023-02-04
# 履歴依存型動的文脈を用いた強化学習 Reinforcement Learning with History-Dependent Dynamic Contexts ( http://arxiv.org/abs/2302.02061v1 ) ライセンス: Link先を確認	Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier	(参考訳) 動的文脈マルコフ決定プロセス(dcmdps)は、文脈が時間とともに変化する非マルコフ環境を扱うためにコンテキスト境界mdpフレームワークを一般化した、歴史依存環境のための新しい強化学習フレームワークである。本モデルでは,文脈遷移を決定するためにアグリゲーション関数を活用し,履歴長に対する指数関数依存を破るロジスティックdcmdpsに着目した特別ケースを検討する。この特別な構造により、後悔の限界を定めている上位信頼境界型アルゴリズムを導出することができる。この理論結果に動機づけられ,潜在空間に計画し,歴史依存的特徴よりも楽観的手法を用いたロジスティックdcmdpsのための実用的なモデルベースアルゴリズムを提案する。提案手法の有効性を,レコメンデーションに応じてユーザ動作のダイナミクスが進化するレコメンデーションタスク(MovieLensデータを用いた)に示す。 We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.	翻訳日:2023-02-07 20:33:34 公開日:2023-02-04
# マスキング言語モデルにおける表現不足 Representation Deficiency in Masked Language Modeling ( http://arxiv.org/abs/2302.02060v1 ) ライセンス: Link先を確認	Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer	(参考訳) Masked Language Modeling (MLM) は、その単純さと有効性から、双方向テキストエンコーダを事前学習するための最も顕著なアプローチの1つである。 MLMに関する注目すべき懸念は、特別な$\texttt{[MASK]}$シンボルが事前トレーニングデータと下流データの間に相違を引き起こすことである。我々は、MLM事前学習が、$\texttt{[MASK]}$トークンのみを表すために、いくつかのモデル次元を割り当て、結果として、実際のトークンに対する表現不足が生じ、$\textt{[MASK]}$トークンを使わずに下流データに適用された場合、事前訓練されたモデルの表現が制限されることを経験的および理論的に示す。そこで本研究では,Masked Autoencoder アーキテクチャを MLM で事前トレーニングする MAE-LM を提案し,$\texttt{[MASK]} のトークンをエンコーダから除外する。実験により,MAE-LMは実トークン表現におけるモデル次元の利用を改良し,GLUEおよびSQuADベンチマークで微調整した場合,MAE-LMは異なる事前学習設定とモデルサイズでMLM事前学習モデルより一貫して優れることを示した。 Masked Language Modeling (MLM) has been one of the most prominent approaches for pretraining bidirectional text encoders due to its simplicity and effectiveness. One notable concern about MLM is that the special $\texttt{[MASK]}$ symbol causes a discrepancy between pretraining data and downstream data as it is present only in pretraining but not in fine-tuning. In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens. Motivated by the identified issue, we propose MAE-LM, which pretrains the Masked Autoencoder architecture with MLM where $\texttt{[MASK]}$ tokens are excluded from the encoder. Empirically, we show that MAE-LM improves the utilization of model dimensions for real token representations, and MAE-LM consistently outperforms MLM-pretrained models across different pretraining settings and model sizes when fine-tuned on the GLUE and SQuAD benchmarks.	翻訳日:2023-02-07 20:33:17 公開日:2023-02-04
# 意味セグメンテーションのための意味拡散ネットワーク Semantic Diffusion Network for Semantic Segmentation ( http://arxiv.org/abs/2302.02057v1 ) ライセンス: Link先を確認	Haoru Tan, Sitong Wu, Jimin Pi	(参考訳) 境界領域の正確かつ正確な予測はセマンティックセグメンテーションに不可欠である。しかし、一般に使用される畳み込み演算子は、局所的な詳細情報を滑らかにぼかす傾向があるため、深いモデルが正確な境界予測を生成するのが困難である。本稿では,意味的境界意識を高めるためのオペレータレベルのアプローチを提案し,深い意味的セグメンテーションモデルの予測を改善する。具体的には,まず境界特徴強調を異方性拡散過程として定式化する。次に、パラメータ化意味差畳み込み演算子と特徴融合モジュールとを含む拡散過程を近似する、意味拡散ネットワーク(SDN)と呼ばれる新しい学習可能なアプローチを提案する。我々のSDNは、元の機能からクラス間境界強化機能への微分可能なマッピングを構築することを目的としています。提案するsdnは、既存のエンコーダ/デコーダセグメンテーションモデルに簡単に接続可能な、効率的で柔軟なモジュールである。広範な実験により,提案手法は,公開ベンチマークに挑戦する上で,いくつかの典型的および最先端のセグメンテーションベースラインモデルに対して一貫した改善を達成可能であることが示された。コードはまもなくリリースされる。 Precise and accurate predictions over boundary areas are essential for semantic segmentation. However, the commonly-used convolutional operators tend to smooth and blur local detail cues, making it difficult for deep models to generate accurate boundary predictions. In this paper, we introduce an operator-level approach to enhance semantic boundary awareness, so as to improve the prediction of the deep semantic segmentation model. Specifically, we first formulate the boundary feature enhancement as an anisotropic diffusion process. We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module. Our SDN aims to construct a differentiable mapping from the original feature to the inter-class boundary-enhanced feature. The proposed SDN is an efficient and flexible module that can be easily plugged into existing encoder-decoder segmentation models. Extensive experiments show that our approach can achieve consistent improvements over several typical and state-of-the-art segmentation baseline models on challenging public benchmarks. The code will be released soon.	翻訳日:2023-02-07 20:32:48 公開日:2023-02-04
# 分子エンベディングのハーネス化シミュレーション Harnessing Simulation for Molecular Embeddings ( http://arxiv.org/abs/2302.02055v1 ) ライセンス: Link先を確認	Christopher Fifty, Joseph M. Paggi, Ehsan Amid, Jure Leskovec, Ron Dror	(参考訳) 深層学習は、何十年にもわたって計算生物学の進歩を解き放ったが、ラベル付きデータが乏しく、自己教師付き学習の利点が無視できるため、深層学習の技法を分子領域に拡張することは困難であることが証明されている。この研究では、異なるアプローチを探求します。深層強化学習とロボット工学の手法に着想を得て,分子組込みの開発に物理に基づく分子シミュレーションを応用した。グラフニューラルネットワークをシミュレーションデータに適合させることで、シミュレーション中の生物学的ターゲットと同じような相互作用を示す分子が埋め込み空間で同様の表現を発達させる。これらの埋め込みは、現実世界のデータに基づいて訓練された下流モデルの特徴空間を初期化して、シミュレーション中に学んだ情報を分子予測タスクにエンコードする。実験結果から,本手法は実世界の分子予測タスクにおける既存のディープラーニングモデルの性能を,下流モデルに最小限の修正を加えて38%向上させ,ハイパーパラメータチューニングを不要とした。 While deep learning has unlocked advances in computational biology once thought to be decades away, extending deep learning techniques to the molecular domain has proven challenging, as labeled data is scarce and the benefit from self-supervised learning can be negligible in many cases. In this work, we explore a different approach. Inspired by methods in deep reinforcement learning and robotics, we explore harnessing physics-based molecular simulation to develop molecular embeddings. By fitting a Graph Neural Network to simulation data, molecules that display similar interactions with biological targets under simulation develop similar representations in the embedding space. These embeddings can then be used to initialize the feature space of down-stream models trained on real-world data to encode information learned during simulation into a molecular prediction task. Our experimental findings indicate this approach improves the performance of existing deep learning models on real-world molecular prediction tasks by as much as 38% with minimal modification to the downstream model and no hyperparameter tuning.	翻訳日:2023-02-07 20:32:31 公開日:2023-02-04
# 動的グラフ予測による多変量時系列異常検出 Multivariate Time Series Anomaly Detection via Dynamic Graph Forecasting ( http://arxiv.org/abs/2302.02051v1 ) ライセンス: Link先を確認	Katrina Chen, Mingbin Feng, Tony S. Wirjanto	(参考訳) 単変量時系列の異常はしばしば、歴史的観測の大多数からの時間的パターンからの異常値と逸脱を指す。多変量時系列では、異常は時間とともに相関のような系列間の関係の異常な変化を指す。既存の研究は、グラフニューラルネットワークを通してそのような系列間関係をモデル化することができる。しかし、ほとんどの作業は、異常な関係を明示的に検出するために調整されていない時系列予測タスクや再構築タスクを支援するために、グローバルまたはコンテキストウィンドウ内で静的グラフを学習することに落ち着く。他の作品では、時系列グラフのリストの再構築や予測に基づいて異常を検出し、グラフの離散的な性質によってデータ内の時間的パターンをとらえる能力を不注意に弱めている。本研究では,動的時系列グラフのリストに基づく多変量時系列異常検出フレームワークDyGraphADを提案する。その中核となる考え方は、グラフ予測タスクと時系列予測タスクを同時に支援するために、グラフの進化する性質を活用することにより、シリーズ間関係とシリーズ内時間パターンの正常状態から異常状態への偏差に基づいて異常を検出することである。実世界のデータセットに関する数値実験により,DyGraphADはベースライン異常検出手法よりも優れた性能を示した。 Anomalies in univariate time series often refer to abnormal values and deviations from the temporal patterns from majority of historical observations. In multivariate time series, anomalies also refer to abnormal changes in the inter-series relationship, such as correlation, over time. Existing studies have been able to model such inter-series relationships through graph neural networks. However, most works settle on learning a static graph globally or within a context window to assist a time series forecasting task or a reconstruction task, whose objective is not tailored to explicitly detect the abnormal relationship. Some other works detect anomalies based on reconstructing or forecasting a list of inter-series graphs, which inadvertently weakens their power to capture temporal patterns within the data due to the discrete nature of graphs. In this study, we propose DyGraphAD, a multivariate time series anomaly detection framework based upon a list of dynamic inter-series graphs. The core idea is to detect anomalies based on the deviation of inter-series relationships and intra-series temporal patterns from normal to anomalous states, by leveraging the evolving nature of the graphs in order to assist a graph forecasting task and a time series forecasting task simultaneously. Our numerical experiments on real-world datasets demonstrate that DyGraphAD has superior performance than baseline anomaly detection approaches.	翻訳日:2023-02-07 20:32:12 公開日:2023-02-04
# REaLTabFormer: トランスフォーマーを用いたリアルリレーショナルデータとタブラルデータの生成 REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers ( http://arxiv.org/abs/2302.02041v1 ) ライセンス: Link先を確認	Aivin V. Solatorio and Olivier Dupriez	(参考訳) タブラルデータ(tabular data)は、データ編成の一般的な形式である。複数のモデルは、観察が独立した合成表型データセットを生成することができるが、リレーショナルデータセットを生成する能力を持つものは少ない。テーブルとテーブル間の関係の両方をモデル化する必要があるため、関係データのモデリングは難しい。 realtabformer (realistic relational and tabular transformer), 表型および関係型データ生成モデルであるrealtabformer (realistic relational and tabular transformer) を導入する。まず、自己回帰型gpt-2モデルを使用して親テーブルを作成し、次にsequence-to-sequence(seq2seq)モデルを使用して親テーブル上のリレーショナルデータセットを生成する。我々は,データのコピーを防止するためにターゲットマスキングを実装し,オーバーフィッティングを検出するために$q_{\delta}$ statistic and statistical bootstrappingを提案する。実世界のデータセットを用いた実験では、REaLTabFormerはベースラインモデルよりもリレーショナル構造をよりよくキャプチャする。 REaLTabFormerは、微調整を必要とせずに大規模な非リレーショナルデータセットに対して、予測タスク"out-of-box"の最先端結果も達成している。 Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular datasets where observations are independent, but few have the ability to produce relational datasets. Modeling relational data is challenging as it requires modeling both a "parent" table and its relationships across tables. We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq) model. We implement target masking to prevent data copying and propose the $Q_{\delta}$ statistic and statistical bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures the relational structure better than a baseline model. REaLTabFormer also achieves state-of-the-art results on prediction tasks, "out-of-the-box", for large non-relational datasets without needing fine-tuning.	翻訳日:2023-02-07 20:31:51 公開日:2023-02-04
# 残留膜電位によるANN-SNN変換誤差の低減 Reducing ANN-SNN Conversion Error through Residual Membrane Potential ( http://arxiv.org/abs/2302.02091v1 ) ライセンス: Link先を確認	Zecheng Hao, Tong Bu, Jianhao Ding, Tiejun Huang, Zhaofei Yu	(参考訳) スパイキングニューラルネットワーク(SNN)は、低消費電力のユニークな特性とニューロモルフィックチップ上の高速コンピューティングにより、広く学術的な注目を集めている。 SNNの様々なトレーニング手法の中で、ANN-SNN変換は大規模データセット上でのANNと同等の性能を示す。しかし,活性化層へのスパイク到来の時間的変化による偏差を示すむら誤差は効果的に解決されておらず,短時間のステップ条件下ではsnsの性能に深刻な打撃を与えている。本稿では,凹凸誤差の詳細な解析を行い,これらを4つのカテゴリに分類する。 ANNの出力がゼロであるのに対し、SNNの出力は最大パーセントのゼロよりも大きいことを指摘している。そこで本稿では,本事例の十分な条件と必要条件を理論的に証明し,残留膜電位に基づく最適化手法を提案する。実験の結果,提案手法はCIFAR-10, CIFAR-100, ImageNetデータセット上での最先端性能を実現することがわかった。例えば、ImageNetでトップ1の精度は10ステップで64.32\%に達する。我々の知る限り、ANN-SNN変換は複雑なデータセット上で高い精度と超低レイテンシを同時に達成できるのはこれが初めてである。コードはhttps://github.com/hzc1208/ANN2SNN\_SRPで入手できる。 Spiking Neural Networks (SNNs) have received extensive academic attention due to the unique properties of low power consumption and high-speed computing on neuromorphic chips. Among various training methods of SNNs, ANN-SNN conversion has shown the equivalent level of performance as ANNs on large-scale datasets. However, unevenness error, which refers to the deviation caused by different temporal sequences of spike arrival on activation layers, has not been effectively resolved and seriously suffers the performance of SNNs under the condition of short time-steps. In this paper, we make a detailed analysis of unevenness error and divide it into four categories. We point out that the case of the ANN output being zero while the SNN output being larger than zero accounts for the largest percentage. Based on this, we theoretically prove the sufficient and necessary conditions of this case and propose an optimization strategy based on residual membrane potential to reduce unevenness error. The experimental results show that the proposed method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet datasets. For example, we reach top-1 accuracy of 64.32\% on ImageNet with 10-steps. To the best of our knowledge, this is the first time ANN-SNN conversion can simultaneously achieve high accuracy and ultra-low-latency on the complex dataset. Code is available at https://github.com/hzc1208/ANN2SNN\_SRP.	翻訳日:2023-02-07 20:24:57 公開日:2023-02-04
# MOMA:自己監督型教員から学ぶ MOMA:Distill from Self-Supervised Teachers ( http://arxiv.org/abs/2302.02089v1 ) ライセンス: Link先を確認	Yuchong Yao, Nandakishor Desai, Marimuthu Palaniswami	(参考訳) コントラスト学習とマスク画像モデリングは、それぞれモーメントコントラスト(moco)とマスクオートエンコーダ(mae)が最先端である自己教師あり表現学習において、例外的な性能を示している。本研究では,MoCoとMAEを自己指導的に蒸留し,両方のパラダイムから知識を抽出する手法を提案する。提案するMOMAフレームワークに3つの異なる知識伝達機構を導入する。 1) 予備訓練したMoCoをMAEに希釈する。 2) MoCo と MoCo を蒸留した MAE と MoCo と MAE を無作為初期化学生に希釈した。蒸留中、教師と生徒は、それぞれオリジナルの入力とマスクされた入力を供給される。教師の正規化表現と生徒の投影表現とを整合させることにより学習を可能にする。この単純な設計は、非常に高いマスク比と劇的に訓練エポックスを低減した効率的な計算をもたらし、蒸留ターゲットに余分な配慮は必要としない。この実験は、MOMAが既存の最先端の手法に匹敵する性能を持つコンパクトな学生モデルを提供し、双方の自己教師付き学習パラダイムのパワーを組み合わせていることを示している。コンピュータビジョンにおける様々なベンチマークに対する競合結果を示す。本手法は,大規模事前学習モデルからの知識の伝達と適応に関する知見を,計算的に効率的な方法で提供することを願っている。 Contrastive Learning and Masked Image Modelling have demonstrated exceptional performance on self-supervised representation learning, where Momentum Contrast (i.e., MoCo) and Masked AutoEncoder (i.e., MAE) are the state-of-the-art, respectively. In this work, we propose MOMA to distill from pre-trained MoCo and MAE in a self-supervised manner to collaborate the knowledge from both paradigms. We introduce three different mechanisms of knowledge transfer in the propsoed MOMA framework. : (1) Distill pre-trained MoCo to MAE. (2) Distill pre-trained MAE to MoCo (3) Distill pre-trained MoCo and MAE to a random initialized student. During the distillation, the teacher and the student are fed with original inputs and masked inputs, respectively. The learning is enabled by aligning the normalized representations from the teacher and the projected representations from the student. This simple design leads to efficient computation with extremely high mask ratio and dramatically reduced training epochs, and does not require extra considerations on the distillation target. The experiments show MOMA delivers compact student models with comparable performance to existing state-of-the-art methods, combining the power of both self-supervised learning paradigms. It presents competitive results against different benchmarks in computer vision. We hope our method provides an insight on transferring and adapting the knowledge from large-scale pre-trained models in a computationally efficient way.	翻訳日:2023-02-07 20:24:35 公開日:2023-02-04
# AV-NeRF:リアルワールドオーディオ映像合成のためのニューラルネットワーク学習 AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis ( http://arxiv.org/abs/2302.02088v1 ) ライセンス: Link先を確認	Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu	(参考訳) 複雑な世界に対する人間の認識は、マルチモーダル信号の包括的な分析に依存しており、オーディオとビデオ信号の共起は、人間に豊かな手がかりを与える。本稿では,実世界における新しい映像シーン合成について述べる。オーディオ映像シーンの映像録画を前提として,その映像シーン内の任意のカメラ軌跡に沿って,空間的音声で新しい映像を合成する。音声合成にNeRFモデルを直接用いることは、事前知識の欠如と音響監督のために不十分である。この課題に対処するために,我々はまず,従来の音声伝搬の知識をNeRFに統合した音響認識型音声生成モジュールを提案し,そこで音声生成と視覚環境の3次元幾何を関連づける。また,音源に対する視聴方向を表す座標変換モジュールを提案する。このような方向変換は、モデルが音源中心の音響場を学ぶのに役立つ。さらに,頭部関連インパルス応答関数を用いて擬似バイノーラル音声を合成し,トレーニングを強化するデータ拡張を行う。実世界の映像シーンにおけるモデルの有用性を質的かつ定量的に実証する。我々は興味のある読者に、説得力のある比較のためにビデオ結果を見るよう勧める。 Human perception of the complex world relies on a comprehensive analysis of multi-modal signals, and the co-occurrences of audio and video signals provide humans with rich cues. This paper focuses on novel audio-visual scene synthesis in the real world. Given a video recording of an audio-visual scene, the task is to synthesize new videos with spatial audios along arbitrary novel camera trajectories in that audio-visual scene. Directly using a NeRF-based model for audio synthesis is insufficient due to its lack of prior knowledge and acoustic supervision. To tackle the challenges, we first propose an acoustic-aware audio generation module that integrates our prior knowledge of audio propagation into NeRF, in which we associate audio generation with the 3D geometry of the visual environment. In addition, we propose a coordinate transformation module that expresses a viewing direction relative to the sound source. Such a direction transformation helps the model learn sound source-centric acoustic fields. Moreover, we utilize a head-related impulse response function to synthesize pseudo binaural audio for data augmentation that strengthens training. We qualitatively and quantitatively demonstrate the advantage of our model on real-world audio-visual scenes. We refer interested readers to view our video results for convincing comparisons.	翻訳日:2023-02-07 20:24:06 公開日:2023-02-04
# 生まれた規則 -- 公理か結果か? The Born Rule -- Axiom or Result? ( http://arxiv.org/abs/2302.02086v1 ) ライセンス: Link先を確認	Jay Lawrence and Philip Goyal	(参考訳) ボルン則(英: born rule)は、量子論の標準版における崩壊公理の一部である。ここでは、その符号の二次的依存は、他の公理を超える1つの物理的仮定、すなわち、ある特定の測定結果(例えば、$\phi_k$)の確率が、その固有状態の1つがその結果に対応する限り、測定可能な選択から独立であることを示す。私たちはこの仮定を「観測可能な独立」と呼んでいる。結果として生まれた規則は公理のリストから完全に排除することはできないが、原則として、より物理的なステートメントに還元できる。我々のプレゼンテーションは、量子理論の標準講座を受講した上級の学部生や大学院生に適している。理論の特定の解釈には依存しない。 The Born rule is part of the collapse axiom in the standard version of quantum theory, as presented by standard textbooks on the subject. We show here that its signature quadratic dependence follows from a single additional physical assumption beyond the other axioms - namely, that the probability of a particular measurement outcome (the state $\phi_k$, say) is independent of the choice of observable to be measured, so long as one of its eigenstates corresponds to that outcome. We call this assumption ``observable independence.'' As a consequence, the Born rule cannot be completely eliminated from the list of axioms, but it can, in principle, be reduced to a more physical statement. Our presentation is suitable for advanced undergraduates or graduate students who have taken a standard course in quantum theory. It does not depend on any particular interpretation of the theory.	翻訳日:2023-02-07 20:23:48 公開日:2023-02-04
# 心の理論は、大きな言語モデルで自然発生的に現れたかもしれない Theory of Mind May Have Spontaneously Emerged in Large Language Models ( http://arxiv.org/abs/2302.02083v1 ) ライセンス: Link先を確認	Michal Kosinski	(参考訳) 心の理論、または他人に観察不能な精神状態をもたらす能力は、人間の社会的相互作用、コミュニケーション、共感、自己意識、道徳の中心である。人間のToMテストに広く用いられている古典的偽理解タスクを,事例や事前学習を伴わずに,いくつかの言語モデルに管理する。その結果,2022年以前のモデルでは,ToMタスクを解く能力がほとんどないことがわかった。しかし、2022年1月のGPT-3(davinci-002)では、ToMタスクの70%が解決された。さらに、2022年11月版(davinci-003)では、ToMタスクの93%が解決された。これらの結果から,ToM様の能力は言語モデルの言語能力向上の副産物として自然に出現した可能性が示唆された。 Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.	翻訳日:2023-02-07 20:23:33 公開日:2023-02-04
# Gated FusionによるNLPモデルの後方互換性向上 Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion ( http://arxiv.org/abs/2302.02080v1 ) ライセンス: Link先を確認	Yi-An Lai, Elman Mansimov, Yuqing Xie, Yi Zhang	(参考訳) ニューラルモデルを新しいバージョンにアップグレードする場合、レガシバージョンで遭遇しなかった新しいエラーを、レグレッションエラー(regress error)として導入することができる。モデルアップグレード中のこの一貫性のない振る舞いは、しばしば精度向上の利点を上回り、新しいモデルの採用を妨げる。モデルアップグレードからの回帰誤差を軽減するため、蒸留とアンサンブルは性能に大きな妥協なしに実現可能であることが証明された。進歩にもかかわらず、これらのアプローチは回帰の漸進的な削減を達成し、後方互換性のあるモデルアップグレードには程遠い。本研究では,古いモデルと新しいモデルの間で予測を混合する学習を通じて,後方互換性を促進する新しい手法gated fusionを提案する。 2つの異なるモデルアップグレードシナリオにおける実験結果から,提案手法は回帰誤差を平均62%削減し,最強のベースラインを平均25%上回る結果となった。 When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. This inconsistent behavior during model upgrade often outweighs the benefits of accuracy gain and hinders the adoption of new models. To mitigate regression errors from model upgrade, distillation and ensemble have proven to be viable solutions without significant compromise in performance. Despite the progress, these approaches attained an incremental reduction in regression which is still far from achieving backward-compatible model upgrade. In this work, we propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models. Empirical results on two distinct model upgrade scenarios show that our method reduces the number of regression errors by 62% on average, outperforming the strongest baseline by an average of 25%.	翻訳日:2023-02-07 20:23:15 公開日:2023-02-04
# FGSI:細粒度意味情報に基づく関係抽出のための距離スーパービジョン FGSI: Distant Supervision for Relation Extraction method based on Fine-Grained Semantic Information ( http://arxiv.org/abs/2302.02078v1 ) ライセンス: Link先を確認	Chenghong Sun, Weidong Ji, Guohui Zhou, Hui Guo, Zengxiang Yin and Yuqi Yue	(参考訳) 関係抽出の主な目的は、文の意味理解と知識グラフの構築において重要な役割を担っている、文内のエンティティのタグ付きペア間の意味関係を抽出することである。本稿では,文内のキーセマンティック情報が,エンティティ間の関係抽出において重要な役割を果たすことを提案する。文内のキーセマンティック情報がエンティティ関係抽出において重要な役割を果たすという仮説を提案する。そして,この仮説に基づき,文の内部から実体の位置に応じて文を3つのセグメントに分割し,文内部の微細な意味的特徴を文内注意機構を通じて発見し,無関係な雑音情報の干渉を低減する。提案する関係抽出モデルは、利用可能なポジティブな意味情報を十分に活用することができる。実験の結果,提案手法は既存手法と比較して精度-リコール曲線とp@n値が向上し,本モデルの有効性が証明された。 The main purpose of relation extraction is to extract the semantic relationships between tagged pairs of entities in a sentence, which plays an important role in the semantic understanding of sentences and the construction of knowledge graphs. In this paper, we propose that the key semantic information within a sentence plays a key role in the relationship extraction of entities. We propose the hypothesis that the key semantic information inside the sentence plays a key role in entity relationship extraction. And based on this hypothesis, we split the sentence into three segments according to the location of the entity from the inside of the sentence, and find the fine-grained semantic features inside the sentence through the intra-sentence attention mechanism to reduce the interference of irrelevant noise information. The proposed relational extraction model can make full use of the available positive semantic information. The experimental results show that the proposed relation extraction model improves the accuracy-recall curves and P@N values compared with existing methods, which proves the effectiveness of this model.	翻訳日:2023-02-07 20:23:00 公開日:2023-02-04
# クロス周波数時系列メタフォアキャスティング Cross-Frequency Time Series Meta-Forecasting ( http://arxiv.org/abs/2302.02077v1 ) ライセンス: Link先を確認	Mike Van Ness, Huibin Shen, Hao Wang, Xiaoyong Jin, Danielle C. Maddix, Karthick Gopalswamy	(参考訳) meta-forecastingは、メタラーニングと時系列予測を組み合わせた新しい分野だ。 meta-forecastingの目標は、ソース時系列のコレクションをトレーニングし、新しい時系列に1回ずつ一般化することだ。メタ予測における従来のアプローチは競合性能を実現するが、サンプリング周波数ごとに個別のモデルを訓練する制限がある。本研究では,様々なサンプリング周波数のメタフォアキャスティングを調査し,新しいモデルである連続周波数アダプタ(cfa)を導入し,周波数不変表現を学習する。我々は、CFAが周波数を一般化する際の性能を大幅に改善し、より大規模なマルチ周波数データセットを予測するための第一歩となることを発見した。 Meta-forecasting is a newly emerging field which combines meta-learning and time series forecasting. The goal of meta-forecasting is to train over a collection of source time series and generalize to new time series one-at-a-time. Previous approaches in meta-forecasting achieve competitive performance, but with the restriction of training a separate model for each sampling frequency. In this work, we investigate meta-forecasting over different sampling frequencies, and introduce a new model, the Continuous Frequency Adapter (CFA), specifically designed to learn frequency-invariant representations. We find that CFA greatly improves performance when generalizing to unseen frequencies, providing a first step towards forecasting over larger multi-frequency datasets.	翻訳日:2023-02-07 20:22:42 公開日:2023-02-04
# X-ReID: アイデンティティレベル人物再識別のためのクロスインスタンス変換器 X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification ( http://arxiv.org/abs/2302.02075v1 ) ライセンス: Link先を確認	Leqi Shen, Tao He, Yuchen Guo, Guiguang Ding	(参考訳) 現在、ほとんどの既存の人物再識別方法は、単一の画像からのみ抽出されるインスタンスレベル機能を使用している。しかし、これらのインスタンスレベルの特徴は、各アイデンティティの外観が異なる画像で大きく異なるため、容易に識別情報を無視することができる。したがって、各アイデンティティの異なるイメージ間で共有できるidレベルの機能を利用する必要がある。本稿では,同一人物画像から同一人物画像への情報をクロスアテンションで取り込み,より統一的で識別可能な歩行者情報を得ることにより,アイデンティティ・レベル特徴に対するインスタンス・レベル特徴の促進を提案する。 x-reid という新しいトレーニングフレームワークを提案する。具体的には、cross intra-identity instances module(intrax)は異なるidentityインスタンスを融合してidレベルの知識を転送し、インスタンスレベルの機能をよりコンパクトにする。 InterX(Cross Inter-Identity Instances Module)は、アイデンティティ内の変動を最小限に抑え、アイデンティティ間の変動を最大化する、異なるアイデンティティではなく、同じアイデンティティに対する注意応答を改善するために、ハードポジティとハードポジティのインスタンスを含む。ベンチマークデータセットに関する広範な実験は、既存の作業よりも優れた方法を示している。特にMSMT17では,2位に比べて1.1%のmAP改善が得られた。 Currently, most existing person re-identification methods use Instance-Level features, which are extracted only from a single image. However, these Instance-Level features can easily ignore the discriminative information due to the appearance of each identity varies greatly in different images. Thus, it is necessary to exploit Identity-Level features, which can be shared across different images of each identity. In this paper, we propose to promote Instance-Level features to Identity-Level features by employing cross-attention to incorporate information from one image to another of the same identity, thus more unified and discriminative pedestrian information can be obtained. We propose a novel training framework named X-ReID. Specifically, a Cross Intra-Identity Instances module (IntraX) fuses different intra-identity instances to transfer Identity-Level knowledge and make Instance-Level features more compact. A Cross Inter-Identity Instances module (InterX) involves hard positive and hard negative instances to improve the attention response to the same identity instead of different identity, which minimizes intra-identity variation and maximizes inter-identity variation. Extensive experiments on benchmark datasets show the superiority of our method over existing works. Particularly, on the challenging MSMT17, our proposed method gains 1.1% mAP improvements when compared to the second place.	翻訳日:2023-02-07 20:22:31 公開日:2023-02-04
# 量子計算:大規模クリティカルインフラストラクチャのための効率的なネットワークパーティショニング Quantum computation: Efficient network partitioning for large scale critical infrastructures ( http://arxiv.org/abs/2302.02074v1 ) ライセンス: Link先を確認	Saikat Ray Majumder, Annarita Giani, Weiwei Shen, Bogdan Neculaes, Daiwei Zhu, and Sonika Johri	(参考訳) 量子コンピュータは、古典的なコンピュータにとって困難な特定の計算問題に取り組むための、有効な代替手段として現れつつある。閉じ込められたイオンに基づく量子ハードウェアの急速な発展により、これらのシステムで効率的に解くことができるリスク管理問題を特定する実践的な動機がある。本稿では,重要なインフラにおけるリスクを分析する手段としてネットワーク分割に着目し,その実装に量子的アプローチを提案する。これは潜在的なスピードアップ量子コンピュータがスパースグラフラプラシアンの固有値と固有ベクトルを識別できる可能性に基づいており、これは古典的コンピュータ上の時間とメモリによって制約される手順である。 Quantum computers are emerging as a viable alternative to tackle certain computational problems that are challenging for classical computers. With the rapid development of quantum hardware such as those based on trapped ions, there is practical motivation for identifying risk management problems that are efficiently solvable with these systems. Here we focus on network partitioning as a means for analyzing risk in critical infrastructures and present a quantum approach for its implementation. It is based on the potential speedup quantum computers can provide in the identification of eigenvalues and eigenvectors of sparse graph Laplacians, a procedure which is constrained by time and memory on classical computers.	翻訳日:2023-02-07 20:22:07 公開日:2023-02-04
# 適応的拡張意味情報を組み合わせた知識グラフ補完法 Knowledge Graph Completion Method Combined With Adaptive Enhanced Semantic Information ( http://arxiv.org/abs/2302.02116v1 ) ライセンス: Link先を確認	Weidong Ji, Zengxiang Yin, Guohui Zhou, Yuqi Yue, Xinru Zhang, Chenghong Sun	(参考訳) 翻訳モデルは知識グラフ補完の過程において、トリアドの豊富な意味情報を無視する傾向にある。本稿では,適応的に強化された意味情報を含む知識グラフ補完手法を構築する。 BERTモデルを微調整して、トリアドに固有の隠された意味情報を取得し、その注意特徴埋め込み法を用いて、正及び負の3つのトライアドの関係と実体間の意味的注意スコアを算出し、それらを構造情報に組み込んで意味情報に対するソフト制約ルールを形成する。このルールは、意味情報の適応的な拡張を実現するために、元の翻訳モデルに追加される。さらに、効果に対する高次元ベクトルの効果を考慮し、bert-whitening法を用いて次元を縮小し、より効率的な意味ベクトル表現を生成する。実験による比較の結果,fb15kおよびwin18データセットにおいて,本手法の有効性と有効性を検証した原文翻訳モデルと比較して約2.6%の数値改善が得られた。 Translation models tend to ignore the rich semantic information in triads in the process of knowledge graph complementation. To remedy this shortcoming, this paper constructs a knowledge graph complementation method that incorporates adaptively enhanced semantic information. The hidden semantic information inherent in the triad is obtained by fine-tuning the BERT model, and the attention feature embedding method is used to calculate the semantic attention scores between relations and entities in positive and negative triads and incorporate them into the structural information to form a soft constraint rule for semantic information. The rule is added to the original translation model to realize the adaptive enhancement of semantic information. In addition, the method takes into account the effect of high-dimensional vectors on the effect, and uses the BERT-whitening method to reduce the dimensionality and generate a more efficient semantic vector representation. After experimental comparison, the proposed method performs better on both FB15K and WIN18 datasets, with a numerical improvement of about 2.6% compared with the original translation model, which verifies the reasonableness and effectiveness of the method.	翻訳日:2023-02-07 20:16:24 公開日:2023-02-04
# 複素ベリー相と不完全非エルミート相転移 Complex Berry phase and imperfect non-Hermitian phase transitions ( http://arxiv.org/abs/2302.02114v1 ) ライセンス: Link先を確認	Stefano Longhi and Liang Feng	(参考訳) 有効な非エルミート・ハミルトニアンによって記述される多くの古典的・量子系において、スペクトル相転移は、完全に実エネルギースペクトルから複素スペクトルへ、系の非エルミートパラメータが臨界値を超えると観測できる。パラダイム的な例はパリティ時(PT)対称性を持つ系によって提供され、そこではエネルギースペクトルが非破壊PT相において完全に現実のままであり、一方複素エネルギーへの遷移は非破壊PT相において観測される。このようなスペクトル相転移は普遍的に鋭い。しかし、系がゆっくりと周期的に周期的に循環すると、相転移は、系の周期的断熱的進化に付随する複雑なベリー相のため、滑らかに、すなわち不完全になる。この注目すべき現象は、外部直流場を受ける2バンド非エルミート格子のpt対称クラスにおけるワニエ・スターク・ラダーのスペクトル相転移を考慮し、ザック相の非有界虚部 -- ブリルアンゾーン全体にわたって進化するブロッホ固有状態によって取り出されたベリー相 -- が不完全なスペクトル相転移の原因であることを示すものである。 In many classical and quantum systems described by an effective non-Hermitian Hamiltonian, spectral phase transitions, from an entirely real energy spectrum to a complex spectrum, can be observed as a non-Hermitian parameter in the system is increased above a critical value. A paradigmatic example is provided by systems possessing parity-time (PT) symmetry, where the energy spectrum remains entirely real in the unbroken PT phase while a transition to complex energies is observed in the unbroken PT phase. Such spectral phase transitions are universally sharp. However, when the system is slowly and periodically cycled, the phase transition can become smooth, i.e. imperfect, owing to the complex Berry phase associated to the cyclic adiabatic evolution of the system. This remarkable phenomenon is illustrated by considering the spectral phase transition of the Wannier-Stark ladders in a PT-symmetric class of two-band non-Hermitian lattices subjected to an external dc field, unraveling that a non-vanishing imaginary part of the Zak phase -- the Berry phase picked up by a Bloch eigenstate evolving across the entire Brillouin zone -- is responsible for imperfect spectral phase transitions	翻訳日:2023-02-07 20:16:05 公開日:2023-02-04
# コードリポジトリの振る舞いデータによるセキュリティパッチの検出 Detecting Security Patches via Behavioral Data in Code Repositories ( http://arxiv.org/abs/2302.02112v1 ) ライセンス: Link先を確認	Nitzan Farhi, Noam Koenigstein, Yuval Shavitt	(参考訳) 今日のソフトウェアの大部分は、gitのような協調バージョン管理ツールを使って共同開発されている。脆弱性が検出され、修正されると、ソフトウェアを開発する開発者は、セキュリティの危険性をユーザコミュニティに警告し、セキュリティパッチを統合するよう促すために、Common Vulnerabilities and Exposures(CVEレコード)を発行する。しかし、一部の企業は脆弱性を公表せず、リポジトリを更新している。その結果、ユーザは脆弱性に気づいておらず、露出し続ける可能性がある。本稿では,Gitリポジトリの開発者動作のみを使用して,修正に伴うコード自体やコメント(コミットメッセージ)を分析することなく,セキュリティパッチを自動的に識別するシステムを提案する。秘密のセキュリティパッチを88.3%、F1スコア89.8%で公開できることを示した。この問題に対する言語的な解決法が提示されたのは今回が初めてである。 The absolute majority of software today is developed collaboratively using collaborative version control tools such as Git. It is a common practice that once a vulnerability is detected and fixed, the developers behind the software issue a Common Vulnerabilities and Exposures or CVE record to alert the user community of the security hazard and urge them to integrate the security patch. However, some companies might not disclose their vulnerabilities and just update their repository. As a result, users are unaware of the vulnerability and may remain exposed. In this paper, we present a system to automatically identify security patches using only the developer behavior in the Git repository without analyzing the code itself or the remarks that accompanied the fix (commit message). We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%. This is the first time that a language-oblivious solution for this problem is presented.	翻訳日:2023-02-07 20:15:40 公開日:2023-02-04
# 視覚トランスフォーマーにおける知識蒸留 : 批判的レビュー Knowledge Distillation in Vision Transformers: A Critical Review ( http://arxiv.org/abs/2302.02108v1 ) ライセンス: Link先を確認	Gousia Habib, Tausifa Jan Saleem, Brejesh Lall	(参考訳) 自然言語処理(nlp)では、トランスフォーマーはすでに注意に基づくエンコーダ・デコーダモデルを利用してこの分野に革命をもたらしている。近年,コンピュータビジョン(CV)にトランスフォーマーのようなアーキテクチャを採用し,画像分類やオブジェクト検出,セマンティックセグメンテーションといったタスクにおいて,これらのアーキテクチャの優れた性能を報告している。ビジョントランスフォーマー(ViT)は、競合するモデリング能力のために、畳み込みニューラルネットワーク(CNN)よりも優れたパフォーマンスを誇示している。しかし、これらのアーキテクチャは膨大な計算資源を必要とするため、リソース制約されたアプリケーションにこれらのモデルをデプロイすることは困難である。圧縮変圧器や拡張畳み込み、min-maxプール、1D畳み込みなどの圧縮関数など、この問題に対処する多くのソリューションが開発されている。モデル圧縮は最近、潜在的な治療としてかなりの研究の注目を集めている。重み量子化,重み多重化,プルーニング,知識蒸留 (kd) などの文献において,モデル圧縮法が提案されている。しかしながら、重み量子化、プルーニング、重み多重化といったテクニックは、圧縮を実行するための複雑なパイプラインを必要とする。 KDは、比較的単純なモデルが複雑なモデルと同じくらい正確にタスクを実行できる、シンプルで効果的なモデル圧縮技術であることが分かってきた。本稿では,vitモデルの効果的圧縮のためのkdに基づく様々な手法について述べる。この論文は、kdがこれらのモデルの計算とメモリ要求を減らす上で果たす役割を解明している。本稿は、まだ解決されていないViTが直面する様々な課題についても述べる。 In Natural Language Processing (NLP), Transformers have already revolutionized the field by utilizing an attention-based encoder-decoder model. Recently, some pioneering works have employed Transformer-like architectures in Computer Vision (CV) and they have reported outstanding performance of these architectures in tasks such as image classification, object detection, and semantic segmentation. Vision Transformers (ViTs) have demonstrated impressive performance improvements over Convolutional Neural Networks (CNNs) due to their competitive modelling capabilities. However, these architectures demand massive computational resources which makes these models difficult to be deployed in the resource-constrained applications. Many solutions have been developed to combat this issue, such as compressive transformers and compression functions such as dilated convolution, min-max pooling, 1D convolution, etc. Model compression has recently attracted considerable research attention as a potential remedy. A number of model compression methods have been proposed in the literature such as weight quantization, weight multiplexing, pruning and Knowledge Distillation (KD). However, techniques like weight quantization, pruning and weight multiplexing typically involve complex pipelines for performing the compression. KD has been found to be a simple and much effective model compression technique that allows a relatively simple model to perform tasks almost as accurately as a complex model. This paper discusses various approaches based upon KD for effective compression of ViT models. The paper elucidates the role played by KD in reducing the computational and memory requirements of these models. The paper also presents the various challenges faced by ViTs that are yet to be resolved.	翻訳日:2023-02-07 20:15:26 公開日:2023-02-04
# ハードサトゲン:ハードSATフォーミュラの難易度と強構造に配慮したベースラインの理解 HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline ( http://arxiv.org/abs/2302.02104v1 ) ライセンス: Link先を確認	Yang Li, Xinyan Chen, Wenxuan Guo, Xijun Li, Wanqian Luo, Junhua Huang, Hui-Ling Zhen, Mingxuan Yuan, Junchi Yan	(参考訳) 産業SAT公式生成は、実用SATアプリケーションにおけるヒューリスティックな開発と学習に基づく手法の急増にとって、重要かつ困難な課題である。既存のSAT生成アプローチでは、グローバルな構造特性を同時に捉えることはほとんどできず、様々な下流のエンゲージメントにとって有害な計算硬度を維持することができる。この目的のために,本研究では,従来の学習方法の制限について,従来の計算困難度を再現するための詳細な解析を行った。産業用公式が明らかなコミュニティ構造と過分な部分構造を示すことから,論理構造のセマンティックな形成が困難であることを示す上で,SAT式生成のためのニューラルスプリット・マージ・パラダイムにきめ細かな制御機構を導入し,産業用ベンチマークの構造的・計算的特性をよりよく回復させるHardSATGENを提案する。計算硬度とグローバルな構造特性の同時把握を両立させる手法として, 民間企業データの評価や, 実用化におけるハイパーパラメータチューニングなどの実験結果から, ハードサトゲンの有意な優位性が確認された。我々の最高の知識に対する最高の手法と比較すると、平均的な性能向上は構造統計で38.5%、計算メトリクスで88.4%、生成したインスタンスでチューニングされたソルバ開発を導く効果で140.7%以上を達成している。 Industrial SAT formula generation is a critical yet challenging task for heuristic development and the surging learning-based methods in practical SAT applications. Existing SAT generation approaches can hardly simultaneously capture the global structural properties and maintain plausible computational hardness, which can be hazardous for the various downstream engagements. To this end, we first present an in-depth analysis for the limitation of previous learning methods in reproducing the computational hardness of original instances, which may stem from the inherent homogeneity in their adopted split-merge procedure. On top of the observations that industrial formulae exhibit clear community structure and oversplit substructures lead to the difficulty in semantic formation of logical structures, we propose HardSATGEN, which introduces a fine-grained control mechanism to the neural split-merge paradigm for SAT formula generation to better recover the structural and computational properties of the industrial benchmarks. Experimental results including evaluations on private corporate data and hyperparameter tuning over solvers in practical use show the significant superiority of HardSATGEN being the only method to successfully augments formulae maintaining similar computational hardness and capturing the global structural properties simultaneously. Compared to the best previous methods to our best knowledge, the average performance gains achieve 38.5% in structural statistics, 88.4% in computational metrics, and over 140.7% in the effectiveness of guiding solver development tuned by our generated instances.	翻訳日:2023-02-07 20:15:01 公開日:2023-02-04
# grande: 指向型マルチグラフによるニューラルモデルとアンチマネーロンダリングへの応用 GRANDE: a neural model over directed multigraphs with application to anti-money laundering ( http://arxiv.org/abs/2302.02101v1 ) ライセンス: Link先を確認	Ruofan Wu, Boqun Ma, Hong Jin, Wenlong Zhao, Weiqiang Wang, Tianyi Zhang	(参考訳) 近年,金融リスクマネジメント(FRM)分野へのグラフ表現学習技術の応用が注目されている。第一に、トランザクションネットワークは本質的にはマルチグラフを指向しており、現行のグラフニューラルネットワーク(gnn)のほとんどでは適切に処理できない。第二に、アンチマネーロンダリング(AML)のようなFRMシナリオにおける重要な問題は、リスクのあるトランザクションを識別することであり、ノード中心のメッセージパッシングプロトコルに従う一般的なGNN設計によって完全に活用されていない、リッチエッジレベルの機能を備えたエッジ分類問題に最も自然に陥る。本稿では,指向型マルチグラフ上でのニューラルモデルの設計側面を体系的に検討し,方向情報を効率的に取り入れることで,上記の課題を克服する新しいgnnプロトコルを開発し,エッジ・ツー・ノード二重グラフの拡張を用いた新しいメッセージパッシング方式を用いてエッジ関連タスクを対象とする拡張を提案する。 GRANDEと呼ばれる具体的なGNNアーキテクチャは提案プロトコルを用いて導出され、時間動的グラフのさらなる改良と一般化がなされている。 GRANDEモデルを現実世界のマネーロンダリングタスクと公開データセットの両方に適用する。実験により, 動的グラフモデリングと有向グラフモデリングにおいて, 最近の最先端モデルよりもgrandeアーキテクチャが優れていることが示された。 The application of graph representation learning techniques to the area of financial risk management (FRM) has attracted significant attention recently. However, directly modeling transaction networks using graph neural models remains challenging: Firstly, transaction networks are directed multigraphs by nature, which could not be properly handled with most of the current off-the-shelf graph neural networks (GNN). Secondly, a crucial problem in FRM scenarios like anti-money laundering (AML) is to identify risky transactions and is most naturally cast into an edge classification problem with rich edge-level features, which are not fully exploited by the prevailing GNN design that follows node-centric message passing protocols. In this paper, we present a systematic investigation of design aspects of neural models over directed multigraphs and develop a novel GNN protocol that overcomes the above challenges via efficiently incorporating directional information, as well as proposing an enhancement that targets edge-related tasks using a novel message passing scheme over an extension of edge-to-node dual graph. A concrete GNN architecture called GRANDE is derived using the proposed protocol, with several further improvements and generalizations to temporal dynamic graphs. We apply the GRANDE model to both a real-world anti-money laundering task and public datasets. Experimental evaluations show the superiority of the proposed GRANDE architecture over recent state-of-the-art models on dynamic graph modeling and directed graph modeling.	翻訳日:2023-02-07 20:14:33 公開日:2023-02-04
# PLCプロセス制御における異常検出のための教師なしアンサンブル法 Unsupervised Ensemble Methods for Anomaly Detection in PLC-based Process Control ( http://arxiv.org/abs/2302.02097v1 ) ライセンス: Link先を確認	Emmanuel Aboah Boateng, and Bruce J. W	(参考訳) プログラム可能なロジックコントローラ(PLC)ベースの産業制御システム(ICS)は、重要なインフラを監視し制御するために使用される。 ICSにおける通信ネットワークの統合とIoTアプローチは、サイバー攻撃に対するICSの脆弱性を増大させた。本研究はPLCベースのICSにおける異常検出のための新しい教師なし機械学習アンサンブル手法を提案する。本研究は, 決定係数に基づく学習アルゴリズムを用いた重み付き投票アンサンブルアプローチと, 孤立林メタ検出器を用いた積み重ね型アンサンブルアプローチの2つの手法を提案する。 2つのアンサンブル法を,複数の攻撃シナリオを想定したオープンソースのplcベースのicを用いて解析した。この研究は、重み付けされた投票アンサンブル法のための4つの異なる学習モデルを考える。 5つのアンサンブル法を駆動する多種多様なベース検出器の比較性能解析を行った。その結果,分離林メタ検出器を用いた積み重ね型アンサンブル法は,過去のすべての性能指標よりも優れた性能を示した。また,分離森林メタ検出器を持つ積み重ね型アンサンブルのような効果的なアンサンブル手法は,任意のICSデータセットの異常を確実に検出できることが示唆された。最後に, 統計的仮説を用いて実験結果を検証した。 Programmable logic controller (PLC) based industrial control systems (ICS) are used to monitor and control critical infrastructure. Integration of communication networks and an Internet of Things approach in ICS has increased ICS vulnerability to cyber-attacks. This work proposes novel unsupervised machine learning ensemble methods for anomaly detection in PLC-based ICS. The work presents two broad approaches to anomaly detection: a weighted voting ensemble approach with a learning algorithm based on coefficient of determination and a stacking-based ensemble approach using isolation forest meta-detector. The two ensemble methods were analyzed via an open-source PLC-based ICS subjected to multiple attack scenarios as a case study. The work considers four different learning models for the weighted voting ensemble method. Comparative performance analyses of five ensemble methods driven diverse base detectors are presented. Results show that stacking-based ensemble method using isolation forest meta-detector achieves superior performance to previous work on all performance metrics. Results also suggest that effective unsupervised ensemble methods, such as stacking-based ensemble having isolation forest meta-detector, can robustly detect anomalies in arbitrary ICS datasets. Finally, the presented results were validated by using statistical hypothesis tests.	翻訳日:2023-02-07 20:14:08 公開日:2023-02-04
# 個人フェアネスの行列推定 Matrix Estimation for Individual Fairness ( http://arxiv.org/abs/2302.02096v1 ) ライセンス: Link先を確認	Cindy Y. Zhang, Sarah H. Cen, Devavrat Shah	(参考訳) 近年、アルゴリズム的公正性の複数の概念が生まれている。そのような概念の1つは個人公正(IF)であり、類似した個人が同様の治療を受ける必要がある。並行して、行列推定(me)は、値が欠けているノイズデータを扱うための自然なパラダイムとして現れた。この作品では、2つの概念をつなぐ。 meを用いた前処理は性能を犠牲にすることなくアルゴリズムのifを改善できることを示す。具体的には,データ前処理に特異値しきい値(SVT)と呼ばれる一般的なME手法を用いることで,適切な条件下での強力なIF保証が得られることを示す。次に、類似した条件下では、SVT前処理が一貫したほぼ最小値の推定値も得られることを示す。したがって、ME前処理ステップは、前述の条件の下では、ベースアルゴリズムの予測誤差、すなわち、フェアネスとパフォーマンスのトレードオフを課さない。これらの結果を合成データと実データで検証する。 In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. We show that pre-processing data using ME can improve an algorithm's IF without sacrificing performance. Specifically, we show that using a popular ME method known as singular value thresholding (SVT) to pre-process the data provides a strong IF guarantee under appropriate conditions. We then show that, under analogous conditions, SVT pre-processing also yields estimates that are consistent and approximately minimax optimal. As such, the ME pre-processing step does not, under the stated conditions, increase the prediction error of the base algorithm, i.e., does not impose a fairness-performance trade-off. We verify these results on synthetic and real data.	翻訳日:2023-02-07 20:13:49 公開日:2023-02-04
# ナレッジエンハンスドニューラルマシン推論:レビュー Knowledge-enhanced Neural Machine Reasoning: A Review ( http://arxiv.org/abs/2302.02093v1 ) ライセンス: Link先を確認	Tanmoy Chowdhury, Chen Ling, Xuchao Zhang, Xujiang Zhao, Guangji Bai, Jian Pei, Haifeng Chen, Liang Zhao	(参考訳) 知識に富んだニューラルマシン推論は、最先端でありながら多くの実用的応用に挑戦する研究分野として大きな注目を集めている。過去数年間、深層モデルの推論能力向上、効果的な知識統合、暗黙の知識マイニング、トラクタビリティと最適化の問題といった課題に取り組むために、さまざまな外部知識を活用してきた研究が数多くある。しかし、様々なアプリケーションドメインにまたがる既存の知識に富んだ推論技術に関する包括的な技術的レビューがある。本調査は, 既存の知識向上手法を2つの主要なカテゴリと4つのサブカテゴリに分類する新しい分類法を導入し, この分野の最近の進歩を詳細に検討する。我々は,これらの手法を体系的に議論し,その相関性,強み,限界を強調する。最後に、現在のアプリケーションドメインを解明し、将来の研究の展望に関する洞察を提供する。 Knowledge-enhanced neural machine reasoning has garnered significant attention as a cutting-edge yet challenging research area with numerous practical applications. Over the past few years, plenty of studies have leveraged various forms of external knowledge to augment the reasoning capabilities of deep models, tackling challenges such as effective knowledge integration, implicit knowledge mining, and problems of tractability and optimization. However, there is a dearth of a comprehensive technical review of the existing knowledge-enhanced reasoning techniques across the diverse range of application domains. This survey provides an in-depth examination of recent advancements in the field, introducing a novel taxonomy that categorizes existing knowledge-enhanced methods into two primary categories and four subcategories. We systematically discuss these methods and highlight their correlations, strengths, and limitations. Finally, we elucidate the current application domains and provide insight into promising prospects for future research.	翻訳日:2023-02-07 20:13:34 公開日:2023-02-04
# ロバスト学習のための補間:測地線データ拡張 Interpolation for Robust Learning: Data Augmentation on Geodesics ( http://arxiv.org/abs/2302.02092v1 ) ライセンス: Link先を確認	Jiacheng Zhu, Jielin Qiu, Aritra Guha, Zhuolin Yang, Xuanlong Nguyen, Bo Li, Ding Zhao	(参考訳) 本稿では,トレーニングデータ分布の補間を通じて,モデルの性能に準ずるロバスト性を研究・促進することを提案する。具体的には,(1)異なるカテゴリーの測地線接続部分集団分布について,ワーストケースのwasserstein barycenterを求めることで,データを強化した。 2) サブポピュレーション分布を接続する連続測地路上でのスムーズな性能のモデルを正規化する。また,ロバスト性向上の理論的保証を提供し,測地線の位置とサンプルサイズがそれぞれどのように寄与するかを検討する。 CIFAR-100とImageNetを含む4つのデータセットに対する提案手法の実験的検証により,提案手法の有効性が確立された。例えば,提案手法は,CIFAR10のベースラインの証明可能なロバスト性を,CIFAR-100の実証的ロバスト性に対して$16.8\%で最大7.7\%まで改善する。我々の研究は、ワッサーシュタイン測地学に基づく補間によるモデルロバスト性の新しい視点と、既存のロバストトレーニング手法と組み合わせることができる実用的なオフザシェルフ戦略を提供する。 We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on four datasets, including CIFAR-100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR-100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesic-based interpolation with a practical off-the-shelf strategy that can be combined with existing robust training methods.	翻訳日:2023-02-07 20:13:19 公開日:2023-02-04
# ピラミッド型マルチモーダル変圧器を用いた効率的なエンドツーエンドビデオ質問応答 Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer ( http://arxiv.org/abs/2302.02136v1 ) ライセンス: Link先を確認	Min Peng, Chongyang Wang, Yu Shi, Xiang-Dong Zhou	(参考訳) 本稿では,大量の特徴抽出器を用いた大規模事前学習が現在普及しているビデオQA(End-to-end Video Question Answering)を提案する。ピラミッド型マルチモーダルトランス (PMT) モデルでこれを実現し、学習可能な単語埋め込み層といくつかの畳み込み層とトランスフォーマー層を組み込む。異方性ピラミッドを用いて、異なる時空間スケールにわたるビデオ言語相互作用を実現する。左右の接続を持つボトムアップ経路とトップダウン経路を含む標準ピラミッドに加えて、異なるスケールで視覚特徴ストリームを空間的・時間的サブストリームに分解し、局所的・グローバル的意味論の整合性を保ちながら言語意味論との相互作用を実装する新しい戦略が提案されている。我々は,5つのビデオQAベンチマークにおいて,最先端手法に対して高い計算効率で高い性能を示す。本研究は,再利用可能な事前学習重み付き特徴抽出器とピラミッドの有効性を活かし,テキスト対ビデオ検索の競争結果を得るモデルのスケーラビリティを示す。 This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors. We achieve this with a pyramidal multimodal transformer (PMT) model, which simply incorporates a learnable word embedding layer, a few convolutional and transformer layers. We use the anisotropic pyramid to fulfill video-language interactions across different spatio-temporal scales. In addition to the canonical pyramid, which includes both bottom-up and top-down pathways with lateral connections, novel strategies are proposed to decompose the visual feature stream into spatial and temporal sub-streams at different scales and implement their interactions with the linguistic semantics while preserving the integrity of local and global semantics. We demonstrate better or on-par performances with high computational efficiency against state-of-the-art methods on five VideoQA benchmarks. Our ablation study shows the scalability of our model that achieves competitive results for text-to-video retrieval by leveraging feature extractors with reusable pre-trained weights, and also the effectiveness of the pyramid.	翻訳日:2023-02-07 20:07:20 公開日:2023-02-04
# 最寄りのトレーニングセットを最適かつ正確に削減する Reducing Nearest Neighbor Training Sets Optimally and Exactly ( http://arxiv.org/abs/2302.02132v1 ) ライセンス: Link先を確認	Josiah Rohrer and Simon Weber	(参考訳) near-neighbor分類では、与えられた分類を持つ$\mathbb{r}^d$のポイントの訓練セットが$\mathbb{r}^d$のすべてのポイントを分類するために使用される。最近、Eppstein [SOSA'22] は関連するトレーニングポイント、例えば$P$ や $P\setminus\{p\}$ などを検出するアルゴリズムを開発した。最小濃度低減トレーニングセット $p'\subseteq p$ を見つける問題は、$p$ と $p'$ が同じ分類を誘導するように検討する。 p$ が一般的な位置にある場合、関連する点の集合は最小濃度低減訓練集合であることを示す。さらに、P$を縮退させる可能性のある最小濃度のトレーニングセットを見つけることは、$d=1$でP、$d\geq 2$でNP完全であることを示す。 In nearest-neighbor classification, a training set $P$ of points in $\mathbb{R}^d$ with given classification is used to classify every point in $\mathbb{R}^d$: Every point gets the same classification as its nearest neighbor in $P$. Recently, Eppstein [SOSA'22] developed an algorithm to detect the relevant training points, those points $p\in P$, such that $P$ and $P\setminus\{p\}$ induce different classifications. We investigate the problem of finding the minimum cardinality reduced training set $P'\subseteq P$ such that $P$ and $P'$ induce the same classification. We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position. Furthermore, we show that finding a minimum cardinality reduced training set for possibly degenerate $P$ is in P for $d=1$, and NP-complete for $d\geq 2$.	翻訳日:2023-02-07 20:06:59 公開日:2023-02-04
# オープンピット採掘における地球移動機器と環境相互作用の考察 Inferencing the earth moving equipment-environment interaction in open pit mining ( http://arxiv.org/abs/2302.02130v1 ) ライセンス: Link先を確認	M. Balamurali	(参考訳) 鉱業において、グレードコントロールは一般的に、地層からの材料発掘方法にほとんど、あるいは全く注意を払わずに、爆風孔サンプリングと鉱石制御ブロックモデルの推定に重点を置いている。トラックを積み込む過程において、個々のバケットの負荷の変動性は、トラックのペイロードの変動性を決定する。したがって、正確な物質移動には、掘削過程とバケットと環境との相互作用に関する十分な知識が必要である。しかし、機器は予期せぬ遅延、障害、欠陥のためにしばしば名目上状態に陥る。このような乱れの大量発生は、統計力とバイアス推定を減少させる情報損失を引き起こし、生産の不確実性が増大する。利用可能なデータソースからマシンと環境の相互作用に関する知識の欠如を推測する信頼性の高い手法は、物質の動きを正確にモデル化するのに不可欠である。本研究では,教師なしクラスタリングを行い,欠落情報を予測する2段階の手法を実装した。第1の方法はDBSCANベースの空間クラスタリングであり、デガーとバケットの位置データを接続されたロードセグメントに分割する。セグメント化バケット掘削位置の明瞭なパターンが観察された。第2のモデルは、クラスタ化されたデータでトレーニングされたガウス過程の回帰を利用して、テストクラスタの平均位置を推測する。その後、バケット掘削場所は推定平均位置で異なる期間シミュレーションされ、既知のバケット掘削場所と比較された。この方法は西オーストラリアのピルバラにある露天掘り鉱山で試験された。その結果,バケット環境相互作用の欠落情報を参照することで,坑夫が連続的に物質移動を追跡できるという利点が得られた。 In mining, grade control generally focuses on blast hole sampling and the estimation of ore control block models with little or no attention given to how the materials are being excavated from the ground. In the process of loading trucks, the underlying variability of the individual bucket load will determine the variability of truck payload. Hence, accurate material movement demands a good knowledge of the excavation process and the buckets interaction with the environment. However, equipment frequently goes into off nominal states due to unexpected delays, disturbances or faults. The large amount of such disturbances causes information loss that reduces the statistical power and biases estimates, leading to increased uncertainty in the production. A reliable method that inferences the missing knowledge about the interaction between the machine and the environment from the available data sources, is vital to accurately model the material movement. In this study, a twostep method was implemented that performed unsupervised clustering and then predicted the missing information. The first method is DBSCAN based spatial clustering which divides the diggers and buckets positional data into connected loading segments. Clear patterns of segmented bucket dig positions were observed. The second model utilized Gaussian process regression which was trained with the clustered data and the model was then used to infer the mean locations of the test clusters. Bucket dig locations were then simulated at the inferred mean locations for different durations and compared against the known bucket dig locations. This method was tested at an open pit mine in the Pilbara of Western Australia. The results demonstrate the advantage of the proposed method in inferencing the missing information of bucket environment interactions and therefore enables miners to continuously track the material movement.	翻訳日:2023-02-07 20:06:39 公開日:2023-02-04
# 時間グラフの相互作用順序予測 Interaction Order Prediction for Temporal Graphs ( http://arxiv.org/abs/2302.02128v1 ) ライセンス: Link先を確認	Nayana Bannur and Mashrin Srivastava and Harsha Vardhan	(参考訳) グラフにおけるリンク予測は、広く研究されているタスクである。ナレッジグラフの補完、コンテンツ/コンテンツの推薦、ソーシャルネットワークの推薦など、さまざまな分野に適用されている。ほとんどの研究の最初の焦点は静的グラフにおけるリンク予測であった。しかし、最近は時間グラフのモデリングに関する多くの研究が行われており、その結果、時間グラフのリンク予測が研究されている。しかし、既存の研究のほとんどはリンク形成の順序に焦点を合わせておらず、リンクの存在を予測しているに過ぎない。本研究では,ノード間相互作用の順序を予測することを目的とする。 Link prediction in graphs is a task that has been widely investigated. It has been applied in various domains such as knowledge graph completion, content/item recommendation, social network recommendations and so on. The initial focus of most research was on link prediction in static graphs. However, there has recently been abundant work on modeling temporal graphs, and consequently one of the tasks that has been researched is link prediction in temporal graphs. However, most of the existing work does not focus on the order of link formation, and only predicts the existence of links. In this study, we aim to predict the order of node interactions.	翻訳日:2023-02-07 20:06:12 公開日:2023-02-04
# Geometric Prior and Contrastive similarity を用いた3次元医用画像分割法 Weakly-Supervised 3D Medical Image Segmentation using Geometric Prior and Contrastive Similarity ( http://arxiv.org/abs/2302.02125v1 ) ライセンス: Link先を確認	Hao Du, Qihua Dong, Yan Xu, Jing Liao	(参考訳) 医用画像分割は、コンピュータ支援診断において最も重要な前処理手順であるが、医療画像(低コントラスト組織、非ホモジェネティックテクスチャ)によって生じるセグメントや様々なアーティファクトの複雑な形状のために非常に難しい課題である。本稿では,幾何学的先行的および対比的類似性を,損失ベースで弱教師付きセグメンテーションフレームワークに組み込む,単純かつ効果的なセグメンテーションフレームワークを提案する。ポイントクラウド上に構築された幾何学的事前構築は、境界ボックスアノテーション(高さと幅)の固有の性質よりも優れた監督を行う弱教師付きセグメンテーション提案に細かな幾何学を提供する。さらに, コントラスト組織を識別するために, 臓器の画素がコントラスト埋め込み空間に集まることを促すために, コントラスト類似性を提案する。提案するコントラスト埋め込み空間は、従来のグレー空間の貧弱な表現を補うことができる。弱教師付きセグメンテーションフレームワークの有効性とロバスト性を検証するため,大規模な実験を行った。提案されたフレームワークは、LiTS 2017 Challenge、KiTS 2021 Challenge、LPBA40といった、現在最先端の弱い教師付き手法よりも優れている。また,提案手法を解析し,各コンポーネントの性能評価を行った。 Medical image segmentation is almost the most important pre-processing procedure in computer-aided diagnosis but is also a very challenging task due to the complex shapes of segments and various artifacts caused by medical imaging, (i.e., low-contrast tissues, and non-homogenous textures). In this paper, we propose a simple yet effective segmentation framework that incorporates the geometric prior and contrastive similarity into the weakly-supervised segmentation framework in a loss-based fashion. The proposed geometric prior built on point cloud provides meticulous geometry to the weakly-supervised segmentation proposal, which serves as better supervision than the inherent property of the bounding-box annotation (i.e., height and width). Furthermore, we propose contrastive similarity to encourage organ pixels to gather around in the contrastive embedding space, which helps better distinguish low-contrast tissues. The proposed contrastive embedding space can make up for the poor representation of the conventionally-used gray space. Extensive experiments are conducted to verify the effectiveness and the robustness of the proposed weakly-supervised segmentation framework. The proposed framework is superior to state-of-the-art weakly-supervised methods on the following publicly accessible datasets: LiTS 2017 Challenge, KiTS 2021 Challenge, and LPBA40. We also dissect our method and evaluate the performance of each component.	翻訳日:2023-02-07 20:06:03 公開日:2023-02-04
# Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning ( http://arxiv.org/abs/2302.02124v1 ) ライセンス: Link先を確認	Jingqiang Chen	(参考訳) コヒーレントエンティティ対応マルチイメージキャプションは,複数の隣接画像に対するコヒーレントキャプションをニュースドキュメントに生成することを目的としている。隣接する画像の間には、同じ実体や出来事をしばしば記述するため、コヒーレンスな関係がある。これらの関係は、エンティティ対応のマルチイメージキャプションにおいて重要であるが、エンティティ対応のシングルイメージキャプションでは無視される。既存の作品の多くは単一画像キャプションに焦点を当てているが、複数画像キャプションはこれまでに研究されていない。そこで本稿では,コヒーレンス関係を利用したコヒーレントなエンティティ対応多画像キャプションモデルを提案する。このモデルはトランスフォーマーベースのキャプション生成モデルと2種類のコントラスト学習ベースのコヒーレンス機構から構成される。生成モデルは、画像及び付随するテキストに注意を払ってキャプションを生成する。水平コヒーレンス機構は、キャプションを隣接画像のキャプションとコヒーレントにすることを目的としている。垂直コヒーレンス機構は、キャプションを画像と付随するテキストと一貫性を持たせることを目的としている。キャプション間のコヒーレンスを評価するために,2つのコヒーレンス評価指標を提案する。新しいデータセットDM800Kは、既存の2つのデータセットであるGoodNewsとNYT800Kよりもドキュメント当たりの画像が多く、マルチイメージキャプションに適している。 3つのデータセットで実験したところ,提案したキャプションモデルは,単画像キャプション評価により6つのベースラインを上回り,生成したキャプションはコヒーレンス評価や人間評価によりベースラインよりもコヒーレントであることがわかった。 Coherent entity-aware multi-image captioning aims to generate coherent captions for multiple adjacent images in a news document. There are coherence relationships among adjacent images because they often describe same entities or events. These relationships are important for entity-aware multi-image captioning, but are neglected in entity-aware single-image captioning. Most existing work focuses on single-image captioning, while multi-image captioning has not been explored before. Hence, this paper proposes a coherent entity-aware multi-image captioning model by making use of coherence relationships. The model consists of a Transformer-based caption generation model and two types of contrastive learning-based coherence mechanisms. The generation model generates the caption by paying attention to the image and the accompanying text. The horizontal coherence mechanism aims to the make the caption coherent with captions of adjacent images. The vertical coherence mechanism aims to make the caption coherent with the image and the accompanying text. To evaluate coherence between captions, two coherence evaluation metrics are proposed. The new dataset DM800K is constructed that has more images per document than two existing datasets GoodNews and NYT800K, and are more suitable for multi-image captioning. Experiments on three datasets show the proposed captioning model outperforms 6 baselines according to single-image captioning evaluations, and the generated captions are more coherent than that of baselines according to coherence evaluations and human evaluations.	翻訳日:2023-02-07 20:05:39 公開日:2023-02-04
# 体重、注意は必要か? AEIUOrder:トランスフォーマーにおける層重行列のグリーディ順序付けによる翻訳の改善 Weight, Is Attention All We Need? AEIUOrder: Greedy Ordering of Layer Weight Matrices in Transformer Improves Translation ( http://arxiv.org/abs/2302.02123v1 ) ライセンス: Link先を確認	Elicia Ye	(参考訳) 先行研究では、トランスフォーマベースのエンコーダ・デコーダアーキテクチャの内部構造と機能について、マルチヘッドアテンションとフィードフォワードサブレイヤーのレベルで理解しようと試みている。解釈は、エンコーダとデコーダに焦点を合わせ、セルフアテンション、クロスアテンション、フィードフォワードサブレイヤーの組合せ可能性に焦点を当てている。トランスフォーマーのサブ層抽象に飛び込み、その層重行列を置換することで翻訳の質を向上させることができるか? 本稿では,ランダム行列理論 (rmt) の指標を用いて,エンコーダ内の層重み行列を規則的に順序付けし,エンコーダの順序付けを逆転させる手法を提案する。目的は、デコーダ構造がエンコーダの逆過程を表現するのに役立ちながら、エンコーダの完全訓練性を最大化することである。標準トランスフォーマー(6層、モデル次元512)では、IWSLT 2016ドイツ語翻訳タスクで34.62点(ベースライン34.31点)、WMT 2014英語翻訳タスクで27.95点(ベースライン27.91点)を達成している。 AEIUOrderは、様々な深さと埋め込み次元を持つトランスフォーマーでも実現されており、浅いスリムなモデルよりもより深く、より広いモデルで大幅に改善されている。例えば、8層、768次元、4層、1024次元変換器は、IWSLT 2016の英独翻訳タスク(28.53と28.97)でそれぞれ29.1と29.31のBLEUスコアを達成している。以上の結果から, RMTをモチベーションとした手法は, 層重行列を優雅に並べ替えることで, 表現を学習し, 翻訳をより効果的に生成する。 Prior work has attempted to understand the internal structures and functionalities of Transformer-based encoder-decoder architectures on the level of multi-head attention and feed-forward sublayers. Interpretations have focused on the encoder and decoder, along with the combinatorial possibilities of the self-attention, cross-attention, and feed-forward sublayers. Could we improve the quality of translation by diving into the Transformer sublayer abstractions and permuting its layer weight matrices? We propose AEIUOrder to greedily reorder layer weight matrices in the encoder by their well-trainedness, as measured by Random Matrix Theory (RMT) metrics, and reverse the ordering scheme for the encoder. The objective is to maximize Total well-trainedness in the encoder while the decoder structure serves to represent the reverse process of encoding. On the standard Transformer (6 layers, model dimension 512), AEIUOrder achieves a BLEU score of 34.62 (baseline 34.31) on the IWSLT 2016 German-to-English translation task, and 27.95 BLEU on the WMT 2014 English-to-German translation task (baseline 27.91). AEIUOrder is also realized on Transformers with various depths and embedding dimensions, showing significant improvements on deeper, wider models than on their shallower, slimmer counterparts. For instance, the 8-layer, 768-dimension and the 4-layer, 1024-dimension Transformers achieve respective 29.1 and 29.31 BLEU scores on the IWSLT 2016 English-to-German translation task (28.53 and 28.97 on respective baselines). Our results suggest that the RMT-motivated approach to maximize \textit{Total well-trainedness}, by greedily reordering its layer weight matrices, facilitates the model to learn representations and generate translations more effectively.	翻訳日:2023-02-07 20:05:11 公開日:2023-02-04
# クロスドメイン戦略に基づく偽ニュース検出のためのXAIモデル A New cross-domain strategy based XAI models for fake news detection ( http://arxiv.org/abs/2302.02122v1 ) ライセンス: Link先を確認	Deepak Kanneganti	(参考訳) 本研究では,事前学習モデルにおける偽ニュース検出のための4レベルクロスドメイン戦略を提案する。クロスドメインテキスト分類は、ソースドメインの知識を使用してターゲットドメインを採用するモデルのタスクである。これらの複雑なモデルの振る舞いを理解するには説明可能性が不可欠である。精巧なチューンベルト模型が用いられる。異なるドメインのデータセットを使用して、いくつかの実験でクロスドメイン分類を実行する。 Anchor、ELI5、LIME、SHAPなどの説明モデルは、クロスドメインレベルに対する新しい説明可能なアプローチを設計するために使用される。実験分析により、異なるレベルのクロスドメイン上の理想的なXAIモデルが得られた。 In this study, we presented a four-level cross-domain strategy for fake news detection on pre-trained models. Cross-domain text classification is a task of a model adopting a target domain by using the knowledge of the source domain. Explainability is crucial in understanding the behaviour of these complex models. A fine-tune BERT model is used to. perform cross-domain classification with several experiments using datasets from different domains. Explanatory models like Anchor, ELI5, LIME and SHAP are used to design a novel explainable approach to cross-domain levels. The experimental analysis has given an ideal pair of XAI models on different levels of cross-domain.	翻訳日:2023-02-07 20:04:34 公開日:2023-02-04
# 自己再生による多様性誘導型環境設計 Diversity Induced Environment Design via Self-Play ( http://arxiv.org/abs/2302.02119v1 ) ライセンス: Link先を確認	Dexun Li, Wenjun Li, Pradeep Varakantham	(参考訳) 環境の適切な分布を設計する最近の研究は、効果的な汎用エージェントの訓練を約束していることを示している。その成功の一部は、エージェントの能力の最前線で環境インスタンス(またはレベル)を生成する適応的なカリキュラム学習の形式が原因である。しかし、このような環境設計フレームワークは、しばしば挑戦的な設計空間において効果的なレベルを見つけるのに苦労し、環境とのコストのかかる相互作用を必要とする。本稿では,Unsupervised Environment Design (UED) フレームワークに多様性を導入することを目的とする。具体的には,与えられたレベルを表す観測/隠蔽状態を特定するタスク非依存の手法を提案する。この手法の結果は, 2つのレベル間の多様性を特徴付けるために利用され, 有効性能に欠かせないことが示されている。さらに, サンプリング効率を向上させるため, 環境生成装置が学習エージェントにとって非常に有益な環境を自動的に生成できるセルフプレイ技術も取り入れた。提案手法は,DivSP(DivSP)による環境設計であり,既存の手法よりも優れた性能を示す。 Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods.	翻訳日:2023-02-07 20:04:26 公開日:2023-02-04
# Visual Commonsense Reasoningのためのビジョンアテンションの学習 Learning to Agree on Vision Attention for Visual Commonsense Reasoning ( http://arxiv.org/abs/2302.02117v1 ) ライセンス: Link先を確認	Zhenyang Li, Yangyang Guo, Yangyang Guo, Fan Liu, Liqiang Nie, Mohan Kankanhalli	(参考訳) visual commonsense reasoning (vcr) は、視覚推論の分野では重要なが困難な研究課題である。 vcrモデルは一般的に、画像に関するテキスト質問に応答することを目的としており、その後、前回の応答プロセスの合理化予測を行う。これら2つのプロセスは逐次的かつ相互に絡み合っているが、既存のメソッドは常にこれらを2つの独立したマッチングベースのインスタンスと見なしている。したがって、2つのプロセス間の重要な関係を無視し、最適化されたモデル性能に繋がる。本稿では,これら2つのプロセスを統一的な枠組みで効果的に処理する新しい視覚的アライメント手法を提案する。そこで我々はまず,各プロセスで生成した視覚注意マップを集約する再認識モジュールを設計する。その後、2つの注意マップのセットを注意深く並べて、同じ画像領域に基づいて2つのプロセスを導く。本稿では,本手法を従来の注意と最近のTransformerモデルの両方に適用し,VCRベンチマークデータセット上で広範な実験を行う。その結果,アテンションアライメントモジュールにより,本手法は基本手法よりも大幅に改善され,両手法の結合性および提案手法の有効性が明らかとなった。 Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning. A VCR model generally aims at answering a textual question regarding an image, followed by the rationale prediction for the preceding answering process. Though these two processes are sequential and intertwined, existing methods always consider them as two independent matching-based instances. They, therefore, ignore the pivotal relationship between the two processes, leading to sub-optimal model performance. This paper presents a novel visual attention alignment method to efficaciously handle these two processes in a unified framework. To achieve this, we first design a re-attention module for aggregating the vision attention map produced in each process. Thereafter, the resultant two sets of attention maps are carefully aligned to guide the two processes to make decisions based on the same image regions. We apply this method to both conventional attention and the recent Transformer models and carry out extensive experiments on the VCR benchmark dataset. The results demonstrate that with the attention alignment module, our method achieves a considerable improvement over the baseline methods, evidently revealing the feasibility of the coupling of the two processes as well as the effectiveness of the proposed method.	翻訳日:2023-02-07 20:04:09 公開日:2023-02-04
# テンソル回復が保証された低ランク性と滑らかさ Guaranteed Tensor Recovery Fused Low-rankness and Smoothness ( http://arxiv.org/abs/2302.02155v1 ) ライセンス: Link先を確認	Hailin Wang, Jiangjun Peng, Wenjin Qin, Jianjun Wang and Deyu Meng	(参考訳) したがって、テンソルデータ回復タスクは近年多くの研究の注目を集めている。このような不正な問題を解くには、一般に、テンソルデータに基づく固有の事前構造を探索し、復元テンソルの音響推定を導くためのある種の正規化項として定式化する必要がある。近年の研究では、異なるテンソルモードにまたがる2つの洞察力に富んだテンソル前置法、すなわち大域的低ランク性 (l) と局所的滑らか性 (s) が適用され、これは常に2つの別々の正規化項の和としてリカバリモデルに符号化されている。しかし、低ランクテンソルの回復に関する主要な理論的な発展とは異なり、これらのl+sモデルは理論上の正確な再現性保証を持っておらず、実際の手法では信頼性に欠ける。この重要な問題に対して、本研究では、テンソルのlとsのプリエントを同時にエンコードする一意な正規化項を構築する。特に、この単一正則化器をリカバリモデルに組み込むことで、テンソル完備化(TC)とテンソル頑健成分分析(TRPCA)という2つの典型的なテンソルリカバリタスクの正確なリカバリ保証を厳格に証明することができる。我々の知る限りでは、これはテンソルリカバリのためのすべての関連するL+S法の中では初めての正確な復元結果である。様々な視覚的テンソルデータを持つ複数のTCおよびTRPCAタスクにおいて、他の多くのSOTA法よりも重要な回復精度の改善が広範な実験で観測された。典型的には、カラー画像の塗装作業において、欠落率が非常に大きい場合(例えば99.5%)に動作可能性能が得られるが、この課題では全ピアが完全に失敗する。 The tensor data recovery task has thus attracted much research attention in recent years. Solving such an ill-posed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global low-rankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on low-rank tensor recovery, these joint L+S models have no theoretical exact-recovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equipping this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exact-recovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.	翻訳日:2023-02-07 19:58:52 公開日:2023-02-04
# この腸は存在しない:リアルな無線カプセル内視鏡画像生成のためのマルチスケール残差オートエンコーダ This Intestine Does Not Exist: Multiscale Residual Variational Autoencoder for Realistic Wireless Capsule Endoscopy Image Generation ( http://arxiv.org/abs/2302.02150v1 ) ライセンス: Link先を確認	Dimitrios E. Diamantis, Panagiota Gatoula, Anastasios Koulaouzidis, and Dimitris K. Iakovidis	(参考訳) 医用画像合成は、画像ベースの臨床決定支援(CDS)システムにおいて、機械学習アルゴリズムのトレーニングに必要な注釈付き医療データの限られた可用性に対応するための、有望なソリューションとして登場した。この目的のために、GAN(Generative Adversarial Networks)は、データ拡張のための合成画像を生成するアルゴリズムトレーニングプロセスを支援するために主に適用されてきた。しかし、Wireless Capsule Endoscopy (WCE)の分野では、既存の公開アノテーションデータセットの限られた内容の多様性とサイズは、GANのトレーニング安定性と合成性能の両方に悪影響を及ぼす。 WCE画像合成のための実行可能なソリューションとして,新しい変分オートエンコーダアーキテクチャ,すなわち "This Intestine Does Not Exist" (TIDE)を提案する。提案するアーキテクチャは,多スケールな特徴抽出畳み込みブロックと残差接続を含み,限られた数のトレーニング画像でも高品質で多様なデータセットを生成できる。利用可能なデータセットの増大を指向した現在のアプローチとは対照的に,本研究では,TIDEを用いて実WCEデータセットを人工的に生成したデータセットに置き換えることが,分類性能を損なうことなく可能であることを示す。さらに、経験豊富なWCEスペシャリストによる質的およびユーザ評価研究は、TIDEによって合成された正常なWCE画像と異常なWCE画像の両方が十分に現実的であるという医学的観点から検証する。 Medical image synthesis has emerged as a promising solution to address the limited availability of annotated medical data needed for training machine learning algorithms in the context of image-based Clinical Decision Support (CDS) systems. To this end, Generative Adversarial Networks (GANs) have been mainly applied to support the algorithm training process by generating synthetic images for data augmentation. However, in the field of Wireless Capsule Endoscopy (WCE), the limited content diversity and size of existing publicly available annotated datasets, adversely affect both the training stability and synthesis performance of GANs. Aiming to a viable solution for WCE image synthesis, a novel Variational Autoencoder architecture is proposed, namely "This Intestine Does not Exist" (TIDE). The proposed architecture comprises multiscale feature extraction convolutional blocks and residual connections, which enable the generation of high-quality and diverse datasets even with a limited number of training images. Contrary to the current approaches, which are oriented towards the augmentation of the available datasets, this study demonstrates that using TIDE, real WCE datasets can be fully substituted by artificially generated ones, without compromising classification performance. Furthermore, qualitative and user evaluation studies by experienced WCE specialists, validate from a medical viewpoint that both the normal and abnormal WCE images synthesized by TIDE are sufficiently realistic.	翻訳日:2023-02-07 19:58:22 公開日:2023-02-04
# 神経オートマトンに対する不変量 Invariants for neural automata ( http://arxiv.org/abs/2302.02149v1 ) ライセンス: Link先を確認	Jone Uria-Albizuri, Giovanni Sirio Carmantini, Peter beim Graben, Serafim Rodrigues	(参考訳) 神経力学系の計算モデリングは、しばしばニューラルネットワークとシンボリックダイナミクスを展開する。ベクトル記号アーキテクチャと呼ばれるフレームワーク内でこれらのアプローチを組み合わせる特別な方法は、神経オートマトンにつながる。私たちがこの枠組みで追求した興味深い研究の方向性は、ニューラルオートマトンとして表現される神経力学へのシンボリックダイナミクスのマッピングを検討することです。この表現論は、脳がチューリング計算をどのように実装するのかといった質問を可能にする。具体的には、この表現理論において、ニューラルオートマトンは、記号とシンボル文字列を数値に割り当てることによって生じる。この代入記号計算は、実位相空間における状態ベクトルの軌跡によって表現され、実空間の測定と実験データとの統計的相関解析を可能にする。しかし、これらの割り当ては通常、完全に任意である。したがって、そのような表現の下で観察されるダイナミクスのどの側面がダイナミクスに固有のものであり、どれがそうではないのかという問題に対処するのは理にかなっている。本研究では,異なる符号化条件下での神経オートマトンの対称性と不変量を調べるための形式的厳密な数学的枠組みを考案する。中心的な概念として、そのようなシステムに対する平等のパターンを定義する。我々は、ニューラルネットワークの平均活性化レベルなど、異なるマクロ可観測性を検討し、その不変性を求める。この結果から, 平均アクティベーションは変化しないものの, 同一性のパターン上で定義されるステップ関数のみが再符号化の下で不変であることが示唆された。我々の研究は、特定のエンコーディングに依存し、ダイナミクスに固有のものではないコンバウンディング結果を避けるために、ニューロシンボリックプロセッサを用いた実世界の計測の回帰研究において極めて重要である可能性がある。 Computational modeling of neurodynamical systems often deploys neural networks and symbolic dynamics. A particular way for combining these approaches within a framework called vector symbolic architectures leads to neural automata. An interesting research direction we have pursued under this framework has been to consider mapping symbolic dynamics onto neurodynamics, represented as neural automata. This representation theory, enables us to ask questions, such as, how does the brain implement Turing computations. Specifically, in this representation theory, neural automata result from the assignment of symbols and symbol strings to numbers, known as G\"odel encoding. Under this assignment symbolic computation becomes represented by trajectories of state vectors in a real phase space, that allows for statistical correlation analyses with real-world measurements and experimental data. However, these assignments are usually completely arbitrary. Hence, it makes sense to address the problem question of, which aspects of the dynamics observed under such a representation is intrinsic to the dynamics and which are not. In this study, we develop a formally rigorous mathematical framework for the investigation of symmetries and invariants of neural automata under different encodings. As a central concept we define patterns of equality for such systems. We consider different macroscopic observables, such as the mean activation level of the neural network, and ask for their invariance properties. Our main result shows that only step functions that are defined over those patterns of equality are invariant under recodings, while the mean activation is not. Our work could be of substantial importance for related regression studies of real-world measurements with neurosymbolic processors for avoiding confounding results that are dependant on a particular encoding and not intrinsic to the dynamics.	翻訳日:2023-02-07 19:57:56 公開日:2023-02-04
# トラップイオン量子計算と量子シミュレーションのためのエンタングルゲート Entangling gates for trapped-ion quantum computation and quantum simulation ( http://arxiv.org/abs/2302.02148v1 ) ライセンス: Link先を確認	Zhengyang Cai, Chunyang Luan, Lingfeng Ou, Hengchao Tu, Zihan Yin, Jing-Ning Zhang, and Kihwan Kim	(参考訳) トラップイオン系は1995年にciracとzollerによって量子ゲートの最初のスキームが提唱されて以来、実用的な量子計算と量子シミュレーションのための主要なプラットフォームとなっている。閉じ込められたイオンを持つ量子ゲートは、全ての物理プラットフォームの中で最も高い忠実度を示している。近年, 振幅, 位相, 周波数変調, 多周波印加などの量子ゲートの高度なスキームが開発され, ゲートの高速化, 多数の不完全性に対する堅牢化, および複数の量子ビットに適用されている。ここでは、イオンを閉じ込めた量子ゲートの基本原理と最近の発展について述べる。 The trapped-ion system has been a leading platform for practical quantum computation and quantum simulation since the first scheme of a quantum gate was proposed by Cirac and Zoller in 1995. Quantum gates with trapped ions have shown the highest fidelity among all physical platforms. Recently, sophisticated schemes of quantum gates such as amplitude, phase, frequency modulation, or multi-frequency application, have been developed to make the gates fast, robust to many types of imperfections, and applicable to multiple qubits. Here, we review the basic principle and recent development of quantum gates with trapped ions.	翻訳日:2023-02-07 19:57:28 公開日:2023-02-04
# スピン軌道結合ボース・アインシュタイン凝縮体を用いたキャビティqedの多重安定性 Multi-Stability in Cavity QED with Spin-Orbit Coupled Bose-Einstein Condensate ( http://arxiv.org/abs/2302.02147v1 ) ライセンス: Link先を確認	Kashif Ammar Yasir and Gao Xianlong	(参考訳) スピン軌道結合ボース・アインシュタイン凝縮体を含むキャビティ系において,強いポンプレーザーにより駆動される定常多重安定性の発生について検討した。印加された磁場はボース・アインシュタイン凝縮体を擬スピン状態へ分割し、超低温原子と直接相互作用する2つの対向伝播ラマンレーザーの運動量に敏感となる。全てのサブシステムに対して定常状態のダイナミクスを制御した後、キャビティ・原子系の以前の研究と異なり、キャビティ・フォトン数の多安定挙動の出現を示す。しかし、このマルチスタビリティは関連するシステムパラメータで調整できる。さらに, 準スピン-$\uparrow$ amd spin-$\downarrow$状態の原子集団に対する混合安定挙動の出現を, いわゆる双不安定な形で示す。これらの原子数状態の集合的挙動は、スピン軌道カップリングとゼーマン場効果によって強化および制御できる、両方のスピン状態の集団間の遷移界面を持つ。さらに, 擬似スピン状態の機械的散逸速度を増加させることにより, 二次界面が出現することを示す。これらの界面は、空洞によって媒介される合成スピン状態の非自明な挙動によって引き起こされる可能性がある。我々の発見は光スイッチングの課題に欠かせないだけでなく、空洞量子電磁力学による合成原子状態の力学的側面の研究の基礎となるかもしれない。 We investigate the occurrence of steady-state multi-stability in a cavity system containing spin-orbit coupled Bose-Einstein condensate and driven by a strong pump laser. The applied magnetic field splits the Bose-Einstein condensate into pseudo-spin states, which then became momentum sensitive with two counter propagating Raman lasers directly interacting with ultra-cold atoms. After governing the steady-state dynamics for all associated subsystems, we show the emergence of multi-stable behavior of cavity photon number, which is unlike with previous investigation on cavity-atom systems. However, this multi-stability can be tuned with associated system parameters. Further, we illustrate the occurrence of mixed-stability behavior for atomic population of the pseudo spin-$\uparrow$ amd spin-$\downarrow$ states, which are appearing in so-called bi-unstable form. The collective behavior of these atomic number states interestingly possesses a transitional interface among the population of both spin states, which can be enhance and controlled by spin-orbit coupling and Zeeman field effects. Furthermore, we illustrate the emergence of secondary interface mediated by increasing the mechanical dissipation rate of the pseudo-spin states. These interfaces could be cause by the non-trivial behavior of synthetic spin state mediated by cavity. Our findings are not only crucial for the subject of optical switching, but also could provide foundation for future studies on mechanical aspect of synthetic atomic states with cavity quantum electrodynamics.	翻訳日:2023-02-07 19:57:17 公開日:2023-02-04
# 能力属性と注意機構による解釈可能な知識追跡の強化 Augmenting Interpretable Knowledge Tracing by Ability Attribute and Attention Mechanism ( http://arxiv.org/abs/2302.02146v1 ) ライセンス: Link先を確認	Yuqi Yue, Xiaoqing Sun, Weidong Ji, Zengxiang Yin, Chenghong Sun	(参考訳) 知識追跡は、学生の過去の回答シーケンスをモデル化し、運動中の知識獲得の変化を追跡し、将来の学習性能を予測することを目的としている。既存のアプローチのほとんどは、生徒の能力が常に個人によって変化または変化しているという事実を無視し、モデル予測の解釈可能性に欠けている。そこで本稿では,能力特性と注意機構に基づく新しいモデルを提案する。まず, 学生の能力特性を把握し, 生徒を類似能力を持つグループに動的に割り当て, 演習の注意重みを計算し, モデルの解釈可能性を高めることで, 演習のスキルとの関連性を定量化する。大規模実験を行い,実オンライン教育データセットの評価を行った。その結果,提案モデルが5つの代表的な知識トレースモデルよりも性能予測に優れていることが判明し,モデル予測結果が推論経路を通じて説明される。 Knowledge tracing aims to model students' past answer sequences to track the change in their knowledge acquisition during exercise activities and to predict their future learning performance. Most existing approaches ignore the fact that students' abilities are constantly changing or vary between individuals, and lack the interpretability of model predictions. To this end, in this paper, we propose a novel model based on ability attributes and attention mechanism. We first segment the interaction sequences and captures students' ability attributes, then dynamically assign students to groups with similar abilities, and quantify the relevance of the exercises to the skill by calculating the attention weights between the exercises and the skill to enhance the interpretability of the model. We conducted extensive experiments and evaluate real online education datasets. The results confirm that the proposed model is better at predicting performance than five well-known representative knowledge tracing models, and the model prediction results are explained through an inference path.	翻訳日:2023-02-07 19:56:51 公開日:2023-02-04
# LipFormer:ビジュアルランドマーク変換器による未確認話者のリフレッド学習 LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers ( http://arxiv.org/abs/2302.02141v1 ) ライセンス: Link先を確認	Feng Xue, Yu Li, Deyin Liu, Yincen Xie, Lin Wu, Richang Hong	(参考訳) lipreadingは、ビデオ中の話者の音声を自然言語に理解し、さらに翻訳することを指す。 state-of-the-art lipreading methodはオーバーラップスピーカーの解釈に優れており、トレーニングセットと推論セットの両方に話者が現れている。しかし,これらの手法の一般化は,訓練銀行における話者数の制限や,異なる話者に対する唇の形状・色の違いによる視覚的変化により,破滅的な性能劣化を引き起こす。したがって、唇の目に見える変化によってのみ、モデルオーバーフィットを引き起こす傾向がある。この問題に対処するために、話者の身元に関係なく唇の動きを記述できる視覚的・ランドマーク横断のマルチモーダル機能を提案する。次に,視覚ランドマークトランスフォーマー,すなわちリップフォーマーに基づく文レベルのリップリードフレームワークを開発した。特に、リップフォーマーは、唇の動きの流れ、顔のランドマークの流れ、および交叉モーダル融合からなる。 2つのストリームからの埋め込みは、視覚とランドマークの調整を達成するためにクロスアテンションモジュールに供給される自己アテンションによって生成される。最後に、得られた融合機能は、カスケードSeq2seqモデルで出力テキストにデコードできる。実験により,本手法は未知話者へのモデル一般化を効果的に促進できることが示された。 Lipreading refers to understanding and further translating the speech of a speaker in the video into natural language. State-of-the-art lipreading methods excel in interpreting overlap speakers, i.e., speakers appear in both training and inference sets. However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in training bank and the evident visual variations caused by the shape/color of lips for different speakers. Therefore, merely depending on the visible changes of lips tends to cause model overfitting. To address this problem, we propose to use multi-modal features across visual and landmarks, which can describe the lip motion irrespective to the speaker identities. Then, we develop a sentence-level lipreading framework based on visual-landmark transformers, namely LipFormer. Specifically, LipFormer consists of a lip motion stream, a facial landmark stream, and a cross-modal fusion. The embeddings from the two streams are produced by self-attention, which are fed to the cross-attention module to achieve the alignment between visuals and landmarks. Finally, the resulting fused features can be decoded to output texts by a cascade seq2seq model. Experiments demonstrate that our method can effectively enhance the model generalization to unseen speakers.	翻訳日:2023-02-07 19:56:34 公開日:2023-02-04
# ボトムアップ自己組織特性をもつ動的方程式は、損失関数のない正確な動的階層を学習する Dynamical Equations With Bottom-up Self-Organizing Properties Learn Accurate Dynamical Hierarchies Without Any Loss Function ( http://arxiv.org/abs/2302.02140v1 ) ライセンス: Link先を確認	Danilo Vasconcellos Vargas, Tham Yik Foong, Heng Zhang	(参考訳) 自己組織化は自然と心に普遍的である。しかし、機械学習と認知理論は依然として主題にほとんど触れていない。ハードルは、一般的なパターンを動的方程式の観点で定義することは困難であり、再順序付けによって学習できるシステムを設計することは、まだ見ることができないことである。本稿では,非線形力学の領域において正のフィードバックループと負のフィードバックループでパターンを定義できる学習システムを提案する。実験により、このようなシステムは時間と空間の相関関係をマッピングでき、階層構造を逐次データから学べることが明らかとなった。その結果は、最先端の教師なし学習アルゴリズムを8つの実験のうち7つと現実世界の2つの問題で上回るほど正確だ。興味深いことに、システムの動的性質は本質的に適応し、入力構造が変化すると化学・熱力学の相転移に似た現象を引き起こす。この研究は、自己組織化によってパターン認識が実現し、目的や余分な機能を持たずに単純な動的方程式から知的な振る舞いが生まれることを示唆している。 Self-organization is ubiquitous in nature and mind. However, machine learning and theories of cognition still barely touch the subject. The hurdle is that general patterns are difficult to define in terms of dynamical equations and designing a system that could learn by reordering itself is still to be seen. Here, we propose a learning system, where patterns are defined within the realm of nonlinear dynamics with positive and negative feedback loops, allowing attractor-repeller pairs to emerge for each pattern observed. Experiments reveal that such a system can map temporal to spatial correlation, enabling hierarchical structures to be learned from sequential data. The results are accurate enough to surpass state-of-the-art unsupervised learning algorithms in seven out of eight experiments as well as two real-world problems. Interestingly, the dynamic nature of the system makes it inherently adaptive, giving rise to phenomena similar to phase transitions in chemistry/thermodynamics when the input structure changes. Thus, the work here sheds light on how self-organization can allow for pattern recognition and hints at how intelligent behavior might emerge from simple dynamic equations without any objective/loss function.	翻訳日:2023-02-07 19:56:13 公開日:2023-02-04
# HSICを用いたグラフニューラルネットワークの構造記述 Structural Explanations for Graph Neural Networks using HSIC ( http://arxiv.org/abs/2302.02139v1 ) ライセンス: Link先を確認	Ayato Toyokuni, Makoto Yamada	(参考訳) グラフニューラルネットワーク(GNN)は、グラフィカルなタスクをエンドツーエンドで処理するニューラルネットワークの一種である。近年,グラフ分類やリンク予測,レコメンデーションなど,さまざまなタスクで高いパフォーマンスを達成しているため,機械学習やデータマイニングコミュニティでは,gnnが注目を集めている。しかしながら、gnnの複雑なダイナミクスは、グラフの機能のどの部分が予測により強く寄与するかを理解するのを難しくする。解釈可能性問題に対処するため,近年,様々なGNN説明法が提案されている。本研究では,Hilbert-Schmidt independent criterion (HSIC) を用いて,2変数間の非線型依存性をカーネルを通して捉えることにより,グラフの有意な構造を検出するフレキシブルモデル非依存的説明法を提案する。具体的には、グループラッソと融合ラッソに基づくノード説明法を用いて、ノード説明のためのGraphLIME法を拡張する。 GraphLIMEによるグループと融合正規化は、サブ構造単位におけるGNNの解釈を可能にする。次に,提案手法を逐次グラフ分類タスクの説明に利用できることを示す。実験により,対象とするグラフの重要な構造を様々な設定で識別できることを実証した。 Graph neural networks (GNNs) are a type of neural model that tackle graphical tasks in an end-to-end manner. Recently, GNNs have been receiving increased attention in machine learning and data mining communities because of the higher performance they achieve in various tasks, including graph classification, link prediction, and recommendation. However, the complicated dynamics of GNNs make it difficult to understand which parts of the graph features contribute more strongly to the predictions. To handle the interpretability issues, recently, various GNN explanation methods have been proposed. In this study, a flexible model agnostic explanation method is proposed to detect significant structures in graphs using the Hilbert-Schmidt independence criterion (HSIC), which captures the nonlinear dependency between two variables through kernels. More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lasso-based node explanation method. The group and fused regularization with GraphLIME enables the interpretation of GNNs in substructure units. Then, we show that the proposed approach can be used for the explanation of sequential graph classification tasks. Through experiments, it is demonstrated that our method can identify crucial structures in a target graph in various settings.	翻訳日:2023-02-07 19:55:54 公開日:2023-02-04
# FedSpectral+:Federated Learningを用いたスペクトルクラスタリング FedSpectral+: Spectral Clustering using Federated Learning ( http://arxiv.org/abs/2302.02137v1 ) ライセンス: Link先を確認	Janvi Thakkar, Devvrat Joshi	(参考訳) グラフのクラスタリングはよく知られた研究問題であり、特にインターネットやソーシャルネットワークのデータのほとんどはグラフの形式である。組織は、グラフデータセットのクラスタリングを見つけるために、スペクトルクラスタリングアルゴリズムを広く使っている。しかし、スペクトルクラスタリングを大規模データセットに適用することは、計算オーバーヘッドのため困難である。分散スペクトルクラスタリングアルゴリズムは存在するが、データプライバシとクライアント間の通信コストの増大という問題に直面している。そこで本稿では,これらの問題を克服するために,フェデレートラーニング(FL)を用いたスペクトルクラスタリングアルゴリズムを提案する。 FLは、ユーザの生データを収集するのではなく、各学習者のモデルパラメータを蓄積し、スケーラビリティとデータのプライバシを提供するプライバシー保護アルゴリズムである。我々はFedSpectralとFedSpectral+の2つのアプローチを開発した。 FedSpectralは、局所スペクトルクラスタリングラベルを使用して、類似性グラフを作成することで、グローバルスペクトルクラスタリングを集約するベースラインアプローチである。最先端のアプローチであるfeedspectral+は、power iterationメソッドを使用して、クライアント間で分散された生情報にアクセスせずにグラフデータ全体を組み込むことで、グローバルスペクトル埋め込みを学ぶ。さらに,分散アプローチのクラスタリング品質をオリジナル/非flクラスタリングと比較するために,独自の類似度指標を設計した。提案手法であるfeedspectral+は98.85%と99.8%の類似性を持ち、ego-facebookとeメール-eu-coreデータセットのグローバルクラスタリングに匹敵する。 Clustering in graphs has been a well-known research problem, particularly because most Internet and social network data is in the form of graphs. Organizations widely use spectral clustering algorithms to find clustering in graph datasets. However, applying spectral clustering to a large dataset is challenging due to computational overhead. While the distributed spectral clustering algorithm exists, they face the problem of data privacy and increased communication costs between the clients. Thus, in this paper, we propose a spectral clustering algorithm using federated learning (FL) to overcome these issues. FL is a privacy-protecting algorithm that accumulates model parameters from each local learner rather than collecting users' raw data, thus providing both scalability and data privacy. We developed two approaches: FedSpectral and FedSpectral+. FedSpectral is a baseline approach that uses local spectral clustering labels to aggregate the global spectral clustering by creating a similarity graph. FedSpectral+, a state-of-the-art approach, uses the power iteration method to learn the global spectral embedding by incorporating the entire graph data without access to the raw information distributed among the clients. We further designed our own similarity metric to check the clustering quality of the distributed approach to that of the original/non-FL clustering. The proposed approach FedSpectral+ obtained a similarity of 98.85% and 99.8%, comparable to that of global clustering on the ego-Facebook and email-Eu-core dataset.	翻訳日:2023-02-07 19:55:33 公開日:2023-02-04
# 協調型マルチエージェント強化学習のための個別グローバルマックスを伴わない二重自己認識値分解フレームワーク Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2302.02180v1 ) ライセンス: Link先を確認	Zhiwei Xu, Bin Zhang, Dapeng Li, Guangchong Zhou, Zeren Zhang, Guoliang Fan	(参考訳) 協調型マルチエージェント強化学習分野では, 値分解法が徐々に普及している。しかしながら、ほとんど全ての値分解法は、値分解法が解決できる問題の範囲を制限する、個人的グローバルマックス(IGM)原理またはその変種に従う。心理学における二重自己認識の概念に着想を得て, IGMの前提を完全に否定する二重自己認識価値分解フレームワークを提案する。各エージェントは、アクションを実行するegoポリシと、クレジット割り当てに参加する変更ego値関数で構成される。値関数の分解は明示的な探索手順を用いてIMGの仮定を無視することができる。また,アルゴリズムが局所的に最適になるのを避けるために,新たなエゴ探索機構を提案する。 IGMを含まない最初の完全値分解法として,提案手法は様々な協調作業において望ましい性能を実現する。 Value decomposition methods have gradually become popular in the cooperative multi-agent reinforcement learning field. However, almost all value decomposition methods follow the Individual Global Max (IGM) principle or its variants, which restricts the range of issues that value decomposition methods can resolve. Inspired by the notion of dual self-awareness in psychology, we propose a dual self-awareness value decomposition framework that entirely rejects the IGM premise. Each agent consists of an ego policy that carries out actions and an alter ego value function that takes part in credit assignment. The value function factorization can ignore the IGM assumption by using an explicit search procedure. We also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.	翻訳日:2023-02-07 19:48:38 公開日:2023-02-04
# ハイウェイマージのための教師なしスキル発見による階層学習 Hierarchical Learning with Unsupervised Skill Discovery for Highway Merging Applications ( http://arxiv.org/abs/2302.02179v1 ) ライセンス: Link先を確認	Yigit Gurses, Kaan Buyukdemirci, and Yildiray Yildiz	(参考訳) 人間や自律的なドライバーとの密集したトラフィックの運転は、ダイナミックな環境の変化に素早く反応する能力とともに、高いレベルの計画と推論を必要とする課題である。本研究では,学習動作プリミティブを動作として利用する階層的学習手法を提案する。モーションプリミティブは、所定の報酬関数なしで教師なしスキル発見を使用して取得され、異なるシナリオで再利用することができる。これにより、さまざまな振る舞いを持つ複数のモデルを取得する必要のあるアプリケーション全体のトレーニング時間を短縮できる。シミュレーションの結果,提案手法は,ベースライン強化学習法と比較して,トレーニングの少ないドライバモデルで高い性能が得られることが示された。 Driving in dense traffic with human and autonomous drivers is a challenging task that requires high level planning and reasoning along with the ability to react quickly to changes in a dynamic environment. In this study, we propose a hierarchical learning approach that uses learned motion primitives as actions. Motion primitives are obtained using unsupervised skill discovery without a predetermined reward function, allowing them to be reused in different scenarios. This can reduce the total training time for applications that need to obtain multiple models with varying behavior. Simulation results demonstrate that the proposed approach yields driver models that achieve higher performance with less training compared to baseline reinforcement learning methods.	翻訳日:2023-02-07 19:48:24 公開日:2023-02-04
# 構築文法は、ニューラルネットワークモデルにユニークな洞察を与える Construction Grammar Provides Unique Insight into Neural Language Models ( http://arxiv.org/abs/2302.02178v1 ) ライセンス: Link先を確認	Leonie Weissweiler, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, Hinrich Sch\"utze	(参考訳) 建設文法 (CxG) は, 大規模事前学習言語モデル (PLM) の性能を, 構造と意味に関して調査する研究の基盤として最近利用されている。本稿では,本研究の継続と拡張について提案する。我々は、CxGを念頭に置いて設計されていない探索手法と、特定の構成のために設計された探索手法を考察する。我々は,過去の研究を詳細に分析し,この新たな分野が直面する最も重要な課題と研究課題について考察する。 Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pretrained language models (PLMs) with respect to the structure and meaning of constructions. In this position paper, we make suggestions for the continuation and augmentation of this line of research. We look at probing methodology that was not designed with CxG in mind, as well as probing methodology that was designed for specific constructions. We analyse selected previous work in detail, and provide our view of the most important challenges and research questions that this promising new field faces.	翻訳日:2023-02-07 19:48:13 公開日:2023-02-04
# フーリエ変換を用いたニューラル時系列解析:サーベイ Neural Time Series Analysis with Fourier Transform: A Survey ( http://arxiv.org/abs/2302.02173v1 ) ライセンス: Link先を確認	Kun Yi and Qi Zhang and Shoujin Wang and Hui He and Guodong Long and Zhendong Niu	(参考訳) 近年、フーリエ変換が深層ニューラルネットワークに広く導入され、時系列解析の精度と効率の両面で最先端技術が進歩している。効率性やグローバルビューなどの時系列解析におけるフーリエ変換の利点は急速に研究され、時系列解析のための有望なディープラーニングパラダイムが提示されている。しかし、この新興地域では注目が高まり、研究が盛んになっているが、既存の研究の体系的な見直しが欠如している。そこで本稿では,フーリエ変換を用いた時系列解析の研究の包括的レビューを行う。我々は,最新の研究成果を体系的に調査し,要約することを目的とする。そこで我々は,既存のニューラルネットワーク時系列解析手法を特徴,利用パラダイム,ネットワーク設計,応用の4つの観点から分類する新しい分類法を提案する。我々はまた、この活気ある地域で新しい研究の方向性を共有している。 Recently, Fourier transform has been widely introduced into deep neural networks to further advance the state-of-the-art regarding both accuracy and efficiency of time series analysis. The advantages of the Fourier transform for time series analysis, such as efficiency and global view, have been rapidly explored and exploited, exhibiting a promising deep learning paradigm for time series analysis. However, although increasing attention has been attracted and research is flourishing in this emerging area, there lacks a systematic review of the variety of existing studies in the area. To this end, in this paper, we provide a comprehensive review of studies on neural time series analysis with Fourier transform. We aim to systematically investigate and summarize the latest research progress. Accordingly, we propose a novel taxonomy to categorize existing neural time series analysis methods from four perspectives, including characteristics, usage paradigms, network design, and applications. We also share some new research directions in this vibrant area.	翻訳日:2023-02-07 19:48:05 公開日:2023-02-04
# 位置依存質量を持つ非対称発振器の厳密解とコヒーレント状態 Exact solution and coherent states of an asymmetric oscillator with position-dependent mass ( http://arxiv.org/abs/2302.02172v1 ) ライセンス: Link先を確認	Bruno G. da Costa, Ignacio S. Gomez, and Biswanath Rath	(参考訳) 位置依存質量をもつ変形振動子(da Costa et al., J. Math)の問題を再検討する。 Phys bf 62}, 092101 (2021)] は古典的および量子形式論において、運動エネルギーとポテンシャルエネルギーの両方において質量関数の効果を導入する。得られたハミルトニアンは、通常の位相空間$(x, p)$から変形した1ドル(x_\gamma, \Pi_\gamma)$への点正準変換によってモース振動子に写像される。モースポテンシャルと同様に、変形振動子は古典形式論における無調和振動運動に対応する位相空間における束縛軌道を示し、従って量子形式論における離散スペクトルを持つ束縛状態を示す。一方、位相空間における開軌道は散乱状態と連続エネルギースペクトルと関連している。因子化法を用いて、時間進化とその不確実性などのコヒーレントな状態の特性について検討する。高速な局所化(古典的および量子的)は、非対称な位置依存質量のためにコヒーレントな状態に対して報告される。不確実性関係の時間進化の振動も観察され、変形が増加するにつれて振幅が増加する。 We revisit the problem of the deformed oscillator with position-dependent mass [da Costa et al., J. Math. Phys. {\bf 62}, 092101 (2021)] in the classical and quantum formalisms, by introducing the effect of the mass function in both kinetic and potential energies. The resulting Hamiltonian is mapped into a Morse oscillator by means of a point canonical transformation from the usual phase space $(x, p)$ to a deformed one $(x_\gamma, \Pi_\gamma)$. Similar to the Morse potential, the deformed oscillator presents bound trajectories in phase space corresponding to an anharmonic oscillatory motion in classical formalism and, therefore, bound states with a discrete spectrum in quantum formalism. On the other hand, open trajectories in phase space are associated with scattering states and continuous energy spectrum. Employing the factorization method, we investigate the properties of the coherent states, such as the time evolution and their uncertainties. A fast localization, classical and quantum, is reported for the coherent states due to the asymmetrical position-dependent mass. An oscillation of the time evolution of the uncertainty relationship is also observed, whose amplitude increases as the deformation increases.	翻訳日:2023-02-07 19:47:52 公開日:2023-02-04
# 制約付き連続多目的最適化問題のキャラクタリゼーション:性能空間の観点から Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective ( http://arxiv.org/abs/2302.02170v1 ) ライセンス: Link先を確認	Aljo\v{s}a Vodopija, Tea Tu\v{s}ar, Bogdan Filipi\v{c}	(参考訳) 制約付き多目的最適化はここ数年で大きな関心を集めている。しかし、制約付き多目的最適化問題(CMOP)はまだ満足できない。したがって、ベンチマークに十分なCMOPの選択は困難であり、形式的な背景が欠けている。本稿では,パフォーマンス空間の観点からCMOPを探索し,この問題に対処する。まず,制約付き多目的最適化のための新しい性能評価手法を提案する。この方法論は、パレートフロントと制約満足度を近似する性能を同時に測定する最初の試みを提供する。第二に、アルゴリズムの性能を区別する最適化問題の能力を測定する手法を提案する。最後に、このアプローチはCMOPの8つの頻繁に使用される人工テストスイートと対比するために使用される。実験結果から,3つのよく知られた多目的最適化アルゴリズムの判別において,どのスイートの方が効率的かが明らかとなった。ベンチマーク設計者は、これらの結果を使用して、必要に応じて最も適切なcmopsを選択することができる。 Constrained multiobjective optimization has gained much interest in the past few years. However, constrained multiobjective optimization problems (CMOPs) are still unsatisfactorily understood. Consequently, the choice of adequate CMOPs for benchmarking is difficult and lacks a formal background. This paper addresses this issue by exploring CMOPs from a performance space perspective. First, it presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization. This methodology offers a first attempt to simultaneously measure the performance in approximating the Pareto front and constraint satisfaction. Secondly, it proposes an approach to measure the capability of the given optimization problem to differentiate among algorithm performances. Finally, this approach is used to contrast eight frequently used artificial test suites of CMOPs. The experimental results reveal which suites are more efficient in discerning between three well-known multiobjective optimization algorithms. Benchmark designers can use these results to select the most appropriate CMOPs for their needs.	翻訳日:2023-02-07 19:47:29 公開日:2023-02-04
# この予測を解くには、どのトレーニングポイントを廃止する必要があるか? How Many and Which Training Points Would Need to be Removed to Flip this Prediction? ( http://arxiv.org/abs/2302.02169v1 ) ライセンス: Link先を確認	Jinghan Yang, Sarthak Jain, Byron C. Wallace	(参考訳) トレーニングデータの最小部分集合である $\mathcal{S}_t$ を識別する問題は、もし $\mathcal{S}_t$ を構成するインスタンスがトレーニング前に削除された場合、与えられたテストポイント $x_t$ の分類が異なるであろう。このような集合の同定にはいくつかの理由がある。まず、$\mathcal{s}_t$ の濃度はロバスト性の尺度を提供する($\|\mathcal{s}_t\|$ が $x_t$ で小さい場合は、対応する予測に対する自信が低くなるかもしれない)。第二に、$\mathcal{s}_t$ の尋問は、特定のモデル予測に異議を唱えるための新しいメカニズムを提供するかもしれない:$\mathcal{s}_t$ の点が誤ってラベル付けされたり無関係であったりした場合、これは関連する予測を覆すために議論するかもしれない。 brute-force による $\mathcal{S}_t$ の識別は難解である。我々は、影響関数に基づいて$\mathcal{s}_t$を求めるための比較的高速な近似法を提案し、単純な凸テキスト分類モデルにおいて、これらのアプローチは、予測をひっくり返すような、比較的小さなトレーニング例のセットをうまく識別できることを発見した。我々の知る限り、これは機械学習の文脈で与えられた予測を反転させるのに必要な最小限のトレーニングセットを特定することの問題を調査する最初の試みである。 We consider the problem of identifying a minimal subset of training data $\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of robustness (if $\|\mathcal{S}_t\|$ is small for $x_t$, we might be less confident in the corresponding prediction), which we show is correlated with but complementary to predicted probabilities. Second, interrogation of $\mathcal{S}_t$ may provide a novel mechanism for contesting a particular model prediction: If one can make the case that the points in $\mathcal{S}_t$ are wrongly labeled or irrelevant, this may argue for overturning the associated prediction. Identifying $\mathcal{S}_t$ via brute-force is intractable. We propose comparatively fast approximation methods to find $\mathcal{S}_t$ based on influence functions, and find that -- for simple convex text classification models -- these approaches can often successfully identify relatively small sets of training examples which, if removed, would flip the prediction. To our knowledge, this is the first work in to investigate the problem of identifying a minimal training set necessary to flip a given prediction in the context of machine learning.	翻訳日:2023-02-07 19:47:17 公開日:2023-02-04
# AUTOLYCUS: 決定木モデルに対するモデル抽出攻撃のための説明可能なAI(XAI)の爆発 AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Decision Tree Models ( http://arxiv.org/abs/2302.02162v1 ) ライセンス: Link先を確認	Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday	(参考訳) モデル抽出攻撃は、メンバシップ推論攻撃とモデル反転攻撃とともに、機械学習モデルをターゲットにする最も顕著な敵手法の1つである。一方、説明可能な人工知能(XAI)は、AIの背後にある意思決定プロセスを説明するためのテクニックと手順のセットである。 XAIはAIモデルの背後にある理由を理解するための優れたツールですが、そのような啓示のために提供されるデータは、セキュリティとプライバシの脆弱性を生み出します。本稿では,LIMEによるモデル抽出攻撃であるAUTOLYCUSを提案する。この攻撃は,決定木モデルの決定境界を推測し,対象モデルと同じような振る舞いをする抽出サロゲートモデルを作成する。 Model extraction attack is one of the most prominent adversarial techniques to target machine learning models along with membership inference attack and model inversion attack. On the other hand, Explainable Artificial Intelligence (XAI) is a set of techniques and procedures to explain the decision making process behind AI. XAI is a great tool to understand the reasoning behind AI models but the data provided for such revelation creates security and privacy vulnerabilities. In this poster, we propose AUTOLYCUS, a model extraction attack that exploits the explanations provided by LIME to infer the decision boundaries of decision tree models and create extracted surrogate models that behave similar to a target model.	翻訳日:2023-02-07 19:46:48 公開日:2023-02-04
# 涙を伴う有向非循環グラフ Directed Acyclic Graphs With Tears ( http://arxiv.org/abs/2302.02160v1 ) ライセンス: Link先を確認	Zhichao Chen, Zhiqiang Ge	(参考訳) ベイズネットワークは、産業プロセスにおける障害の検出と診断によく用いられる手法である。ベイズネットワークの基礎はデータから有向非巡回グラフ(DAG)を学習する構造学習である。しかし、探索空間はプロセス変数の増加とともに超指数的にスケールするので、データ駆動型構造学習は難しい問題となる。この目的のために、NOTEARs法によるDAGは、離散最適化を連続最適化問題に変換するだけでなく、ディープラーニングフレームワークとの互換性もよく研究されている。それでも、NOTEARベースの手法には依然として課題がある。 1) 実現不可能な解は,勾配降下に基づく最適化パラダイムから得られる。 2)学習グラフの非循環を約束する切断操作。この作品において,挑戦の理由は 1) 理論的に解析を行い, 課題2を緩和するために混合整数計画に基づくDAGs with Tears法という新しい手法を提案する。さらに, 先行知識を新たな手法に取り入れることで, 産業プロセスにおいて構造学習をより実用的で有用なものにすることができる。最後に, ケーススタディとして, 数値例と工業例を用いて, 開発手法の優位性を実証する。 Bayesian network is a frequently-used method for fault detection and diagnosis in industrial processes. The basis of Bayesian network is structure learning which learns a directed acyclic graph (DAG) from data. However, the search space will scale super-exponentially with the increase of process variables, which makes the data-driven structure learning a challenging problem. To this end, the DAGs with NOTEARs methods are being well studied not only for their conversion of the discrete optimization into continuous optimization problem but also their compatibility with deep learning framework. Nevertheless, there still remain challenges for NOTEAR-based methods: 1) the infeasible solution results from the gradient descent-based optimization paradigm; 2) the truncation operation to promise the learned graph acyclic. In this work, the reason for challenge 1) is analyzed theoretically, and a novel method named DAGs with Tears method is proposed based on mix-integer programming to alleviate challenge 2). In addition, prior knowledge is able to incorporate into the new proposed method, making structure learning more practical and useful in industrial processes. Finally, a numerical example and an industrial example are adopted as case studies to demonstrate the superiority of the developed method.	翻訳日:2023-02-07 19:46:35 公開日:2023-02-04
# TrajMatch: 軌道マッチングによる道路側LiDARの自動時空間校正に向けて TrajMatch: Towards Automatic Spatio-temporal Calibration for Roadside LiDARs through Trajectory Matching ( http://arxiv.org/abs/2302.02157v1 ) ライセンス: Link先を確認	Haojie Ren, Sha Zhang, Sugang Li, Yao Li, Xinchen Li, Jianmin Ji, Yu Zhang, Yanyong Zhang	(参考訳) 近年,道路脇にLiDARなどのセンサを配置して交通状況を監視し,自動運転車の認識を支援することが普及している。自動運転車とは異なり、路面センサーは通常異なるサブシステムに関連付けられ、時間と空間の同期が欠如している。キャリブレーションは、中央サーバが異なるロケーションインフラストラクチャによって生成されたデータを融合し、センシング範囲と検出ロバスト性を改善するための重要な技術である。残念ながら、既存のキャリブレーションアルゴリズムは、LiDARが著しく重複しているか、時間キャリブレーションが既に達成されていると仮定することが多い。これらの仮定が常に現実世界に当てはまるわけではないため、既存のアルゴリズムによる校正結果はしばしば不十分であり、人間の関与が常に必要であり、高い労働コストをもたらす。本稿では,道路沿いのLiDARを時間と空間の両方で自動校正できる最初のシステムであるTrajMatchを提案する。主なアイデアは、特別な特徴を抽出するのではなく、検出/追跡タスクの結果に基づいて自動的にセンサーを調整することだ。さらに,本手法の有効性を実験的に検証し,複数のキャリブレーションにおけるパラメータ反復の指導にも利用できることを示す。最後に,TrajMatchの性能を評価するために,シミュレーションデータセットLiDARnet-sim 1.0と実世界のデータセットの2つのデータセットを収集した。実験の結果,trajmatchは10cm未満の空間キャリブレーション誤差と1.5ms未満の時間キャリブレーション誤差を達成できた。 Recently, it has become popular to deploy sensors such as LiDARs on the roadside to monitor the passing traffic and assist autonomous vehicle perception. Unlike autonomous vehicle systems, roadside sensors are usually affiliated with different subsystems and lack synchronization both in time and space. Calibration is a key technology which allows the central server to fuse the data generated by different location infrastructures, which can deliver improve the sensing range and detection robustness. Unfortunately, existing calibration algorithms often assume that the LiDARs are significantly overlapped or that the temporal calibration is already achieved. Since these assumptions do not always hold in the real world, the calibration results from the existing algorithms are often unsatisfactory and always need human involvement, which brings high labor costs. In this paper, we propose TrajMatch -- the first system that can automatically calibrate for roadside LiDARs in both time and space. The main idea is to automatically calibrate the sensors based on the result of the detection/tracking task instead of extracting special features. More deeply, we propose a mechanism for evaluating calibration parameters that is consistent with our algorithm, and we demonstrate the effectiveness of this scheme experimentally, which can also be used to guide parameter iterations for multiple calibration. Finally, to evaluate the performance of TrajMatch , we collect two dataset, one simulated dataset LiDARnet-sim 1.0 and a real-world dataset. Experiment results show that TrajMatch can achieve a spatial calibration error of less than 10cm and a temporal calibration error of less than 1.5ms.	翻訳日:2023-02-07 19:46:18 公開日:2023-02-04
# 低ビットビジョン変換器の無振動量子化 Oscillation-free Quantization for Low-bit Vision Transformers ( http://arxiv.org/abs/2302.02210v1 ) ライセンス: Link先を確認	Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng	(参考訳) 重み振動は量子化対応トレーニングの望ましくない副作用であり、量子化された重みは2つの量子化レベルの間で頻繁にジャンプし、トレーニングの不安定性と準最適最終モデルをもたらす。学習可能なスケーリング係数である$\textit{de facto}$の量子化設定は、重みの振動を増大させる。本研究では,学習可能なスケーリング因子と量的重み振動との関係について検討し,vitをケースドライバとして活用し,その発見と改善について検討した。さらに、量子化重みの相互依存性が$\textit{query}$と$\textit{key}$の自己アテンション層であることから、ViTは振動に弱いことが判明した。そこで,本研究では, 統計的量量化($\rm StatsQ$)による量子化ロバスト性の向上と, 一般的な学習可能スケール法と比較しての信頼性向上($\rm CGA$)による重み付けを凍結し, 発振重みを緩和する($\textit{high confidence}$, $\textit{query}$-$\textit{key}$再パラメータ化($\rm QKR$)によるクエリキーの相互交叉振動の解消と, 結果の勾配推定の緩和を行う($\rm QKR$)3つの手法を提案する。広汎な実験により、これらの手法は重量振動を緩和し、一貫して画像ネットの精度を向上することを示した。具体的には、我々の2ビットのDeiT-T/DeiT-Sアルゴリズムは、それぞれ9.8%と7.7%で先行技術を上回っている。コードは補足資料に含まれており、リリースされます。 Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used $\textit{de facto}$ setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in $\textit{query}$ and $\textit{key}$ of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization ($\rm StatsQ$) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing ($\rm CGA$) that freezes the weights with $\textit{high confidence}$ and calms the oscillating weights; and $\textit{query}$-$\textit{key}$ reparameterization ($\rm QKR$) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that these proposed techniques successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively. The code is included in the supplementary material and will be released.	翻訳日:2023-02-07 19:39:54 公開日:2023-02-04
# 関係型Weisfeiler-Lemanによるリンク予測の一理論 A Theory of Link Prediction via Relational Weisfeiler-Leman ( http://arxiv.org/abs/2302.02209v1 ) ライセンス: Link先を確認	Xingyue Huang, Miguel Romero Orth, \.Ismail \.Ilkan Ceylan, Pablo Barcel\'o	(参考訳) グラフニューラルネットワークは、グラフ構造化データ上での表現学習のための顕著なモデルである。これらのモデルの能力と制限は単純なグラフではよく理解されているが、知識グラフの文脈では、我々の理解は極めて不完全である。この研究の目的は、リンク予測の顕著なタスクに関連する知識グラフに対するグラフニューラルネットワークの展望を体系的に理解することである。我々の分析は、一見無関係なモデルに対する統一的な視点を必要とし、他のモデルもアンロックする。様々なモデルの表現力は、異なる初期化規則を持つ対応する関係性Weisfeiler-Lemanアルゴリズムによって特徴づけられる。この分析は、グラフニューラルネットワークのクラスによってキャプチャされる関数のクラスを正確に論理的に特徴づけるために拡張される。提案手法を実証的に検証した実践的設計選択の利点を理論的に説明する。 Graph neural networks are prominent models for representation learning over graph-structured data. While the capabilities and limitations of these models are well-understood for simple graphs, our understanding remains highly incomplete in the context of knowledge graphs. The goal of this work is to provide a systematic understanding of the landscape of graph neural networks for knowledge graphs pertaining the prominent task of link prediction. Our analysis entails a unifying perspective on seemingly unrelated models, and unlocks a series of other models. The expressive power of various models is characterized via a corresponding relational Weisfeiler-Leman algorithm with different initialization regimes. This analysis is extended to provide a precise logical characterization of the class of functions captured by a class of graph neural networks. Our theoretical findings explain the benefits of some widely employed practical design choices, which are validated empirically.	翻訳日:2023-02-07 19:39:17 公開日:2023-02-04
# 対向摂動下におけるロバスト認証制御 Certified Robust Control under Adversarial Perturbations ( http://arxiv.org/abs/2302.02208v1 ) ライセンス: Link先を確認	Jinghan Yang, Hunmin Kim, Wenbin Wan, Naira Hovakimyan, Yevgeniy Vorobeychik	(参考訳) 自律システムは、高次元の生の入力を、意思決定と制御に使用される予測に変換する機械学習技術にますます依存している。しかし、これらの入力を悪意を持って操作し、その結果、予測することが容易であることが多い。逆入力摂動に対する予測のロバスト性を検証する効果的な手法が提案されているが、予測を下流で利用するための制御システムから切り離されている。本稿では, 逆入力摂動に対する制御の正当性を得るために, 原入力摂動に対する予測の頑健性検証を構成するための最初の手法を提案する。我々は、適応車両制御のケーススタディを用いて、我々のアプローチを説明し、広範囲な実験を通して得られたエンドツーエンド証明書の価値を示す。 Autonomous systems increasingly rely on machine learning techniques to transform high-dimensional raw inputs into predictions that are then used for decision-making and control. However, it is often easy to maliciously manipulate such inputs and, as a result, predictions. While effective techniques have been proposed to certify the robustness of predictions to adversarial input perturbations, such techniques have been disembodied from control systems that make downstream use of the predictions. We propose the first approach for composing robustness certification of predictions with respect to raw input perturbations with robust control to obtain certified robustness of control to adversarial input perturbations. We use a case study of adaptive vehicle control to illustrate our approach and show the value of the resulting end-to-end certificates through extensive experiments.	翻訳日:2023-02-07 19:39:05 公開日:2023-02-04
# ラプラシアンicpによる3次元頭部メッシュのプログレッシブ登録 Laplacian ICP for Progressive Registration of 3D Human Head Meshes ( http://arxiv.org/abs/2302.02194v1 ) ライセンス: Link先を確認	Nick Pears, Hang Dai, Will Smith and Hao Sun	(参考訳) 古典的非剛性イテレーティブ・クローズト・ポイント(N-ICP)の高効率な変種であるプログレッシブ3次元登録フレームワークを提案する。変形正則化にLaplace-Beltrami演算子を用いるので、全体のプロセスはLaplacian ICP (L-ICP) とみなす。これは「イテレーション毎の小さな変形」という仮定を生かし、徐々に粗くなり、フレキシブルな変形モデル、対応集合の数の増加、より洗練された対応推定を利用する。対応マッチングは、ドメイン固有の特徴抽出器から派生した予め定義された頂点サブセット内でのみ許可される。さらに,アノテーション転送に基づく3次元非剛性登録のための新しいベンチマークと2つの評価指標を提案する。これを、3d human head scans(headspace)の公開データセット上で評価するために使用します。この手法は頑丈であり、最も一般的な古典的手法と比較して計算時間のごく一部しか必要としないが、登録性能は同等である。 We present a progressive 3D registration framework that is a highly-efficient variant of classical non-rigid Iterative Closest Points (N-ICP). Since it uses the Laplace-Beltrami operator for deformation regularisation, we view the overall process as Laplacian ICP (L-ICP). This exploits a `small deformation per iteration' assumption and is progressively coarse-to-fine, employing an increasingly flexible deformation model, an increasing number of correspondence sets, and increasingly sophisticated correspondence estimation. Correspondence matching is only permitted within predefined vertex subsets derived from domain-specific feature extractors. Additionally, we present a new benchmark and a pair of evaluation metrics for 3D non-rigid registration, based on annotation transfer. We use this to evaluate our framework on a publicly-available dataset of 3D human head scans (Headspace). The method is robust and only requires a small fraction of the computation time compared to the most popular classical approach, yet has comparable registration performance.	翻訳日:2023-02-07 19:38:51 公開日:2023-02-04
# 時間力学を用いた表面符号回路のハードウェア要件の緩和 Relaxing Hardware Requirements for Surface Code Circuits using Time-dynamics ( http://arxiv.org/abs/2302.02192v1 ) ライセンス: Link先を確認	Matt McEwen, Dave Bacon, Craig Gidney	(参考訳) 量子誤り訂正(QEC)符号の典型的な時間依存ビューは、ハードウェア上で実行可能な回路への分解においてかなりの自由を隠蔽する。領域検出の概念を用いて、静的QEC符号を回路に分解する代わりに、時間動的QEC回路を直接設計する。特に、曲面符号の標準的な回路構成を改善し、正方形格子の代わりに六角形格子に埋め込み、CNOTやCZゲートの代わりにISWAPゲートを使用し、量子ビットデータを交換して役割を計測し、実行中に物理量子ビットグリッドの周りに論理的パッチを移動させる新しい回路を提示する。これらの構造は全て追加のエンタングルゲート層を使用しず、基本的に同じ論理的性能を示し、標準的なサーフェスコード回路の25%以内のテラクオプフットプリントを有する。これらの回路は、ハードウェアの需要を緩和しながら、標準的なサーフェスコード回路と本質的に同じ論理性能を達成するため、量子ハードウェアエンジニアにとって大きな関心を持つだろう。 The typical time-independent view of quantum error correction (QEC) codes hides significant freedom in the decomposition into circuits that are executable on hardware. Using the concept of detecting regions, we design time-dynamic QEC circuits directly instead of designing static QEC codes to decompose into circuits. In particular, we improve on the standard circuit constructions for the surface code, presenting new circuits that can embed on a hexagonal grid instead of a square grid, that can use ISWAP gates instead of CNOT or CZ gates, that can exchange qubit data and measure roles, and that move logical patches around the physical qubit grid while executing. All these constructions use no additional entangling gate layers and display essentially the same logical performance, having teraquop footprints within 25% of the standard surface code circuit. We expect these circuits to be of great interest to quantum hardware engineers, because they achieve essentially the same logical performance as standard surface code circuits while relaxing demands on hardware.	翻訳日:2023-02-07 19:38:34 公開日:2023-02-04
# 3GPPMIMOシステムにおけるパイロットフリー伝送の教師なし学習 Unsupervised Learning for Pilot-free Transmission in 3GPP MIMO Systems ( http://arxiv.org/abs/2302.02191v1 ) ライセンス: Link先を確認	Omar M. Sleem, Mohamed Salah Ibrahim, Akshay Malhotra, Mihaela Beluri, Philip Pietraski	(参考訳) 参照信号のオーバーヘッド低減は、近年、システムのスペクトル効率を改善する効果的なソリューションとして進化している。本稿では,復調基準信号(DM-RS)が不要な新しいダウンリンクデータ構造を提案する。提案したデータ転送構造は,ユーザデータの一部を複数のサブバンドにまたがる簡単な繰り返しステップを含む。ユーザ側で繰り返し構造を利用すると,正準相関分析により信頼性の高いリカバリが可能となる。また、OFDMシステムにおけるCCA性能を高めるための2つの効果的なメカニズムを提案し、その1つは繰り返しパターンの選択であり、もう1つは重度周波数選択性の問題に対処するものである。提案手法は複雑さとパフォーマンスのトレードオフが良好であり,実用的な実装が期待できる。 3gppリンクレベルテストベンチを用いた数値実験により,提案手法が最先端手法よりも優れていることを示す。 Reference signals overhead reduction has recently evolved as an effective solution for improving the system spectral efficiency. This paper introduces a new downlink data structure that is free from demodulation reference signals (DM-RS), and hence does not require any channel estimation at the receiver. The new proposed data transmission structure involves a simple repetition step of part of the user data across the different sub-bands. Exploiting the repetition structure at the user side, it is shown that reliable recovery is possible via canonical correlation analysis. This paper also proposes two effective mechanisms for boosting the CCA performance in OFDM systems; one for repetition pattern selection and another to deal with the severe frequency selectivity issues. The proposed approach exhibits favorable complexity-performance tradeoff, rendering it appealing for practical implementation. Numerical results, using a 3GPP link-level testbench, demonstrate the superiority of the proposed approach relative to the state-of-the-art methods.	翻訳日:2023-02-07 19:38:15 公開日:2023-02-04
# 適切な信頼性、説明可能なAI、人間とAIのコラボレーション、人間とAIの相補性 Appropriate Reliance, Explainable AI, Human-AI Collaboration, Human-AI Complementarity ( http://arxiv.org/abs/2302.02187v1 ) ライセンス: Link先を確認	Max Schemmer, Niklas K\"uhl, Carina Benz, Andrea Bartos, Gerhard Satzger	(参考訳) AIアドバイスは、例えば投資や治療決定において、ますます人気が高まっている。このアドバイスは一般的に不完全であるため、意思決定者は、実際にそのアドバイスに従うかどうかを判断しなければならない。しかし、現在の適切な信頼に関する研究には、まだ共通の定義と運用上の測定概念が欠けている。さらに、この行動に影響を及ぼす要因を理解するのに役立つ深い行動実験は行われていない。本稿では,AoR(Adropriateness of Reliance)を基礎となる,定量的な2次元計測概念として提案する。我々は、aiアドバイスに説明を提供する効果を分析する研究モデルを開発した。 200人の参加者による実験では、これらの説明がAoRにどのように影響し、AIアドバイスの有効性を示す。我々の研究は、依存行動の分析とAIアドバイザの目的設計のための基本的な概念に貢献する。 AI advice is becoming increasingly popular, e.g., in investment and medical treatment decisions. As this advice is typically imperfect, decision-makers have to exert discretion as to whether actually follow that advice: they have to "appropriately" rely on correct and turn down incorrect advice. However, current research on appropriate reliance still lacks a common definition as well as an operational measurement concept. Additionally, no in-depth behavioral experiments have been conducted that help understand the factors influencing this behavior. In this paper, we propose Appropriateness of Reliance (AoR) as an underlying, quantifiable two-dimensional measurement concept. We develop a research model that analyzes the effect of providing explanations for AI advice. In an experiment with 200 participants, we demonstrate how these explanations influence the AoR, and, thus, the effectiveness of AI advice. Our work contributes fundamental concepts for the analysis of reliance behavior and the purposeful design of AI advisors.	翻訳日:2023-02-07 19:38:02 公開日:2023-02-04
# モバイルデバイス上でのリアルタイム画像復調 Real-Time Image Demoireing on Mobile Devices ( http://arxiv.org/abs/2302.02184v1 ) ライセンス: Link先を確認	Yuxin Zhang, Mingbao Lin, Xunchao Li, Han Liu, Guozhi Wang, Fei Chao, Shuai Ren, Yafei Wen, Xiaoxin Chen, Rongrong Ji	(参考訳) モアレパターンは、デジタルスクリーンの写真を撮るときに頻繁に現れ、画質を大幅に劣化させる。画像復号化におけるCNNの進歩にもかかわらず、既存のネットワークは設計が重く、モバイルデバイスに冗長な計算負荷をもたらす。本稿では,デモレーアネットワークの高速化に関する最初の研究を開始し,モバイルデバイス上でのリアルタイム展開に向けた動的デモレーア・アクセラレーション手法(dda)を提案する。私たちの刺激は、モアレパターンが画像全体に不均衡に分散されることが、シンプルで普遍的な事実に起因しています。その結果、過剰な計算は非モアレ領域で無駄にされる。したがって,画像パッチの複雑さに比例して計算コストを再配置する。この目的を達成するために,moireパターンのカラフルさと頻度情報の両方を考慮した新しいmoire preを設計し,画像パッチの複雑さを測定する。そして,より大規模なネットワークを用いて画像パッチを復元し,より複雑な画像パッチを小さなネットワークに割り当て,計算負担を軽減する。最終的に、パラメータの重荷を避けるためにパラメータ共有スーパーネットパラダイムですべてのネットワークをトレーニングします。いくつかのベンチマークにおいて,提案したDDAの有効性を示す実験を行った。さらに、snapdragon 8 gen 1のチップを搭載したvivo x80 proスマートフォンで評価された加速度は、この手法が推定時間を劇的に短縮し、モバイルデバイスでのリアルタイム画像の復調に繋がることを示している。ソースコードとモデルはhttps://github.com/zyxxmu/ddaでリリース Moire patterns appear frequently when taking photos of digital screens, drastically degrading the image quality. Despite the advance of CNNs in image demoireing, existing networks are with heavy design, causing redundant computation burden for mobile devices. In this paper, we launch the first study on accelerating demoireing networks and propose a dynamic demoireing acceleration method (DDA) towards a real-time deployment on mobile devices. Our stimulus stems from a simple-yet-universal fact that moire patterns often unbalancedly distribute across an image. Consequently, excessive computation is wasted upon non-moire areas. Therefore, we reallocate computation costs in proportion to the complexity of image patches. In order to achieve this aim, we measure the complexity of an image patch by designing a novel moire prior that considers both colorfulness and frequency information of moire patterns. Then, we restore image patches with higher-complexity using larger networks and the ones with lower-complexity are assigned with smaller networks to relieve the computation burden. At last, we train all networks in a parameter-shared supernet paradigm to avoid additional parameter burden. Extensive experiments on several benchmarks demonstrate the efficacy of our proposed DDA. In addition, the acceleration evaluated on the VIVO X80 Pro smartphone equipped with a chip of Snapdragon 8 Gen 1 shows that our method can drastically reduce the inference time, leading to a real-time image demoireing on mobile devices. Source codes and models are released at https://github.com/zyxxmu/DDA	翻訳日:2023-02-07 19:37:49 公開日:2023-02-04
# 非定常入力駆動環境におけるオンライン強化学習のための局所的制約付きポリシー最適化 Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments ( http://arxiv.org/abs/2302.02182v1 ) ライセンス: Link先を確認	Pouya Hamadanian, Arash Nasr-Esfahany, Siddartha Sen, Malte Schwarzkopf, Mohammad Alizadeh	(参考訳) 非定常的な入力駆動環境におけるオンライン強化学習(RL)について検討した。オンラインRLは破滅的忘れ(CF)のため、このような環境では困難である。エージェントは新しい経験を訓練するとき、事前の知識を忘れがちです。この問題を軽減するための以前のアプローチでは、タスクラベル(実際には利用できないことが多い)や、不安定でパフォーマンスが悪い可能性のあるオフポリシーメソッドを想定している。本稿では,政策出力を古い経験に固定し,現在の経験への回帰を最適化することでCFと戦う,地方制約付き政策最適化(LCPO)を提案する。このアンカリングを行うため、LCPOは現在の入力分布の外にある経験からのサンプルを使用してポリシー最適化を局所的に制約する。 2つのジムおよびコンピュータシステム環境でlcpoを様々な合成および実入力トレースで評価し、オンライン環境では最先端のオン・ポリシーおよびオフ・ポリシーrl法を上回り、全入力トレースで事前訓練されたオフラインエージェントと同等の結果を得る。 We study online Reinforcement Learning (RL) in non-stationary input-driven environments, where a time-varying exogenous input process affects the environment dynamics. Online RL is challenging in such environments due to catastrophic forgetting (CF). The agent tends to forget prior knowledge as it trains on new experiences. Prior approaches to mitigate this issue assume task labels (which are often not available in practice) or use off-policy methods that can suffer from instability and poor performance. We present Locally Constrained Policy Optimization (LCPO), an on-policy RL approach that combats CF by anchoring policy outputs on old experiences while optimizing the return on current experiences. To perform this anchoring, LCPO locally constrains policy optimization using samples from experiences that lie outside of the current input distribution. We evaluate LCPO in two gym and computer systems environments with a variety of synthetic and real input traces, and find that it outperforms state-of-the-art on-policy and off-policy RL methods in the online setting, while achieving results on-par with an offline agent pre-trained on the whole input trace.	翻訳日:2023-02-07 19:37:25 公開日:2023-02-04
# gan発生器がリアルタイムにネットワークを反転させるモデルステッチングと可視化 Model Stitching and Visualization How GAN Generators can Invert Networks in Real-Time ( http://arxiv.org/abs/2302.02181v1 ) ライセンス: Link先を確認	Rudolf Herdt (1 and 2), Maximilian Schmidt (1 and 2), Daniel Otero Baguer (1 and 2), Jean Le'Clerc Arrastia (1 and 2), Peter Maass (1 and 2) ((1) University of Bremen, (2) aisencia)	(参考訳) 医療分野における批判的応用は、深層学習手法による決定を解釈するために、追加情報を迅速に提供する必要がある。本研究では,畳み込みを利用したGANジェネレータを用いて,分類とセマンティックセグメンテーションネットワークの活性化を高速かつ正確に可視化する手法を提案する。 afhq野生動物データセットの動物画像とステンド組織標本の現実世界のデジタル病理スキャンを用いて実験を行った。提案手法は,これらのデータセット上で,約2桁高速に動作しながら,確立された勾配降下法に匹敵する結果を与える。 Critical applications, such as in the medical field, require the rapid provision of additional information to interpret decisions made by deep learning methods. In this work, we propose a fast and accurate method to visualize activations of classification and semantic segmentation networks by stitching them with a GAN generator utilizing convolutions. We test our approach on images of animals from the AFHQ wild dataset and real-world digital pathology scans of stained tissue samples. Our method provides comparable results to established gradient descent methods on these datasets while running about two orders of magnitude faster.	翻訳日:2023-02-07 19:37:02 公開日:2023-02-04
# 効率的なConvNetによる画像の劣化再考 Revisiting Image Deblurring with an Efficient ConvNet ( http://arxiv.org/abs/2302.02234v1 ) ライセンス: Link先を確認	Lingyan Ruan, Mojtaba Bemana, Hans-peter Seidel, Karol Myszkowski, Bin Chen	(参考訳) Image Deblurringは、ぼやけた画像から潜むシャープなイメージを復元することを目的としており、コンピュータビジョンに幅広い応用がある。畳み込みニューラルネットワーク(cnns)は長年にわたってこの領域でよく機能しており、最近ではトランスフォーマーと呼ばれる別のネットワークアーキテクチャがさらに強力な性能を示している。 mhsa(multi-head self-attention)メカニズムは、cnnよりも大きな受容野と優れた入力コンテンツ適応性を提供する。しかし、mhsaは入力解像度に対して二次的に増加する高い計算コストを要求するため、高分解能画像デブラリングタスクでは実用的でない。本研究では,大規模な実効性受容場(ERF)を特徴とする軽量CNNネットワークを提案する。我々の鍵となる設計はLaKDと呼ばれる効率的なCNNブロックで、大きなカーネル深さの畳み込みと空間チャネルの混合構造を備えており、トランスフォーマーと同等あるいは大きいRFを実現するが、パラメータスケールは小さい。具体的には,パラメータが32%少なく,MACが39%少ないデフォーカス/モーションデブロアリングベンチマークデータセット上で,最先端のRestormer上で+0.17dB / +0.43dB PSNRを達成する。大規模な実験は、ネットワークの性能と各モジュールの有効性を実証する。さらに,ERFを定量的に特徴付け,ネットワーク性能に高い相関性を示すコンパクトで直感的なERFメータ指標を提案する。この研究によって、CNNとTransformerのアーキテクチャが、イメージの損なうようなタスクを超えて、さらに長所と短所を探求できることを期待しています。 Image deblurring aims to recover the latent sharp image from its blurry counterpart and has a wide range of applications in computer vision. The Convolution Neural Networks (CNNs) have performed well in this domain for many years, and until recently an alternative network architecture, namely Transformer, has demonstrated even stronger performance. One can attribute its superiority to the multi-head self-attention (MHSA) mechanism, which offers a larger receptive field and better input content adaptability than CNNs. However, as MHSA demands high computational costs that grow quadratically with respect to the input resolution, it becomes impractical for high-resolution image deblurring tasks. In this work, we propose a unified lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers while bearing less computational costs. Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure, attaining comparable or larger ERF than Transformers but with a smaller parameter scale. Specifically, we achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs. Extensive experiments demonstrate the superior performance of our network and the effectiveness of each module. Furthermore, we propose a compact and intuitive ERFMeter metric that quantitatively characterizes ERF, and shows a high correlation to the network performance. We hope this work can inspire the research community to further explore the pros and cons of CNN and Transformer architectures beyond image deblurring tasks.	翻訳日:2023-02-07 19:31:11 公開日:2023-02-04
# アラビア語の同義語強化のためのベンチマークとスコーリングアルゴリズム A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms ( http://arxiv.org/abs/2302.02232v1 ) ライセンス: Link先を確認	Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas	(参考訳) 本稿では,同義語強度をファジィ値として考慮し,与えられたシンセセットを拡張するタスクについて述べる。 mono/multilingual synsetとしきい値(ファジィ値 [0-1])が与えられたとき、我々の目標は、既存のレキシコンからこのしきい値を超える新しいシノニムを抽出することである。アルゴリズムとベンチマークデータセットという2つのコントリビューションを提示します。データセットは500シンセットの3K候補シノニムで構成されている。各候補は4人の言語学者によってファジィ値で注釈付けされる。データセットは重要です (i)同義語に関する言語学者(dis/)の語義を理解することに加えて 2) データセットをベースラインとして,アルゴリズムの評価を行う。提案アルゴリズムは,既存の語彙から同義語を抽出し,各候補に対するファジィ値を算出する。評価の結果,このアルゴリズムは言語学者のように振る舞うことができ,ファジィ値は言語学者によって提案されたものに近い(RMSEとMAEを用いて)。データセットとデモページはhttps://portal.sina.birzeit.edu/synonymsで公開されている。 This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at https://portal.sina.birzeit.edu/synonyms.	翻訳日:2023-02-07 19:30:31 公開日:2023-02-04
# PubGraph: 大規模科学的一時的知識グラフ PubGraph: A Large Scale Scientific Temporal Knowledge Graph ( http://arxiv.org/abs/2302.02231v1 ) ライセンス: Link先を確認	Kian Ahrabian, Xinwei Du, Richard Delwin Myloth, Arun Baalaaji Sankar Ananthan, Jay Pujara	(参考訳) 研究出版物は、新しい発見、方法、技術、洞察の形で科学的進歩を共有するための主要な手段である。出版物は、コンテンツ分析と書誌構造の両方の観点から研究されてきたが、科学研究のより包括的な研究への障壁は、広くアクセス可能な大規模データや資源の欠如である。本稿では,大規模時間知識グラフ(KG)の形式を取り入れた科学的進歩を研究するための新たな資料PubGraphを提案する。 432万以上のノードと15.49Bのエッジがウィキデータオントロジーにマッピングされている。 PubGraphから異なるサイズの3つのKGを抽出し、異なるスケールでの実験を可能にする。これらのkgsを用いて,時間的に調整されたトレーニング,検証,テストパーティションを含むトランスダクティブおよびインダクティブ設定のための新しいリンク予測ベンチマークを導入する。さらに,pubgraphに適合する2つの新しい帰納的学習手法を開発し,明示的な特徴を伴わずに未認識ノード上で動作し,大規模なkgsにスケールし,既存モデルを上回るパフォーマンスを示す。その結果,過去の引用の構造的特徴は,新たな出版物の質の高い予測に十分であることがわかった。また,敵対的なコミュニティベースリンク予測設定,ゼロショットインダクティブ学習,大規模学習など,kgモデルの新たな課題を特定する。 Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Publications have been studied from the perspectives of both content analysis and bibliometric structure, but a barrier to more comprehensive studies of scientific research is a lack of publicly accessible large-scale data and resources. In this paper, we present PubGraph, a new resource for studying scientific progress that takes the form of a large-scale temporal knowledge graph (KG). It contains more than 432M nodes and 15.49B edges mapped to the popular Wikidata ontology. We extract three KGs with varying sizes from PubGraph to allow experimentation at different scales. Using these KGs, we introduce a new link prediction benchmark for transductive and inductive settings with temporally-aligned training, validation, and testing partitions. Moreover, we develop two new inductive learning methods better suited to PubGraph, operating on unseen nodes without explicit features, scaling to large KGs, and outperforming existing models. Our results demonstrate that structural features of past citations are sufficient to produce high-quality predictions about new publications. We also identify new challenges for KG models, including an adversarial community-based link prediction setting, zero-shot inductive learning, and large-scale learning.	翻訳日:2023-02-07 19:30:07 公開日:2023-02-04
# フェルミオンガウス状態の絡み合い容量 Entanglement capacity of fermionic Gaussian states ( http://arxiv.org/abs/2302.02229v1 ) ライセンス: Link先を確認	Youyi Huang and Lu Wei	(参考訳) フェルミオンガウス状態上での量子二部体の絡み合いの度合いを推定する際のエンタングルメントエントロピーの代替としてエンタングルメントの容量について検討する。特に、粒子数制約なしに、2つの異なるケースの平均容量の正確な漸近公式を導出する。後者の場合、得られた式は文学における平均容量の部分的な結果を一般化する。結果の導出の鍵となる要素は、フェルミオンガウス状態の絡み合いエントロピーの研究で最近開発された有限和を単純化するための新しいツールセットである。 We study the capacity of entanglement as an alternative to entanglement entropies in estimating the degree of entanglement of quantum bipartite systems over fermionic Gaussian states. In particular, we derive the exact and asymptotic formulas of average capacity of two different cases - with and without particle number constraints. For the later case, the obtained formulas generalize some partial results of average capacity in the literature. The key ingredient in deriving the results is a set of new tools for simplifying finite summations developed very recently in the study of entanglement entropy of fermionic Gaussian states.	翻訳日:2023-02-07 19:29:32 公開日:2023-02-04
# 単射因果モデルの反事実識別可能性 Counterfactual Identifiability of Bijective Causal Models ( http://arxiv.org/abs/2302.02228v1 ) ライセンス: Link先を確認	Arash Nasr-Esfahany, Mohammad Alizadeh, Devavrat Shah	(参考訳) 文献で広く使われている複数の因果関係モデルを一般化するクラスであるBGM(Bijective Generation Mechanism)を用いた因果関係モデルの因果関係同定可能性について検討した。本研究では,観測不能な3つの共通因果構造に対して,BGMの学習を構造的生成モデルとして活用する実践的学習手法を提案する。学習されたBGMは効果的な反ファクト推定を可能にし、様々な深い条件生成モデルを用いて得ることができる。本手法を視覚的タスクで評価し,実世界のビデオストリーミングシミュレーションタスクにおけるその応用を実証する。 We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature. We establish their counterfactual identifiability for three common causal structures with unobserved confounding, and propose a practical learning method that casts learning a BGM as structured generative modeling. Learned BGMs enable efficient counterfactual estimation and can be obtained using a variety of deep conditional generative models. We evaluate our techniques in a visual task and demonstrate its application in a real-world video streaming simulation task.	翻訳日:2023-02-07 19:29:19 公開日:2023-02-04
# TAP: ラベルなしデータからのクロスモーダルな知識伝達のための注意パッチ TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Data ( http://arxiv.org/abs/2302.02224v1 ) ライセンス: Link先を確認	Yinsong Wang, Shahin Shahrampour	(参考訳) 本研究は,クロスモーダル学習とセミ教師あり学習の交点について検討し,未ラベルのモーダルから欠落情報を借りることにより,一次モーダルの教師あり学習性能を向上させることを目的とする。ナダラヤ・ワトソン(NW)カーネル回帰の観点からこの問題を考察し、この定式化が暗黙的にカーネル化されたクロスアテンションモジュールにつながることを示す。そこで本研究では,ラベルのないモダリティからデータレベル知識の転送を可能にする単純なニューラルネットワークプラグインである attention patch (tap) を提案する。実世界の3つのデータセット上で数値シミュレーションを行い、TAPのそれぞれの側面を調べ、ニューラルネットワークにおけるTAP統合が、ラベルのないモダリティを用いて一般化性能を向上させることを示す。 This work investigates the intersection of cross modal learning and semi supervised learning, where we aim to improve the supervised learning performance of the primary modality by borrowing missing information from an unlabeled modality. We investigate this problem from a Nadaraya Watson (NW) kernel regression perspective and show that this formulation implicitly leads to a kernelized cross attention module. To this end, we propose The Attention Patch (TAP), a simple neural network plugin that allows data level knowledge transfer from the unlabeled modality. We provide numerical simulations on three real world datasets to examine each aspect of TAP and show that a TAP integration in a neural network can improve generalization performance using the unlabeled modality.	翻訳日:2023-02-07 19:29:10 公開日:2023-02-04
# multi-armed adversarial attack detection に対する minimax アプローチ A Minimax Approach Against Multi-Armed Adversarial Attacks Detection ( http://arxiv.org/abs/2302.02216v1 ) ライセンス: Link先を確認	Federica Granese, Marco Romanelli, Siddharth Garg, Pablo Piantanida	(参考訳) 複数のアルゴリズムと目標損失関数を同時に使用するマルチアーム対向攻撃は、検出機構の特定の側情報を必要としない状態で、最先端の対向検知器を騙すことに成功している。問題の定式化により,複数の事前学習検出器のソフト確率出力をミニマックス法に従って集約する解を提案することができる。提案するフレームワークは数学的に健全で実装が容易でモジュール化されており、既存の検出器や将来の検出器を統合することができる。一般的なデータセット(例えば CIFAR10 や SVHN など)の広範な評価を通じて、我々のアグリゲーションは、多武装の敵攻撃に対する個々の最先端検出器よりも一貫して優れており、利用可能なメソッドのレジリエンスを改善する効果的なソリューションであることを示す。 Multi-armed adversarial attacks, in which multiple algorithms and objective loss functions are simultaneously used at evaluation time, have been shown to be highly successful in fooling state-of-the-art adversarial examples detectors while requiring no specific side information about the detection mechanism. By formalizing the problem at hand, we can propose a solution that aggregates the soft-probability outputs of multiple pre-trained detectors according to a minimax approach. The proposed framework is mathematically sound, easy to implement, and modular, allowing for integrating existing or future detectors. Through extensive evaluation on popular datasets (e.g., CIFAR10 and SVHN), we show that our aggregation consistently outperforms individual state-of-the-art detectors against multi-armed adversarial attacks, making it an effective solution to improve the resilience of available methods.	翻訳日:2023-02-07 19:28:55 公開日:2023-02-04
# CNNを用いた教師なしリフトを用いた変分多チャンネルセグメンテーション\endgraf Variational multichannel multiclass segmentation\endgraf using unsupervised lifting with CNNs ( http://arxiv.org/abs/2302.02214v1 ) ライセンス: Link先を確認	Nadja Gruber, Johannes Schwab, Sebastien Court, Elke Gizewski, Markus Haltmeier	(参考訳) 本稿では、変動エネルギー関数と深部畳み込みニューラルネットワークを組み合わせた教師なし画像分割手法を提案する。この変動部分は、複数の入力画像から有用な情報を同時に抽出できる、最近のマルチチャネルマルチフェーズChan-Veseモデルに基づいている。与えられた画像をK$の異なる領域に分割するフレキシブルなマルチクラスセグメンテーション手法を実装した。画像の事前分解を目的とした畳み込みニューラルネットワーク(CNN)を用いる。その後、セグメント化関数を最小化することにより、最終的なセグメント化は完全に教師なしの方法で得られる。セグメンテーションの出発点となる情報的特徴マップの抽出に特に重点が置かれている。提案手法は,テクスチャや医用画像などの様々な種類の画像の領域を分解・分割し,その性能を他の多相分割法と比較できることを示す。 We propose an unsupervised image segmentation approach, that combines a variational energy functional and deep convolutional neural networks. The variational part is based on a recent multichannel multiphase Chan-Vese model, which is capable to extract useful information from multiple input images simultaneously. We implement a flexible multiclass segmentation method that divides a given image into $K$ different regions. We use convolutional neural networks (CNNs) targeting a pre-decomposition of the image. By subsequently minimising the segmentation functional, the final segmentation is obtained in a fully unsupervised manner. Special emphasis is given to the extraction of informative feature maps serving as a starting point for the segmentation. The initial results indicate that the proposed method is able to decompose and segment the different regions of various types of images, such as texture and medical images and compare its performance with another multiphase segmentation method.	翻訳日:2023-02-07 19:28:37 公開日:2023-02-04
# CosPGD : 画素単位の予測タスクに対する一貫したホワイトボックス対向攻撃 CosPGD: a unified white-box adversarial attack for pixel-wise prediction tasks ( http://arxiv.org/abs/2302.02213v1 ) ライセンス: Link先を確認	Shashank Agnihotri and Margret Keuper	(参考訳) ニューラルネットワークは、多くのタスクで高精度な予測を可能にするが、わずかな入力摂動に対する堅牢性の欠如は、多くの現実世界アプリケーションでのデプロイメントを妨げている。近年のニューラルネットのロバスト性評価に向けた研究は, セミナル \emph{projected gradient descent} (PGD) 攻撃やその後の研究, ベンチマークなどに大きな注目を集めている。しかし,このような手法は主に分類タスクに焦点をあてるが,意味セグメンテーションやオプティカルフロー,不一致推定といった画素単位の予測タスクの分析を特に扱うアプローチはごくわずかである。注目すべき例外は、最近提案されたSegPGD攻撃であり、セマンティックセグメンテーションを評価するためのピクセルワイズアタックの重要性を示す可能性がある。 SegPGDはピクセル単位の分類(セグメンテーション)に限られるが、本研究では、任意のピクセル単位の予測タスクに対する専用の攻撃を統一環境で最適化できる、新しいホワイトボックス対逆攻撃であるCosPGDを提案する。予測と基底真理の間のコサイン類似性を利用して、分類タスクから回帰設定へ直接拡張する。さらに, セマンティクスセグメンテーションにおけるcospgdの優れた性能と, 光学的流れと不一致推定について実証的に示す。 While neural networks allow highly accurate predictions in many tasks, their lack in robustness towards even slight input perturbations hampers their deployment in many real-world applications. Recent research towards evaluating the robustness of neural networks such as the seminal \emph{projected gradient descent} (PGD) attack and subsequent works and benchmarks have therefore drawn significant attention. Yet, such methods focus predominantly on classification tasks, while only a few approaches specifically address the analysis of pixel-wise prediction tasks such as semantic segmentation, optical flow, or disparity estimation. One notable exception is the recently proposed SegPGD attack, which could showcase the importance of pixel-wise attacks for evaluating semantic segmentation. While SegPGD is limited to pixel-wise classification (i.e. segmentation), in this work, we propose CosPGD, a novel white-box adversarial attack that allows to optimize dedicated attacks for any pixel-wise prediction task in a unified setting. It leverages the cosine similarity between the predictions and ground truth to extend directly from classification tasks to regression settings. Further, we empirically show the superior performance of CosPGD for semantic segmentation as well as for optical flow and disparity estimation.	翻訳日:2023-02-07 19:28:23 公開日:2023-02-04
# 環境不均質性を考慮した線形関数近似による連立時間差分学習 Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity ( http://arxiv.org/abs/2302.02212v1 ) ライセンス: Link先を確認	Han Wang, Aritra Mitra, Hamed Hassani, George J. Pappas, James Anderson	(参考訳) 政策評価問題を考慮して,環境不均質性下での連帯強化学習の研究を開始する。我々のセットアップは、同じ状態とアクション空間を共有するが、報酬関数と状態遷移カーネルが異なる環境と相互作用する$N$エージェントを含んでいる。エージェントが中央サーバーを介して通信できると仮定すると、情報交換は共通のポリシーを評価するプロセスを早めるだろうか? そこで我々は,マルコフ的サンプリング,エージェントの環境の不均一性,通信の節約のための複数の局所的更新を考慮しつつ,線形関数近似を用いたフェデレーション時間差学習アルゴリズム(TD)の総合的有限時間解析を行った。私たちの分析はいくつかの新しい材料に依存しています i) エージェントの基本マルコフ決定過程(MDPs)における不均一性の関数としてのTD固定点上の摂動境界の導出 (II)フェデレートされたTDアルゴリズムの力学を密に近似する仮想MDPを導入し、 (iii) 仮想MDPを用いて、フェデレーション最適化に明示的な接続を行う。これらの部品を組み立てることで、低均一性状態において、モデル推定の交換がエージェント数の線形収束速度向上につながることを厳密に証明する。 We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space but differ in their reward functions and state transition kernels. Assuming agents can communicate via a central server, we ask: Does exchanging information expedite the process of evaluating a common policy? To answer this question, we provide the first comprehensive finite-time analysis of a federated temporal difference (TD) learning algorithm with linear function approximation, while accounting for Markovian sampling, heterogeneity in the agents' environments, and multiple local updates to save communication. Our analysis crucially relies on several novel ingredients: (i) deriving perturbation bounds on TD fixed points as a function of the heterogeneity in the agents' underlying Markov decision processes (MDPs); (ii) introducing a virtual MDP to closely approximate the dynamics of the federated TD algorithm; and (iii) using the virtual MDP to make explicit connections to federated optimization. Putting these pieces together, we rigorously prove that in a low-heterogeneity regime, exchanging model estimates leads to linear convergence speedups in the number of agents.	翻訳日:2023-02-07 19:28:00 公開日:2023-02-04
# NeuRI: 帰納的ルール推論によるDNN生成の多様化 NeuRI: Diversifying DNN Generation via Inductive Rule Inference ( http://arxiv.org/abs/2302.02261v1 ) ライセンス: Link先を確認	Jiawei Liu, Jinjun Peng, Yuyao Wang, Lingming Zhang	(参考訳) ディープラーニング(DL)は、意思決定を改善し、プロセスを自動化するために様々な業界で広く使われています。 DLシステムの正確性は、DLアプリケーションの信頼性に不可欠である。このように、最近の研究の波は、ファジィDLシステムのためのテストケース(DNNモデルとその入力)の自動合成の研究である。しかし、既存のモデルジェネレータは演算子制約を広くモデル化する能力がないため、限られた数の演算子のみを仮定する。この課題に対処するために,数百種類の演算子からなる有効かつ多様なDLモデルを生成するための,完全に自動化されたアプローチであるNeuRIを提案する。 NeuRIは3段階のプロセスを採用しています。 i) 各種情報源から有効かつ無効なAPIトレースを収集すること。 (ii)有効なモデルを構築するための制約を推測するために、トレースに帰納的プログラム合成を適用すること。 (iii)シンボリック演算子とコンクリート演算子を共用してハイブリッドモデルを生成すること。我々の評価によると、NeuRIはTensorFlowとPyTorchのブランチカバレッジを最先端よりも51%、15%改善している。 4ヶ月以内に、NeuRIはPyTorchとTensorFlowの87の新しいバグを発見し、64がすでに修正または確認されており、PyTorchがラベル付けした8つの優先度の高いバグが、この期間のすべての優先度の高いバグの10%を構成している。さらに、オープンソース開発者は、当社が報告したエラー誘発モデルを“高品質”と“実践上の一般的な”とみなしています。 Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the ever-evolving DL libraries and compilers. The correctness of DL systems is crucial for trust in DL applications. As such, the recent wave of research has been studying the automated synthesis of test-cases (i.e., DNN models and their inputs) for fuzzing DL systems. However, existing model generators only subsume a limited number of operators, for lacking the ability to pervasively model operator constraints. To address this challenge, we propose NeuRI, a fully automated approach for generating valid and diverse DL models composed of hundreds of types of operators. NeuRI adopts a three-step process: (i) collecting valid and invalid API traces from various sources; (ii) applying inductive program synthesis over the traces to infer the constraints for constructing valid models; and (iii) performing hybrid model generation by incorporating both symbolic and concrete operators concolically. Our evaluation shows that NeuRI improves branch coverage of TensorFlow and PyTorch by 51% and 15% over the state-of-the-art. Within four months, NeuRI finds 87 new bugs for PyTorch and TensorFlow, with 64 already fixed or confirmed, and 8 high-priority bugs labeled by PyTorch, constituting 10% of all high-priority bugs of the period. Additionally, open-source developers regard error-inducing models reported by us as "high-quality" and "common in practice".	翻訳日:2023-02-07 19:21:17 公開日:2023-02-04
# CLiNet:2次元および3次元における道路ネットワーク中心線の共同検出 CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D ( http://arxiv.org/abs/2302.02259v1 ) ライセンス: Link先を確認	David Paz, Srinidhi Kalgundi Srinivas, Yunchao Yao, and Henrik I. Christensen	(参考訳) 本研究は,2次元と3次元で共同で特徴をローカライズすることで,画像データに基づく中心線の共同検出のための新しいアプローチを提案する。視覚手がかりの検出に焦点を当てた既存の研究とは対照的に,都市運転タスクに直結する特徴抽出手法について検討する。 AV Breadcrumbsと呼ばれる大規模都市走行データセットをベクトル地図表現と射影幾何学を利用して自動的にラベル付けし,900,000以上の画像に注釈を付ける。本研究は,様々な都市走行シナリオにおける動的シーンモデリングの可能性を示す。本モデルではF1スコアが0.684、平均正規化深度誤差が2.083である。コードとデータアノテーションが公開されている。 This work introduces a new approach for joint detection of centerlines based on image data by localizing the features jointly in 2D and 3D. In contrast to existing work that focuses on detection of visual cues, we explore feature extraction methods that are directly amenable to the urban driving task. To develop and evaluate our approach, a large urban driving dataset dubbed AV Breadcrumbs is automatically labeled by leveraging vector map representations and projective geometry to annotate over 900,000 images. Our results demonstrate potential for dynamic scene modeling across various urban driving scenarios. Our model achieves an F1 score of 0.684 and an average normalized depth error of 2.083. The code and data annotations are publicly available.	翻訳日:2023-02-07 19:20:53 公開日:2023-02-04
# 同時音楽生成と分離のためのマルチソース拡散モデル Multi-Source Diffusion Models for Simultaneous Music Generation and Separation ( http://arxiv.org/abs/2302.02257v1 ) ライセンス: Link先を確認	Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodol\`a	(参考訳) 本研究では、文脈を共有するソースの結合確率密度のスコアを学習することにより、音楽合成と音源分離の両方が可能な拡散ベース生成モデルを定義する。古典的総推論タスク(例えば、混合を生成し、ソースを分離する)と並行して、ソースインプテーションの部分的推論タスクを紹介し、実験を行い、他のソースのサブセットを生成する(例えば、ドラムとうまく連携するピアノトラックを弾く)。さらに,分離タスクに対する新たな推論手法を提案する。我々は、音源分離のための標準データセットであるslakh2100でモデルをトレーニングし、生成環境における質的結果を提供し、分離設定における競争的定量的結果を示す。本手法は,生成と分離の両方を処理可能な単一モデルの最初の例である。 In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e. generating a mixture, separating the sources), we also introduce and experiment on the partial inference task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.	翻訳日:2023-02-07 19:20:43 公開日:2023-02-04
# 学習型レンズレスイメージングによる非知覚的識別 Human-Imperceptible Identification with Learnable Lensless Imaging ( http://arxiv.org/abs/2302.02255v1 ) ライセンス: Link先を確認	Thuong Nguyen Canh, Trung Thanh Ngo, Hajime Nagahara	(参考訳) レンズレスイメージングは、人間が被写体を認識できないが、機械が情報を推測するのに十分な情報を含む、大きくぼやけた画像を撮影することで、視覚プライバシを保護する。残念ながら、視覚的プライバシー保護は、認識精度の低下と、その逆が伴う。認識精度を維持しつつ、視覚プライバシを保護する学習可能なレンズレスイメージングフレームワークを提案する。得られた画像が人間に知覚できないようにするために,全変動,可逆性,および制限等尺性に基づくいくつかの損失関数を設計した。主観的評価に基づいて,プライバシー保護と曖昧さが個人識別に及ぼす影響を定量的に検討した。さらに,光リソグラフィー方式のマスクを用いたレンズレス画像のハードウェア実現によるシミュレーションの検証を行った。 Lensless imaging protects visual privacy by capturing heavily blurred images that are imperceptible for humans to recognize the subject but contain enough information for machines to infer information. Unfortunately, protecting visual privacy comes with a reduction in recognition accuracy and vice versa. We propose a learnable lensless imaging framework that protects visual privacy while maintaining recognition accuracy. To make captured images imperceptible to humans, we designed several loss functions based on total variation, invertibility, and the restricted isometry property. We studied the effect of privacy protection with blurriness on the identification of personal identity via a quantitative method based on a subjective evaluation. Moreover, we validate our simulation by implementing a hardware realization of lensless imaging with photo-lithographically printed masks.	翻訳日:2023-02-07 19:20:29 公開日:2023-02-04
# 密度特徴を持つ低域MDPにおける強化学習 Reinforcement Learning in Low-Rank MDPs with Density Features ( http://arxiv.org/abs/2302.02252v1 ) ライセンス: Link先を確認	Audrey Huang, Jinglin Chen, Nan Jiang	(参考訳) 低ランクな遷移を持つMDP -- すなわち、遷移行列は、左右の2つの行列の積に分解できる -- は、抽出可能な学習を可能にする非常に代表的な構造である。左行列は、値に基づく学習のための表現関数近似を可能にし、広く研究されている。そこで本研究では,密度特性を用いたサンプル効率学習,すなわち,状態占有分布の強力なモデルを生成する正しい行列について検討する。この設定は、教師なし学習をRLで活用するだけでなく、凸RLのプラグインソリューションを可能にする。オフライン環境では,非探索的なデータを処理可能な占有者のオフポリシー推定アルゴリズムを提案する。これをサブルーチンとして、探索的データ分布をレベルバイレベルに構築するオンラインアルゴリズムをさらに考案する。中心的な技術的課題として、占有率推定の付加誤差は、データカバレッジの乗法的定義とは相容れない。到達性のような強い仮定がなければ、この非互換性は、新しい技術ツールによって克服された指数的エラーの爆発を引き起こす。また, 密度特徴が不明であり, 指数関数的に大きな候補集合から学習する必要がある場合, 表現学習環境にも容易に拡張できる。 MDPs with low-rank transitions -- that is, the transition matrix can be factored into the product of two matrices, left and right -- is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blow-up, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.	翻訳日:2023-02-07 19:20:16 公開日:2023-02-04
# ジャマー耐性周波数とパワーアロケーションのための深層強化学習の一般化 Generalization of Deep Reinforcement Learning for Jammer-Resilient Frequency and Power Allocation ( http://arxiv.org/abs/2302.02250v1 ) ライセンス: Link先を確認	Swatantra Kafle, Jithin Jagannath, Zackary Kane, Noor Biswas, Prem Sagar Vasanth Kumar, Anu Jagannath	(参考訳) 我々は,深層強化学習モデルの一般化能力を強調しつつ,結合周波数と電力配分の問題に取り組む。既存の手法の多くは、事前決定された無線ネットワークシナリオの強化学習ベースのワイヤレス問題を解決する。訓練されたエージェントのパフォーマンスはネットワークに非常に特有であり、異なるネットワーク運用シナリオ(例えば、サイズ、周辺、移動性など)で使用されると劣化する傾向がある。本稿では,分散マルチエージェント環境におけるデプロイモデルの推論において,より高度な一般化機能を実現するためのトレーニング強化手法を提案する。これらの結果から,従来は見つからなかった異なるサイズとアーキテクチャの無線ネットワーク上で,提案手法のトレーニングと推論性能が向上したことを示す。さらに重要なことは、実用的な影響を証明するために、組込みソフトウェア定義無線にエンドツーエンドのソリューションを実装し、オーバー・ザ・エア評価を用いて検証したことである。 We tackle the problem of joint frequency and power allocation while emphasizing the generalization capability of a deep reinforcement learning model. Most of the existing methods solve reinforcement learning-based wireless problems for a specific pre-determined wireless network scenario. The performance of a trained agent tends to be very specific to the network and deteriorates when used in a different network operating scenario (e.g., different in size, neighborhood, and mobility, among others). We demonstrate our approach to enhance training to enable a higher generalization capability during inference of the deployed model in a distributed multi-agent setting in a hostile jamming environment. With all these, we show the improved training and inference performance of the proposed methods when tested on previously unseen simulated wireless networks of different sizes and architectures. More importantly, to prove practical impact, the end-to-end solution was implemented on the embedded software-defined radio and validated using over-the-air evaluation.	翻訳日:2023-02-07 19:19:57 公開日:2023-02-04
# 視覚コレクション拡張のための自己教師付きマルチビューディスタングル Self-supervised Multi-view Disentanglement for Expansion of Visual Collections ( http://arxiv.org/abs/2302.02249v1 ) ライセンス: Link先を確認	Nihal Jain, Praneetha Vaddamanu, Paridhi Maheshwari, Vishwa Vinay, Kuldeep Kulkarni	(参考訳) 画像検索エンジンは、クエリ画像に関連する画像の検索を可能にする。本研究では,類似画像に対するクエリが画像の集合から導出されるような設定について検討する。視覚的検索では、類似度の測定は複数の軸、あるいはスタイルや色などのビューに沿って行われる。我々は、特徴抽出器のセットへのアクセスを想定し、それぞれが特定のビューの表現を計算する。本研究の目的は,複数視点から計算した表現上の類似性を効果的に結合した検索アルゴリズムを設計することである。そこで本研究では,視間重なりを最小限に抑えるために,画像の絡み合った視点特異的表現を抽出する自己教師あり学習法を提案する。これによって、ビュー上の分散としてコレクションの意図を計算することができることを示す。クエリコレクションの意図にマッチする候補拡張画像を優先順位付けすることにより,効果的な検索を行う方法を示す。最後に,本稿で提示した手法を用いて,複数のコレクションを合成して検索することにより,画像検索のための新たな検索機構を提案する。 Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a specific view. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views. To this end, we propose a self-supervised learning method for extracting disentangled view-specific representations for images such that the inter-view overlap is minimized. We show how this allows us to compute the intent of a collection as a distribution over views. We show how effective retrieval can be performed by prioritizing candidate expansion images that match the intent of a query collection. Finally, we present a new querying mechanism for image search enabled by composing multiple collections and perform retrieval under this setting using the techniques presented in this paper.	翻訳日:2023-02-07 19:19:43 公開日:2023-02-04
# 二元分類におけるラベル保護のためのganベース連合学習 GAN-based federated learning for label protection in binary classification ( http://arxiv.org/abs/2302.02245v1 ) ライセンス: Link先を確認	Yujin Han, Leying Guan	(参考訳) 新たな技術として、垂直連合学習は異なるデータソースと連携して、データ交換なしで機械学習モデルを共同訓練する。しかし、フェデレーション学習は複雑な暗号アルゴリズムとセキュアな計算プロトコルによるモデリングにおいて計算コストが高く非効率である。分割学習はこれらの課題を回避する代替ソリューションを提供する。しかし、バニラ分割学習は依然としてプライバシー漏洩に悩まされている。本稿では,GAFM(Generative Adversarial Federated Model)を提案する。GAN(Generative Adversarial Network)とバニラ分割学習フレームワークを統合し,バイナリ分類タスクにおける勾配からのラベル漏洩を防止する。この提案をmarvel、max norm、splitnnといった既存モデルと比較し、gafmは分類精度とラベルのプライバシー保護のトレードオフに関して大きな改善を示している。また,GAFMがベースラインよりも改善できる理由をヒューリスティックに正当化し,SplitNNと比較して勾配摂動によるラベル保護が可能であることを示す。 As an emerging technique, vertical federated learning collaborates with different data sources to jointly train a machine learning model without data exchange. However, federated learning is computationally expensive and inefficient in modeling due to complex encryption algorithms and secure computation protocols. Split learning offers an alternative solution to circumvent these challenges. Despite this, vanilla split learning still suffers privacy leakage. Here, we propose the Generative Adversarial Federated Model (GAFM), which integrates the vanilla split learning framework with the Generative Adversarial Network (GAN) for protection against label leakage from gradients in binary classification tasks. We compare our proposal to existing models, including Marvell, Max Norm, and SplitNN, on three publicly available datasets, where GAFM shows significant improvement regarding the trade-off between classification accuracy and label privacy protection. We also provide heuristic justification for why GAFM can improve over baselines and demonstrate that GAFM offers label protection through gradient perturbation compared to SplitNN.	翻訳日:2023-02-07 19:19:30 公開日:2023-02-04
# 等角化半教師付きランダム森林の分類と異常検出 Conformalized semi-supervised random forest for classification and abnormality detection ( http://arxiv.org/abs/2302.02237v1 ) ライセンス: Link先を確認	Yujin Han, Mingwenchan Xu, Leying Guan	(参考訳) 従来の分類器は、トレーニングとテストサンプルが同じ分布から生成されるという前提の下でラベルを推論する。この仮定は、医療診断やネットワークアタック検出などの安全クリティカルな応用において問題となる可能性がある。本稿では,トレーニングデータとテストデータが異なる分布を持つ場合のマルチクラス分類問題について考察する。本研究では,整合型半教師付きランダムフォレスト(CSForest)を提案する。これは,設定値の予測を$C(x)$で構成し,正しいクラスラベルを所望の確率で含むとともに,効率よく外れ値を検出する。本提案手法は,提案手法の強みを示すために,合成例と実データアプリケーションの両方において,他の最先端手法と比較する。 Traditional classifiers infer labels under the premise that the training and test samples are generated from the same distribution. This assumption can be problematic for safety-critical applications such as medical diagnosis and network attack detection. In this paper, we consider the multi-class classification problem when the training data and the test data may have different distributions. We propose conformalized semi-supervised random forest (CSForest), which constructs set-valued predictions $C(x)$ to include the correct class label with desired probability while detecting outliers efficiently. We compare the proposed method to other state-of-art methods in both a synthetic example and a real data application to demonstrate the strength of our proposal.	翻訳日:2023-02-07 19:19:11 公開日:2023-02-04
# 二元系ボース-アインシュタイン凝縮体のポラロン Polarons in Binary Bose-Einstein Condensates ( http://arxiv.org/abs/2206.13738v3 ) ライセンス: Link先を確認	Ning Liu and Z. C. Tu	(参考訳) 不純物とボース・アインシュタイン凝縮の集団励起はボース・ポーラロンの出現に繋がる。本稿では,le-low-pines変分アプローチの枠組みにおいて,二成分ボース・アインシュタイン凝縮物に浸漬した単一不純物の性質について検討する。 2種類の効果的な不純物-フォノン相互作用を持つ有効Fr\"{o}hlich Hamiltonianを導出する。低フォノンモードと結合した不純物の挙動は相分離に対する安定性条件によって制約される。比重および不等質量の2成分浴におけるボースポーラロンのエネルギー,有効質量,およびフォノン数を明示的に解析した。例えば、ポラロンエネルギーの1つの分枝は、下分枝が減少している間に、種間散乱長の単調に増加する関数である。特に, 低いフォノンモードと結合した不純物の分岐は, 相分離近傍で劇的な変化を示す。不等質量ボソンの場合、2つの枝が種間質量の置換によってつながっていることが分かる。以上の結果は,多成分のボース浴におけるポーラロンの挙動を基礎的に理解する。 Impurities coupled with collective excitations of Bose-Einstein condensates lead to the emergence of Bose polarons. In this paper, we investigate the properties of a single impurity immersed in binary Bose-Einstein condensates in the framework of the Lee-Low-Pines variational approach. We derive an effective Fr\"{o}hlich Hamiltonian with two kinds of effective impurity-phonon interactions. The behavior of impurity coupled with the lower phonon mode is constrained by the stability condition against phase separation. We show explicit analytical results of the energy, effective mass, and phonon number for Bose polaron in interacting binary baths with equal mass and unequal mass of species. For the equal-mass boson bath, we find the opposite behaviors of two branches in terms of the scattering length between two species, e.g., one branch of polaron energy is a monotonically increasing function of the interspecific scattering length while the lower branch is decreasing. Especially, the branch of impurities coupled with the lower phonon modes exhibits a dramatic change in the vicinity of phase separation. In the case of unequal-mass bosons, we find two branches are connected by the permutation of interspecific mass. The above results provide a fundamental understanding of the behaviors of polarons in Bose baths with multiple components.	翻訳日:2023-02-07 12:55:03 公開日:2023-02-04
# 画像検索における不確実性定量化のためのベイズ計量学習 Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval ( http://arxiv.org/abs/2302.01332v2 ) ライセンス: Link先を確認	Frederik Warburg, Marco Miani, Silas Brack, Soren Hauberg	(参考訳) 計量学習のための最初のベイズエンコーダを提案する。従来の研究では、ニューラル・アモーティゼーションに頼るのではなく、Laplace Approximationでネットワーク重みの分布を学習する。まず、対照的な損失が有効なログポストであることを示す。次に、正の確定ヘッシアンを保証する3つの方法を提案する。最後に,一般化ガウスニュートン近似の新たな分解法を提案する。実験の結果,laplacian metric learner (lam) は不確かさを推定し,分散のサンプルを確実に検出し,最先端の予測性能が得られることがわかった。 We propose the first Bayesian encoder for metric learning. Rather than relying on neural amortization as done in prior works, we learn a distribution over the network weights with the Laplace Approximation. We actualize this by first proving that the contrastive loss is a valid log-posterior. We then propose three methods that ensure a positive definite Hessian. Lastly, we present a novel decomposition of the Generalized Gauss-Newton approximation. Empirically, we show that our Laplacian Metric Learner (LAM) estimates well-calibrated uncertainties, reliably detects out-of-distribution examples, and yields state-of-the-art predictive performance.	翻訳日:2023-02-07 12:48:08 公開日:2023-02-04

Title

Authors

Abstract

論文公表日・翻訳日

# MIMOネットワークにおけるクロス層フェデレーション学習最適化

Cross-Layer Federated Learning Optimization in MIMO Networks ( http://arxiv.org/abs/2302.14648v1 )

ライセンス: Link先を確認

Sihua Wang and Mingzhe Chen and Cong Shen and Changchuan Yin and Christopher G. Brinton

(参考訳) 本稿では,ディジタル変調とaircomp(over-the-air computation)を用いた現実的無線多入力多重出力(mimo)通信システム上でのフェデレーション学習(fl)の性能最適化について検討する。特に、エッジデバイスが(ローカル収集データを用いて訓練された)ローカルFLモデルをビームフォーミングを用いてパラメータサーバ(PS)に送信し、送信予定デバイスの数を最大化するMIMOシステムを考える。中央コントローラとして機能するPSは、受信したローカルFLモデルを使用してグローバルFLモデルを生成し、それを全デバイスにブロードキャストする。無線ネットワークの帯域幅が限られているため、効率的な無線データアグリゲーションを実現するためにAirCompが採用されている。しかし、無線チャネルのフェードはAirCompベースのFLスキームにおいて集約歪みを生じさせる。そこで本研究では,ディジタル変調とaircompを組み合わせたfederated averaging(fedavg)アルゴリズムを提案する。これは、現在のflモデルパラメータに基づいてビームフォーミング行列を動的に調整し、送信誤差を最小化し、fl性能を確保する最適化問題として定式化されたジョイント送信受信ビームフォーミング設計によって達成される。この目的を達成するために,まずビームフォーミング行列がfedavgの性能に与える影響を解析的に特徴付ける。この関係に基づいて、人工知能ニューラルネットワーク(ANN)を用いて、全デバイスの局所FLモデルを推定し、将来のモデル伝送のためにPSのビーム形成行列を調整する。提案手法のアルゴリズム的利点と性能改善は, 広範囲な数値実験により実証された。

In this paper, the performance optimization of federated learning (FL), when deployed over a realistic wireless multiple-input multiple-output (MIMO) communication system with digital modulation and over-the-air computation (AirComp) is studied. In particular, an MIMO system is considered in which edge devices transmit their local FL models (trained using their locally collected data) to a parameter server (PS) using beamforming to maximize the number of devices scheduled for transmission. The PS, acting as a central controller, generates a global FL model using the received local FL models and broadcasts it back to all devices. Due to the limited bandwidth in a wireless network, AirComp is adopted to enable efficient wireless data aggregation. However, fading of wireless channels can produce aggregate distortions in an AirComp-based FL scheme. To tackle this challenge, we propose a modified federated averaging (FedAvg) algorithm that combines digital modulation with AirComp to mitigate wireless fading while ensuring the communication efficiency. This is achieved by a joint transmit and receive beamforming design, which is formulated as a optimization problem to dynamically adjust the beamforming matrices based on current FL model parameters so as to minimize the transmitting error and ensure the FL performance. To achieve this goal, we first analytically characterize how the beamforming matrices affect the performance of the FedAvg in different iterations. Based on this relationship, an artificial neural network (ANN) is used to estimate the local FL models of all devices and adjust the beamforming matrices at the PS for future model transmission. The algorithmic advantages and improved performance of the proposed methodologies are demonstrated through extensive numerical experiments.

翻訳日:2023-03-05 05:44:47 公開日:2023-02-04

# テキスト処理サービスの信頼性の自動評価の進歩

Advances in Automatically Rating the Trustworthiness of Text Processing Services ( http://arxiv.org/abs/2302.09079v1 )

ライセンス: Link先を確認

Biplav Srivastava, Kausik Lakkaraju, Mariana Bernagozzi, Marco Valtorta

(参考訳) AIサービスは、データ、モデル、あるいはユーザの変化を受けると不安定な振る舞いを持つことが知られている。このような行動は、欠席や委任によって引き起こされたとしても、AIが人間と働くときの信頼の問題につながる。消費者がAIのソースコードやトレーニングデータにアクセスできないブラックボックス設定でAIサービスを評価する現在のアプローチは限られている。コンシューマはai開発者のドキュメンテーションに頼り、前述のようにシステムが構築されていることを信頼する必要がある。さらに、AIコンシューマがサービスを再利用して顧客に販売する他のサービスを構築する場合、コンシューマはサービスプロバイダ(データとモデルプロバイダの両方)のリスクにさらされます。この文脈での私たちのアプローチは、食品産業における健康促進のための栄養ラベル付けの成功にインスパイアされ、独立した利害関係者の視点からAIサービスの評価と評価を目指しています。評価はAIシステムの行動を伝える手段となり、消費者がリスクを知らせ、情報的な決定を下すことができる。本稿では,まず,ユーザ研究に期待できるテキストベースの機械翻訳aiサービスのための評価手法の開発動向について述べる。次に、原則化されたマルチモーダルな因果評価手法の課題とビジョンと、健康や食品レコメンデーションといった現実のシナリオにおける意思決定支援の意義について概説する。

AI services are known to have unstable behavior when subjected to changes in data, models or users. Such behaviors, whether triggered by omission or commission, lead to trust issues when AI works with humans. The current approach of assessing AI services in a black box setting, where the consumer does not have access to the AI's source code or training data, is limited. The consumer has to rely on the AI developer's documentation and trust that the system has been built as stated. Further, if the AI consumer reuses the service to build other services which they sell to their customers, the consumer is at the risk of the service providers (both data and model providers). Our approach, in this context, is inspired by the success of nutritional labeling in food industry to promote health and seeks to assess and rate AI services for trust from the perspective of an independent stakeholder. The ratings become a means to communicate the behavior of AI systems so that the consumer is informed about the risks and can make an informed decision. In this paper, we will first describe recent progress in developing rating methods for text-based machine translator AI services that have been found promising with user studies. Then, we will outline challenges and vision for a principled, multi-modal, causality-based rating methodologies and its implication for decision-support in real-world scenarios like health and food recommendation.

翻訳日:2023-02-26 14:55:17 公開日:2023-02-04

# 形態分化に基づく人工神経回路を用いた悪性脳腫瘍の温熱解析

Thermal Analysis of Malignant Brain Tumors by Employing a Morphological Differentiation-Based Method in Conjunction with Artificial Neural Network ( http://arxiv.org/abs/2302.10271v1 )

ライセンス: Link先を確認

Hamed Hani, Afsaneh Mojra

(参考訳) 本研究では,脳腫瘍の悪性度を検出するために,組織表面の温度分布を利用した形態分化に基づく方法を提案する。腫瘍CTでは悪性腫瘍の異常な形状を記述するために2つの異なるシナリオが実装されている。第1のシナリオでは腫瘍はポリゴンベースプリズムと見なされ、第2のシナリオでは星型ベースプリズムと見なされている。ポリゴンの側面や恒星の翼の数を増やすことで、悪性度が増大した。腫瘍に対して一定の熱発生が検討され,両腫瘍モデル上のPYTHONスクリプトとリンクしたBAQUSソフトウェアを用いて上組織表面の温度変化を研究する有限要素解析が行われた。この温度分布は10のパラメータによって特徴づけられる。各シナリオでは、これらのパラメータの98セットを放射基底関数ニューラルネットワーク(RBFNN)の入力として使用し、出力として側面や翼の数を選択している。 RBFNNはその形態に基づいて腫瘍の悪性度を特定するために訓練されている。 RBFNNの結果によると,本手法は良性腫瘍と悪性腫瘍の鑑別が可能であり,悪性度を高い精度で推定できる。

In this study, a morphological differentiation-based method has been introduced which employs temperature distribution on the tissue surface to detect brain tumor's malignancy. According to the common tumor CT scans, two different scenarios have been implemented to describe irregular shape of the malignant tumor. In the first scenario, tumor has been considered as a polygon base prism and in the second one, it has been considered as a star-shaped base prism. By increasing the number of sides of the polygon or wings of the star, degree of the malignancy has been increased. Constant heat generation has been considered for the tumor and finite element analysis has been conducted by the ABAQUS software linked with a PYTHON script on both tumor models to study temperature variations on the top tissue surface. This temperature distribution has been characterized by 10 parameters. In each scenario, 98 sets of these parameters has been used as inputs of a radial basis function neural network (RBFNN) and number of sides or wings has been selected to be the output. The RBFNN has been trained to identify malignancy of tumor based on its morphology. According to the RBFNN results, the proposed method has been capable of differentiating between benign and malignant tumors and estimating the degree of malignancy with high accuracy

翻訳日:2023-02-26 14:36:42 公開日:2023-02-04

# 量子相対論

Quantum Relativity ( http://arxiv.org/abs/2302.10216v1 )

ライセンス: Link先を確認

Michael Spanner

(参考訳) 量子力学におけるベルの不等式の意味を考慮し、古典的局所性と量子物理学への因果性を取り戻すために新しい量子補間が提案されている: 検出された量子事象間の相対座標のみが有効な可観測性である。この仮定は、量子力学が不完全であるというeprの見解を支持する一方で、ボーアの量子論とは相容れない。量子相対性理論のより一般的な原理は、量子事象の実験的検出の間の相関のみが真の古典的存在を持つというものである。量子相対性理論は、量子世界と古典世界を区別する枠組みを提供する。

Starting with a consideration of the implication of Bell inequalities in quantum mechanics, a new quantum postulate is suggested in order to restore classical locality and causality to quantum physics: only the relative coordinates between detected quantum events are valid observables. This postulate supports the EPR view that quantum mechanics is incomplete, while also staying compatible to the Bohr view that nothing exists beyond the quantum. The new postulate follows from a more general principle of quantum relativity, which states that only correlations between experimental detections of quantum events have a real classical existence. Quantum relativity provides a framework to differentiate the quantum and classical world.

翻訳日:2023-02-26 14:35:56 公開日:2023-02-04

# 低次元カオスによるスパースシステム同定のベンチマーク

Benchmarking sparse system identification with low-dimensional chaos ( http://arxiv.org/abs/2302.10787v1 )

ライセンス: Link先を確認

Alan A. Kaptanoglu and Lanyue Zhang and Zachary G. Nicolaou and Urban Fasel and Steven L. Brunton

(参考訳) スパース・システム同定(英: Sparse System Identification)は、力学系の進化を記述し、モデルの複雑さと精度のバランスをとる擬似微分方程式を得るデータ駆動プロセスである。科学的領域間でのシステム識別には急速な革新があったが、様々な力学系で評価される大規模方法論比較の文献にはまだ差がある。本研究では,カオスシステムのdysts標準データベースを用いて,分散回帰型を体系的にベンチマークする。特に,このオープンソースツールを用いて,異なるシステム識別手法を定量的に比較する方法を実証する。このベンチマークをどのように利用できるかを説明するために、非線形力学最適化問題(SINDy)のスパース同定を解くための4つのアルゴリズムを比較し、元のアルゴリズムと最近の混合整数離散アルゴリズムの強い性能を求める。いずれの場合も,SINDyの雑音頑健性を改善し,統計的比較を行うためにアンサンブルを用いた。さらに,SINDyの弱い定式化が,クリーンデータにおいても従来の手法よりも大幅に改善されていることを示す。最後に,シンディアルゴリズムから生成するパレート・オプティカルモデルが方程式の性質にどのように依存しているかを考察し,カオス量,スケール分離量,非線形度,構文複雑性を定量化する力学特性の組に対して,その性能が有意な依存性を示さないことを見出した。

Sparse system identification is the data-driven process of obtaining parsimonious differential equations that describe the evolution of a dynamical system, balancing model complexity and accuracy. There has been rapid innovation in system identification across scientific domains, but there remains a gap in the literature for large-scale methodological comparisons that are evaluated on a variety of dynamical systems. In this work, we systematically benchmark sparse regression variants by utilizing the dysts standardized database of chaotic systems. In particular, we demonstrate how this open-source tool can be used to quantitatively compare different methods of system identification. To illustrate how this benchmark can be utilized, we perform a large comparison of four algorithms for solving the sparse identification of nonlinear dynamics (SINDy) optimization problem, finding strong performance of the original algorithm and a recent mixed-integer discrete algorithm. In all cases, we used ensembling to improve the noise robustness of SINDy and provide statistical comparisons. In addition, we show very compelling evidence that the weak SINDy formulation provides significant improvements over the traditional method, even on clean data. Lastly, we investigate how Pareto-optimal models generated from SINDy algorithms depend on the properties of the equations, finding that the performance shows no significant dependence on a set of dynamical properties that quantify the amount of chaos, scale separation, degree of nonlinearity, and the syntactic complexity.

翻訳日:2023-02-26 13:58:33 公開日:2023-02-04

# モバイルアプリデータを用いた社会経済的幸福の予測 : フランスを事例として

Predicting Socio-Economic Well-being Using Mobile Apps Data: A Case Study of France ( http://arxiv.org/abs/2301.09986v2 )

ライセンス: Link先を確認

Rahul Goel, Angelo Furno, Rajesh Sharma

(参考訳) 社会経済指標は、国の全体状態を評価する文脈を提供する。これらの指標には教育、性別、貧困、雇用、その他の要因に関する情報が含まれる。そのため、社会調査や政府の監視には信頼性と正確性が不可欠である。国勢調査など現在のデータソースの多くは、人口が少ないか、頻繁に更新されている。それでも、コールデータレコード(CDR)やモバイルアプリの利用といった代替データソースは、社会経済的指標を特定するための費用対効果と最新の情報源として機能する。本研究では,モバイルアプリデータを用いて社会経済的特徴を予測する。約3000万のユーザが550,000平方km以上を分散し,25,000以上の基地局を運用する,数千のモバイルアプリケーションのトラフィックをキャプチャするデータを用いた大規模調査を行った。データセットはフランス全土をカバーし、2019年3月16日から6月6日までの2.5ヶ月以上に及ぶ。アプリの利用パターンを使うことで、最良のモデルは社会経済指標を見積もることができる(r-二乗スコアは0.16)。さらに,モデルの説明可能性を用いて,モバイルアプリの利用パターンがirisの社会経済格差を明らかにする可能性を見出した。本研究は,進化するネットワークパターンを理解するためのユーザ時間的ネットワーク分析や,代替データソースの探索など,今後の介入に対するいくつかの方法を提供する。

Socio-economic indicators provide context for assessing a country's overall condition. These indicators contain information about education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Most data sources available today, such as censuses, have sparse population coverage or are updated infrequently. Nonetheless, alternative data sources, such as call data records (CDR) and mobile app usage, can serve as cost-effective and up-to-date sources for identifying socio-economic indicators. This work investigates mobile app data to predict socio-economic features. We present a large-scale study using data that captures the traffic of thousands of mobile applications by approximately 30 million users distributed over 550,000 km square and served by over 25,000 base stations. The dataset covers the whole France territory and spans more than 2.5 months, starting from 16th March 2019 to 6th June 2019. Using the app usage patterns, our best model can estimate socio-economic indicators (attaining an R-squared score upto 0.66). Furthermore, using models' explainability, we discover that mobile app usage patterns have the potential to reveal socio-economic disparities in IRIS. Insights of this study provide several avenues for future interventions, including user temporal network analysis to understand evolving network patterns and exploration of alternative data sources.

翻訳日:2023-02-19 13:45:41 公開日:2023-02-04

# 米国における生活支援: オープンデータセット

Assisted Living in the United States: an Open Dataset ( http://arxiv.org/abs/2212.14092v2 )

ライセンス: Link先を確認

Anton Stengel, Jaan Altosaar, Rebecca Dittrich, Noemie Elhadad

(参考訳) 補助生活施設(英: assisted living facility、alf)は、誰かが生活し、交通などの社会的支援を受け、トイレやドレッシングといった日常生活の活動を補助する場所である。 alfsが重要な役割を担っているにもかかわらず、メディケアの認定を受ける必要はなく、これらの施設の公共の国立データベースも存在しない。アメリカ合衆国で最初のALFの公開データセットを公開し、50の州とDC全てを44,638の施設と120万のベッドでカバーした。このデータセットは、既存の公衆衛生問題に対する答えを提供するだけでなく、必要な施設を見つけるのに役立つ。このデータセットは, 人種, 障害, 所得などの健康格差に関連する郡レベルの社会経済変数について, 閉データを用いたALFの全国調査[4]の結果を再現して検証した。このデータセットの価値を示すために、コミュニティベースのケアへのアクセスを評価するための新しいメトリクスも提案する。必要な個人がalfに到達するために移動しなければならない平均距離を計算する。データセットと関連するコードはgithub.com/antonstengel/assisted-living-dataで入手できる。

An assisted living facility (ALF) is a place where someone can live, have access to social supports such as transportation, and receive assistance with the activities of daily living such as toileting and dressing. Despite the important role of ALFs, they are not required to be certified with Medicare and there is no public national database of these facilities. We present the first public dataset of ALFs in the United States, covering all 50 states and DC with 44,638 facilities and over 1.2 million beds. This dataset can help provide answers to existing public health questions as well as help those in need find a facility. The dataset was validated by replicating the results of a nationwide study of ALFs that uses closed data [4], where the prevalence of ALFs is assessed with respect to county-level socioeconomic variables related to health disparity such as race, disability, and income. To showcase the value of this dataset, we also propose a novel metric to assess access to community-based care. We calculate the average distance an individual in need must travel in order to reach an ALF. The dataset and all relevant code are available at github.com/antonstengel/assisted-living-data.

翻訳日:2023-02-19 13:22:21 公開日:2023-02-04

# フェイクニュースにおけるジェンダーバイアス:分析

Gender Bias in Fake News: An Analysis ( http://arxiv.org/abs/2209.11984v3 )

ライセンス: Link先を確認

Navya Sahadevan, Deepak P

(参考訳) 偽ニュースに関するデータサイエンスの研究は近年、大きな公開ベンチマークデータセットの出現によって、非常に勢いを増している。ジェンダーバイアスはニュースメディアを広める問題であるとするメディア研究の中で、確立されているが、ジェンダーバイアスとフェイクニュースの関係についてはほとんど調査されていない。本研究では,公開ベンチマークデータセットよりも単純で透明なレキシコンベースの手法を活用し,性バイアスvis-a-vis偽ニュースを初めて実証的に分析する。本分析により, 偽ニュースにおける性バイアスの頻度は, 3つの顔, 豊富, 感情, 近位語にまたがる。この分析から得られた知見は、フェイクニュースの研究においてジェンダーバイアスが重要な考慮事項である必要があるという強い議論をもたらす。

Data science research into fake news has gathered much momentum in recent years, arguably facilitated by the emergence of large public benchmark datasets. While it has been well-established within media studies that gender bias is an issue that pervades news media, there has been very little exploration into the relationship between gender bias and fake news. In this work, we provide the first empirical analysis of gender bias vis-a-vis fake news, leveraging simple and transparent lexicon-based methods over public benchmark datasets. Our analysis establishes the increased prevalance of gender bias in fake news across three facets viz., abundance, affect and proximal words. The insights from our analysis provide a strong argument that gender bias needs to be an important consideration in research into fake news.

翻訳日:2023-02-19 11:21:47 公開日:2023-02-04

# スリランカにおけるビデオレビューをアンボックスするyoutubeスマートフォンの感情分析

Sentiment Analysis on YouTube Smart Phone Unboxing Video Reviews in Sri Lanka ( http://arxiv.org/abs/2302.03496v1 )

ライセンス: Link先を確認

Sherina Sally

(参考訳) 製品関連レビューは、主にYouTubeのビデオで共有されるユーザー体験に基づいている。 2021年に世界で2番目に人気のあるウェブサイトである。人々は購入する前に、全体的なフィードバックを集め、価値のある決定を下すために、最近リリースされた製品でビデオを見ることを好む。これらのビデオは、技術材料に熱心であるvloggerたちによって作成され、フィードバックは通常、製品やブランドの経験豊富なユーザーによって置かれる。ユーザレビューの感情を分析することは、製品全般に対する有用な洞察を与えます。この調査は、2021年にリリースされたiPhone 13、Google Pixel 6、Samsung Galaxy S21の3つのスマートフォンレビューに焦点を当てている。語彙と規則に基づく感情分析ツールであるVADERは、それぞれのコメントを適切な正または負の向きに分類するために使用された。 3つのスマートフォンはいずれもユーザーの視点から肯定的な評価を示し、iphone 13は肯定的なレビュー数が最も多い。得られたモデルはN\"aive Bayes, Decision Tree, Support Vector Machineを使ってテストされている。これら3つの分類器のうち、Support Vector Machineはより高い精度とF1スコアを示している。

Product-related reviews are based on users' experiences that are mostly shared on videos in YouTube. It is the second most popular website globally in 2021. People prefer to watch videos on recently released products prior to purchasing, in order to gather overall feedback and make worthy decisions. These videos are created by vloggers who are enthusiastic about technical materials and feedback is usually placed by experienced users of the product or its brand. Analyzing the sentiment of the user reviews gives useful insights into the product in general. This study is focused on three smartphone reviews, namely, Apple iPhone 13, Google Pixel 6, and Samsung Galaxy S21 which were released in 2021. VADER, which is a lexicon and rule-based sentiment analysis tool was used to classify each comment to its appropriate positive or negative orientation. All three smartphones show a positive sentiment from the users' perspective and iPhone 13 has the highest number of positive reviews. The resulting models have been tested using N\"aive Bayes, Decision Tree, and Support Vector Machine. Among these three classifiers, Support Vector Machine shows higher accuracies and F1-scores.

翻訳日:2023-02-08 16:16:27 公開日:2023-02-04

# 生体信号と浅部機械学習によるハンチントン病予後自動診断

Automated Huntington's Disease Prognosis via Biomedical Signals and Shallow Machine Learning ( http://arxiv.org/abs/2302.03605v1 )

ライセンス: Link先を確認

Sucheer Maddury

(参考訳) ハンティントン病(英: huntington's disease、hd)は、hdの早期予後は患者の生活の質を著しく改善するが、患者の寿命を制限する稀な遺伝的決定脳障害である。現在のHD予後法には、臨床および画像因子などの様々な複雑なバイオマーカーの使用が含まれるが、これらの手法には、そのリソース需要や、症状や非症状の患者を区別できないことなど、多くの欠点がある。定量的なバイオメディカルシグナルは統合失調症などの他の神経疾患の診断に使われており、hd患者の異常を暴露する可能性がある。本研究は, 心電図, 心電図, 機能的近赤外分光データを用いて, 27例のHD陽性患者, 36例, 6例の未知の患者を対象に, プレメイド, 認定データセットを用いた。最初にデータを前処理し、変換信号と生信号の両方から様々な特徴を抽出し、その後、多くの浅い機械学習技術を適用した。最大精度はスケールアウトしたExtremely Randomized Treesアルゴリズムにより達成され、受信者特性0.963の曲線下と91.353%の精度で達成された。その後の機能分析の結果、60.865%がp<0.05であり、生信号の特徴が最も重要であることがわかった。以上の結果から,hdの異常をマークする神経信号と心臓信号の有望性,および疾患の進行についての評価を行った。

Huntington's disease (HD) is a rare, genetically-determined brain disorder that limits the life of the patient, although early prognosis of HD can substantially improve the patient's quality of life. Current HD prognosis methods include using a variety of complex biomarkers such as clinical and imaging factors, however these methods have many shortfalls, such as their resource demand and failure to distinguish symptomatic and asymptomatic patients. Quantitative biomedical signaling has been used for diagnosis of other neurological disorders such as schizophrenia, and has potential for exposing abnormalities in HD patients. In this project, we used a premade, certified dataset collected at a clinic with 27 HD positive patients, 36 controls, and 6 unknowns with electroencephalography, electrocardiography, and functional near-infrared spectroscopy data. We first preprocessed the data and extracted a variety of features from both the transformed and raw signals, after which we applied a plethora of shallow machine learning techniques. We found the highest accuracy was achieved by a scaled-out Extremely Randomized Trees algorithm, with area under the curve of the receiver operator characteristic of 0.963 and accuracy of 91.353%. The subsequent feature analysis showed that 60.865% of the features had p<0.05, with the features from the raw signal being most significant. The results indicate the promise of neural and cardiac signals for marking abnormalities in HD, as well as evaluating the progression of the disease in

翻訳日:2023-02-08 15:40:10 公開日:2023-02-04

# PartitionVAE -- 人間の解釈可能なVAE

PartitionVAE -- a human-interpretable VAE ( http://arxiv.org/abs/2302.03689v1 )

ライセンス: Link先を確認

Fareed Sheriff, Sameer Pai

(参考訳) 可変オートエンコーダ(VAE)は、入力画像空間の分布を、その分布に関する事前情報を前提とせず明示的に学習するオートエンコーダである。これにより、潜在空間の分布において互いに近い類似のサンプルを分類することができる。 VAEは古典的には、遅延空間は通常の分布であると仮定するが、多くの分布先行は機能し、損失関数のK-L発散項を通じてこの仮定を符号化する。 While VAEs learn the distribution of the latent space and naturally make each dimension in the latent space as disjoint from the others as possible, they do not group together similar features -- the image space feature represented by one unit of the representation layer does not necessarily have high correlation with the feature represented by a neighboring unit of the representation layer. This makes it difficult to interpret VAEs since the representation layer is not structured in a way that is easy for humans to parse. We aim to make a more interpretable VAE by partitioning the representation layer into disjoint sets of units. Partitioning the representation layer into disjoint sets of interconnected units yields a prior that features of the input space to this new VAE, which we call a partition VAE or PVAE, are grouped together by correlation -- for example, if our image space were the space of all ping ping game images (a somewhat complex image space we use to test our architecture) then we would hope the partitions in the representation layer each learned some large feature of the image like the characteristics of the ping pong table or the characteristics and position of the players or the ball. また、PVAEにコスト削減策として、サブレゾリューションを追加します。長時間GPUトレーニング環境にアクセスできず、Google Colab Proは費用がかかるため、入力画像からスケールダウンした寸法の画像を一定要素で出力することにより、PVAEの複雑さを低減しようとするため、モデルのより小さなバージョンを出力せざるを得ない。次に、隣接する画素を補間することで、損失と訓練を計算する解像度を高める。 MNISTとSports10でPVAEをチューニングし、その有効性をテストする。

VAEs, or variational autoencoders, are autoencoders that explicitly learn the distribution of the input image space rather than assuming no prior information about the distribution. This allows it to classify similar samples close to each other in the latent space's distribution. VAEs classically assume the latent space is normally distributed, though many distribution priors work, and they encode this assumption through a K-L divergence term in the loss function. While VAEs learn the distribution of the latent space and naturally make each dimension in the latent space as disjoint from the others as possible, they do not group together similar features -- the image space feature represented by one unit of the representation layer does not necessarily have high correlation with the feature represented by a neighboring unit of the representation layer. This makes it difficult to interpret VAEs since the representation layer is not structured in a way that is easy for humans to parse. We aim to make a more interpretable VAE by partitioning the representation layer into disjoint sets of units. Partitioning the representation layer into disjoint sets of interconnected units yields a prior that features of the input space to this new VAE, which we call a partition VAE or PVAE, are grouped together by correlation -- for example, if our image space were the space of all ping ping game images (a somewhat complex image space we use to test our architecture) then we would hope the partitions in the representation layer each learned some large feature of the image like the characteristics of the ping pong table or the characteristics and position of the players or the ball. We also add to the PVAE a cost-saving measure: subresolution. Because we do not have access to GPU training environments for long periods of time and Google Colab Pro costs money, we attempt to decrease the complexity of the PVAE by outputting an image with dimensions scaled down from the input image by a constant factor, thus forcing the model to output a smaller version of the image. We then increase the resolution to calculate loss and train by interpolating through neighboring pixels. We train a tuned PVAE on MNIST and Sports10 to test its effectiveness.

翻訳日:2023-02-08 15:13:35 公開日:2023-02-04

# インテリジェント交通システムにおける交通光制御のための深層強化学習

Deep Reinforcement Learning for Traffic Light Control in Intelligent Transportation Systems ( http://arxiv.org/abs/2302.03669v1 )

ライセンス: Link先を確認

Xiao-Yang Liu, Ming Zhu, Sem Borst, and Anwar Walid

(参考訳) インテリジェントトランスポートシステム(ITS)におけるスマートトラヒックライトは、交通効率を大幅に向上させ、混雑を低減するために考えられている。道路網におけるリアルタイム交通状況に基づいて信号機を適応的に制御する手法として,深部強化学習(DRL)がある。しかし、従来の手法はスケーラビリティに乏しい。 In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. また、5 \times 10$グリッドロードネットワークで ``greenwave" パターンを検証する。第3に, 実験結果に示された「グリーンウェーブ」ポリシーは, 特定交通モデル(複数道路を横断する道路)において最適であることが証明されたため, DRLアルゴリズムが好ましい解を生成することを示す。単一の道路交差点とグリッド道路ネットワークの両方で配信されたポリシーは、DRLアルゴリズムのスケーラビリティを示している。

Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms.

翻訳日:2023-02-08 15:10:33 公開日:2023-02-04

# 量子回路の完全等式理論

A Complete Equational Theory for Quantum Circuits ( http://arxiv.org/abs/2206.10577v2 )

ライセンス: Link先を確認

Alexandre Cl\'ement, Nicolas Heurtel, Shane Mansfield, Simon Perdrix, Beno\^it Valiron

(参考訳) 量子回路に対する最初の完全方程式理論を導入する。より正確には、2つの回路が同じユニタリ写像を表現していることと、2つの回路が一方を他方に変換できるかどうかを方程式を用いて証明する一連の回路方程式を導入する。この証明は、基本ゲートを用いて定義されるマルチコントロールゲートの性質と、線形光回路への量子回路の符号化に基づくもので、完全な公理化であることが証明されている。

We introduce the first complete equational theory for quantum circuits. More precisely, we introduce a set of circuit equations that we prove to be sound and complete: two circuits represent the same unitary map if and only if they can be transformed one into the other using the equations. The proof is based on the properties of multi-controlled gates -- that are defined using elementary gates -- together with an encoding of quantum circuits into linear optical circuits, which have been proved to have a complete axiomatisation.

翻訳日:2023-02-08 12:44:23 公開日:2023-02-04

# 自由形電磁逆設計のためのニューラルネットワークに基づくサロゲート解法

A neural operator-based surrogate solver for free-form electromagnetic inverse design ( http://arxiv.org/abs/2302.01934v1 )

ライセンス: Link先を確認

Yannick Augenstein, Taavi Rep\"an, Carsten Rockstuhl

(参考訳) ニューラルネットワークは、科学機械学習の文脈で偏微分方程式を解く強力なツールとして登場した。本稿では,改良したフーリエニューラル演算子を電磁散乱問題のサロゲート解法として実装し,そのデータ効率を既存の手法と比較する。さらに,自由形,完全3次元電磁散乱器の勾配に基づくナノフォトニクス逆設計への応用を実証する。

Neural operators have emerged as a powerful tool for solving partial differential equations in the context of scientific machine learning. Here, we implement and train a modified Fourier neural operator as a surrogate solver for electromagnetic scattering problems and compare its data efficiency to existing methods. We further demonstrate its application to the gradient-based nanophotonic inverse design of free-form, fully three-dimensional electromagnetic scatterers, an area that has so far eluded the application of deep learning techniques.

翻訳日:2023-02-07 21:08:35 公開日:2023-02-04

# 因果レンズによるバイアスに対する感情分析システムの評価

Rating Sentiment Analysis Systems for Bias through a Causal Lens ( http://arxiv.org/abs/2302.02038v1 )

ライセンス: Link先を確認

Kausik Lakkaraju, Biplav Srivastava, Marco Valtorta

(参考訳) 感情分析システム(sass)はデータ駆動型人工知能(ai)システムであり、テキストの一部が与えられたとき、入力で表現される極性と感情の強さを伝える1つ以上の数字を割り当てる。他の自動機械学習システムと同様に、入力の(小さな)変化が出力の劇的な揺らぎを引き起こすようなモデルの不確実性を示すことも知られている。これは、入力が性別や人種のような保護された特徴と関連付けられている場合、特に問題となる。本稿では,テキスト入力の他の構成要素,例えば選択された感情語が固定された場合でも,出力感情が保護変数に敏感であるかどうかをテストするために,制御因果設定において入力が摂動しているsassを評価し評価する新しい手法を提案する。次に、結果を使用してラベル(レーティング)を細かなレベルと全体的なレベルに割り当て、入力変更に対するsasの堅牢さを伝達します。評価は、SASを比較し、行動に基づいてそれらの中から選択する原則として機能する。これは、すべてのユーザー、特に既存のsassを再利用してより大きなaiシステムを構築しているが、比較するためのコードやトレーニングデータにアクセスできない開発者にとって有益である。

Sentiment Analysis Systems (SASs) are data-driven Artificial Intelligence (AI) systems that, given a piece of text, assign one or more numbers conveying the polarity and emotional intensity expressed in the input. Like other automatic machine learning systems, they have also been known to exhibit model uncertainty where a (small) change in the input leads to drastic swings in the output. This can be especially problematic when inputs are related to protected features like gender or race since such behavior can be perceived as a lack of fairness, i.e., bias. We introduce a novel method to assess and rate SASs where inputs are perturbed in a controlled causal setting to test if the output sentiment is sensitive to protected variables even when other components of the textual input, e.g., chosen emotion words, are fixed. We then use the result to assign labels (ratings) at fine-grained and overall levels to convey the robustness of the SAS to input changes. The ratings serve as a principled basis to compare SASs and choose among them based on behavior. It benefits all users, especially developers who reuse off-the-shelf SASs to build larger AI systems but do not have access to their code or training data to compare.

翻訳日:2023-02-07 20:43:25 公開日:2023-02-04

# GDB: Gated Convolutionsベースのドキュメントバイナリ化

GDB: Gated convolutions-based Document Binarization ( http://arxiv.org/abs/2302.02073v1 )

ライセンス: Link先を確認

Zongyuan Yang, Yongping Xiong, Guibin Wu

(参考訳) ドキュメントビナライゼーションは多くの文書分析タスクにおいて重要な前処理ステップである。しかし,既存の方法では,バニラ畳み込みの公平な処理や境界情報による適切な監視を伴わないストロークエッジの抽出などにより,ストロークエッジを微細に抽出することはできない。本稿では、ゲーティング値の学習としてテキスト抽出を定式化し、不正確なストロークエッジ抽出の問題を解決するために、エンドツーエンドのゲート畳み込みネットワーク(GDB)を提案する。ゲート畳み込みを適用して、異なる注意でストロークの特徴を選択的に抽出する。提案する枠組みは2段階からなる。まず、余分なエッジブランチを持つ粗いサブネットワークをトレーニングし、プリオリマスクとエッジを入力してより正確な特徴マップを得る。次に、シャープエッジに基づくゲート畳み込みにより第1段の出力を洗練するために、改良サブネットワークをカスケードする。グローバル情報に関しては、GDBにはローカル機能とグローバル機能を組み合わせたマルチスケール操作も含まれている。 2009年から2019年にかけて,dibco(document image binarization contest)データセットの総合実験を行った。実験の結果,提案手法は平均値で最先端手法を上回り,6つのベンチマークデータセットで上位ランキングを得た。

Document binarization is a key pre-processing step for many document analysis tasks. However, existing methods can not extract stroke edges finely, mainly due to the fair-treatment nature of vanilla convolutions and the extraction of stroke edges without adequate supervision by boundary-related information. In this paper, we formulate text extraction as the learning of gating values and propose an end-to-end gated convolutions-based network (GDB) to solve the problem of imprecise stroke edge extraction. The gated convolutions are applied to selectively extract the features of strokes with different attention. Our proposed framework consists of two stages. Firstly, a coarse sub-network with an extra edge branch is trained to get more precise feature maps by feeding a priori mask and edge. Secondly, a refinement sub-network is cascaded to refine the output of the first stage by gated convolutions based on the sharp edge. For global information, GDB also contains a multi-scale operation to combine local and global features. We conduct comprehensive experiments on ten Document Image Binarization Contest (DIBCO) datasets from 2009 to 2019. Experimental results show that our proposed methods outperform the state-of-the-art methods in terms of all metrics on average and achieve top ranking on six benchmark datasets.

翻訳日:2023-02-07 20:35:18 公開日:2023-02-04

# 事前学習モデルによる意味誘導画像の拡張

Semantic-Guided Image Augmentation with Pre-trained Models ( http://arxiv.org/abs/2302.02070v1 )

ライセンス: Link先を確認

Bohan Li, Xinghao Wang, Xiao Xu, Yutai Hou, Yunlong Feng, Feng Wang, Wanxiang Che

(参考訳) 画像拡張は、コンピュータビジョンにおけるデータの不足を軽減する共通のメカニズムである。既存の画像増倍法は、しばしば元の画像の増倍に事前定義された変換や混合を適用するが、局所的にしか変化しない。これにより、意味情報の維持と画像の多様性の向上のバランスを見つけるのに苦労する。本稿では,事前学習モデル(SIP)を用いたセマンティック誘導画像拡張手法を提案する。具体的には、SIPは画像ラベルとキャプションでプロンプトを構築し、事前訓練された安定拡散モデルのイメージ・ツー・イメージ生成プロセスをより良くガイドする。元の画像に含まれる意味情報はよく保存でき、拡張された画像は依然として多様性を維持している。実験の結果、SIPは一般的に使用されている2つのバックボーン、すなわちResNet-50とViTを平均して7つのデータセットで12.60%、2.07%改善できることがわかった。さらに、SIPは最高の画像拡張ベースラインRandAugmentを2つのバックボーンで4.46%、1.23%上回るだけでなく、ベースラインと自然に統合することでパフォーマンスも向上する。拡張画像の多様性,テキストプロンプトのアブレーション研究,生成画像の事例研究など,sipの詳細な解析を行った。

Image augmentation is a common mechanism to alleviate data scarcity in computer vision. Existing image augmentation methods often apply pre-defined transformations or mixup to augment the original image, but only locally vary the image. This makes them struggle to find a balance between maintaining semantic information and improving the diversity of augmented images. In this paper, we propose a Semantic-guided Image augmentation method with Pre-trained models (SIP). Specifically, SIP constructs prompts with image labels and captions to better guide the image-to-image generation process of the pre-trained Stable Diffusion model. The semantic information contained in the original images can be well preserved, and the augmented images still maintain diversity. Experimental results show that SIP can improve two commonly used backbones, i.e., ResNet-50 and ViT, by 12.60% and 2.07% on average over seven datasets, respectively. Moreover, SIP not only outperforms the best image augmentation baseline RandAugment by 4.46% and 1.23% on two backbones, but also further improves the performance by integrating naturally with the baseline. A detailed analysis of SIP is presented, including the diversity of augmented images, an ablation study on textual prompts, and a case study on the generated images.

翻訳日:2023-02-07 20:34:53 公開日:2023-02-04

# 学習とアンラーニングを組み込んだヘテロジニアス連合知識グラフ

Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning ( http://arxiv.org/abs/2302.02069v1 )

ライセンス: Link先を確認

Xiangrong Zhu and Guangyao Li and Wei Hu

(参考訳) Federated Learning(FL)は最近、生データを共有せずに分散クライアント間でグローバル機械学習モデルをトレーニングするパラダイムとして登場した。知識グラフ(KG)埋め込みは、多くの知識駆動アプリケーションのバックボーンとして機能する連続ベクトル空間におけるKGを表す。有望な組み合わせとして、フェデレーションkg埋め込みは、ローカルデータのプライバシーを保ちながら、異なるクライアントから学んだ知識を十分に活用することができる。しかし、データの異質性や知識の忘れといった現実的な問題はいまだに残っている。本稿では,不均一なKG埋め込み学習とアンラーニングのための新しいFLフレームワークであるFedLUを提案する。データの不均一性による局所最適化とグローバル収束のドリフトに対処するため,局所的な知識をグローバルに伝達し,グローバルな知識を吸収する相互知識蒸留を提案する。さらに, 遡及的干渉と受動的減衰を組み合わせた認知神経科学に基づく未学習手法を提案し, 知識蒸留を再利用して, 地域顧客からの特定の知識を消去し, グローバルモデルに伝播させる手法を提案する。我々は最新技術の現実的な性能を評価するための新しいデータセットを構築する。大規模な実験により、FedLUはリンク予測と知識忘れの両方において優れた結果が得られることが示された。

Federated Learning (FL) recently emerges as a paradigm to train a global machine learning model across distributed clients without sharing raw data. Knowledge Graph (KG) embedding represents KGs in a continuous vector space, serving as the backbone of many knowledge-driven applications. As a promising combination, federated KG embedding can fully take advantage of knowledge learned from different clients while preserving the privacy of local data. However, realistic problems such as data heterogeneity and knowledge forgetting still remain to be concerned. In this paper, we propose FedLU, a novel FL framework for heterogeneous KG embedding learning and unlearning. To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back. Moreover, we present an unlearning method based on cognitive neuroscience, which combines retroactive interference and passive decay to erase specific knowledge from local clients and propagate to the global model by reusing knowledge distillation. We construct new datasets for assessing realistic performance of the state-of-the-arts. Extensive experiments show that FedLU achieves superior results in both link prediction and knowledge forgetting.

翻訳日:2023-02-07 20:34:28 公開日:2023-02-04

# live experience matters: ソーシャルメディア上で物質を使用する人に対するスティグマの自動検出

Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media ( http://arxiv.org/abs/2302.02064v1 )

ライセンス: Link先を確認

Salvatore Giorgi, Douglas Bellew, Daniel Roy Sadek Habib, Joao Sedoc, Chase Smitterberg, Amanda Devoto, McKenzie Himelein-Wachowiak, and Brenda Curtis

(参考訳) 物質(PWUS)を使用する人々に対するスティグマは、治療を求める主要な障壁である。さらに、治療中の患者は、より高いスティグマティゼーションを経験すれば脱落する傾向が強い。ヘイトスピーチと毒性の関連概念は、脆弱な人口を対象としたものを含むが、自動コンテンツモデレーション研究、スティグマ(stigma)、特に物質を使用する人はそうではない。本稿では、約5000の公開Reddit投稿のデータセットを用いて、PWUSに対するスティグマについて検討する。我々は,PWUSに対するスティグマの存在について,各投稿に注釈を付けるように依頼し,物質使用経験に関する一連の質問に回答するクラウドソースアノテーションタスクを実施した。結果、物質を使ったり、薬物使用障害の人を知っている労働者は、投稿を汚職として評価する傾向が強いことがわかった。これに基づいて、redditの投稿にスティグマタイジング(stigmatizing)とラベル付けする、生きた物質使用経験のある労働者を集中させる、教師付き機械学習フレームワークを使用します。コメントレベルの言語に加えて、個人レベルの人口層をモデル化すると、分類精度は0.69で、モデリング言語だけで17%向上している。最後に、pwusの物質と、他の言語(「人々」や「彼ら」)を取り巻く言語に同意しない人々、そして「アドディクト」のような用語がスティグマタイジングであるのに対し、pwusは特定の物質に関する議論をよりスティグマタイジングするのと対照的に)を区別する言語学者の手がかりを探究する。本研究は, 物質使用におけるスティグマの知覚特性について考察した。さらに、これらの結果は、これらの機械学習タスクの主観的な性質をさらに確立し、彼らの社会的コンテキストを理解する必要性を強調している。

Stigma toward people who use substances (PWUS) is a leading barrier to seeking treatment. Further, those in treatment are more likely to drop out if they experience higher levels of stigmatization. While related concepts of hate speech and toxicity, including those targeted toward vulnerable populations, have been the focus of automatic content moderation research, stigma and, in particular, people who use substances have not. This paper explores stigma toward PWUS using a data set of roughly 5,000 public Reddit posts. We performed a crowd-sourced annotation task where workers are asked to annotate each post for the presence of stigma toward PWUS and answer a series of questions related to their experiences with substance use. Results show that workers who use substances or know someone with a substance use disorder are more likely to rate a post as stigmatizing. Building on this, we use a supervised machine learning framework that centers workers with lived substance use experience to label each Reddit post as stigmatizing. Modeling person-level demographics in addition to comment-level language results in a classification accuracy (as measured by AUC) of 0.69 -- a 17% increase over modeling language alone. Finally, we explore the linguist cues which distinguish stigmatizing content: PWUS substances and those who don't agree that language around othering ("people", "they") and terms like "addict" are stigmatizing, while PWUS (as opposed to those who do not) find discussions around specific substances more stigmatizing. Our findings offer insights into the nature of perceived stigma in substance use. Additionally, these results further establish the subjective nature of such machine learning tasks, highlighting the need for understanding their social contexts.

翻訳日:2023-02-07 20:34:10 公開日:2023-02-04

# 履歴依存型動的文脈を用いた強化学習

Reinforcement Learning with History-Dependent Dynamic Contexts ( http://arxiv.org/abs/2302.02061v1 )

ライセンス: Link先を確認

Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

(参考訳) 動的文脈マルコフ決定プロセス(dcmdps)は、文脈が時間とともに変化する非マルコフ環境を扱うためにコンテキスト境界mdpフレームワークを一般化した、歴史依存環境のための新しい強化学習フレームワークである。本モデルでは,文脈遷移を決定するためにアグリゲーション関数を活用し,履歴長に対する指数関数依存を破るロジスティックdcmdpsに着目した特別ケースを検討する。この特別な構造により、後悔の限界を定めている上位信頼境界型アルゴリズムを導出することができる。この理論結果に動機づけられ,潜在空間に計画し,歴史依存的特徴よりも楽観的手法を用いたロジスティックdcmdpsのための実用的なモデルベースアルゴリズムを提案する。提案手法の有効性を,レコメンデーションに応じてユーザ動作のダイナミクスが進化するレコメンデーションタスク(MovieLensデータを用いた)に示す。

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

翻訳日:2023-02-07 20:33:34 公開日:2023-02-04

# マスキング言語モデルにおける表現不足

Representation Deficiency in Masked Language Modeling ( http://arxiv.org/abs/2302.02060v1 )

ライセンス: Link先を確認

Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

(参考訳) Masked Language Modeling (MLM) は、その単純さと有効性から、双方向テキストエンコーダを事前学習するための最も顕著なアプローチの1つである。 MLMに関する注目すべき懸念は、特別な$\texttt{[MASK]}$シンボルが事前トレーニングデータと下流データの間に相違を引き起こすことである。我々は、MLM事前学習が、$\texttt{[MASK]}$トークンのみを表すために、いくつかのモデル次元を割り当て、結果として、実際のトークンに対する表現不足が生じ、$\textt{[MASK]}$トークンを使わずに下流データに適用された場合、事前訓練されたモデルの表現が制限されることを経験的および理論的に示す。そこで本研究では,Masked Autoencoder アーキテクチャを MLM で事前トレーニングする MAE-LM を提案し,$\texttt{[MASK]} のトークンをエンコーダから除外する。実験により,MAE-LMは実トークン表現におけるモデル次元の利用を改良し,GLUEおよびSQuADベンチマークで微調整した場合,MAE-LMは異なる事前学習設定とモデルサイズでMLM事前学習モデルより一貫して優れることを示した。

Masked Language Modeling (MLM) has been one of the most prominent approaches for pretraining bidirectional text encoders due to its simplicity and effectiveness. One notable concern about MLM is that the special $\texttt{[MASK]}$ symbol causes a discrepancy between pretraining data and downstream data as it is present only in pretraining but not in fine-tuning. In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens. Motivated by the identified issue, we propose MAE-LM, which pretrains the Masked Autoencoder architecture with MLM where $\texttt{[MASK]}$ tokens are excluded from the encoder. Empirically, we show that MAE-LM improves the utilization of model dimensions for real token representations, and MAE-LM consistently outperforms MLM-pretrained models across different pretraining settings and model sizes when fine-tuned on the GLUE and SQuAD benchmarks.

翻訳日:2023-02-07 20:33:17 公開日:2023-02-04

# 意味セグメンテーションのための意味拡散ネットワーク

Semantic Diffusion Network for Semantic Segmentation ( http://arxiv.org/abs/2302.02057v1 )

ライセンス: Link先を確認

Haoru Tan, Sitong Wu, Jimin Pi

(参考訳) 境界領域の正確かつ正確な予測はセマンティックセグメンテーションに不可欠である。しかし、一般に使用される畳み込み演算子は、局所的な詳細情報を滑らかにぼかす傾向があるため、深いモデルが正確な境界予測を生成するのが困難である。本稿では,意味的境界意識を高めるためのオペレータレベルのアプローチを提案し,深い意味的セグメンテーションモデルの予測を改善する。具体的には,まず境界特徴強調を異方性拡散過程として定式化する。次に、パラメータ化意味差畳み込み演算子と特徴融合モジュールとを含む拡散過程を近似する、意味拡散ネットワーク(SDN)と呼ばれる新しい学習可能なアプローチを提案する。我々のSDNは、元の機能からクラス間境界強化機能への微分可能なマッピングを構築することを目的としています。提案するsdnは、既存のエンコーダ/デコーダセグメンテーションモデルに簡単に接続可能な、効率的で柔軟なモジュールである。広範な実験により,提案手法は,公開ベンチマークに挑戦する上で,いくつかの典型的および最先端のセグメンテーションベースラインモデルに対して一貫した改善を達成可能であることが示された。コードはまもなくリリースされる。

Precise and accurate predictions over boundary areas are essential for semantic segmentation. However, the commonly-used convolutional operators tend to smooth and blur local detail cues, making it difficult for deep models to generate accurate boundary predictions. In this paper, we introduce an operator-level approach to enhance semantic boundary awareness, so as to improve the prediction of the deep semantic segmentation model. Specifically, we first formulate the boundary feature enhancement as an anisotropic diffusion process. We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module. Our SDN aims to construct a differentiable mapping from the original feature to the inter-class boundary-enhanced feature. The proposed SDN is an efficient and flexible module that can be easily plugged into existing encoder-decoder segmentation models. Extensive experiments show that our approach can achieve consistent improvements over several typical and state-of-the-art segmentation baseline models on challenging public benchmarks. The code will be released soon.

翻訳日:2023-02-07 20:32:48 公開日:2023-02-04

# 分子エンベディングのハーネス化シミュレーション

Harnessing Simulation for Molecular Embeddings ( http://arxiv.org/abs/2302.02055v1 )

ライセンス: Link先を確認

Christopher Fifty, Joseph M. Paggi, Ehsan Amid, Jure Leskovec, Ron Dror

(参考訳) 深層学習は、何十年にもわたって計算生物学の進歩を解き放ったが、ラベル付きデータが乏しく、自己教師付き学習の利点が無視できるため、深層学習の技法を分子領域に拡張することは困難であることが証明されている。この研究では、異なるアプローチを探求します。深層強化学習とロボット工学の手法に着想を得て,分子組込みの開発に物理に基づく分子シミュレーションを応用した。グラフニューラルネットワークをシミュレーションデータに適合させることで、シミュレーション中の生物学的ターゲットと同じような相互作用を示す分子が埋め込み空間で同様の表現を発達させる。これらの埋め込みは、現実世界のデータに基づいて訓練された下流モデルの特徴空間を初期化して、シミュレーション中に学んだ情報を分子予測タスクにエンコードする。実験結果から,本手法は実世界の分子予測タスクにおける既存のディープラーニングモデルの性能を,下流モデルに最小限の修正を加えて38%向上させ,ハイパーパラメータチューニングを不要とした。

While deep learning has unlocked advances in computational biology once thought to be decades away, extending deep learning techniques to the molecular domain has proven challenging, as labeled data is scarce and the benefit from self-supervised learning can be negligible in many cases. In this work, we explore a different approach. Inspired by methods in deep reinforcement learning and robotics, we explore harnessing physics-based molecular simulation to develop molecular embeddings. By fitting a Graph Neural Network to simulation data, molecules that display similar interactions with biological targets under simulation develop similar representations in the embedding space. These embeddings can then be used to initialize the feature space of down-stream models trained on real-world data to encode information learned during simulation into a molecular prediction task. Our experimental findings indicate this approach improves the performance of existing deep learning models on real-world molecular prediction tasks by as much as 38% with minimal modification to the downstream model and no hyperparameter tuning.

翻訳日:2023-02-07 20:32:31 公開日:2023-02-04

# 動的グラフ予測による多変量時系列異常検出

Multivariate Time Series Anomaly Detection via Dynamic Graph Forecasting ( http://arxiv.org/abs/2302.02051v1 )

ライセンス: Link先を確認

Katrina Chen, Mingbin Feng, Tony S. Wirjanto

(参考訳) 単変量時系列の異常はしばしば、歴史的観測の大多数からの時間的パターンからの異常値と逸脱を指す。多変量時系列では、異常は時間とともに相関のような系列間の関係の異常な変化を指す。既存の研究は、グラフニューラルネットワークを通してそのような系列間関係をモデル化することができる。しかし、ほとんどの作業は、異常な関係を明示的に検出するために調整されていない時系列予測タスクや再構築タスクを支援するために、グローバルまたはコンテキストウィンドウ内で静的グラフを学習することに落ち着く。他の作品では、時系列グラフのリストの再構築や予測に基づいて異常を検出し、グラフの離散的な性質によってデータ内の時間的パターンをとらえる能力を不注意に弱めている。本研究では,動的時系列グラフのリストに基づく多変量時系列異常検出フレームワークDyGraphADを提案する。その中核となる考え方は、グラフ予測タスクと時系列予測タスクを同時に支援するために、グラフの進化する性質を活用することにより、シリーズ間関係とシリーズ内時間パターンの正常状態から異常状態への偏差に基づいて異常を検出することである。実世界のデータセットに関する数値実験により,DyGraphADはベースライン異常検出手法よりも優れた性能を示した。

Anomalies in univariate time series often refer to abnormal values and deviations from the temporal patterns from majority of historical observations. In multivariate time series, anomalies also refer to abnormal changes in the inter-series relationship, such as correlation, over time. Existing studies have been able to model such inter-series relationships through graph neural networks. However, most works settle on learning a static graph globally or within a context window to assist a time series forecasting task or a reconstruction task, whose objective is not tailored to explicitly detect the abnormal relationship. Some other works detect anomalies based on reconstructing or forecasting a list of inter-series graphs, which inadvertently weakens their power to capture temporal patterns within the data due to the discrete nature of graphs. In this study, we propose DyGraphAD, a multivariate time series anomaly detection framework based upon a list of dynamic inter-series graphs. The core idea is to detect anomalies based on the deviation of inter-series relationships and intra-series temporal patterns from normal to anomalous states, by leveraging the evolving nature of the graphs in order to assist a graph forecasting task and a time series forecasting task simultaneously. Our numerical experiments on real-world datasets demonstrate that DyGraphAD has superior performance than baseline anomaly detection approaches.

翻訳日:2023-02-07 20:32:12 公開日:2023-02-04

# REaLTabFormer: トランスフォーマーを用いたリアルリレーショナルデータとタブラルデータの生成

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers ( http://arxiv.org/abs/2302.02041v1 )

ライセンス: Link先を確認

Aivin V. Solatorio and Olivier Dupriez

(参考訳) タブラルデータ(tabular data)は、データ編成の一般的な形式である。複数のモデルは、観察が独立した合成表型データセットを生成することができるが、リレーショナルデータセットを生成する能力を持つものは少ない。テーブルとテーブル間の関係の両方をモデル化する必要があるため、関係データのモデリングは難しい。 realtabformer (realistic relational and tabular transformer), 表型および関係型データ生成モデルであるrealtabformer (realistic relational and tabular transformer) を導入する。まず、自己回帰型gpt-2モデルを使用して親テーブルを作成し、次にsequence-to-sequence(seq2seq)モデルを使用して親テーブル上のリレーショナルデータセットを生成する。我々は,データのコピーを防止するためにターゲットマスキングを実装し,オーバーフィッティングを検出するために$q_{\delta}$ statistic and statistical bootstrappingを提案する。実世界のデータセットを用いた実験では、REaLTabFormerはベースラインモデルよりもリレーショナル構造をよりよくキャプチャする。 REaLTabFormerは、微調整を必要とせずに大規模な非リレーショナルデータセットに対して、予測タスク"out-of-box"の最先端結果も達成している。

Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular datasets where observations are independent, but few have the ability to produce relational datasets. Modeling relational data is challenging as it requires modeling both a "parent" table and its relationships across tables. We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq) model. We implement target masking to prevent data copying and propose the $Q_{\delta}$ statistic and statistical bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures the relational structure better than a baseline model. REaLTabFormer also achieves state-of-the-art results on prediction tasks, "out-of-the-box", for large non-relational datasets without needing fine-tuning.

翻訳日:2023-02-07 20:31:51 公開日:2023-02-04

# 残留膜電位によるANN-SNN変換誤差の低減

Reducing ANN-SNN Conversion Error through Residual Membrane Potential ( http://arxiv.org/abs/2302.02091v1 )

ライセンス: Link先を確認

Zecheng Hao, Tong Bu, Jianhao Ding, Tiejun Huang, Zhaofei Yu

(参考訳) スパイキングニューラルネットワーク(SNN)は、低消費電力のユニークな特性とニューロモルフィックチップ上の高速コンピューティングにより、広く学術的な注目を集めている。 SNNの様々なトレーニング手法の中で、ANN-SNN変換は大規模データセット上でのANNと同等の性能を示す。しかし,活性化層へのスパイク到来の時間的変化による偏差を示すむら誤差は効果的に解決されておらず,短時間のステップ条件下ではsnsの性能に深刻な打撃を与えている。本稿では,凹凸誤差の詳細な解析を行い,これらを4つのカテゴリに分類する。 ANNの出力がゼロであるのに対し、SNNの出力は最大パーセントのゼロよりも大きいことを指摘している。そこで本稿では,本事例の十分な条件と必要条件を理論的に証明し,残留膜電位に基づく最適化手法を提案する。実験の結果,提案手法はCIFAR-10, CIFAR-100, ImageNetデータセット上での最先端性能を実現することがわかった。例えば、ImageNetでトップ1の精度は10ステップで64.32\%に達する。我々の知る限り、ANN-SNN変換は複雑なデータセット上で高い精度と超低レイテンシを同時に達成できるのはこれが初めてである。コードはhttps://github.com/hzc1208/ANN2SNN\_SRPで入手できる。

Spiking Neural Networks (SNNs) have received extensive academic attention due to the unique properties of low power consumption and high-speed computing on neuromorphic chips. Among various training methods of SNNs, ANN-SNN conversion has shown the equivalent level of performance as ANNs on large-scale datasets. However, unevenness error, which refers to the deviation caused by different temporal sequences of spike arrival on activation layers, has not been effectively resolved and seriously suffers the performance of SNNs under the condition of short time-steps. In this paper, we make a detailed analysis of unevenness error and divide it into four categories. We point out that the case of the ANN output being zero while the SNN output being larger than zero accounts for the largest percentage. Based on this, we theoretically prove the sufficient and necessary conditions of this case and propose an optimization strategy based on residual membrane potential to reduce unevenness error. The experimental results show that the proposed method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet datasets. For example, we reach top-1 accuracy of 64.32\% on ImageNet with 10-steps. To the best of our knowledge, this is the first time ANN-SNN conversion can simultaneously achieve high accuracy and ultra-low-latency on the complex dataset. Code is available at https://github.com/hzc1208/ANN2SNN\_SRP.

翻訳日:2023-02-07 20:24:57 公開日:2023-02-04

# MOMA:自己監督型教員から学ぶ

MOMA:Distill from Self-Supervised Teachers ( http://arxiv.org/abs/2302.02089v1 )

ライセンス: Link先を確認

Yuchong Yao, Nandakishor Desai, Marimuthu Palaniswami

(参考訳) コントラスト学習とマスク画像モデリングは、それぞれモーメントコントラスト(moco)とマスクオートエンコーダ(mae)が最先端である自己教師あり表現学習において、例外的な性能を示している。本研究では,MoCoとMAEを自己指導的に蒸留し,両方のパラダイムから知識を抽出する手法を提案する。提案するMOMAフレームワークに3つの異なる知識伝達機構を導入する。 1) 予備訓練したMoCoをMAEに希釈する。 2) MoCo と MoCo を蒸留した MAE と MoCo と MAE を無作為初期化学生に希釈した。蒸留中、教師と生徒は、それぞれオリジナルの入力とマスクされた入力を供給される。教師の正規化表現と生徒の投影表現とを整合させることにより学習を可能にする。この単純な設計は、非常に高いマスク比と劇的に訓練エポックスを低減した効率的な計算をもたらし、蒸留ターゲットに余分な配慮は必要としない。この実験は、MOMAが既存の最先端の手法に匹敵する性能を持つコンパクトな学生モデルを提供し、双方の自己教師付き学習パラダイムのパワーを組み合わせていることを示している。コンピュータビジョンにおける様々なベンチマークに対する競合結果を示す。本手法は,大規模事前学習モデルからの知識の伝達と適応に関する知見を,計算的に効率的な方法で提供することを願っている。

Contrastive Learning and Masked Image Modelling have demonstrated exceptional performance on self-supervised representation learning, where Momentum Contrast (i.e., MoCo) and Masked AutoEncoder (i.e., MAE) are the state-of-the-art, respectively. In this work, we propose MOMA to distill from pre-trained MoCo and MAE in a self-supervised manner to collaborate the knowledge from both paradigms. We introduce three different mechanisms of knowledge transfer in the propsoed MOMA framework. : (1) Distill pre-trained MoCo to MAE. (2) Distill pre-trained MAE to MoCo (3) Distill pre-trained MoCo and MAE to a random initialized student. During the distillation, the teacher and the student are fed with original inputs and masked inputs, respectively. The learning is enabled by aligning the normalized representations from the teacher and the projected representations from the student. This simple design leads to efficient computation with extremely high mask ratio and dramatically reduced training epochs, and does not require extra considerations on the distillation target. The experiments show MOMA delivers compact student models with comparable performance to existing state-of-the-art methods, combining the power of both self-supervised learning paradigms. It presents competitive results against different benchmarks in computer vision. We hope our method provides an insight on transferring and adapting the knowledge from large-scale pre-trained models in a computationally efficient way.

翻訳日:2023-02-07 20:24:35 公開日:2023-02-04

# AV-NeRF:リアルワールドオーディオ映像合成のためのニューラルネットワーク学習

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis ( http://arxiv.org/abs/2302.02088v1 )

ライセンス: Link先を確認

Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

(参考訳) 複雑な世界に対する人間の認識は、マルチモーダル信号の包括的な分析に依存しており、オーディオとビデオ信号の共起は、人間に豊かな手がかりを与える。本稿では,実世界における新しい映像シーン合成について述べる。オーディオ映像シーンの映像録画を前提として,その映像シーン内の任意のカメラ軌跡に沿って,空間的音声で新しい映像を合成する。音声合成にNeRFモデルを直接用いることは、事前知識の欠如と音響監督のために不十分である。この課題に対処するために,我々はまず,従来の音声伝搬の知識をNeRFに統合した音響認識型音声生成モジュールを提案し,そこで音声生成と視覚環境の3次元幾何を関連づける。また,音源に対する視聴方向を表す座標変換モジュールを提案する。このような方向変換は、モデルが音源中心の音響場を学ぶのに役立つ。さらに,頭部関連インパルス応答関数を用いて擬似バイノーラル音声を合成し,トレーニングを強化するデータ拡張を行う。実世界の映像シーンにおけるモデルの有用性を質的かつ定量的に実証する。我々は興味のある読者に、説得力のある比較のためにビデオ結果を見るよう勧める。

Human perception of the complex world relies on a comprehensive analysis of multi-modal signals, and the co-occurrences of audio and video signals provide humans with rich cues. This paper focuses on novel audio-visual scene synthesis in the real world. Given a video recording of an audio-visual scene, the task is to synthesize new videos with spatial audios along arbitrary novel camera trajectories in that audio-visual scene. Directly using a NeRF-based model for audio synthesis is insufficient due to its lack of prior knowledge and acoustic supervision. To tackle the challenges, we first propose an acoustic-aware audio generation module that integrates our prior knowledge of audio propagation into NeRF, in which we associate audio generation with the 3D geometry of the visual environment. In addition, we propose a coordinate transformation module that expresses a viewing direction relative to the sound source. Such a direction transformation helps the model learn sound source-centric acoustic fields. Moreover, we utilize a head-related impulse response function to synthesize pseudo binaural audio for data augmentation that strengthens training. We qualitatively and quantitatively demonstrate the advantage of our model on real-world audio-visual scenes. We refer interested readers to view our video results for convincing comparisons.

翻訳日:2023-02-07 20:24:06 公開日:2023-02-04

# 生まれた規則 -- 公理か結果か?

The Born Rule -- Axiom or Result? ( http://arxiv.org/abs/2302.02086v1 )

ライセンス: Link先を確認

Jay Lawrence and Philip Goyal

(参考訳) ボルン則(英: born rule)は、量子論の標準版における崩壊公理の一部である。ここでは、その符号の二次的依存は、他の公理を超える1つの物理的仮定、すなわち、ある特定の測定結果(例えば、$\phi_k$)の確率が、その固有状態の1つがその結果に対応する限り、測定可能な選択から独立であることを示す。私たちはこの仮定を「観測可能な独立」と呼んでいる。結果として生まれた規則は公理のリストから完全に排除することはできないが、原則として、より物理的なステートメントに還元できる。我々のプレゼンテーションは、量子理論の標準講座を受講した上級の学部生や大学院生に適している。理論の特定の解釈には依存しない。

The Born rule is part of the collapse axiom in the standard version of quantum theory, as presented by standard textbooks on the subject. We show here that its signature quadratic dependence follows from a single additional physical assumption beyond the other axioms - namely, that the probability of a particular measurement outcome (the state $\phi_k$, say) is independent of the choice of observable to be measured, so long as one of its eigenstates corresponds to that outcome. We call this assumption ``observable independence.'' As a consequence, the Born rule cannot be completely eliminated from the list of axioms, but it can, in principle, be reduced to a more physical statement. Our presentation is suitable for advanced undergraduates or graduate students who have taken a standard course in quantum theory. It does not depend on any particular interpretation of the theory.

翻訳日:2023-02-07 20:23:48 公開日:2023-02-04

# 心の理論は、大きな言語モデルで自然発生的に現れたかもしれない

Theory of Mind May Have Spontaneously Emerged in Large Language Models ( http://arxiv.org/abs/2302.02083v1 )

ライセンス: Link先を確認

Michal Kosinski

(参考訳) 心の理論、または他人に観察不能な精神状態をもたらす能力は、人間の社会的相互作用、コミュニケーション、共感、自己意識、道徳の中心である。人間のToMテストに広く用いられている古典的偽理解タスクを,事例や事前学習を伴わずに,いくつかの言語モデルに管理する。その結果,2022年以前のモデルでは,ToMタスクを解く能力がほとんどないことがわかった。しかし、2022年1月のGPT-3(davinci-002)では、ToMタスクの70%が解決された。さらに、2022年11月版(davinci-003)では、ToMタスクの93%が解決された。これらの結果から,ToM様の能力は言語モデルの言語能力向上の副産物として自然に出現した可能性が示唆された。

Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.

翻訳日:2023-02-07 20:23:33 公開日:2023-02-04

# Gated FusionによるNLPモデルの後方互換性向上

Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion ( http://arxiv.org/abs/2302.02080v1 )

ライセンス: Link先を確認

Yi-An Lai, Elman Mansimov, Yuqing Xie, Yi Zhang

(参考訳) ニューラルモデルを新しいバージョンにアップグレードする場合、レガシバージョンで遭遇しなかった新しいエラーを、レグレッションエラー(regress error)として導入することができる。モデルアップグレード中のこの一貫性のない振る舞いは、しばしば精度向上の利点を上回り、新しいモデルの採用を妨げる。モデルアップグレードからの回帰誤差を軽減するため、蒸留とアンサンブルは性能に大きな妥協なしに実現可能であることが証明された。進歩にもかかわらず、これらのアプローチは回帰の漸進的な削減を達成し、後方互換性のあるモデルアップグレードには程遠い。本研究では,古いモデルと新しいモデルの間で予測を混合する学習を通じて,後方互換性を促進する新しい手法gated fusionを提案する。 2つの異なるモデルアップグレードシナリオにおける実験結果から,提案手法は回帰誤差を平均62%削減し,最強のベースラインを平均25%上回る結果となった。

When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. This inconsistent behavior during model upgrade often outweighs the benefits of accuracy gain and hinders the adoption of new models. To mitigate regression errors from model upgrade, distillation and ensemble have proven to be viable solutions without significant compromise in performance. Despite the progress, these approaches attained an incremental reduction in regression which is still far from achieving backward-compatible model upgrade. In this work, we propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models. Empirical results on two distinct model upgrade scenarios show that our method reduces the number of regression errors by 62% on average, outperforming the strongest baseline by an average of 25%.

翻訳日:2023-02-07 20:23:15 公開日:2023-02-04

# FGSI:細粒度意味情報に基づく関係抽出のための距離スーパービジョン

FGSI: Distant Supervision for Relation Extraction method based on Fine-Grained Semantic Information ( http://arxiv.org/abs/2302.02078v1 )

ライセンス: Link先を確認

Chenghong Sun, Weidong Ji, Guohui Zhou, Hui Guo, Zengxiang Yin and Yuqi Yue

(参考訳) 関係抽出の主な目的は、文の意味理解と知識グラフの構築において重要な役割を担っている、文内のエンティティのタグ付きペア間の意味関係を抽出することである。本稿では,文内のキーセマンティック情報が,エンティティ間の関係抽出において重要な役割を果たすことを提案する。文内のキーセマンティック情報がエンティティ関係抽出において重要な役割を果たすという仮説を提案する。そして,この仮説に基づき,文の内部から実体の位置に応じて文を3つのセグメントに分割し,文内部の微細な意味的特徴を文内注意機構を通じて発見し,無関係な雑音情報の干渉を低減する。提案する関係抽出モデルは、利用可能なポジティブな意味情報を十分に活用することができる。実験の結果,提案手法は既存手法と比較して精度-リコール曲線とp@n値が向上し,本モデルの有効性が証明された。

The main purpose of relation extraction is to extract the semantic relationships between tagged pairs of entities in a sentence, which plays an important role in the semantic understanding of sentences and the construction of knowledge graphs. In this paper, we propose that the key semantic information within a sentence plays a key role in the relationship extraction of entities. We propose the hypothesis that the key semantic information inside the sentence plays a key role in entity relationship extraction. And based on this hypothesis, we split the sentence into three segments according to the location of the entity from the inside of the sentence, and find the fine-grained semantic features inside the sentence through the intra-sentence attention mechanism to reduce the interference of irrelevant noise information. The proposed relational extraction model can make full use of the available positive semantic information. The experimental results show that the proposed relation extraction model improves the accuracy-recall curves and P@N values compared with existing methods, which proves the effectiveness of this model.

翻訳日:2023-02-07 20:23:00 公開日:2023-02-04

# クロス周波数時系列メタフォアキャスティング

Cross-Frequency Time Series Meta-Forecasting ( http://arxiv.org/abs/2302.02077v1 )

ライセンス: Link先を確認

Mike Van Ness, Huibin Shen, Hao Wang, Xiaoyong Jin, Danielle C. Maddix, Karthick Gopalswamy

(参考訳) meta-forecastingは、メタラーニングと時系列予測を組み合わせた新しい分野だ。 meta-forecastingの目標は、ソース時系列のコレクションをトレーニングし、新しい時系列に1回ずつ一般化することだ。メタ予測における従来のアプローチは競合性能を実現するが、サンプリング周波数ごとに個別のモデルを訓練する制限がある。本研究では,様々なサンプリング周波数のメタフォアキャスティングを調査し,新しいモデルである連続周波数アダプタ(cfa)を導入し,周波数不変表現を学習する。我々は、CFAが周波数を一般化する際の性能を大幅に改善し、より大規模なマルチ周波数データセットを予測するための第一歩となることを発見した。

Meta-forecasting is a newly emerging field which combines meta-learning and time series forecasting. The goal of meta-forecasting is to train over a collection of source time series and generalize to new time series one-at-a-time. Previous approaches in meta-forecasting achieve competitive performance, but with the restriction of training a separate model for each sampling frequency. In this work, we investigate meta-forecasting over different sampling frequencies, and introduce a new model, the Continuous Frequency Adapter (CFA), specifically designed to learn frequency-invariant representations. We find that CFA greatly improves performance when generalizing to unseen frequencies, providing a first step towards forecasting over larger multi-frequency datasets.

翻訳日:2023-02-07 20:22:42 公開日:2023-02-04

# X-ReID: アイデンティティレベル人物再識別のためのクロスインスタンス変換器

X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification ( http://arxiv.org/abs/2302.02075v1 )

ライセンス: Link先を確認

Leqi Shen, Tao He, Yuchen Guo, Guiguang Ding

(参考訳) 現在、ほとんどの既存の人物再識別方法は、単一の画像からのみ抽出されるインスタンスレベル機能を使用している。しかし、これらのインスタンスレベルの特徴は、各アイデンティティの外観が異なる画像で大きく異なるため、容易に識別情報を無視することができる。したがって、各アイデンティティの異なるイメージ間で共有できるidレベルの機能を利用する必要がある。本稿では,同一人物画像から同一人物画像への情報をクロスアテンションで取り込み,より統一的で識別可能な歩行者情報を得ることにより,アイデンティティ・レベル特徴に対するインスタンス・レベル特徴の促進を提案する。 x-reid という新しいトレーニングフレームワークを提案する。具体的には、cross intra-identity instances module(intrax)は異なるidentityインスタンスを融合してidレベルの知識を転送し、インスタンスレベルの機能をよりコンパクトにする。 InterX(Cross Inter-Identity Instances Module)は、アイデンティティ内の変動を最小限に抑え、アイデンティティ間の変動を最大化する、異なるアイデンティティではなく、同じアイデンティティに対する注意応答を改善するために、ハードポジティとハードポジティのインスタンスを含む。ベンチマークデータセットに関する広範な実験は、既存の作業よりも優れた方法を示している。特にMSMT17では,2位に比べて1.1%のmAP改善が得られた。

Currently, most existing person re-identification methods use Instance-Level features, which are extracted only from a single image. However, these Instance-Level features can easily ignore the discriminative information due to the appearance of each identity varies greatly in different images. Thus, it is necessary to exploit Identity-Level features, which can be shared across different images of each identity. In this paper, we propose to promote Instance-Level features to Identity-Level features by employing cross-attention to incorporate information from one image to another of the same identity, thus more unified and discriminative pedestrian information can be obtained. We propose a novel training framework named X-ReID. Specifically, a Cross Intra-Identity Instances module (IntraX) fuses different intra-identity instances to transfer Identity-Level knowledge and make Instance-Level features more compact. A Cross Inter-Identity Instances module (InterX) involves hard positive and hard negative instances to improve the attention response to the same identity instead of different identity, which minimizes intra-identity variation and maximizes inter-identity variation. Extensive experiments on benchmark datasets show the superiority of our method over existing works. Particularly, on the challenging MSMT17, our proposed method gains 1.1% mAP improvements when compared to the second place.

翻訳日:2023-02-07 20:22:31 公開日:2023-02-04

# 量子計算:大規模クリティカルインフラストラクチャのための効率的なネットワークパーティショニング

Quantum computation: Efficient network partitioning for large scale critical infrastructures ( http://arxiv.org/abs/2302.02074v1 )

ライセンス: Link先を確認

Saikat Ray Majumder, Annarita Giani, Weiwei Shen, Bogdan Neculaes, Daiwei Zhu, and Sonika Johri

(参考訳) 量子コンピュータは、古典的なコンピュータにとって困難な特定の計算問題に取り組むための、有効な代替手段として現れつつある。閉じ込められたイオンに基づく量子ハードウェアの急速な発展により、これらのシステムで効率的に解くことができるリスク管理問題を特定する実践的な動機がある。本稿では,重要なインフラにおけるリスクを分析する手段としてネットワーク分割に着目し,その実装に量子的アプローチを提案する。これは潜在的なスピードアップ量子コンピュータがスパースグラフラプラシアンの固有値と固有ベクトルを識別できる可能性に基づいており、これは古典的コンピュータ上の時間とメモリによって制約される手順である。

Quantum computers are emerging as a viable alternative to tackle certain computational problems that are challenging for classical computers. With the rapid development of quantum hardware such as those based on trapped ions, there is practical motivation for identifying risk management problems that are efficiently solvable with these systems. Here we focus on network partitioning as a means for analyzing risk in critical infrastructures and present a quantum approach for its implementation. It is based on the potential speedup quantum computers can provide in the identification of eigenvalues and eigenvectors of sparse graph Laplacians, a procedure which is constrained by time and memory on classical computers.

翻訳日:2023-02-07 20:22:07 公開日:2023-02-04

# 適応的拡張意味情報を組み合わせた知識グラフ補完法

Knowledge Graph Completion Method Combined With Adaptive Enhanced Semantic Information ( http://arxiv.org/abs/2302.02116v1 )

ライセンス: Link先を確認

Weidong Ji, Zengxiang Yin, Guohui Zhou, Yuqi Yue, Xinru Zhang, Chenghong Sun

(参考訳) 翻訳モデルは知識グラフ補完の過程において、トリアドの豊富な意味情報を無視する傾向にある。本稿では,適応的に強化された意味情報を含む知識グラフ補完手法を構築する。 BERTモデルを微調整して、トリアドに固有の隠された意味情報を取得し、その注意特徴埋め込み法を用いて、正及び負の3つのトライアドの関係と実体間の意味的注意スコアを算出し、それらを構造情報に組み込んで意味情報に対するソフト制約ルールを形成する。このルールは、意味情報の適応的な拡張を実現するために、元の翻訳モデルに追加される。さらに、効果に対する高次元ベクトルの効果を考慮し、bert-whitening法を用いて次元を縮小し、より効率的な意味ベクトル表現を生成する。実験による比較の結果,fb15kおよびwin18データセットにおいて,本手法の有効性と有効性を検証した原文翻訳モデルと比較して約2.6%の数値改善が得られた。

Translation models tend to ignore the rich semantic information in triads in the process of knowledge graph complementation. To remedy this shortcoming, this paper constructs a knowledge graph complementation method that incorporates adaptively enhanced semantic information. The hidden semantic information inherent in the triad is obtained by fine-tuning the BERT model, and the attention feature embedding method is used to calculate the semantic attention scores between relations and entities in positive and negative triads and incorporate them into the structural information to form a soft constraint rule for semantic information. The rule is added to the original translation model to realize the adaptive enhancement of semantic information. In addition, the method takes into account the effect of high-dimensional vectors on the effect, and uses the BERT-whitening method to reduce the dimensionality and generate a more efficient semantic vector representation. After experimental comparison, the proposed method performs better on both FB15K and WIN18 datasets, with a numerical improvement of about 2.6% compared with the original translation model, which verifies the reasonableness and effectiveness of the method.

翻訳日:2023-02-07 20:16:24 公開日:2023-02-04

# 複素ベリー相と不完全非エルミート相転移

Complex Berry phase and imperfect non-Hermitian phase transitions ( http://arxiv.org/abs/2302.02114v1 )

ライセンス: Link先を確認

Stefano Longhi and Liang Feng

(参考訳) 有効な非エルミート・ハミルトニアンによって記述される多くの古典的・量子系において、スペクトル相転移は、完全に実エネルギースペクトルから複素スペクトルへ、系の非エルミートパラメータが臨界値を超えると観測できる。パラダイム的な例はパリティ時(PT)対称性を持つ系によって提供され、そこではエネルギースペクトルが非破壊PT相において完全に現実のままであり、一方複素エネルギーへの遷移は非破壊PT相において観測される。このようなスペクトル相転移は普遍的に鋭い。しかし、系がゆっくりと周期的に周期的に循環すると、相転移は、系の周期的断熱的進化に付随する複雑なベリー相のため、滑らかに、すなわち不完全になる。この注目すべき現象は、外部直流場を受ける2バンド非エルミート格子のpt対称クラスにおけるワニエ・スターク・ラダーのスペクトル相転移を考慮し、ザック相の非有界虚部 -- ブリルアンゾーン全体にわたって進化するブロッホ固有状態によって取り出されたベリー相 -- が不完全なスペクトル相転移の原因であることを示すものである。

In many classical and quantum systems described by an effective non-Hermitian Hamiltonian, spectral phase transitions, from an entirely real energy spectrum to a complex spectrum, can be observed as a non-Hermitian parameter in the system is increased above a critical value. A paradigmatic example is provided by systems possessing parity-time (PT) symmetry, where the energy spectrum remains entirely real in the unbroken PT phase while a transition to complex energies is observed in the unbroken PT phase. Such spectral phase transitions are universally sharp. However, when the system is slowly and periodically cycled, the phase transition can become smooth, i.e. imperfect, owing to the complex Berry phase associated to the cyclic adiabatic evolution of the system. This remarkable phenomenon is illustrated by considering the spectral phase transition of the Wannier-Stark ladders in a PT-symmetric class of two-band non-Hermitian lattices subjected to an external dc field, unraveling that a non-vanishing imaginary part of the Zak phase -- the Berry phase picked up by a Bloch eigenstate evolving across the entire Brillouin zone -- is responsible for imperfect spectral phase transitions

翻訳日:2023-02-07 20:16:05 公開日:2023-02-04

# コードリポジトリの振る舞いデータによるセキュリティパッチの検出

Detecting Security Patches via Behavioral Data in Code Repositories ( http://arxiv.org/abs/2302.02112v1 )

ライセンス: Link先を確認

Nitzan Farhi, Noam Koenigstein, Yuval Shavitt

(参考訳) 今日のソフトウェアの大部分は、gitのような協調バージョン管理ツールを使って共同開発されている。脆弱性が検出され、修正されると、ソフトウェアを開発する開発者は、セキュリティの危険性をユーザコミュニティに警告し、セキュリティパッチを統合するよう促すために、Common Vulnerabilities and Exposures(CVEレコード)を発行する。しかし、一部の企業は脆弱性を公表せず、リポジトリを更新している。その結果、ユーザは脆弱性に気づいておらず、露出し続ける可能性がある。本稿では,Gitリポジトリの開発者動作のみを使用して,修正に伴うコード自体やコメント(コミットメッセージ)を分析することなく,セキュリティパッチを自動的に識別するシステムを提案する。秘密のセキュリティパッチを88.3%、F1スコア89.8%で公開できることを示した。この問題に対する言語的な解決法が提示されたのは今回が初めてである。

The absolute majority of software today is developed collaboratively using collaborative version control tools such as Git. It is a common practice that once a vulnerability is detected and fixed, the developers behind the software issue a Common Vulnerabilities and Exposures or CVE record to alert the user community of the security hazard and urge them to integrate the security patch. However, some companies might not disclose their vulnerabilities and just update their repository. As a result, users are unaware of the vulnerability and may remain exposed. In this paper, we present a system to automatically identify security patches using only the developer behavior in the Git repository without analyzing the code itself or the remarks that accompanied the fix (commit message). We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%. This is the first time that a language-oblivious solution for this problem is presented.

翻訳日:2023-02-07 20:15:40 公開日:2023-02-04

# 視覚トランスフォーマーにおける知識蒸留 : 批判的レビュー

Knowledge Distillation in Vision Transformers: A Critical Review ( http://arxiv.org/abs/2302.02108v1 )

ライセンス: Link先を確認

Gousia Habib, Tausifa Jan Saleem, Brejesh Lall

(参考訳) 自然言語処理(nlp)では、トランスフォーマーはすでに注意に基づくエンコーダ・デコーダモデルを利用してこの分野に革命をもたらしている。近年,コンピュータビジョン(CV)にトランスフォーマーのようなアーキテクチャを採用し,画像分類やオブジェクト検出,セマンティックセグメンテーションといったタスクにおいて,これらのアーキテクチャの優れた性能を報告している。ビジョントランスフォーマー(ViT)は、競合するモデリング能力のために、畳み込みニューラルネットワーク(CNN)よりも優れたパフォーマンスを誇示している。しかし、これらのアーキテクチャは膨大な計算資源を必要とするため、リソース制約されたアプリケーションにこれらのモデルをデプロイすることは困難である。圧縮変圧器や拡張畳み込み、min-maxプール、1D畳み込みなどの圧縮関数など、この問題に対処する多くのソリューションが開発されている。モデル圧縮は最近、潜在的な治療としてかなりの研究の注目を集めている。重み量子化,重み多重化,プルーニング,知識蒸留 (kd) などの文献において,モデル圧縮法が提案されている。しかしながら、重み量子化、プルーニング、重み多重化といったテクニックは、圧縮を実行するための複雑なパイプラインを必要とする。 KDは、比較的単純なモデルが複雑なモデルと同じくらい正確にタスクを実行できる、シンプルで効果的なモデル圧縮技術であることが分かってきた。本稿では,vitモデルの効果的圧縮のためのkdに基づく様々な手法について述べる。この論文は、kdがこれらのモデルの計算とメモリ要求を減らす上で果たす役割を解明している。本稿は、まだ解決されていないViTが直面する様々な課題についても述べる。

In Natural Language Processing (NLP), Transformers have already revolutionized the field by utilizing an attention-based encoder-decoder model. Recently, some pioneering works have employed Transformer-like architectures in Computer Vision (CV) and they have reported outstanding performance of these architectures in tasks such as image classification, object detection, and semantic segmentation. Vision Transformers (ViTs) have demonstrated impressive performance improvements over Convolutional Neural Networks (CNNs) due to their competitive modelling capabilities. However, these architectures demand massive computational resources which makes these models difficult to be deployed in the resource-constrained applications. Many solutions have been developed to combat this issue, such as compressive transformers and compression functions such as dilated convolution, min-max pooling, 1D convolution, etc. Model compression has recently attracted considerable research attention as a potential remedy. A number of model compression methods have been proposed in the literature such as weight quantization, weight multiplexing, pruning and Knowledge Distillation (KD). However, techniques like weight quantization, pruning and weight multiplexing typically involve complex pipelines for performing the compression. KD has been found to be a simple and much effective model compression technique that allows a relatively simple model to perform tasks almost as accurately as a complex model. This paper discusses various approaches based upon KD for effective compression of ViT models. The paper elucidates the role played by KD in reducing the computational and memory requirements of these models. The paper also presents the various challenges faced by ViTs that are yet to be resolved.

翻訳日:2023-02-07 20:15:26 公開日:2023-02-04

# ハードサトゲン:ハードSATフォーミュラの難易度と強構造に配慮したベースラインの理解

HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline ( http://arxiv.org/abs/2302.02104v1 )

ライセンス: Link先を確認

Yang Li, Xinyan Chen, Wenxuan Guo, Xijun Li, Wanqian Luo, Junhua Huang, Hui-Ling Zhen, Mingxuan Yuan, Junchi Yan

(参考訳) 産業SAT公式生成は、実用SATアプリケーションにおけるヒューリスティックな開発と学習に基づく手法の急増にとって、重要かつ困難な課題である。既存のSAT生成アプローチでは、グローバルな構造特性を同時に捉えることはほとんどできず、様々な下流のエンゲージメントにとって有害な計算硬度を維持することができる。この目的のために,本研究では,従来の学習方法の制限について,従来の計算困難度を再現するための詳細な解析を行った。産業用公式が明らかなコミュニティ構造と過分な部分構造を示すことから,論理構造のセマンティックな形成が困難であることを示す上で,SAT式生成のためのニューラルスプリット・マージ・パラダイムにきめ細かな制御機構を導入し,産業用ベンチマークの構造的・計算的特性をよりよく回復させるHardSATGENを提案する。計算硬度とグローバルな構造特性の同時把握を両立させる手法として, 民間企業データの評価や, 実用化におけるハイパーパラメータチューニングなどの実験結果から, ハードサトゲンの有意な優位性が確認された。我々の最高の知識に対する最高の手法と比較すると、平均的な性能向上は構造統計で38.5%、計算メトリクスで88.4%、生成したインスタンスでチューニングされたソルバ開発を導く効果で140.7%以上を達成している。

Industrial SAT formula generation is a critical yet challenging task for heuristic development and the surging learning-based methods in practical SAT applications. Existing SAT generation approaches can hardly simultaneously capture the global structural properties and maintain plausible computational hardness, which can be hazardous for the various downstream engagements. To this end, we first present an in-depth analysis for the limitation of previous learning methods in reproducing the computational hardness of original instances, which may stem from the inherent homogeneity in their adopted split-merge procedure. On top of the observations that industrial formulae exhibit clear community structure and oversplit substructures lead to the difficulty in semantic formation of logical structures, we propose HardSATGEN, which introduces a fine-grained control mechanism to the neural split-merge paradigm for SAT formula generation to better recover the structural and computational properties of the industrial benchmarks. Experimental results including evaluations on private corporate data and hyperparameter tuning over solvers in practical use show the significant superiority of HardSATGEN being the only method to successfully augments formulae maintaining similar computational hardness and capturing the global structural properties simultaneously. Compared to the best previous methods to our best knowledge, the average performance gains achieve 38.5% in structural statistics, 88.4% in computational metrics, and over 140.7% in the effectiveness of guiding solver development tuned by our generated instances.

翻訳日:2023-02-07 20:15:01 公開日:2023-02-04

# grande: 指向型マルチグラフによるニューラルモデルとアンチマネーロンダリングへの応用

GRANDE: a neural model over directed multigraphs with application to anti-money laundering ( http://arxiv.org/abs/2302.02101v1 )

ライセンス: Link先を確認

Ruofan Wu, Boqun Ma, Hong Jin, Wenlong Zhao, Weiqiang Wang, Tianyi Zhang

(参考訳) 近年,金融リスクマネジメント(FRM)分野へのグラフ表現学習技術の応用が注目されている。第一に、トランザクションネットワークは本質的にはマルチグラフを指向しており、現行のグラフニューラルネットワーク(gnn)のほとんどでは適切に処理できない。第二に、アンチマネーロンダリング(AML)のようなFRMシナリオにおける重要な問題は、リスクのあるトランザクションを識別することであり、ノード中心のメッセージパッシングプロトコルに従う一般的なGNN設計によって完全に活用されていない、リッチエッジレベルの機能を備えたエッジ分類問題に最も自然に陥る。本稿では,指向型マルチグラフ上でのニューラルモデルの設計側面を体系的に検討し,方向情報を効率的に取り入れることで,上記の課題を克服する新しいgnnプロトコルを開発し,エッジ・ツー・ノード二重グラフの拡張を用いた新しいメッセージパッシング方式を用いてエッジ関連タスクを対象とする拡張を提案する。 GRANDEと呼ばれる具体的なGNNアーキテクチャは提案プロトコルを用いて導出され、時間動的グラフのさらなる改良と一般化がなされている。 GRANDEモデルを現実世界のマネーロンダリングタスクと公開データセットの両方に適用する。実験により, 動的グラフモデリングと有向グラフモデリングにおいて, 最近の最先端モデルよりもgrandeアーキテクチャが優れていることが示された。

The application of graph representation learning techniques to the area of financial risk management (FRM) has attracted significant attention recently. However, directly modeling transaction networks using graph neural models remains challenging: Firstly, transaction networks are directed multigraphs by nature, which could not be properly handled with most of the current off-the-shelf graph neural networks (GNN). Secondly, a crucial problem in FRM scenarios like anti-money laundering (AML) is to identify risky transactions and is most naturally cast into an edge classification problem with rich edge-level features, which are not fully exploited by the prevailing GNN design that follows node-centric message passing protocols. In this paper, we present a systematic investigation of design aspects of neural models over directed multigraphs and develop a novel GNN protocol that overcomes the above challenges via efficiently incorporating directional information, as well as proposing an enhancement that targets edge-related tasks using a novel message passing scheme over an extension of edge-to-node dual graph. A concrete GNN architecture called GRANDE is derived using the proposed protocol, with several further improvements and generalizations to temporal dynamic graphs. We apply the GRANDE model to both a real-world anti-money laundering task and public datasets. Experimental evaluations show the superiority of the proposed GRANDE architecture over recent state-of-the-art models on dynamic graph modeling and directed graph modeling.

翻訳日:2023-02-07 20:14:33 公開日:2023-02-04

# PLCプロセス制御における異常検出のための教師なしアンサンブル法

Unsupervised Ensemble Methods for Anomaly Detection in PLC-based Process Control ( http://arxiv.org/abs/2302.02097v1 )

ライセンス: Link先を確認

Emmanuel Aboah Boateng, and Bruce J. W

(参考訳) プログラム可能なロジックコントローラ(PLC)ベースの産業制御システム(ICS)は、重要なインフラを監視し制御するために使用される。 ICSにおける通信ネットワークの統合とIoTアプローチは、サイバー攻撃に対するICSの脆弱性を増大させた。本研究はPLCベースのICSにおける異常検出のための新しい教師なし機械学習アンサンブル手法を提案する。本研究は, 決定係数に基づく学習アルゴリズムを用いた重み付き投票アンサンブルアプローチと, 孤立林メタ検出器を用いた積み重ね型アンサンブルアプローチの2つの手法を提案する。 2つのアンサンブル法を,複数の攻撃シナリオを想定したオープンソースのplcベースのicを用いて解析した。この研究は、重み付けされた投票アンサンブル法のための4つの異なる学習モデルを考える。 5つのアンサンブル法を駆動する多種多様なベース検出器の比較性能解析を行った。その結果,分離林メタ検出器を用いた積み重ね型アンサンブル法は,過去のすべての性能指標よりも優れた性能を示した。また,分離森林メタ検出器を持つ積み重ね型アンサンブルのような効果的なアンサンブル手法は,任意のICSデータセットの異常を確実に検出できることが示唆された。最後に, 統計的仮説を用いて実験結果を検証した。

Programmable logic controller (PLC) based industrial control systems (ICS) are used to monitor and control critical infrastructure. Integration of communication networks and an Internet of Things approach in ICS has increased ICS vulnerability to cyber-attacks. This work proposes novel unsupervised machine learning ensemble methods for anomaly detection in PLC-based ICS. The work presents two broad approaches to anomaly detection: a weighted voting ensemble approach with a learning algorithm based on coefficient of determination and a stacking-based ensemble approach using isolation forest meta-detector. The two ensemble methods were analyzed via an open-source PLC-based ICS subjected to multiple attack scenarios as a case study. The work considers four different learning models for the weighted voting ensemble method. Comparative performance analyses of five ensemble methods driven diverse base detectors are presented. Results show that stacking-based ensemble method using isolation forest meta-detector achieves superior performance to previous work on all performance metrics. Results also suggest that effective unsupervised ensemble methods, such as stacking-based ensemble having isolation forest meta-detector, can robustly detect anomalies in arbitrary ICS datasets. Finally, the presented results were validated by using statistical hypothesis tests.

翻訳日:2023-02-07 20:14:08 公開日:2023-02-04

# 個人フェアネスの行列推定

Matrix Estimation for Individual Fairness ( http://arxiv.org/abs/2302.02096v1 )

ライセンス: Link先を確認

Cindy Y. Zhang, Sarah H. Cen, Devavrat Shah

(参考訳) 近年、アルゴリズム的公正性の複数の概念が生まれている。そのような概念の1つは個人公正(IF)であり、類似した個人が同様の治療を受ける必要がある。並行して、行列推定(me)は、値が欠けているノイズデータを扱うための自然なパラダイムとして現れた。この作品では、2つの概念をつなぐ。 meを用いた前処理は性能を犠牲にすることなくアルゴリズムのifを改善できることを示す。具体的には,データ前処理に特異値しきい値(SVT)と呼ばれる一般的なME手法を用いることで,適切な条件下での強力なIF保証が得られることを示す。次に、類似した条件下では、SVT前処理が一貫したほぼ最小値の推定値も得られることを示す。したがって、ME前処理ステップは、前述の条件の下では、ベースアルゴリズムの予測誤差、すなわち、フェアネスとパフォーマンスのトレードオフを課さない。これらの結果を合成データと実データで検証する。

In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. We show that pre-processing data using ME can improve an algorithm's IF without sacrificing performance. Specifically, we show that using a popular ME method known as singular value thresholding (SVT) to pre-process the data provides a strong IF guarantee under appropriate conditions. We then show that, under analogous conditions, SVT pre-processing also yields estimates that are consistent and approximately minimax optimal. As such, the ME pre-processing step does not, under the stated conditions, increase the prediction error of the base algorithm, i.e., does not impose a fairness-performance trade-off. We verify these results on synthetic and real data.

翻訳日:2023-02-07 20:13:49 公開日:2023-02-04

# ナレッジエンハンスドニューラルマシン推論:レビュー

Knowledge-enhanced Neural Machine Reasoning: A Review ( http://arxiv.org/abs/2302.02093v1 )

ライセンス: Link先を確認

Tanmoy Chowdhury, Chen Ling, Xuchao Zhang, Xujiang Zhao, Guangji Bai, Jian Pei, Haifeng Chen, Liang Zhao

(参考訳) 知識に富んだニューラルマシン推論は、最先端でありながら多くの実用的応用に挑戦する研究分野として大きな注目を集めている。過去数年間、深層モデルの推論能力向上、効果的な知識統合、暗黙の知識マイニング、トラクタビリティと最適化の問題といった課題に取り組むために、さまざまな外部知識を活用してきた研究が数多くある。しかし、様々なアプリケーションドメインにまたがる既存の知識に富んだ推論技術に関する包括的な技術的レビューがある。本調査は, 既存の知識向上手法を2つの主要なカテゴリと4つのサブカテゴリに分類する新しい分類法を導入し, この分野の最近の進歩を詳細に検討する。我々は,これらの手法を体系的に議論し,その相関性,強み,限界を強調する。最後に、現在のアプリケーションドメインを解明し、将来の研究の展望に関する洞察を提供する。

Knowledge-enhanced neural machine reasoning has garnered significant attention as a cutting-edge yet challenging research area with numerous practical applications. Over the past few years, plenty of studies have leveraged various forms of external knowledge to augment the reasoning capabilities of deep models, tackling challenges such as effective knowledge integration, implicit knowledge mining, and problems of tractability and optimization. However, there is a dearth of a comprehensive technical review of the existing knowledge-enhanced reasoning techniques across the diverse range of application domains. This survey provides an in-depth examination of recent advancements in the field, introducing a novel taxonomy that categorizes existing knowledge-enhanced methods into two primary categories and four subcategories. We systematically discuss these methods and highlight their correlations, strengths, and limitations. Finally, we elucidate the current application domains and provide insight into promising prospects for future research.

翻訳日:2023-02-07 20:13:34 公開日:2023-02-04

# ロバスト学習のための補間:測地線データ拡張

Interpolation for Robust Learning: Data Augmentation on Geodesics ( http://arxiv.org/abs/2302.02092v1 )

ライセンス: Link先を確認

Jiacheng Zhu, Jielin Qiu, Aritra Guha, Zhuolin Yang, Xuanlong Nguyen, Bo Li, Ding Zhao

(参考訳) 本稿では,トレーニングデータ分布の補間を通じて,モデルの性能に準ずるロバスト性を研究・促進することを提案する。具体的には,(1)異なるカテゴリーの測地線接続部分集団分布について,ワーストケースのwasserstein barycenterを求めることで,データを強化した。 2) サブポピュレーション分布を接続する連続測地路上でのスムーズな性能のモデルを正規化する。また,ロバスト性向上の理論的保証を提供し,測地線の位置とサンプルサイズがそれぞれどのように寄与するかを検討する。 CIFAR-100とImageNetを含む4つのデータセットに対する提案手法の実験的検証により,提案手法の有効性が確立された。例えば,提案手法は,CIFAR10のベースラインの証明可能なロバスト性を,CIFAR-100の実証的ロバスト性に対して$16.8\%で最大7.7\%まで改善する。我々の研究は、ワッサーシュタイン測地学に基づく補間によるモデルロバスト性の新しい視点と、既存のロバストトレーニング手法と組み合わせることができる実用的なオフザシェルフ戦略を提供する。

We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on four datasets, including CIFAR-100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR-100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesic-based interpolation with a practical off-the-shelf strategy that can be combined with existing robust training methods.

翻訳日:2023-02-07 20:13:19 公開日:2023-02-04

# ピラミッド型マルチモーダル変圧器を用いた効率的なエンドツーエンドビデオ質問応答

Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer ( http://arxiv.org/abs/2302.02136v1 )

ライセンス: Link先を確認

Min Peng, Chongyang Wang, Yu Shi, Xiang-Dong Zhou

(参考訳) 本稿では,大量の特徴抽出器を用いた大規模事前学習が現在普及しているビデオQA(End-to-end Video Question Answering)を提案する。ピラミッド型マルチモーダルトランス (PMT) モデルでこれを実現し、学習可能な単語埋め込み層といくつかの畳み込み層とトランスフォーマー層を組み込む。異方性ピラミッドを用いて、異なる時空間スケールにわたるビデオ言語相互作用を実現する。左右の接続を持つボトムアップ経路とトップダウン経路を含む標準ピラミッドに加えて、異なるスケールで視覚特徴ストリームを空間的・時間的サブストリームに分解し、局所的・グローバル的意味論の整合性を保ちながら言語意味論との相互作用を実装する新しい戦略が提案されている。我々は,5つのビデオQAベンチマークにおいて,最先端手法に対して高い計算効率で高い性能を示す。本研究は,再利用可能な事前学習重み付き特徴抽出器とピラミッドの有効性を活かし,テキスト対ビデオ検索の競争結果を得るモデルのスケーラビリティを示す。

This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors. We achieve this with a pyramidal multimodal transformer (PMT) model, which simply incorporates a learnable word embedding layer, a few convolutional and transformer layers. We use the anisotropic pyramid to fulfill video-language interactions across different spatio-temporal scales. In addition to the canonical pyramid, which includes both bottom-up and top-down pathways with lateral connections, novel strategies are proposed to decompose the visual feature stream into spatial and temporal sub-streams at different scales and implement their interactions with the linguistic semantics while preserving the integrity of local and global semantics. We demonstrate better or on-par performances with high computational efficiency against state-of-the-art methods on five VideoQA benchmarks. Our ablation study shows the scalability of our model that achieves competitive results for text-to-video retrieval by leveraging feature extractors with reusable pre-trained weights, and also the effectiveness of the pyramid.

翻訳日:2023-02-07 20:07:20 公開日:2023-02-04

# 最寄りのトレーニングセットを最適かつ正確に削減する

Reducing Nearest Neighbor Training Sets Optimally and Exactly ( http://arxiv.org/abs/2302.02132v1 )

ライセンス: Link先を確認

Josiah Rohrer and Simon Weber

(参考訳) near-neighbor分類では、与えられた分類を持つ$\mathbb{r}^d$のポイントの訓練セットが$\mathbb{r}^d$のすべてのポイントを分類するために使用される。最近、Eppstein [SOSA'22] は関連するトレーニングポイント、例えば$P$ や $P\setminus\{p\}$ などを検出するアルゴリズムを開発した。最小濃度低減トレーニングセット $p'\subseteq p$ を見つける問題は、$p$ と $p'$ が同じ分類を誘導するように検討する。 p$ が一般的な位置にある場合、関連する点の集合は最小濃度低減訓練集合であることを示す。さらに、P$を縮退させる可能性のある最小濃度のトレーニングセットを見つけることは、$d=1$でP、$d\geq 2$でNP完全であることを示す。

In nearest-neighbor classification, a training set $P$ of points in $\mathbb{R}^d$ with given classification is used to classify every point in $\mathbb{R}^d$: Every point gets the same classification as its nearest neighbor in $P$. Recently, Eppstein [SOSA'22] developed an algorithm to detect the relevant training points, those points $p\in P$, such that $P$ and $P\setminus\{p\}$ induce different classifications. We investigate the problem of finding the minimum cardinality reduced training set $P'\subseteq P$ such that $P$ and $P'$ induce the same classification. We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position. Furthermore, we show that finding a minimum cardinality reduced training set for possibly degenerate $P$ is in P for $d=1$, and NP-complete for $d\geq 2$.

翻訳日:2023-02-07 20:06:59 公開日:2023-02-04

# オープンピット採掘における地球移動機器と環境相互作用の考察

Inferencing the earth moving equipment-environment interaction in open pit mining ( http://arxiv.org/abs/2302.02130v1 )

ライセンス: Link先を確認

M. Balamurali

(参考訳) 鉱業において、グレードコントロールは一般的に、地層からの材料発掘方法にほとんど、あるいは全く注意を払わずに、爆風孔サンプリングと鉱石制御ブロックモデルの推定に重点を置いている。トラックを積み込む過程において、個々のバケットの負荷の変動性は、トラックのペイロードの変動性を決定する。したがって、正確な物質移動には、掘削過程とバケットと環境との相互作用に関する十分な知識が必要である。しかし、機器は予期せぬ遅延、障害、欠陥のためにしばしば名目上状態に陥る。このような乱れの大量発生は、統計力とバイアス推定を減少させる情報損失を引き起こし、生産の不確実性が増大する。利用可能なデータソースからマシンと環境の相互作用に関する知識の欠如を推測する信頼性の高い手法は、物質の動きを正確にモデル化するのに不可欠である。本研究では,教師なしクラスタリングを行い,欠落情報を予測する2段階の手法を実装した。第1の方法はDBSCANベースの空間クラスタリングであり、デガーとバケットの位置データを接続されたロードセグメントに分割する。セグメント化バケット掘削位置の明瞭なパターンが観察された。第2のモデルは、クラスタ化されたデータでトレーニングされたガウス過程の回帰を利用して、テストクラスタの平均位置を推測する。その後、バケット掘削場所は推定平均位置で異なる期間シミュレーションされ、既知のバケット掘削場所と比較された。この方法は西オーストラリアのピルバラにある露天掘り鉱山で試験された。その結果,バケット環境相互作用の欠落情報を参照することで,坑夫が連続的に物質移動を追跡できるという利点が得られた。

In mining, grade control generally focuses on blast hole sampling and the estimation of ore control block models with little or no attention given to how the materials are being excavated from the ground. In the process of loading trucks, the underlying variability of the individual bucket load will determine the variability of truck payload. Hence, accurate material movement demands a good knowledge of the excavation process and the buckets interaction with the environment. However, equipment frequently goes into off nominal states due to unexpected delays, disturbances or faults. The large amount of such disturbances causes information loss that reduces the statistical power and biases estimates, leading to increased uncertainty in the production. A reliable method that inferences the missing knowledge about the interaction between the machine and the environment from the available data sources, is vital to accurately model the material movement. In this study, a twostep method was implemented that performed unsupervised clustering and then predicted the missing information. The first method is DBSCAN based spatial clustering which divides the diggers and buckets positional data into connected loading segments. Clear patterns of segmented bucket dig positions were observed. The second model utilized Gaussian process regression which was trained with the clustered data and the model was then used to infer the mean locations of the test clusters. Bucket dig locations were then simulated at the inferred mean locations for different durations and compared against the known bucket dig locations. This method was tested at an open pit mine in the Pilbara of Western Australia. The results demonstrate the advantage of the proposed method in inferencing the missing information of bucket environment interactions and therefore enables miners to continuously track the material movement.

翻訳日:2023-02-07 20:06:39 公開日:2023-02-04

# 時間グラフの相互作用順序予測

Interaction Order Prediction for Temporal Graphs ( http://arxiv.org/abs/2302.02128v1 )

ライセンス: Link先を確認

Nayana Bannur and Mashrin Srivastava and Harsha Vardhan

(参考訳) グラフにおけるリンク予測は、広く研究されているタスクである。ナレッジグラフの補完、コンテンツ/コンテンツの推薦、ソーシャルネットワークの推薦など、さまざまな分野に適用されている。ほとんどの研究の最初の焦点は静的グラフにおけるリンク予測であった。しかし、最近は時間グラフのモデリングに関する多くの研究が行われており、その結果、時間グラフのリンク予測が研究されている。しかし、既存の研究のほとんどはリンク形成の順序に焦点を合わせておらず、リンクの存在を予測しているに過ぎない。本研究では,ノード間相互作用の順序を予測することを目的とする。

Link prediction in graphs is a task that has been widely investigated. It has been applied in various domains such as knowledge graph completion, content/item recommendation, social network recommendations and so on. The initial focus of most research was on link prediction in static graphs. However, there has recently been abundant work on modeling temporal graphs, and consequently one of the tasks that has been researched is link prediction in temporal graphs. However, most of the existing work does not focus on the order of link formation, and only predicts the existence of links. In this study, we aim to predict the order of node interactions.

翻訳日:2023-02-07 20:06:12 公開日:2023-02-04

# Geometric Prior and Contrastive similarity を用いた3次元医用画像分割法

Weakly-Supervised 3D Medical Image Segmentation using Geometric Prior and Contrastive Similarity ( http://arxiv.org/abs/2302.02125v1 )

ライセンス: Link先を確認

Hao Du, Qihua Dong, Yan Xu, Jing Liao

(参考訳) 医用画像分割は、コンピュータ支援診断において最も重要な前処理手順であるが、医療画像(低コントラスト組織、非ホモジェネティックテクスチャ)によって生じるセグメントや様々なアーティファクトの複雑な形状のために非常に難しい課題である。本稿では,幾何学的先行的および対比的類似性を,損失ベースで弱教師付きセグメンテーションフレームワークに組み込む,単純かつ効果的なセグメンテーションフレームワークを提案する。ポイントクラウド上に構築された幾何学的事前構築は、境界ボックスアノテーション(高さと幅)の固有の性質よりも優れた監督を行う弱教師付きセグメンテーション提案に細かな幾何学を提供する。さらに, コントラスト組織を識別するために, 臓器の画素がコントラスト埋め込み空間に集まることを促すために, コントラスト類似性を提案する。提案するコントラスト埋め込み空間は、従来のグレー空間の貧弱な表現を補うことができる。弱教師付きセグメンテーションフレームワークの有効性とロバスト性を検証するため,大規模な実験を行った。提案されたフレームワークは、LiTS 2017 Challenge、KiTS 2021 Challenge、LPBA40といった、現在最先端の弱い教師付き手法よりも優れている。また,提案手法を解析し,各コンポーネントの性能評価を行った。

Medical image segmentation is almost the most important pre-processing procedure in computer-aided diagnosis but is also a very challenging task due to the complex shapes of segments and various artifacts caused by medical imaging, (i.e., low-contrast tissues, and non-homogenous textures). In this paper, we propose a simple yet effective segmentation framework that incorporates the geometric prior and contrastive similarity into the weakly-supervised segmentation framework in a loss-based fashion. The proposed geometric prior built on point cloud provides meticulous geometry to the weakly-supervised segmentation proposal, which serves as better supervision than the inherent property of the bounding-box annotation (i.e., height and width). Furthermore, we propose contrastive similarity to encourage organ pixels to gather around in the contrastive embedding space, which helps better distinguish low-contrast tissues. The proposed contrastive embedding space can make up for the poor representation of the conventionally-used gray space. Extensive experiments are conducted to verify the effectiveness and the robustness of the proposed weakly-supervised segmentation framework. The proposed framework is superior to state-of-the-art weakly-supervised methods on the following publicly accessible datasets: LiTS 2017 Challenge, KiTS 2021 Challenge, and LPBA40. We also dissect our method and evaluate the performance of each component.

翻訳日:2023-02-07 20:06:03 公開日:2023-02-04

# Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning ( http://arxiv.org/abs/2302.02124v1 )

ライセンス: Link先を確認

Jingqiang Chen

(参考訳) コヒーレントエンティティ対応マルチイメージキャプションは,複数の隣接画像に対するコヒーレントキャプションをニュースドキュメントに生成することを目的としている。隣接する画像の間には、同じ実体や出来事をしばしば記述するため、コヒーレンスな関係がある。これらの関係は、エンティティ対応のマルチイメージキャプションにおいて重要であるが、エンティティ対応のシングルイメージキャプションでは無視される。既存の作品の多くは単一画像キャプションに焦点を当てているが、複数画像キャプションはこれまでに研究されていない。そこで本稿では,コヒーレンス関係を利用したコヒーレントなエンティティ対応多画像キャプションモデルを提案する。このモデルはトランスフォーマーベースのキャプション生成モデルと2種類のコントラスト学習ベースのコヒーレンス機構から構成される。生成モデルは、画像及び付随するテキストに注意を払ってキャプションを生成する。水平コヒーレンス機構は、キャプションを隣接画像のキャプションとコヒーレントにすることを目的としている。垂直コヒーレンス機構は、キャプションを画像と付随するテキストと一貫性を持たせることを目的としている。キャプション間のコヒーレンスを評価するために,2つのコヒーレンス評価指標を提案する。新しいデータセットDM800Kは、既存の2つのデータセットであるGoodNewsとNYT800Kよりもドキュメント当たりの画像が多く、マルチイメージキャプションに適している。 3つのデータセットで実験したところ,提案したキャプションモデルは,単画像キャプション評価により6つのベースラインを上回り,生成したキャプションはコヒーレンス評価や人間評価によりベースラインよりもコヒーレントであることがわかった。

Coherent entity-aware multi-image captioning aims to generate coherent captions for multiple adjacent images in a news document. There are coherence relationships among adjacent images because they often describe same entities or events. These relationships are important for entity-aware multi-image captioning, but are neglected in entity-aware single-image captioning. Most existing work focuses on single-image captioning, while multi-image captioning has not been explored before. Hence, this paper proposes a coherent entity-aware multi-image captioning model by making use of coherence relationships. The model consists of a Transformer-based caption generation model and two types of contrastive learning-based coherence mechanisms. The generation model generates the caption by paying attention to the image and the accompanying text. The horizontal coherence mechanism aims to the make the caption coherent with captions of adjacent images. The vertical coherence mechanism aims to make the caption coherent with the image and the accompanying text. To evaluate coherence between captions, two coherence evaluation metrics are proposed. The new dataset DM800K is constructed that has more images per document than two existing datasets GoodNews and NYT800K, and are more suitable for multi-image captioning. Experiments on three datasets show the proposed captioning model outperforms 6 baselines according to single-image captioning evaluations, and the generated captions are more coherent than that of baselines according to coherence evaluations and human evaluations.

翻訳日:2023-02-07 20:05:39 公開日:2023-02-04

# 体重、注意は必要か? AEIUOrder:トランスフォーマーにおける層重行列のグリーディ順序付けによる翻訳の改善

Weight, Is Attention All We Need? AEIUOrder: Greedy Ordering of Layer Weight Matrices in Transformer Improves Translation ( http://arxiv.org/abs/2302.02123v1 )

ライセンス: Link先を確認

Elicia Ye

(参考訳) 先行研究では、トランスフォーマベースのエンコーダ・デコーダアーキテクチャの内部構造と機能について、マルチヘッドアテンションとフィードフォワードサブレイヤーのレベルで理解しようと試みている。解釈は、エンコーダとデコーダに焦点を合わせ、セルフアテンション、クロスアテンション、フィードフォワードサブレイヤーの組合せ可能性に焦点を当てている。トランスフォーマーのサブ層抽象に飛び込み、その層重行列を置換することで翻訳の質を向上させることができるか? 本稿では,ランダム行列理論 (rmt) の指標を用いて,エンコーダ内の層重み行列を規則的に順序付けし,エンコーダの順序付けを逆転させる手法を提案する。目的は、デコーダ構造がエンコーダの逆過程を表現するのに役立ちながら、エンコーダの完全訓練性を最大化することである。標準トランスフォーマー(6層、モデル次元512)では、IWSLT 2016ドイツ語翻訳タスクで34.62点(ベースライン34.31点)、WMT 2014英語翻訳タスクで27.95点(ベースライン27.91点)を達成している。 AEIUOrderは、様々な深さと埋め込み次元を持つトランスフォーマーでも実現されており、浅いスリムなモデルよりもより深く、より広いモデルで大幅に改善されている。例えば、8層、768次元、4層、1024次元変換器は、IWSLT 2016の英独翻訳タスク(28.53と28.97)でそれぞれ29.1と29.31のBLEUスコアを達成している。以上の結果から, RMTをモチベーションとした手法は, 層重行列を優雅に並べ替えることで, 表現を学習し, 翻訳をより効果的に生成する。

Prior work has attempted to understand the internal structures and functionalities of Transformer-based encoder-decoder architectures on the level of multi-head attention and feed-forward sublayers. Interpretations have focused on the encoder and decoder, along with the combinatorial possibilities of the self-attention, cross-attention, and feed-forward sublayers. Could we improve the quality of translation by diving into the Transformer sublayer abstractions and permuting its layer weight matrices? We propose AEIUOrder to greedily reorder layer weight matrices in the encoder by their well-trainedness, as measured by Random Matrix Theory (RMT) metrics, and reverse the ordering scheme for the encoder. The objective is to maximize Total well-trainedness in the encoder while the decoder structure serves to represent the reverse process of encoding. On the standard Transformer (6 layers, model dimension 512), AEIUOrder achieves a BLEU score of 34.62 (baseline 34.31) on the IWSLT 2016 German-to-English translation task, and 27.95 BLEU on the WMT 2014 English-to-German translation task (baseline 27.91). AEIUOrder is also realized on Transformers with various depths and embedding dimensions, showing significant improvements on deeper, wider models than on their shallower, slimmer counterparts. For instance, the 8-layer, 768-dimension and the 4-layer, 1024-dimension Transformers achieve respective 29.1 and 29.31 BLEU scores on the IWSLT 2016 English-to-German translation task (28.53 and 28.97 on respective baselines). Our results suggest that the RMT-motivated approach to maximize \textit{Total well-trainedness}, by greedily reordering its layer weight matrices, facilitates the model to learn representations and generate translations more effectively.

翻訳日:2023-02-07 20:05:11 公開日:2023-02-04

# クロスドメイン戦略に基づく偽ニュース検出のためのXAIモデル

A New cross-domain strategy based XAI models for fake news detection ( http://arxiv.org/abs/2302.02122v1 )

ライセンス: Link先を確認

Deepak Kanneganti

(参考訳) 本研究では,事前学習モデルにおける偽ニュース検出のための4レベルクロスドメイン戦略を提案する。クロスドメインテキスト分類は、ソースドメインの知識を使用してターゲットドメインを採用するモデルのタスクである。これらの複雑なモデルの振る舞いを理解するには説明可能性が不可欠である。精巧なチューンベルト模型が用いられる。異なるドメインのデータセットを使用して、いくつかの実験でクロスドメイン分類を実行する。 Anchor、ELI5、LIME、SHAPなどの説明モデルは、クロスドメインレベルに対する新しい説明可能なアプローチを設計するために使用される。実験分析により、異なるレベルのクロスドメイン上の理想的なXAIモデルが得られた。

In this study, we presented a four-level cross-domain strategy for fake news detection on pre-trained models. Cross-domain text classification is a task of a model adopting a target domain by using the knowledge of the source domain. Explainability is crucial in understanding the behaviour of these complex models. A fine-tune BERT model is used to. perform cross-domain classification with several experiments using datasets from different domains. Explanatory models like Anchor, ELI5, LIME and SHAP are used to design a novel explainable approach to cross-domain levels. The experimental analysis has given an ideal pair of XAI models on different levels of cross-domain.

翻訳日:2023-02-07 20:04:34 公開日:2023-02-04

# 自己再生による多様性誘導型環境設計

Diversity Induced Environment Design via Self-Play ( http://arxiv.org/abs/2302.02119v1 )

ライセンス: Link先を確認

Dexun Li, Wenjun Li, Pradeep Varakantham

(参考訳) 環境の適切な分布を設計する最近の研究は、効果的な汎用エージェントの訓練を約束していることを示している。その成功の一部は、エージェントの能力の最前線で環境インスタンス(またはレベル)を生成する適応的なカリキュラム学習の形式が原因である。しかし、このような環境設計フレームワークは、しばしば挑戦的な設計空間において効果的なレベルを見つけるのに苦労し、環境とのコストのかかる相互作用を必要とする。本稿では,Unsupervised Environment Design (UED) フレームワークに多様性を導入することを目的とする。具体的には,与えられたレベルを表す観測/隠蔽状態を特定するタスク非依存の手法を提案する。この手法の結果は, 2つのレベル間の多様性を特徴付けるために利用され, 有効性能に欠かせないことが示されている。さらに, サンプリング効率を向上させるため, 環境生成装置が学習エージェントにとって非常に有益な環境を自動的に生成できるセルフプレイ技術も取り入れた。提案手法は,DivSP(DivSP)による環境設計であり,既存の手法よりも優れた性能を示す。

Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods.

翻訳日:2023-02-07 20:04:26 公開日:2023-02-04

# Visual Commonsense Reasoningのためのビジョンアテンションの学習

Learning to Agree on Vision Attention for Visual Commonsense Reasoning ( http://arxiv.org/abs/2302.02117v1 )

ライセンス: Link先を確認

Zhenyang Li, Yangyang Guo, Yangyang Guo, Fan Liu, Liqiang Nie, Mohan Kankanhalli

(参考訳) visual commonsense reasoning (vcr) は、視覚推論の分野では重要なが困難な研究課題である。 vcrモデルは一般的に、画像に関するテキスト質問に応答することを目的としており、その後、前回の応答プロセスの合理化予測を行う。これら2つのプロセスは逐次的かつ相互に絡み合っているが、既存のメソッドは常にこれらを2つの独立したマッチングベースのインスタンスと見なしている。したがって、2つのプロセス間の重要な関係を無視し、最適化されたモデル性能に繋がる。本稿では,これら2つのプロセスを統一的な枠組みで効果的に処理する新しい視覚的アライメント手法を提案する。そこで我々はまず,各プロセスで生成した視覚注意マップを集約する再認識モジュールを設計する。その後、2つの注意マップのセットを注意深く並べて、同じ画像領域に基づいて2つのプロセスを導く。本稿では,本手法を従来の注意と最近のTransformerモデルの両方に適用し,VCRベンチマークデータセット上で広範な実験を行う。その結果,アテンションアライメントモジュールにより,本手法は基本手法よりも大幅に改善され,両手法の結合性および提案手法の有効性が明らかとなった。

Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning. A VCR model generally aims at answering a textual question regarding an image, followed by the rationale prediction for the preceding answering process. Though these two processes are sequential and intertwined, existing methods always consider them as two independent matching-based instances. They, therefore, ignore the pivotal relationship between the two processes, leading to sub-optimal model performance. This paper presents a novel visual attention alignment method to efficaciously handle these two processes in a unified framework. To achieve this, we first design a re-attention module for aggregating the vision attention map produced in each process. Thereafter, the resultant two sets of attention maps are carefully aligned to guide the two processes to make decisions based on the same image regions. We apply this method to both conventional attention and the recent Transformer models and carry out extensive experiments on the VCR benchmark dataset. The results demonstrate that with the attention alignment module, our method achieves a considerable improvement over the baseline methods, evidently revealing the feasibility of the coupling of the two processes as well as the effectiveness of the proposed method.

翻訳日:2023-02-07 20:04:09 公開日:2023-02-04

# テンソル回復が保証された低ランク性と滑らかさ

Guaranteed Tensor Recovery Fused Low-rankness and Smoothness ( http://arxiv.org/abs/2302.02155v1 )

ライセンス: Link先を確認

Hailin Wang, Jiangjun Peng, Wenjin Qin, Jianjun Wang and Deyu Meng

(参考訳) したがって、テンソルデータ回復タスクは近年多くの研究の注目を集めている。このような不正な問題を解くには、一般に、テンソルデータに基づく固有の事前構造を探索し、復元テンソルの音響推定を導くためのある種の正規化項として定式化する必要がある。近年の研究では、異なるテンソルモードにまたがる2つの洞察力に富んだテンソル前置法、すなわち大域的低ランク性 (l) と局所的滑らか性 (s) が適用され、これは常に2つの別々の正規化項の和としてリカバリモデルに符号化されている。しかし、低ランクテンソルの回復に関する主要な理論的な発展とは異なり、これらのl+sモデルは理論上の正確な再現性保証を持っておらず、実際の手法では信頼性に欠ける。この重要な問題に対して、本研究では、テンソルのlとsのプリエントを同時にエンコードする一意な正規化項を構築する。特に、この単一正則化器をリカバリモデルに組み込むことで、テンソル完備化(TC)とテンソル頑健成分分析(TRPCA)という2つの典型的なテンソルリカバリタスクの正確なリカバリ保証を厳格に証明することができる。我々の知る限りでは、これはテンソルリカバリのためのすべての関連するL+S法の中では初めての正確な復元結果である。様々な視覚的テンソルデータを持つ複数のTCおよびTRPCAタスクにおいて、他の多くのSOTA法よりも重要な回復精度の改善が広範な実験で観測された。典型的には、カラー画像の塗装作業において、欠落率が非常に大きい場合(例えば99.5%)に動作可能性能が得られるが、この課題では全ピアが完全に失敗する。

The tensor data recovery task has thus attracted much research attention in recent years. Solving such an ill-posed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global low-rankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on low-rank tensor recovery, these joint L+S models have no theoretical exact-recovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equipping this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exact-recovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.

翻訳日:2023-02-07 19:58:52 公開日:2023-02-04

# この腸は存在しない:リアルな無線カプセル内視鏡画像生成のためのマルチスケール残差オートエンコーダ

This Intestine Does Not Exist: Multiscale Residual Variational Autoencoder for Realistic Wireless Capsule Endoscopy Image Generation ( http://arxiv.org/abs/2302.02150v1 )

ライセンス: Link先を確認

Dimitrios E. Diamantis, Panagiota Gatoula, Anastasios Koulaouzidis, and Dimitris K. Iakovidis

(参考訳) 医用画像合成は、画像ベースの臨床決定支援(CDS)システムにおいて、機械学習アルゴリズムのトレーニングに必要な注釈付き医療データの限られた可用性に対応するための、有望なソリューションとして登場した。この目的のために、GAN(Generative Adversarial Networks)は、データ拡張のための合成画像を生成するアルゴリズムトレーニングプロセスを支援するために主に適用されてきた。しかし、Wireless Capsule Endoscopy (WCE)の分野では、既存の公開アノテーションデータセットの限られた内容の多様性とサイズは、GANのトレーニング安定性と合成性能の両方に悪影響を及ぼす。 WCE画像合成のための実行可能なソリューションとして,新しい変分オートエンコーダアーキテクチャ,すなわち "This Intestine Does Not Exist" (TIDE)を提案する。提案するアーキテクチャは,多スケールな特徴抽出畳み込みブロックと残差接続を含み,限られた数のトレーニング画像でも高品質で多様なデータセットを生成できる。利用可能なデータセットの増大を指向した現在のアプローチとは対照的に,本研究では,TIDEを用いて実WCEデータセットを人工的に生成したデータセットに置き換えることが,分類性能を損なうことなく可能であることを示す。さらに、経験豊富なWCEスペシャリストによる質的およびユーザ評価研究は、TIDEによって合成された正常なWCE画像と異常なWCE画像の両方が十分に現実的であるという医学的観点から検証する。

Medical image synthesis has emerged as a promising solution to address the limited availability of annotated medical data needed for training machine learning algorithms in the context of image-based Clinical Decision Support (CDS) systems. To this end, Generative Adversarial Networks (GANs) have been mainly applied to support the algorithm training process by generating synthetic images for data augmentation. However, in the field of Wireless Capsule Endoscopy (WCE), the limited content diversity and size of existing publicly available annotated datasets, adversely affect both the training stability and synthesis performance of GANs. Aiming to a viable solution for WCE image synthesis, a novel Variational Autoencoder architecture is proposed, namely "This Intestine Does not Exist" (TIDE). The proposed architecture comprises multiscale feature extraction convolutional blocks and residual connections, which enable the generation of high-quality and diverse datasets even with a limited number of training images. Contrary to the current approaches, which are oriented towards the augmentation of the available datasets, this study demonstrates that using TIDE, real WCE datasets can be fully substituted by artificially generated ones, without compromising classification performance. Furthermore, qualitative and user evaluation studies by experienced WCE specialists, validate from a medical viewpoint that both the normal and abnormal WCE images synthesized by TIDE are sufficiently realistic.

翻訳日:2023-02-07 19:58:22 公開日:2023-02-04

# 神経オートマトンに対する不変量

Invariants for neural automata ( http://arxiv.org/abs/2302.02149v1 )

ライセンス: Link先を確認

Jone Uria-Albizuri, Giovanni Sirio Carmantini, Peter beim Graben, Serafim Rodrigues

(参考訳) 神経力学系の計算モデリングは、しばしばニューラルネットワークとシンボリックダイナミクスを展開する。ベクトル記号アーキテクチャと呼ばれるフレームワーク内でこれらのアプローチを組み合わせる特別な方法は、神経オートマトンにつながる。私たちがこの枠組みで追求した興味深い研究の方向性は、ニューラルオートマトンとして表現される神経力学へのシンボリックダイナミクスのマッピングを検討することです。この表現論は、脳がチューリング計算をどのように実装するのかといった質問を可能にする。具体的には、この表現理論において、ニューラルオートマトンは、記号とシンボル文字列を数値に割り当てることによって生じる。この代入記号計算は、実位相空間における状態ベクトルの軌跡によって表現され、実空間の測定と実験データとの統計的相関解析を可能にする。しかし、これらの割り当ては通常、完全に任意である。したがって、そのような表現の下で観察されるダイナミクスのどの側面がダイナミクスに固有のものであり、どれがそうではないのかという問題に対処するのは理にかなっている。本研究では,異なる符号化条件下での神経オートマトンの対称性と不変量を調べるための形式的厳密な数学的枠組みを考案する。中心的な概念として、そのようなシステムに対する平等のパターンを定義する。我々は、ニューラルネットワークの平均活性化レベルなど、異なるマクロ可観測性を検討し、その不変性を求める。この結果から, 平均アクティベーションは変化しないものの, 同一性のパターン上で定義されるステップ関数のみが再符号化の下で不変であることが示唆された。我々の研究は、特定のエンコーディングに依存し、ダイナミクスに固有のものではないコンバウンディング結果を避けるために、ニューロシンボリックプロセッサを用いた実世界の計測の回帰研究において極めて重要である可能性がある。

Computational modeling of neurodynamical systems often deploys neural networks and symbolic dynamics. A particular way for combining these approaches within a framework called vector symbolic architectures leads to neural automata. An interesting research direction we have pursued under this framework has been to consider mapping symbolic dynamics onto neurodynamics, represented as neural automata. This representation theory, enables us to ask questions, such as, how does the brain implement Turing computations. Specifically, in this representation theory, neural automata result from the assignment of symbols and symbol strings to numbers, known as G\"odel encoding. Under this assignment symbolic computation becomes represented by trajectories of state vectors in a real phase space, that allows for statistical correlation analyses with real-world measurements and experimental data. However, these assignments are usually completely arbitrary. Hence, it makes sense to address the problem question of, which aspects of the dynamics observed under such a representation is intrinsic to the dynamics and which are not. In this study, we develop a formally rigorous mathematical framework for the investigation of symmetries and invariants of neural automata under different encodings. As a central concept we define patterns of equality for such systems. We consider different macroscopic observables, such as the mean activation level of the neural network, and ask for their invariance properties. Our main result shows that only step functions that are defined over those patterns of equality are invariant under recodings, while the mean activation is not. Our work could be of substantial importance for related regression studies of real-world measurements with neurosymbolic processors for avoiding confounding results that are dependant on a particular encoding and not intrinsic to the dynamics.

翻訳日:2023-02-07 19:57:56 公開日:2023-02-04

# トラップイオン量子計算と量子シミュレーションのためのエンタングルゲート

Entangling gates for trapped-ion quantum computation and quantum simulation ( http://arxiv.org/abs/2302.02148v1 )

ライセンス: Link先を確認

Zhengyang Cai, Chunyang Luan, Lingfeng Ou, Hengchao Tu, Zihan Yin, Jing-Ning Zhang, and Kihwan Kim

(参考訳) トラップイオン系は1995年にciracとzollerによって量子ゲートの最初のスキームが提唱されて以来、実用的な量子計算と量子シミュレーションのための主要なプラットフォームとなっている。閉じ込められたイオンを持つ量子ゲートは、全ての物理プラットフォームの中で最も高い忠実度を示している。近年, 振幅, 位相, 周波数変調, 多周波印加などの量子ゲートの高度なスキームが開発され, ゲートの高速化, 多数の不完全性に対する堅牢化, および複数の量子ビットに適用されている。ここでは、イオンを閉じ込めた量子ゲートの基本原理と最近の発展について述べる。

The trapped-ion system has been a leading platform for practical quantum computation and quantum simulation since the first scheme of a quantum gate was proposed by Cirac and Zoller in 1995. Quantum gates with trapped ions have shown the highest fidelity among all physical platforms. Recently, sophisticated schemes of quantum gates such as amplitude, phase, frequency modulation, or multi-frequency application, have been developed to make the gates fast, robust to many types of imperfections, and applicable to multiple qubits. Here, we review the basic principle and recent development of quantum gates with trapped ions.

翻訳日:2023-02-07 19:57:28 公開日:2023-02-04

# スピン軌道結合ボース・アインシュタイン凝縮体を用いたキャビティqedの多重安定性

Multi-Stability in Cavity QED with Spin-Orbit Coupled Bose-Einstein Condensate ( http://arxiv.org/abs/2302.02147v1 )

ライセンス: Link先を確認

Kashif Ammar Yasir and Gao Xianlong

(参考訳) スピン軌道結合ボース・アインシュタイン凝縮体を含むキャビティ系において,強いポンプレーザーにより駆動される定常多重安定性の発生について検討した。印加された磁場はボース・アインシュタイン凝縮体を擬スピン状態へ分割し、超低温原子と直接相互作用する2つの対向伝播ラマンレーザーの運動量に敏感となる。全てのサブシステムに対して定常状態のダイナミクスを制御した後、キャビティ・原子系の以前の研究と異なり、キャビティ・フォトン数の多安定挙動の出現を示す。しかし、このマルチスタビリティは関連するシステムパラメータで調整できる。さらに, 準スピン-$\uparrow$ amd spin-$\downarrow$状態の原子集団に対する混合安定挙動の出現を, いわゆる双不安定な形で示す。これらの原子数状態の集合的挙動は、スピン軌道カップリングとゼーマン場効果によって強化および制御できる、両方のスピン状態の集団間の遷移界面を持つ。さらに, 擬似スピン状態の機械的散逸速度を増加させることにより, 二次界面が出現することを示す。これらの界面は、空洞によって媒介される合成スピン状態の非自明な挙動によって引き起こされる可能性がある。我々の発見は光スイッチングの課題に欠かせないだけでなく、空洞量子電磁力学による合成原子状態の力学的側面の研究の基礎となるかもしれない。

We investigate the occurrence of steady-state multi-stability in a cavity system containing spin-orbit coupled Bose-Einstein condensate and driven by a strong pump laser. The applied magnetic field splits the Bose-Einstein condensate into pseudo-spin states, which then became momentum sensitive with two counter propagating Raman lasers directly interacting with ultra-cold atoms. After governing the steady-state dynamics for all associated subsystems, we show the emergence of multi-stable behavior of cavity photon number, which is unlike with previous investigation on cavity-atom systems. However, this multi-stability can be tuned with associated system parameters. Further, we illustrate the occurrence of mixed-stability behavior for atomic population of the pseudo spin-$\uparrow$ amd spin-$\downarrow$ states, which are appearing in so-called bi-unstable form. The collective behavior of these atomic number states interestingly possesses a transitional interface among the population of both spin states, which can be enhance and controlled by spin-orbit coupling and Zeeman field effects. Furthermore, we illustrate the emergence of secondary interface mediated by increasing the mechanical dissipation rate of the pseudo-spin states. These interfaces could be cause by the non-trivial behavior of synthetic spin state mediated by cavity. Our findings are not only crucial for the subject of optical switching, but also could provide foundation for future studies on mechanical aspect of synthetic atomic states with cavity quantum electrodynamics.

翻訳日:2023-02-07 19:57:17 公開日:2023-02-04

# 能力属性と注意機構による解釈可能な知識追跡の強化

Augmenting Interpretable Knowledge Tracing by Ability Attribute and Attention Mechanism ( http://arxiv.org/abs/2302.02146v1 )

ライセンス: Link先を確認

Yuqi Yue, Xiaoqing Sun, Weidong Ji, Zengxiang Yin, Chenghong Sun

(参考訳) 知識追跡は、学生の過去の回答シーケンスをモデル化し、運動中の知識獲得の変化を追跡し、将来の学習性能を予測することを目的としている。既存のアプローチのほとんどは、生徒の能力が常に個人によって変化または変化しているという事実を無視し、モデル予測の解釈可能性に欠けている。そこで本稿では,能力特性と注意機構に基づく新しいモデルを提案する。まず, 学生の能力特性を把握し, 生徒を類似能力を持つグループに動的に割り当て, 演習の注意重みを計算し, モデルの解釈可能性を高めることで, 演習のスキルとの関連性を定量化する。大規模実験を行い,実オンライン教育データセットの評価を行った。その結果,提案モデルが5つの代表的な知識トレースモデルよりも性能予測に優れていることが判明し,モデル予測結果が推論経路を通じて説明される。

Knowledge tracing aims to model students' past answer sequences to track the change in their knowledge acquisition during exercise activities and to predict their future learning performance. Most existing approaches ignore the fact that students' abilities are constantly changing or vary between individuals, and lack the interpretability of model predictions. To this end, in this paper, we propose a novel model based on ability attributes and attention mechanism. We first segment the interaction sequences and captures students' ability attributes, then dynamically assign students to groups with similar abilities, and quantify the relevance of the exercises to the skill by calculating the attention weights between the exercises and the skill to enhance the interpretability of the model. We conducted extensive experiments and evaluate real online education datasets. The results confirm that the proposed model is better at predicting performance than five well-known representative knowledge tracing models, and the model prediction results are explained through an inference path.

翻訳日:2023-02-07 19:56:51 公開日:2023-02-04

# LipFormer:ビジュアルランドマーク変換器による未確認話者のリフレッド学習

LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers ( http://arxiv.org/abs/2302.02141v1 )

ライセンス: Link先を確認

Feng Xue, Yu Li, Deyin Liu, Yincen Xie, Lin Wu, Richang Hong

(参考訳) lipreadingは、ビデオ中の話者の音声を自然言語に理解し、さらに翻訳することを指す。 state-of-the-art lipreading methodはオーバーラップスピーカーの解釈に優れており、トレーニングセットと推論セットの両方に話者が現れている。しかし,これらの手法の一般化は,訓練銀行における話者数の制限や,異なる話者に対する唇の形状・色の違いによる視覚的変化により,破滅的な性能劣化を引き起こす。したがって、唇の目に見える変化によってのみ、モデルオーバーフィットを引き起こす傾向がある。この問題に対処するために、話者の身元に関係なく唇の動きを記述できる視覚的・ランドマーク横断のマルチモーダル機能を提案する。次に,視覚ランドマークトランスフォーマー,すなわちリップフォーマーに基づく文レベルのリップリードフレームワークを開発した。特に、リップフォーマーは、唇の動きの流れ、顔のランドマークの流れ、および交叉モーダル融合からなる。 2つのストリームからの埋め込みは、視覚とランドマークの調整を達成するためにクロスアテンションモジュールに供給される自己アテンションによって生成される。最後に、得られた融合機能は、カスケードSeq2seqモデルで出力テキストにデコードできる。実験により,本手法は未知話者へのモデル一般化を効果的に促進できることが示された。

Lipreading refers to understanding and further translating the speech of a speaker in the video into natural language. State-of-the-art lipreading methods excel in interpreting overlap speakers, i.e., speakers appear in both training and inference sets. However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in training bank and the evident visual variations caused by the shape/color of lips for different speakers. Therefore, merely depending on the visible changes of lips tends to cause model overfitting. To address this problem, we propose to use multi-modal features across visual and landmarks, which can describe the lip motion irrespective to the speaker identities. Then, we develop a sentence-level lipreading framework based on visual-landmark transformers, namely LipFormer. Specifically, LipFormer consists of a lip motion stream, a facial landmark stream, and a cross-modal fusion. The embeddings from the two streams are produced by self-attention, which are fed to the cross-attention module to achieve the alignment between visuals and landmarks. Finally, the resulting fused features can be decoded to output texts by a cascade seq2seq model. Experiments demonstrate that our method can effectively enhance the model generalization to unseen speakers.

翻訳日:2023-02-07 19:56:34 公開日:2023-02-04

# ボトムアップ自己組織特性をもつ動的方程式は、損失関数のない正確な動的階層を学習する

Dynamical Equations With Bottom-up Self-Organizing Properties Learn Accurate Dynamical Hierarchies Without Any Loss Function ( http://arxiv.org/abs/2302.02140v1 )

ライセンス: Link先を確認

Danilo Vasconcellos Vargas, Tham Yik Foong, Heng Zhang

(参考訳) 自己組織化は自然と心に普遍的である。しかし、機械学習と認知理論は依然として主題にほとんど触れていない。ハードルは、一般的なパターンを動的方程式の観点で定義することは困難であり、再順序付けによって学習できるシステムを設計することは、まだ見ることができないことである。本稿では,非線形力学の領域において正のフィードバックループと負のフィードバックループでパターンを定義できる学習システムを提案する。実験により、このようなシステムは時間と空間の相関関係をマッピングでき、階層構造を逐次データから学べることが明らかとなった。その結果は、最先端の教師なし学習アルゴリズムを8つの実験のうち7つと現実世界の2つの問題で上回るほど正確だ。興味深いことに、システムの動的性質は本質的に適応し、入力構造が変化すると化学・熱力学の相転移に似た現象を引き起こす。この研究は、自己組織化によってパターン認識が実現し、目的や余分な機能を持たずに単純な動的方程式から知的な振る舞いが生まれることを示唆している。

Self-organization is ubiquitous in nature and mind. However, machine learning and theories of cognition still barely touch the subject. The hurdle is that general patterns are difficult to define in terms of dynamical equations and designing a system that could learn by reordering itself is still to be seen. Here, we propose a learning system, where patterns are defined within the realm of nonlinear dynamics with positive and negative feedback loops, allowing attractor-repeller pairs to emerge for each pattern observed. Experiments reveal that such a system can map temporal to spatial correlation, enabling hierarchical structures to be learned from sequential data. The results are accurate enough to surpass state-of-the-art unsupervised learning algorithms in seven out of eight experiments as well as two real-world problems. Interestingly, the dynamic nature of the system makes it inherently adaptive, giving rise to phenomena similar to phase transitions in chemistry/thermodynamics when the input structure changes. Thus, the work here sheds light on how self-organization can allow for pattern recognition and hints at how intelligent behavior might emerge from simple dynamic equations without any objective/loss function.

翻訳日:2023-02-07 19:56:13 公開日:2023-02-04

# HSICを用いたグラフニューラルネットワークの構造記述

Structural Explanations for Graph Neural Networks using HSIC ( http://arxiv.org/abs/2302.02139v1 )

ライセンス: Link先を確認

Ayato Toyokuni, Makoto Yamada

(参考訳) グラフニューラルネットワーク(GNN)は、グラフィカルなタスクをエンドツーエンドで処理するニューラルネットワークの一種である。近年,グラフ分類やリンク予測,レコメンデーションなど,さまざまなタスクで高いパフォーマンスを達成しているため,機械学習やデータマイニングコミュニティでは,gnnが注目を集めている。しかしながら、gnnの複雑なダイナミクスは、グラフの機能のどの部分が予測により強く寄与するかを理解するのを難しくする。解釈可能性問題に対処するため,近年,様々なGNN説明法が提案されている。本研究では,Hilbert-Schmidt independent criterion (HSIC) を用いて,2変数間の非線型依存性をカーネルを通して捉えることにより,グラフの有意な構造を検出するフレキシブルモデル非依存的説明法を提案する。具体的には、グループラッソと融合ラッソに基づくノード説明法を用いて、ノード説明のためのGraphLIME法を拡張する。 GraphLIMEによるグループと融合正規化は、サブ構造単位におけるGNNの解釈を可能にする。次に,提案手法を逐次グラフ分類タスクの説明に利用できることを示す。実験により,対象とするグラフの重要な構造を様々な設定で識別できることを実証した。

Graph neural networks (GNNs) are a type of neural model that tackle graphical tasks in an end-to-end manner. Recently, GNNs have been receiving increased attention in machine learning and data mining communities because of the higher performance they achieve in various tasks, including graph classification, link prediction, and recommendation. However, the complicated dynamics of GNNs make it difficult to understand which parts of the graph features contribute more strongly to the predictions. To handle the interpretability issues, recently, various GNN explanation methods have been proposed. In this study, a flexible model agnostic explanation method is proposed to detect significant structures in graphs using the Hilbert-Schmidt independence criterion (HSIC), which captures the nonlinear dependency between two variables through kernels. More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lasso-based node explanation method. The group and fused regularization with GraphLIME enables the interpretation of GNNs in substructure units. Then, we show that the proposed approach can be used for the explanation of sequential graph classification tasks. Through experiments, it is demonstrated that our method can identify crucial structures in a target graph in various settings.

翻訳日:2023-02-07 19:55:54 公開日:2023-02-04

# FedSpectral+:Federated Learningを用いたスペクトルクラスタリング

FedSpectral+: Spectral Clustering using Federated Learning ( http://arxiv.org/abs/2302.02137v1 )

ライセンス: Link先を確認

Janvi Thakkar, Devvrat Joshi

(参考訳) グラフのクラスタリングはよく知られた研究問題であり、特にインターネットやソーシャルネットワークのデータのほとんどはグラフの形式である。組織は、グラフデータセットのクラスタリングを見つけるために、スペクトルクラスタリングアルゴリズムを広く使っている。しかし、スペクトルクラスタリングを大規模データセットに適用することは、計算オーバーヘッドのため困難である。分散スペクトルクラスタリングアルゴリズムは存在するが、データプライバシとクライアント間の通信コストの増大という問題に直面している。そこで本稿では,これらの問題を克服するために,フェデレートラーニング(FL)を用いたスペクトルクラスタリングアルゴリズムを提案する。 FLは、ユーザの生データを収集するのではなく、各学習者のモデルパラメータを蓄積し、スケーラビリティとデータのプライバシを提供するプライバシー保護アルゴリズムである。我々はFedSpectralとFedSpectral+の2つのアプローチを開発した。 FedSpectralは、局所スペクトルクラスタリングラベルを使用して、類似性グラフを作成することで、グローバルスペクトルクラスタリングを集約するベースラインアプローチである。最先端のアプローチであるfeedspectral+は、power iterationメソッドを使用して、クライアント間で分散された生情報にアクセスせずにグラフデータ全体を組み込むことで、グローバルスペクトル埋め込みを学ぶ。さらに,分散アプローチのクラスタリング品質をオリジナル/非flクラスタリングと比較するために,独自の類似度指標を設計した。提案手法であるfeedspectral+は98.85%と99.8%の類似性を持ち、ego-facebookとeメール-eu-coreデータセットのグローバルクラスタリングに匹敵する。

Clustering in graphs has been a well-known research problem, particularly because most Internet and social network data is in the form of graphs. Organizations widely use spectral clustering algorithms to find clustering in graph datasets. However, applying spectral clustering to a large dataset is challenging due to computational overhead. While the distributed spectral clustering algorithm exists, they face the problem of data privacy and increased communication costs between the clients. Thus, in this paper, we propose a spectral clustering algorithm using federated learning (FL) to overcome these issues. FL is a privacy-protecting algorithm that accumulates model parameters from each local learner rather than collecting users' raw data, thus providing both scalability and data privacy. We developed two approaches: FedSpectral and FedSpectral+. FedSpectral is a baseline approach that uses local spectral clustering labels to aggregate the global spectral clustering by creating a similarity graph. FedSpectral+, a state-of-the-art approach, uses the power iteration method to learn the global spectral embedding by incorporating the entire graph data without access to the raw information distributed among the clients. We further designed our own similarity metric to check the clustering quality of the distributed approach to that of the original/non-FL clustering. The proposed approach FedSpectral+ obtained a similarity of 98.85% and 99.8%, comparable to that of global clustering on the ego-Facebook and email-Eu-core dataset.

翻訳日:2023-02-07 19:55:33 公開日:2023-02-04

# 協調型マルチエージェント強化学習のための個別グローバルマックスを伴わない二重自己認識値分解フレームワーク

Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2302.02180v1 )

ライセンス: Link先を確認

Zhiwei Xu, Bin Zhang, Dapeng Li, Guangchong Zhou, Zeren Zhang, Guoliang Fan

(参考訳) 協調型マルチエージェント強化学習分野では, 値分解法が徐々に普及している。しかしながら、ほとんど全ての値分解法は、値分解法が解決できる問題の範囲を制限する、個人的グローバルマックス(IGM)原理またはその変種に従う。心理学における二重自己認識の概念に着想を得て, IGMの前提を完全に否定する二重自己認識価値分解フレームワークを提案する。各エージェントは、アクションを実行するegoポリシと、クレジット割り当てに参加する変更ego値関数で構成される。値関数の分解は明示的な探索手順を用いてIMGの仮定を無視することができる。また,アルゴリズムが局所的に最適になるのを避けるために,新たなエゴ探索機構を提案する。 IGMを含まない最初の完全値分解法として,提案手法は様々な協調作業において望ましい性能を実現する。

Value decomposition methods have gradually become popular in the cooperative multi-agent reinforcement learning field. However, almost all value decomposition methods follow the Individual Global Max (IGM) principle or its variants, which restricts the range of issues that value decomposition methods can resolve. Inspired by the notion of dual self-awareness in psychology, we propose a dual self-awareness value decomposition framework that entirely rejects the IGM premise. Each agent consists of an ego policy that carries out actions and an alter ego value function that takes part in credit assignment. The value function factorization can ignore the IGM assumption by using an explicit search procedure. We also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.

翻訳日:2023-02-07 19:48:38 公開日:2023-02-04

# ハイウェイマージのための教師なしスキル発見による階層学習

Hierarchical Learning with Unsupervised Skill Discovery for Highway Merging Applications ( http://arxiv.org/abs/2302.02179v1 )

ライセンス: Link先を確認

Yigit Gurses, Kaan Buyukdemirci, and Yildiray Yildiz

(参考訳) 人間や自律的なドライバーとの密集したトラフィックの運転は、ダイナミックな環境の変化に素早く反応する能力とともに、高いレベルの計画と推論を必要とする課題である。本研究では,学習動作プリミティブを動作として利用する階層的学習手法を提案する。モーションプリミティブは、所定の報酬関数なしで教師なしスキル発見を使用して取得され、異なるシナリオで再利用することができる。これにより、さまざまな振る舞いを持つ複数のモデルを取得する必要のあるアプリケーション全体のトレーニング時間を短縮できる。シミュレーションの結果,提案手法は,ベースライン強化学習法と比較して,トレーニングの少ないドライバモデルで高い性能が得られることが示された。

Driving in dense traffic with human and autonomous drivers is a challenging task that requires high level planning and reasoning along with the ability to react quickly to changes in a dynamic environment. In this study, we propose a hierarchical learning approach that uses learned motion primitives as actions. Motion primitives are obtained using unsupervised skill discovery without a predetermined reward function, allowing them to be reused in different scenarios. This can reduce the total training time for applications that need to obtain multiple models with varying behavior. Simulation results demonstrate that the proposed approach yields driver models that achieve higher performance with less training compared to baseline reinforcement learning methods.

翻訳日:2023-02-07 19:48:24 公開日:2023-02-04

# 構築文法は、ニューラルネットワークモデルにユニークな洞察を与える

Construction Grammar Provides Unique Insight into Neural Language Models ( http://arxiv.org/abs/2302.02178v1 )

ライセンス: Link先を確認

Leonie Weissweiler, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, Hinrich Sch\"utze

(参考訳) 建設文法 (CxG) は, 大規模事前学習言語モデル (PLM) の性能を, 構造と意味に関して調査する研究の基盤として最近利用されている。本稿では,本研究の継続と拡張について提案する。我々は、CxGを念頭に置いて設計されていない探索手法と、特定の構成のために設計された探索手法を考察する。我々は,過去の研究を詳細に分析し,この新たな分野が直面する最も重要な課題と研究課題について考察する。

Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pretrained language models (PLMs) with respect to the structure and meaning of constructions. In this position paper, we make suggestions for the continuation and augmentation of this line of research. We look at probing methodology that was not designed with CxG in mind, as well as probing methodology that was designed for specific constructions. We analyse selected previous work in detail, and provide our view of the most important challenges and research questions that this promising new field faces.

翻訳日:2023-02-07 19:48:13 公開日:2023-02-04

# フーリエ変換を用いたニューラル時系列解析:サーベイ

Neural Time Series Analysis with Fourier Transform: A Survey ( http://arxiv.org/abs/2302.02173v1 )

ライセンス: Link先を確認

Kun Yi and Qi Zhang and Shoujin Wang and Hui He and Guodong Long and Zhendong Niu

(参考訳) 近年、フーリエ変換が深層ニューラルネットワークに広く導入され、時系列解析の精度と効率の両面で最先端技術が進歩している。効率性やグローバルビューなどの時系列解析におけるフーリエ変換の利点は急速に研究され、時系列解析のための有望なディープラーニングパラダイムが提示されている。しかし、この新興地域では注目が高まり、研究が盛んになっているが、既存の研究の体系的な見直しが欠如している。そこで本稿では,フーリエ変換を用いた時系列解析の研究の包括的レビューを行う。我々は,最新の研究成果を体系的に調査し,要約することを目的とする。そこで我々は,既存のニューラルネットワーク時系列解析手法を特徴,利用パラダイム,ネットワーク設計,応用の4つの観点から分類する新しい分類法を提案する。我々はまた、この活気ある地域で新しい研究の方向性を共有している。

Recently, Fourier transform has been widely introduced into deep neural networks to further advance the state-of-the-art regarding both accuracy and efficiency of time series analysis. The advantages of the Fourier transform for time series analysis, such as efficiency and global view, have been rapidly explored and exploited, exhibiting a promising deep learning paradigm for time series analysis. However, although increasing attention has been attracted and research is flourishing in this emerging area, there lacks a systematic review of the variety of existing studies in the area. To this end, in this paper, we provide a comprehensive review of studies on neural time series analysis with Fourier transform. We aim to systematically investigate and summarize the latest research progress. Accordingly, we propose a novel taxonomy to categorize existing neural time series analysis methods from four perspectives, including characteristics, usage paradigms, network design, and applications. We also share some new research directions in this vibrant area.

翻訳日:2023-02-07 19:48:05 公開日:2023-02-04

# 位置依存質量を持つ非対称発振器の厳密解とコヒーレント状態

Exact solution and coherent states of an asymmetric oscillator with position-dependent mass ( http://arxiv.org/abs/2302.02172v1 )

ライセンス: Link先を確認

Bruno G. da Costa, Ignacio S. Gomez, and Biswanath Rath

(参考訳) 位置依存質量をもつ変形振動子(da Costa et al., J. Math)の問題を再検討する。 Phys bf 62}, 092101 (2021)] は古典的および量子形式論において、運動エネルギーとポテンシャルエネルギーの両方において質量関数の効果を導入する。得られたハミルトニアンは、通常の位相空間$(x, p)$から変形した1ドル(x_\gamma, \Pi_\gamma)$への点正準変換によってモース振動子に写像される。モースポテンシャルと同様に、変形振動子は古典形式論における無調和振動運動に対応する位相空間における束縛軌道を示し、従って量子形式論における離散スペクトルを持つ束縛状態を示す。一方、位相空間における開軌道は散乱状態と連続エネルギースペクトルと関連している。因子化法を用いて、時間進化とその不確実性などのコヒーレントな状態の特性について検討する。高速な局所化(古典的および量子的)は、非対称な位置依存質量のためにコヒーレントな状態に対して報告される。不確実性関係の時間進化の振動も観察され、変形が増加するにつれて振幅が増加する。

We revisit the problem of the deformed oscillator with position-dependent mass [da Costa et al., J. Math. Phys. {\bf 62}, 092101 (2021)] in the classical and quantum formalisms, by introducing the effect of the mass function in both kinetic and potential energies. The resulting Hamiltonian is mapped into a Morse oscillator by means of a point canonical transformation from the usual phase space $(x, p)$ to a deformed one $(x_\gamma, \Pi_\gamma)$. Similar to the Morse potential, the deformed oscillator presents bound trajectories in phase space corresponding to an anharmonic oscillatory motion in classical formalism and, therefore, bound states with a discrete spectrum in quantum formalism. On the other hand, open trajectories in phase space are associated with scattering states and continuous energy spectrum. Employing the factorization method, we investigate the properties of the coherent states, such as the time evolution and their uncertainties. A fast localization, classical and quantum, is reported for the coherent states due to the asymmetrical position-dependent mass. An oscillation of the time evolution of the uncertainty relationship is also observed, whose amplitude increases as the deformation increases.

翻訳日:2023-02-07 19:47:52 公開日:2023-02-04

# 制約付き連続多目的最適化問題のキャラクタリゼーション:性能空間の観点から

Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective ( http://arxiv.org/abs/2302.02170v1 )

ライセンス: Link先を確認

Aljo\v{s}a Vodopija, Tea Tu\v{s}ar, Bogdan Filipi\v{c}

(参考訳) 制約付き多目的最適化はここ数年で大きな関心を集めている。しかし、制約付き多目的最適化問題(CMOP)はまだ満足できない。したがって、ベンチマークに十分なCMOPの選択は困難であり、形式的な背景が欠けている。本稿では,パフォーマンス空間の観点からCMOPを探索し,この問題に対処する。まず,制約付き多目的最適化のための新しい性能評価手法を提案する。この方法論は、パレートフロントと制約満足度を近似する性能を同時に測定する最初の試みを提供する。第二に、アルゴリズムの性能を区別する最適化問題の能力を測定する手法を提案する。最後に、このアプローチはCMOPの8つの頻繁に使用される人工テストスイートと対比するために使用される。実験結果から,3つのよく知られた多目的最適化アルゴリズムの判別において,どのスイートの方が効率的かが明らかとなった。ベンチマーク設計者は、これらの結果を使用して、必要に応じて最も適切なcmopsを選択することができる。

Constrained multiobjective optimization has gained much interest in the past few years. However, constrained multiobjective optimization problems (CMOPs) are still unsatisfactorily understood. Consequently, the choice of adequate CMOPs for benchmarking is difficult and lacks a formal background. This paper addresses this issue by exploring CMOPs from a performance space perspective. First, it presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization. This methodology offers a first attempt to simultaneously measure the performance in approximating the Pareto front and constraint satisfaction. Secondly, it proposes an approach to measure the capability of the given optimization problem to differentiate among algorithm performances. Finally, this approach is used to contrast eight frequently used artificial test suites of CMOPs. The experimental results reveal which suites are more efficient in discerning between three well-known multiobjective optimization algorithms. Benchmark designers can use these results to select the most appropriate CMOPs for their needs.

翻訳日:2023-02-07 19:47:29 公開日:2023-02-04

# この予測を解くには、どのトレーニングポイントを廃止する必要があるか?

How Many and Which Training Points Would Need to be Removed to Flip this Prediction? ( http://arxiv.org/abs/2302.02169v1 )

ライセンス: Link先を確認

Jinghan Yang, Sarthak Jain, Byron C. Wallace

(参考訳) トレーニングデータの最小部分集合である $\mathcal{S}_t$ を識別する問題は、もし $\mathcal{S}_t$ を構成するインスタンスがトレーニング前に削除された場合、与えられたテストポイント $x_t$ の分類が異なるであろう。このような集合の同定にはいくつかの理由がある。まず、$\mathcal{s}_t$ の濃度はロバスト性の尺度を提供する($|\mathcal{s}_t|$ が $x_t$ で小さい場合は、対応する予測に対する自信が低くなるかもしれない)。第二に、$\mathcal{s}_t$ の尋問は、特定のモデル予測に異議を唱えるための新しいメカニズムを提供するかもしれない:$\mathcal{s}_t$ の点が誤ってラベル付けされたり無関係であったりした場合、これは関連する予測を覆すために議論するかもしれない。 brute-force による $\mathcal{S}_t$ の識別は難解である。我々は、影響関数に基づいて$\mathcal{s}_t$を求めるための比較的高速な近似法を提案し、単純な凸テキスト分類モデルにおいて、これらのアプローチは、予測をひっくり返すような、比較的小さなトレーニング例のセットをうまく識別できることを発見した。我々の知る限り、これは機械学習の文脈で与えられた予測を反転させるのに必要な最小限のトレーニングセットを特定することの問題を調査する最初の試みである。

We consider the problem of identifying a minimal subset of training data $\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of robustness (if $|\mathcal{S}_t|$ is small for $x_t$, we might be less confident in the corresponding prediction), which we show is correlated with but complementary to predicted probabilities. Second, interrogation of $\mathcal{S}_t$ may provide a novel mechanism for contesting a particular model prediction: If one can make the case that the points in $\mathcal{S}_t$ are wrongly labeled or irrelevant, this may argue for overturning the associated prediction. Identifying $\mathcal{S}_t$ via brute-force is intractable. We propose comparatively fast approximation methods to find $\mathcal{S}_t$ based on influence functions, and find that -- for simple convex text classification models -- these approaches can often successfully identify relatively small sets of training examples which, if removed, would flip the prediction. To our knowledge, this is the first work in to investigate the problem of identifying a minimal training set necessary to flip a given prediction in the context of machine learning.

翻訳日:2023-02-07 19:47:17 公開日:2023-02-04

# AUTOLYCUS: 決定木モデルに対するモデル抽出攻撃のための説明可能なAI(XAI)の爆発

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Decision Tree Models ( http://arxiv.org/abs/2302.02162v1 )

ライセンス: Link先を確認

Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday

(参考訳) モデル抽出攻撃は、メンバシップ推論攻撃とモデル反転攻撃とともに、機械学習モデルをターゲットにする最も顕著な敵手法の1つである。一方、説明可能な人工知能(XAI)は、AIの背後にある意思決定プロセスを説明するためのテクニックと手順のセットである。 XAIはAIモデルの背後にある理由を理解するための優れたツールですが、そのような啓示のために提供されるデータは、セキュリティとプライバシの脆弱性を生み出します。本稿では,LIMEによるモデル抽出攻撃であるAUTOLYCUSを提案する。この攻撃は,決定木モデルの決定境界を推測し,対象モデルと同じような振る舞いをする抽出サロゲートモデルを作成する。

Model extraction attack is one of the most prominent adversarial techniques to target machine learning models along with membership inference attack and model inversion attack. On the other hand, Explainable Artificial Intelligence (XAI) is a set of techniques and procedures to explain the decision making process behind AI. XAI is a great tool to understand the reasoning behind AI models but the data provided for such revelation creates security and privacy vulnerabilities. In this poster, we propose AUTOLYCUS, a model extraction attack that exploits the explanations provided by LIME to infer the decision boundaries of decision tree models and create extracted surrogate models that behave similar to a target model.

翻訳日:2023-02-07 19:46:48 公開日:2023-02-04

# 涙を伴う有向非循環グラフ

Directed Acyclic Graphs With Tears ( http://arxiv.org/abs/2302.02160v1 )

ライセンス: Link先を確認

Zhichao Chen, Zhiqiang Ge

(参考訳) ベイズネットワークは、産業プロセスにおける障害の検出と診断によく用いられる手法である。ベイズネットワークの基礎はデータから有向非巡回グラフ(DAG)を学習する構造学習である。しかし、探索空間はプロセス変数の増加とともに超指数的にスケールするので、データ駆動型構造学習は難しい問題となる。この目的のために、NOTEARs法によるDAGは、離散最適化を連続最適化問題に変換するだけでなく、ディープラーニングフレームワークとの互換性もよく研究されている。それでも、NOTEARベースの手法には依然として課題がある。 1) 実現不可能な解は,勾配降下に基づく最適化パラダイムから得られる。 2)学習グラフの非循環を約束する切断操作。この作品において,挑戦の理由は 1) 理論的に解析を行い, 課題2を緩和するために混合整数計画に基づくDAGs with Tears法という新しい手法を提案する。さらに, 先行知識を新たな手法に取り入れることで, 産業プロセスにおいて構造学習をより実用的で有用なものにすることができる。最後に, ケーススタディとして, 数値例と工業例を用いて, 開発手法の優位性を実証する。

Bayesian network is a frequently-used method for fault detection and diagnosis in industrial processes. The basis of Bayesian network is structure learning which learns a directed acyclic graph (DAG) from data. However, the search space will scale super-exponentially with the increase of process variables, which makes the data-driven structure learning a challenging problem. To this end, the DAGs with NOTEARs methods are being well studied not only for their conversion of the discrete optimization into continuous optimization problem but also their compatibility with deep learning framework. Nevertheless, there still remain challenges for NOTEAR-based methods: 1) the infeasible solution results from the gradient descent-based optimization paradigm; 2) the truncation operation to promise the learned graph acyclic. In this work, the reason for challenge 1) is analyzed theoretically, and a novel method named DAGs with Tears method is proposed based on mix-integer programming to alleviate challenge 2). In addition, prior knowledge is able to incorporate into the new proposed method, making structure learning more practical and useful in industrial processes. Finally, a numerical example and an industrial example are adopted as case studies to demonstrate the superiority of the developed method.

翻訳日:2023-02-07 19:46:35 公開日:2023-02-04

# TrajMatch: 軌道マッチングによる道路側LiDARの自動時空間校正に向けて

TrajMatch: Towards Automatic Spatio-temporal Calibration for Roadside LiDARs through Trajectory Matching ( http://arxiv.org/abs/2302.02157v1 )

ライセンス: Link先を確認

Haojie Ren, Sha Zhang, Sugang Li, Yao Li, Xinchen Li, Jianmin Ji, Yu Zhang, Yanyong Zhang

(参考訳) 近年,道路脇にLiDARなどのセンサを配置して交通状況を監視し,自動運転車の認識を支援することが普及している。自動運転車とは異なり、路面センサーは通常異なるサブシステムに関連付けられ、時間と空間の同期が欠如している。キャリブレーションは、中央サーバが異なるロケーションインフラストラクチャによって生成されたデータを融合し、センシング範囲と検出ロバスト性を改善するための重要な技術である。残念ながら、既存のキャリブレーションアルゴリズムは、LiDARが著しく重複しているか、時間キャリブレーションが既に達成されていると仮定することが多い。これらの仮定が常に現実世界に当てはまるわけではないため、既存のアルゴリズムによる校正結果はしばしば不十分であり、人間の関与が常に必要であり、高い労働コストをもたらす。本稿では,道路沿いのLiDARを時間と空間の両方で自動校正できる最初のシステムであるTrajMatchを提案する。主なアイデアは、特別な特徴を抽出するのではなく、検出/追跡タスクの結果に基づいて自動的にセンサーを調整することだ。さらに,本手法の有効性を実験的に検証し,複数のキャリブレーションにおけるパラメータ反復の指導にも利用できることを示す。最後に,TrajMatchの性能を評価するために,シミュレーションデータセットLiDARnet-sim 1.0と実世界のデータセットの2つのデータセットを収集した。実験の結果,trajmatchは10cm未満の空間キャリブレーション誤差と1.5ms未満の時間キャリブレーション誤差を達成できた。

Recently, it has become popular to deploy sensors such as LiDARs on the roadside to monitor the passing traffic and assist autonomous vehicle perception. Unlike autonomous vehicle systems, roadside sensors are usually affiliated with different subsystems and lack synchronization both in time and space. Calibration is a key technology which allows the central server to fuse the data generated by different location infrastructures, which can deliver improve the sensing range and detection robustness. Unfortunately, existing calibration algorithms often assume that the LiDARs are significantly overlapped or that the temporal calibration is already achieved. Since these assumptions do not always hold in the real world, the calibration results from the existing algorithms are often unsatisfactory and always need human involvement, which brings high labor costs. In this paper, we propose TrajMatch -- the first system that can automatically calibrate for roadside LiDARs in both time and space. The main idea is to automatically calibrate the sensors based on the result of the detection/tracking task instead of extracting special features. More deeply, we propose a mechanism for evaluating calibration parameters that is consistent with our algorithm, and we demonstrate the effectiveness of this scheme experimentally, which can also be used to guide parameter iterations for multiple calibration. Finally, to evaluate the performance of TrajMatch , we collect two dataset, one simulated dataset LiDARnet-sim 1.0 and a real-world dataset. Experiment results show that TrajMatch can achieve a spatial calibration error of less than 10cm and a temporal calibration error of less than 1.5ms.

翻訳日:2023-02-07 19:46:18 公開日:2023-02-04

# 低ビットビジョン変換器の無振動量子化

Oscillation-free Quantization for Low-bit Vision Transformers ( http://arxiv.org/abs/2302.02210v1 )

ライセンス: Link先を確認

Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng

(参考訳) 重み振動は量子化対応トレーニングの望ましくない副作用であり、量子化された重みは2つの量子化レベルの間で頻繁にジャンプし、トレーニングの不安定性と準最適最終モデルをもたらす。学習可能なスケーリング係数である$\textit{de facto}$の量子化設定は、重みの振動を増大させる。本研究では,学習可能なスケーリング因子と量的重み振動との関係について検討し,vitをケースドライバとして活用し,その発見と改善について検討した。さらに、量子化重みの相互依存性が$\textit{query}$と$\textit{key}$の自己アテンション層であることから、ViTは振動に弱いことが判明した。そこで,本研究では, 統計的量量化($\rm StatsQ$)による量子化ロバスト性の向上と, 一般的な学習可能スケール法と比較しての信頼性向上($\rm CGA$)による重み付けを凍結し, 発振重みを緩和する($\textit{high confidence}$, $\textit{query}$-$\textit{key}$再パラメータ化($\rm QKR$)によるクエリキーの相互交叉振動の解消と, 結果の勾配推定の緩和を行う($\rm QKR$)3つの手法を提案する。広汎な実験により、これらの手法は重量振動を緩和し、一貫して画像ネットの精度を向上することを示した。具体的には、我々の2ビットのDeiT-T/DeiT-Sアルゴリズムは、それぞれ9.8%と7.7%で先行技術を上回っている。コードは補足資料に含まれており、リリースされます。

Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used $\textit{de facto}$ setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in $\textit{query}$ and $\textit{key}$ of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization ($\rm StatsQ$) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing ($\rm CGA$) that freezes the weights with $\textit{high confidence}$ and calms the oscillating weights; and $\textit{query}$-$\textit{key}$ reparameterization ($\rm QKR$) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that these proposed techniques successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively. The code is included in the supplementary material and will be released.

翻訳日:2023-02-07 19:39:54 公開日:2023-02-04

# 関係型Weisfeiler-Lemanによるリンク予測の一理論

A Theory of Link Prediction via Relational Weisfeiler-Leman ( http://arxiv.org/abs/2302.02209v1 )

ライセンス: Link先を確認

Xingyue Huang, Miguel Romero Orth, \.Ismail \.Ilkan Ceylan, Pablo Barcel\'o

(参考訳) グラフニューラルネットワークは、グラフ構造化データ上での表現学習のための顕著なモデルである。これらのモデルの能力と制限は単純なグラフではよく理解されているが、知識グラフの文脈では、我々の理解は極めて不完全である。この研究の目的は、リンク予測の顕著なタスクに関連する知識グラフに対するグラフニューラルネットワークの展望を体系的に理解することである。我々の分析は、一見無関係なモデルに対する統一的な視点を必要とし、他のモデルもアンロックする。様々なモデルの表現力は、異なる初期化規則を持つ対応する関係性Weisfeiler-Lemanアルゴリズムによって特徴づけられる。この分析は、グラフニューラルネットワークのクラスによってキャプチャされる関数のクラスを正確に論理的に特徴づけるために拡張される。提案手法を実証的に検証した実践的設計選択の利点を理論的に説明する。

Graph neural networks are prominent models for representation learning over graph-structured data. While the capabilities and limitations of these models are well-understood for simple graphs, our understanding remains highly incomplete in the context of knowledge graphs. The goal of this work is to provide a systematic understanding of the landscape of graph neural networks for knowledge graphs pertaining the prominent task of link prediction. Our analysis entails a unifying perspective on seemingly unrelated models, and unlocks a series of other models. The expressive power of various models is characterized via a corresponding relational Weisfeiler-Leman algorithm with different initialization regimes. This analysis is extended to provide a precise logical characterization of the class of functions captured by a class of graph neural networks. Our theoretical findings explain the benefits of some widely employed practical design choices, which are validated empirically.

翻訳日:2023-02-07 19:39:17 公開日:2023-02-04

# 対向摂動下におけるロバスト認証制御

Certified Robust Control under Adversarial Perturbations ( http://arxiv.org/abs/2302.02208v1 )

ライセンス: Link先を確認

Jinghan Yang, Hunmin Kim, Wenbin Wan, Naira Hovakimyan, Yevgeniy Vorobeychik

(参考訳) 自律システムは、高次元の生の入力を、意思決定と制御に使用される予測に変換する機械学習技術にますます依存している。しかし、これらの入力を悪意を持って操作し、その結果、予測することが容易であることが多い。逆入力摂動に対する予測のロバスト性を検証する効果的な手法が提案されているが、予測を下流で利用するための制御システムから切り離されている。本稿では, 逆入力摂動に対する制御の正当性を得るために, 原入力摂動に対する予測の頑健性検証を構成するための最初の手法を提案する。我々は、適応車両制御のケーススタディを用いて、我々のアプローチを説明し、広範囲な実験を通して得られたエンドツーエンド証明書の価値を示す。

Autonomous systems increasingly rely on machine learning techniques to transform high-dimensional raw inputs into predictions that are then used for decision-making and control. However, it is often easy to maliciously manipulate such inputs and, as a result, predictions. While effective techniques have been proposed to certify the robustness of predictions to adversarial input perturbations, such techniques have been disembodied from control systems that make downstream use of the predictions. We propose the first approach for composing robustness certification of predictions with respect to raw input perturbations with robust control to obtain certified robustness of control to adversarial input perturbations. We use a case study of adaptive vehicle control to illustrate our approach and show the value of the resulting end-to-end certificates through extensive experiments.

翻訳日:2023-02-07 19:39:05 公開日:2023-02-04

# ラプラシアンicpによる3次元頭部メッシュのプログレッシブ登録

Laplacian ICP for Progressive Registration of 3D Human Head Meshes ( http://arxiv.org/abs/2302.02194v1 )

ライセンス: Link先を確認

Nick Pears, Hang Dai, Will Smith and Hao Sun

(参考訳) 古典的非剛性イテレーティブ・クローズト・ポイント(N-ICP)の高効率な変種であるプログレッシブ3次元登録フレームワークを提案する。変形正則化にLaplace-Beltrami演算子を用いるので、全体のプロセスはLaplacian ICP (L-ICP) とみなす。これは「イテレーション毎の小さな変形」という仮定を生かし、徐々に粗くなり、フレキシブルな変形モデル、対応集合の数の増加、より洗練された対応推定を利用する。対応マッチングは、ドメイン固有の特徴抽出器から派生した予め定義された頂点サブセット内でのみ許可される。さらに,アノテーション転送に基づく3次元非剛性登録のための新しいベンチマークと2つの評価指標を提案する。これを、3d human head scans(headspace)の公開データセット上で評価するために使用します。この手法は頑丈であり、最も一般的な古典的手法と比較して計算時間のごく一部しか必要としないが、登録性能は同等である。

We present a progressive 3D registration framework that is a highly-efficient variant of classical non-rigid Iterative Closest Points (N-ICP). Since it uses the Laplace-Beltrami operator for deformation regularisation, we view the overall process as Laplacian ICP (L-ICP). This exploits a `small deformation per iteration' assumption and is progressively coarse-to-fine, employing an increasingly flexible deformation model, an increasing number of correspondence sets, and increasingly sophisticated correspondence estimation. Correspondence matching is only permitted within predefined vertex subsets derived from domain-specific feature extractors. Additionally, we present a new benchmark and a pair of evaluation metrics for 3D non-rigid registration, based on annotation transfer. We use this to evaluate our framework on a publicly-available dataset of 3D human head scans (Headspace). The method is robust and only requires a small fraction of the computation time compared to the most popular classical approach, yet has comparable registration performance.

翻訳日:2023-02-07 19:38:51 公開日:2023-02-04

# 時間力学を用いた表面符号回路のハードウェア要件の緩和

Relaxing Hardware Requirements for Surface Code Circuits using Time-dynamics ( http://arxiv.org/abs/2302.02192v1 )

ライセンス: Link先を確認

Matt McEwen, Dave Bacon, Craig Gidney

(参考訳) 量子誤り訂正(QEC)符号の典型的な時間依存ビューは、ハードウェア上で実行可能な回路への分解においてかなりの自由を隠蔽する。領域検出の概念を用いて、静的QEC符号を回路に分解する代わりに、時間動的QEC回路を直接設計する。特に、曲面符号の標準的な回路構成を改善し、正方形格子の代わりに六角形格子に埋め込み、CNOTやCZゲートの代わりにISWAPゲートを使用し、量子ビットデータを交換して役割を計測し、実行中に物理量子ビットグリッドの周りに論理的パッチを移動させる新しい回路を提示する。これらの構造は全て追加のエンタングルゲート層を使用しず、基本的に同じ論理的性能を示し、標準的なサーフェスコード回路の25%以内のテラクオプフットプリントを有する。これらの回路は、ハードウェアの需要を緩和しながら、標準的なサーフェスコード回路と本質的に同じ論理性能を達成するため、量子ハードウェアエンジニアにとって大きな関心を持つだろう。

The typical time-independent view of quantum error correction (QEC) codes hides significant freedom in the decomposition into circuits that are executable on hardware. Using the concept of detecting regions, we design time-dynamic QEC circuits directly instead of designing static QEC codes to decompose into circuits. In particular, we improve on the standard circuit constructions for the surface code, presenting new circuits that can embed on a hexagonal grid instead of a square grid, that can use ISWAP gates instead of CNOT or CZ gates, that can exchange qubit data and measure roles, and that move logical patches around the physical qubit grid while executing. All these constructions use no additional entangling gate layers and display essentially the same logical performance, having teraquop footprints within 25% of the standard surface code circuit. We expect these circuits to be of great interest to quantum hardware engineers, because they achieve essentially the same logical performance as standard surface code circuits while relaxing demands on hardware.

翻訳日:2023-02-07 19:38:34 公開日:2023-02-04

# 3GPPMIMOシステムにおけるパイロットフリー伝送の教師なし学習

Unsupervised Learning for Pilot-free Transmission in 3GPP MIMO Systems ( http://arxiv.org/abs/2302.02191v1 )

ライセンス: Link先を確認

Omar M. Sleem, Mohamed Salah Ibrahim, Akshay Malhotra, Mihaela Beluri, Philip Pietraski

(参考訳) 参照信号のオーバーヘッド低減は、近年、システムのスペクトル効率を改善する効果的なソリューションとして進化している。本稿では,復調基準信号(DM-RS)が不要な新しいダウンリンクデータ構造を提案する。提案したデータ転送構造は,ユーザデータの一部を複数のサブバンドにまたがる簡単な繰り返しステップを含む。ユーザ側で繰り返し構造を利用すると,正準相関分析により信頼性の高いリカバリが可能となる。また、OFDMシステムにおけるCCA性能を高めるための2つの効果的なメカニズムを提案し、その1つは繰り返しパターンの選択であり、もう1つは重度周波数選択性の問題に対処するものである。提案手法は複雑さとパフォーマンスのトレードオフが良好であり,実用的な実装が期待できる。 3gppリンクレベルテストベンチを用いた数値実験により,提案手法が最先端手法よりも優れていることを示す。

Reference signals overhead reduction has recently evolved as an effective solution for improving the system spectral efficiency. This paper introduces a new downlink data structure that is free from demodulation reference signals (DM-RS), and hence does not require any channel estimation at the receiver. The new proposed data transmission structure involves a simple repetition step of part of the user data across the different sub-bands. Exploiting the repetition structure at the user side, it is shown that reliable recovery is possible via canonical correlation analysis. This paper also proposes two effective mechanisms for boosting the CCA performance in OFDM systems; one for repetition pattern selection and another to deal with the severe frequency selectivity issues. The proposed approach exhibits favorable complexity-performance tradeoff, rendering it appealing for practical implementation. Numerical results, using a 3GPP link-level testbench, demonstrate the superiority of the proposed approach relative to the state-of-the-art methods.

翻訳日:2023-02-07 19:38:15 公開日:2023-02-04

# 適切な信頼性、説明可能なAI、人間とAIのコラボレーション、人間とAIの相補性

Appropriate Reliance, Explainable AI, Human-AI Collaboration, Human-AI Complementarity ( http://arxiv.org/abs/2302.02187v1 )

ライセンス: Link先を確認

Max Schemmer, Niklas K\"uhl, Carina Benz, Andrea Bartos, Gerhard Satzger

(参考訳) AIアドバイスは、例えば投資や治療決定において、ますます人気が高まっている。このアドバイスは一般的に不完全であるため、意思決定者は、実際にそのアドバイスに従うかどうかを判断しなければならない。しかし、現在の適切な信頼に関する研究には、まだ共通の定義と運用上の測定概念が欠けている。さらに、この行動に影響を及ぼす要因を理解するのに役立つ深い行動実験は行われていない。本稿では,AoR(Adropriateness of Reliance)を基礎となる,定量的な2次元計測概念として提案する。我々は、aiアドバイスに説明を提供する効果を分析する研究モデルを開発した。 200人の参加者による実験では、これらの説明がAoRにどのように影響し、AIアドバイスの有効性を示す。我々の研究は、依存行動の分析とAIアドバイザの目的設計のための基本的な概念に貢献する。

AI advice is becoming increasingly popular, e.g., in investment and medical treatment decisions. As this advice is typically imperfect, decision-makers have to exert discretion as to whether actually follow that advice: they have to "appropriately" rely on correct and turn down incorrect advice. However, current research on appropriate reliance still lacks a common definition as well as an operational measurement concept. Additionally, no in-depth behavioral experiments have been conducted that help understand the factors influencing this behavior. In this paper, we propose Appropriateness of Reliance (AoR) as an underlying, quantifiable two-dimensional measurement concept. We develop a research model that analyzes the effect of providing explanations for AI advice. In an experiment with 200 participants, we demonstrate how these explanations influence the AoR, and, thus, the effectiveness of AI advice. Our work contributes fundamental concepts for the analysis of reliance behavior and the purposeful design of AI advisors.

翻訳日:2023-02-07 19:38:02 公開日:2023-02-04

# モバイルデバイス上でのリアルタイム画像復調

Real-Time Image Demoireing on Mobile Devices ( http://arxiv.org/abs/2302.02184v1 )

ライセンス: Link先を確認

Yuxin Zhang, Mingbao Lin, Xunchao Li, Han Liu, Guozhi Wang, Fei Chao, Shuai Ren, Yafei Wen, Xiaoxin Chen, Rongrong Ji

(参考訳) モアレパターンは、デジタルスクリーンの写真を撮るときに頻繁に現れ、画質を大幅に劣化させる。画像復号化におけるCNNの進歩にもかかわらず、既存のネットワークは設計が重く、モバイルデバイスに冗長な計算負荷をもたらす。本稿では,デモレーアネットワークの高速化に関する最初の研究を開始し,モバイルデバイス上でのリアルタイム展開に向けた動的デモレーア・アクセラレーション手法(dda)を提案する。私たちの刺激は、モアレパターンが画像全体に不均衡に分散されることが、シンプルで普遍的な事実に起因しています。その結果、過剰な計算は非モアレ領域で無駄にされる。したがって,画像パッチの複雑さに比例して計算コストを再配置する。この目的を達成するために,moireパターンのカラフルさと頻度情報の両方を考慮した新しいmoire preを設計し,画像パッチの複雑さを測定する。そして,より大規模なネットワークを用いて画像パッチを復元し,より複雑な画像パッチを小さなネットワークに割り当て,計算負担を軽減する。最終的に、パラメータの重荷を避けるためにパラメータ共有スーパーネットパラダイムですべてのネットワークをトレーニングします。いくつかのベンチマークにおいて,提案したDDAの有効性を示す実験を行った。さらに、snapdragon 8 gen 1のチップを搭載したvivo x80 proスマートフォンで評価された加速度は、この手法が推定時間を劇的に短縮し、モバイルデバイスでのリアルタイム画像の復調に繋がることを示している。ソースコードとモデルはhttps://github.com/zyxxmu/ddaでリリース

Moire patterns appear frequently when taking photos of digital screens, drastically degrading the image quality. Despite the advance of CNNs in image demoireing, existing networks are with heavy design, causing redundant computation burden for mobile devices. In this paper, we launch the first study on accelerating demoireing networks and propose a dynamic demoireing acceleration method (DDA) towards a real-time deployment on mobile devices. Our stimulus stems from a simple-yet-universal fact that moire patterns often unbalancedly distribute across an image. Consequently, excessive computation is wasted upon non-moire areas. Therefore, we reallocate computation costs in proportion to the complexity of image patches. In order to achieve this aim, we measure the complexity of an image patch by designing a novel moire prior that considers both colorfulness and frequency information of moire patterns. Then, we restore image patches with higher-complexity using larger networks and the ones with lower-complexity are assigned with smaller networks to relieve the computation burden. At last, we train all networks in a parameter-shared supernet paradigm to avoid additional parameter burden. Extensive experiments on several benchmarks demonstrate the efficacy of our proposed DDA. In addition, the acceleration evaluated on the VIVO X80 Pro smartphone equipped with a chip of Snapdragon 8 Gen 1 shows that our method can drastically reduce the inference time, leading to a real-time image demoireing on mobile devices. Source codes and models are released at https://github.com/zyxxmu/DDA

翻訳日:2023-02-07 19:37:49 公開日:2023-02-04

# 非定常入力駆動環境におけるオンライン強化学習のための局所的制約付きポリシー最適化

Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments ( http://arxiv.org/abs/2302.02182v1 )

ライセンス: Link先を確認

Pouya Hamadanian, Arash Nasr-Esfahany, Siddartha Sen, Malte Schwarzkopf, Mohammad Alizadeh

(参考訳) 非定常的な入力駆動環境におけるオンライン強化学習(RL)について検討した。オンラインRLは破滅的忘れ(CF)のため、このような環境では困難である。エージェントは新しい経験を訓練するとき、事前の知識を忘れがちです。この問題を軽減するための以前のアプローチでは、タスクラベル(実際には利用できないことが多い)や、不安定でパフォーマンスが悪い可能性のあるオフポリシーメソッドを想定している。本稿では,政策出力を古い経験に固定し,現在の経験への回帰を最適化することでCFと戦う,地方制約付き政策最適化(LCPO)を提案する。このアンカリングを行うため、LCPOは現在の入力分布の外にある経験からのサンプルを使用してポリシー最適化を局所的に制約する。 2つのジムおよびコンピュータシステム環境でlcpoを様々な合成および実入力トレースで評価し、オンライン環境では最先端のオン・ポリシーおよびオフ・ポリシーrl法を上回り、全入力トレースで事前訓練されたオフラインエージェントと同等の結果を得る。

We study online Reinforcement Learning (RL) in non-stationary input-driven environments, where a time-varying exogenous input process affects the environment dynamics. Online RL is challenging in such environments due to catastrophic forgetting (CF). The agent tends to forget prior knowledge as it trains on new experiences. Prior approaches to mitigate this issue assume task labels (which are often not available in practice) or use off-policy methods that can suffer from instability and poor performance. We present Locally Constrained Policy Optimization (LCPO), an on-policy RL approach that combats CF by anchoring policy outputs on old experiences while optimizing the return on current experiences. To perform this anchoring, LCPO locally constrains policy optimization using samples from experiences that lie outside of the current input distribution. We evaluate LCPO in two gym and computer systems environments with a variety of synthetic and real input traces, and find that it outperforms state-of-the-art on-policy and off-policy RL methods in the online setting, while achieving results on-par with an offline agent pre-trained on the whole input trace.

翻訳日:2023-02-07 19:37:25 公開日:2023-02-04

# gan発生器がリアルタイムにネットワークを反転させるモデルステッチングと可視化

Model Stitching and Visualization How GAN Generators can Invert Networks in Real-Time ( http://arxiv.org/abs/2302.02181v1 )

ライセンス: Link先を確認

Rudolf Herdt (1 and 2), Maximilian Schmidt (1 and 2), Daniel Otero Baguer (1 and 2), Jean Le'Clerc Arrastia (1 and 2), Peter Maass (1 and 2) ((1) University of Bremen, (2) aisencia)

(参考訳) 医療分野における批判的応用は、深層学習手法による決定を解釈するために、追加情報を迅速に提供する必要がある。本研究では,畳み込みを利用したGANジェネレータを用いて,分類とセマンティックセグメンテーションネットワークの活性化を高速かつ正確に可視化する手法を提案する。 afhq野生動物データセットの動物画像とステンド組織標本の現実世界のデジタル病理スキャンを用いて実験を行った。提案手法は,これらのデータセット上で,約2桁高速に動作しながら,確立された勾配降下法に匹敵する結果を与える。

Critical applications, such as in the medical field, require the rapid provision of additional information to interpret decisions made by deep learning methods. In this work, we propose a fast and accurate method to visualize activations of classification and semantic segmentation networks by stitching them with a GAN generator utilizing convolutions. We test our approach on images of animals from the AFHQ wild dataset and real-world digital pathology scans of stained tissue samples. Our method provides comparable results to established gradient descent methods on these datasets while running about two orders of magnitude faster.

翻訳日:2023-02-07 19:37:02 公開日:2023-02-04

# 効率的なConvNetによる画像の劣化再考

Revisiting Image Deblurring with an Efficient ConvNet ( http://arxiv.org/abs/2302.02234v1 )

ライセンス: Link先を確認

Lingyan Ruan, Mojtaba Bemana, Hans-peter Seidel, Karol Myszkowski, Bin Chen

(参考訳) Image Deblurringは、ぼやけた画像から潜むシャープなイメージを復元することを目的としており、コンピュータビジョンに幅広い応用がある。畳み込みニューラルネットワーク(cnns)は長年にわたってこの領域でよく機能しており、最近ではトランスフォーマーと呼ばれる別のネットワークアーキテクチャがさらに強力な性能を示している。 mhsa(multi-head self-attention)メカニズムは、cnnよりも大きな受容野と優れた入力コンテンツ適応性を提供する。しかし、mhsaは入力解像度に対して二次的に増加する高い計算コストを要求するため、高分解能画像デブラリングタスクでは実用的でない。本研究では,大規模な実効性受容場(ERF)を特徴とする軽量CNNネットワークを提案する。我々の鍵となる設計はLaKDと呼ばれる効率的なCNNブロックで、大きなカーネル深さの畳み込みと空間チャネルの混合構造を備えており、トランスフォーマーと同等あるいは大きいRFを実現するが、パラメータスケールは小さい。具体的には,パラメータが32%少なく,MACが39%少ないデフォーカス/モーションデブロアリングベンチマークデータセット上で,最先端のRestormer上で+0.17dB / +0.43dB PSNRを達成する。大規模な実験は、ネットワークの性能と各モジュールの有効性を実証する。さらに,ERFを定量的に特徴付け,ネットワーク性能に高い相関性を示すコンパクトで直感的なERFメータ指標を提案する。この研究によって、CNNとTransformerのアーキテクチャが、イメージの損なうようなタスクを超えて、さらに長所と短所を探求できることを期待しています。

Image deblurring aims to recover the latent sharp image from its blurry counterpart and has a wide range of applications in computer vision. The Convolution Neural Networks (CNNs) have performed well in this domain for many years, and until recently an alternative network architecture, namely Transformer, has demonstrated even stronger performance. One can attribute its superiority to the multi-head self-attention (MHSA) mechanism, which offers a larger receptive field and better input content adaptability than CNNs. However, as MHSA demands high computational costs that grow quadratically with respect to the input resolution, it becomes impractical for high-resolution image deblurring tasks. In this work, we propose a unified lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers while bearing less computational costs. Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure, attaining comparable or larger ERF than Transformers but with a smaller parameter scale. Specifically, we achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs. Extensive experiments demonstrate the superior performance of our network and the effectiveness of each module. Furthermore, we propose a compact and intuitive ERFMeter metric that quantitatively characterizes ERF, and shows a high correlation to the network performance. We hope this work can inspire the research community to further explore the pros and cons of CNN and Transformer architectures beyond image deblurring tasks.

翻訳日:2023-02-07 19:31:11 公開日:2023-02-04

# アラビア語の同義語強化のためのベンチマークとスコーリングアルゴリズム

A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms ( http://arxiv.org/abs/2302.02232v1 )

ライセンス: Link先を確認

Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas

(参考訳) 本稿では,同義語強度をファジィ値として考慮し,与えられたシンセセットを拡張するタスクについて述べる。 mono/multilingual synsetとしきい値(ファジィ値 [0-1])が与えられたとき、我々の目標は、既存のレキシコンからこのしきい値を超える新しいシノニムを抽出することである。アルゴリズムとベンチマークデータセットという2つのコントリビューションを提示します。データセットは500シンセットの3K候補シノニムで構成されている。各候補は4人の言語学者によってファジィ値で注釈付けされる。データセットは重要です (i)同義語に関する言語学者(dis/)の語義を理解することに加えて 2) データセットをベースラインとして,アルゴリズムの評価を行う。提案アルゴリズムは,既存の語彙から同義語を抽出し,各候補に対するファジィ値を算出する。評価の結果,このアルゴリズムは言語学者のように振る舞うことができ,ファジィ値は言語学者によって提案されたものに近い(RMSEとMAEを用いて)。データセットとデモページはhttps://portal.sina.birzeit.edu/synonymsで公開されている。

This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at https://portal.sina.birzeit.edu/synonyms.

翻訳日:2023-02-07 19:30:31 公開日:2023-02-04

# PubGraph: 大規模科学的一時的知識グラフ

PubGraph: A Large Scale Scientific Temporal Knowledge Graph ( http://arxiv.org/abs/2302.02231v1 )

ライセンス: Link先を確認

Kian Ahrabian, Xinwei Du, Richard Delwin Myloth, Arun Baalaaji Sankar Ananthan, Jay Pujara

(参考訳) 研究出版物は、新しい発見、方法、技術、洞察の形で科学的進歩を共有するための主要な手段である。出版物は、コンテンツ分析と書誌構造の両方の観点から研究されてきたが、科学研究のより包括的な研究への障壁は、広くアクセス可能な大規模データや資源の欠如である。本稿では,大規模時間知識グラフ(KG)の形式を取り入れた科学的進歩を研究するための新たな資料PubGraphを提案する。 432万以上のノードと15.49Bのエッジがウィキデータオントロジーにマッピングされている。 PubGraphから異なるサイズの3つのKGを抽出し、異なるスケールでの実験を可能にする。これらのkgsを用いて,時間的に調整されたトレーニング,検証,テストパーティションを含むトランスダクティブおよびインダクティブ設定のための新しいリンク予測ベンチマークを導入する。さらに,pubgraphに適合する2つの新しい帰納的学習手法を開発し,明示的な特徴を伴わずに未認識ノード上で動作し,大規模なkgsにスケールし,既存モデルを上回るパフォーマンスを示す。その結果,過去の引用の構造的特徴は,新たな出版物の質の高い予測に十分であることがわかった。また,敵対的なコミュニティベースリンク予測設定,ゼロショットインダクティブ学習,大規模学習など,kgモデルの新たな課題を特定する。

Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Publications have been studied from the perspectives of both content analysis and bibliometric structure, but a barrier to more comprehensive studies of scientific research is a lack of publicly accessible large-scale data and resources. In this paper, we present PubGraph, a new resource for studying scientific progress that takes the form of a large-scale temporal knowledge graph (KG). It contains more than 432M nodes and 15.49B edges mapped to the popular Wikidata ontology. We extract three KGs with varying sizes from PubGraph to allow experimentation at different scales. Using these KGs, we introduce a new link prediction benchmark for transductive and inductive settings with temporally-aligned training, validation, and testing partitions. Moreover, we develop two new inductive learning methods better suited to PubGraph, operating on unseen nodes without explicit features, scaling to large KGs, and outperforming existing models. Our results demonstrate that structural features of past citations are sufficient to produce high-quality predictions about new publications. We also identify new challenges for KG models, including an adversarial community-based link prediction setting, zero-shot inductive learning, and large-scale learning.

翻訳日:2023-02-07 19:30:07 公開日:2023-02-04

# フェルミオンガウス状態の絡み合い容量

Entanglement capacity of fermionic Gaussian states ( http://arxiv.org/abs/2302.02229v1 )

ライセンス: Link先を確認

Youyi Huang and Lu Wei

(参考訳) フェルミオンガウス状態上での量子二部体の絡み合いの度合いを推定する際のエンタングルメントエントロピーの代替としてエンタングルメントの容量について検討する。特に、粒子数制約なしに、2つの異なるケースの平均容量の正確な漸近公式を導出する。後者の場合、得られた式は文学における平均容量の部分的な結果を一般化する。結果の導出の鍵となる要素は、フェルミオンガウス状態の絡み合いエントロピーの研究で最近開発された有限和を単純化するための新しいツールセットである。

We study the capacity of entanglement as an alternative to entanglement entropies in estimating the degree of entanglement of quantum bipartite systems over fermionic Gaussian states. In particular, we derive the exact and asymptotic formulas of average capacity of two different cases - with and without particle number constraints. For the later case, the obtained formulas generalize some partial results of average capacity in the literature. The key ingredient in deriving the results is a set of new tools for simplifying finite summations developed very recently in the study of entanglement entropy of fermionic Gaussian states.

翻訳日:2023-02-07 19:29:32 公開日:2023-02-04

# 単射因果モデルの反事実識別可能性

Counterfactual Identifiability of Bijective Causal Models ( http://arxiv.org/abs/2302.02228v1 )

ライセンス: Link先を確認

Arash Nasr-Esfahany, Mohammad Alizadeh, Devavrat Shah

(参考訳) 文献で広く使われている複数の因果関係モデルを一般化するクラスであるBGM(Bijective Generation Mechanism)を用いた因果関係モデルの因果関係同定可能性について検討した。本研究では,観測不能な3つの共通因果構造に対して,BGMの学習を構造的生成モデルとして活用する実践的学習手法を提案する。学習されたBGMは効果的な反ファクト推定を可能にし、様々な深い条件生成モデルを用いて得ることができる。本手法を視覚的タスクで評価し,実世界のビデオストリーミングシミュレーションタスクにおけるその応用を実証する。

We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature. We establish their counterfactual identifiability for three common causal structures with unobserved confounding, and propose a practical learning method that casts learning a BGM as structured generative modeling. Learned BGMs enable efficient counterfactual estimation and can be obtained using a variety of deep conditional generative models. We evaluate our techniques in a visual task and demonstrate its application in a real-world video streaming simulation task.

翻訳日:2023-02-07 19:29:19 公開日:2023-02-04

# TAP: ラベルなしデータからのクロスモーダルな知識伝達のための注意パッチ

TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Data ( http://arxiv.org/abs/2302.02224v1 )

ライセンス: Link先を確認

Yinsong Wang, Shahin Shahrampour

(参考訳) 本研究は,クロスモーダル学習とセミ教師あり学習の交点について検討し,未ラベルのモーダルから欠落情報を借りることにより,一次モーダルの教師あり学習性能を向上させることを目的とする。ナダラヤ・ワトソン(NW)カーネル回帰の観点からこの問題を考察し、この定式化が暗黙的にカーネル化されたクロスアテンションモジュールにつながることを示す。そこで本研究では,ラベルのないモダリティからデータレベル知識の転送を可能にする単純なニューラルネットワークプラグインである attention patch (tap) を提案する。実世界の3つのデータセット上で数値シミュレーションを行い、TAPのそれぞれの側面を調べ、ニューラルネットワークにおけるTAP統合が、ラベルのないモダリティを用いて一般化性能を向上させることを示す。

This work investigates the intersection of cross modal learning and semi supervised learning, where we aim to improve the supervised learning performance of the primary modality by borrowing missing information from an unlabeled modality. We investigate this problem from a Nadaraya Watson (NW) kernel regression perspective and show that this formulation implicitly leads to a kernelized cross attention module. To this end, we propose The Attention Patch (TAP), a simple neural network plugin that allows data level knowledge transfer from the unlabeled modality. We provide numerical simulations on three real world datasets to examine each aspect of TAP and show that a TAP integration in a neural network can improve generalization performance using the unlabeled modality.

翻訳日:2023-02-07 19:29:10 公開日:2023-02-04

# multi-armed adversarial attack detection に対する minimax アプローチ

A Minimax Approach Against Multi-Armed Adversarial Attacks Detection ( http://arxiv.org/abs/2302.02216v1 )

ライセンス: Link先を確認

Federica Granese, Marco Romanelli, Siddharth Garg, Pablo Piantanida

(参考訳) 複数のアルゴリズムと目標損失関数を同時に使用するマルチアーム対向攻撃は、検出機構の特定の側情報を必要としない状態で、最先端の対向検知器を騙すことに成功している。問題の定式化により,複数の事前学習検出器のソフト確率出力をミニマックス法に従って集約する解を提案することができる。提案するフレームワークは数学的に健全で実装が容易でモジュール化されており、既存の検出器や将来の検出器を統合することができる。一般的なデータセット(例えば CIFAR10 や SVHN など)の広範な評価を通じて、我々のアグリゲーションは、多武装の敵攻撃に対する個々の最先端検出器よりも一貫して優れており、利用可能なメソッドのレジリエンスを改善する効果的なソリューションであることを示す。

Multi-armed adversarial attacks, in which multiple algorithms and objective loss functions are simultaneously used at evaluation time, have been shown to be highly successful in fooling state-of-the-art adversarial examples detectors while requiring no specific side information about the detection mechanism. By formalizing the problem at hand, we can propose a solution that aggregates the soft-probability outputs of multiple pre-trained detectors according to a minimax approach. The proposed framework is mathematically sound, easy to implement, and modular, allowing for integrating existing or future detectors. Through extensive evaluation on popular datasets (e.g., CIFAR10 and SVHN), we show that our aggregation consistently outperforms individual state-of-the-art detectors against multi-armed adversarial attacks, making it an effective solution to improve the resilience of available methods.

翻訳日:2023-02-07 19:28:55 公開日:2023-02-04

# CNNを用いた教師なしリフトを用いた変分多チャンネルセグメンテーション\endgraf

Variational multichannel multiclass segmentation\endgraf using unsupervised lifting with CNNs ( http://arxiv.org/abs/2302.02214v1 )

ライセンス: Link先を確認

Nadja Gruber, Johannes Schwab, Sebastien Court, Elke Gizewski, Markus Haltmeier

(参考訳) 本稿では、変動エネルギー関数と深部畳み込みニューラルネットワークを組み合わせた教師なし画像分割手法を提案する。この変動部分は、複数の入力画像から有用な情報を同時に抽出できる、最近のマルチチャネルマルチフェーズChan-Veseモデルに基づいている。与えられた画像をK$の異なる領域に分割するフレキシブルなマルチクラスセグメンテーション手法を実装した。画像の事前分解を目的とした畳み込みニューラルネットワーク(CNN)を用いる。その後、セグメント化関数を最小化することにより、最終的なセグメント化は完全に教師なしの方法で得られる。セグメンテーションの出発点となる情報的特徴マップの抽出に特に重点が置かれている。提案手法は,テクスチャや医用画像などの様々な種類の画像の領域を分解・分割し,その性能を他の多相分割法と比較できることを示す。

We propose an unsupervised image segmentation approach, that combines a variational energy functional and deep convolutional neural networks. The variational part is based on a recent multichannel multiphase Chan-Vese model, which is capable to extract useful information from multiple input images simultaneously. We implement a flexible multiclass segmentation method that divides a given image into $K$ different regions. We use convolutional neural networks (CNNs) targeting a pre-decomposition of the image. By subsequently minimising the segmentation functional, the final segmentation is obtained in a fully unsupervised manner. Special emphasis is given to the extraction of informative feature maps serving as a starting point for the segmentation. The initial results indicate that the proposed method is able to decompose and segment the different regions of various types of images, such as texture and medical images and compare its performance with another multiphase segmentation method.

翻訳日:2023-02-07 19:28:37 公開日:2023-02-04

# CosPGD : 画素単位の予測タスクに対する一貫したホワイトボックス対向攻撃

CosPGD: a unified white-box adversarial attack for pixel-wise prediction tasks ( http://arxiv.org/abs/2302.02213v1 )

ライセンス: Link先を確認

Shashank Agnihotri and Margret Keuper

(参考訳) ニューラルネットワークは、多くのタスクで高精度な予測を可能にするが、わずかな入力摂動に対する堅牢性の欠如は、多くの現実世界アプリケーションでのデプロイメントを妨げている。近年のニューラルネットのロバスト性評価に向けた研究は, セミナル \emph{projected gradient descent} (PGD) 攻撃やその後の研究, ベンチマークなどに大きな注目を集めている。しかし,このような手法は主に分類タスクに焦点をあてるが,意味セグメンテーションやオプティカルフロー,不一致推定といった画素単位の予測タスクの分析を特に扱うアプローチはごくわずかである。注目すべき例外は、最近提案されたSegPGD攻撃であり、セマンティックセグメンテーションを評価するためのピクセルワイズアタックの重要性を示す可能性がある。 SegPGDはピクセル単位の分類(セグメンテーション)に限られるが、本研究では、任意のピクセル単位の予測タスクに対する専用の攻撃を統一環境で最適化できる、新しいホワイトボックス対逆攻撃であるCosPGDを提案する。予測と基底真理の間のコサイン類似性を利用して、分類タスクから回帰設定へ直接拡張する。さらに, セマンティクスセグメンテーションにおけるcospgdの優れた性能と, 光学的流れと不一致推定について実証的に示す。

While neural networks allow highly accurate predictions in many tasks, their lack in robustness towards even slight input perturbations hampers their deployment in many real-world applications. Recent research towards evaluating the robustness of neural networks such as the seminal \emph{projected gradient descent} (PGD) attack and subsequent works and benchmarks have therefore drawn significant attention. Yet, such methods focus predominantly on classification tasks, while only a few approaches specifically address the analysis of pixel-wise prediction tasks such as semantic segmentation, optical flow, or disparity estimation. One notable exception is the recently proposed SegPGD attack, which could showcase the importance of pixel-wise attacks for evaluating semantic segmentation. While SegPGD is limited to pixel-wise classification (i.e. segmentation), in this work, we propose CosPGD, a novel white-box adversarial attack that allows to optimize dedicated attacks for any pixel-wise prediction task in a unified setting. It leverages the cosine similarity between the predictions and ground truth to extend directly from classification tasks to regression settings. Further, we empirically show the superior performance of CosPGD for semantic segmentation as well as for optical flow and disparity estimation.

翻訳日:2023-02-07 19:28:23 公開日:2023-02-04

# 環境不均質性を考慮した線形関数近似による連立時間差分学習

Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity ( http://arxiv.org/abs/2302.02212v1 )

ライセンス: Link先を確認

Han Wang, Aritra Mitra, Hamed Hassani, George J. Pappas, James Anderson

(参考訳) 政策評価問題を考慮して,環境不均質性下での連帯強化学習の研究を開始する。我々のセットアップは、同じ状態とアクション空間を共有するが、報酬関数と状態遷移カーネルが異なる環境と相互作用する$N$エージェントを含んでいる。エージェントが中央サーバーを介して通信できると仮定すると、情報交換は共通のポリシーを評価するプロセスを早めるだろうか? そこで我々は,マルコフ的サンプリング,エージェントの環境の不均一性,通信の節約のための複数の局所的更新を考慮しつつ,線形関数近似を用いたフェデレーション時間差学習アルゴリズム(TD)の総合的有限時間解析を行った。私たちの分析はいくつかの新しい材料に依存しています i) エージェントの基本マルコフ決定過程(MDPs)における不均一性の関数としてのTD固定点上の摂動境界の導出 (II)フェデレートされたTDアルゴリズムの力学を密に近似する仮想MDPを導入し、 (iii) 仮想MDPを用いて、フェデレーション最適化に明示的な接続を行う。これらの部品を組み立てることで、低均一性状態において、モデル推定の交換がエージェント数の線形収束速度向上につながることを厳密に証明する。

We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space but differ in their reward functions and state transition kernels. Assuming agents can communicate via a central server, we ask: Does exchanging information expedite the process of evaluating a common policy? To answer this question, we provide the first comprehensive finite-time analysis of a federated temporal difference (TD) learning algorithm with linear function approximation, while accounting for Markovian sampling, heterogeneity in the agents' environments, and multiple local updates to save communication. Our analysis crucially relies on several novel ingredients: (i) deriving perturbation bounds on TD fixed points as a function of the heterogeneity in the agents' underlying Markov decision processes (MDPs); (ii) introducing a virtual MDP to closely approximate the dynamics of the federated TD algorithm; and (iii) using the virtual MDP to make explicit connections to federated optimization. Putting these pieces together, we rigorously prove that in a low-heterogeneity regime, exchanging model estimates leads to linear convergence speedups in the number of agents.

翻訳日:2023-02-07 19:28:00 公開日:2023-02-04

# NeuRI: 帰納的ルール推論によるDNN生成の多様化

NeuRI: Diversifying DNN Generation via Inductive Rule Inference ( http://arxiv.org/abs/2302.02261v1 )

ライセンス: Link先を確認

Jiawei Liu, Jinjun Peng, Yuyao Wang, Lingming Zhang

(参考訳) ディープラーニング(DL)は、意思決定を改善し、プロセスを自動化するために様々な業界で広く使われています。 DLシステムの正確性は、DLアプリケーションの信頼性に不可欠である。このように、最近の研究の波は、ファジィDLシステムのためのテストケース(DNNモデルとその入力)の自動合成の研究である。しかし、既存のモデルジェネレータは演算子制約を広くモデル化する能力がないため、限られた数の演算子のみを仮定する。この課題に対処するために,数百種類の演算子からなる有効かつ多様なDLモデルを生成するための,完全に自動化されたアプローチであるNeuRIを提案する。 NeuRIは3段階のプロセスを採用しています。 i) 各種情報源から有効かつ無効なAPIトレースを収集すること。 (ii)有効なモデルを構築するための制約を推測するために、トレースに帰納的プログラム合成を適用すること。 (iii)シンボリック演算子とコンクリート演算子を共用してハイブリッドモデルを生成すること。我々の評価によると、NeuRIはTensorFlowとPyTorchのブランチカバレッジを最先端よりも51%、15%改善している。 4ヶ月以内に、NeuRIはPyTorchとTensorFlowの87の新しいバグを発見し、64がすでに修正または確認されており、PyTorchがラベル付けした8つの優先度の高いバグが、この期間のすべての優先度の高いバグの10%を構成している。さらに、オープンソース開発者は、当社が報告したエラー誘発モデルを“高品質”と“実践上の一般的な”とみなしています。

Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the ever-evolving DL libraries and compilers. The correctness of DL systems is crucial for trust in DL applications. As such, the recent wave of research has been studying the automated synthesis of test-cases (i.e., DNN models and their inputs) for fuzzing DL systems. However, existing model generators only subsume a limited number of operators, for lacking the ability to pervasively model operator constraints. To address this challenge, we propose NeuRI, a fully automated approach for generating valid and diverse DL models composed of hundreds of types of operators. NeuRI adopts a three-step process: (i) collecting valid and invalid API traces from various sources; (ii) applying inductive program synthesis over the traces to infer the constraints for constructing valid models; and (iii) performing hybrid model generation by incorporating both symbolic and concrete operators concolically. Our evaluation shows that NeuRI improves branch coverage of TensorFlow and PyTorch by 51% and 15% over the state-of-the-art. Within four months, NeuRI finds 87 new bugs for PyTorch and TensorFlow, with 64 already fixed or confirmed, and 8 high-priority bugs labeled by PyTorch, constituting 10% of all high-priority bugs of the period. Additionally, open-source developers regard error-inducing models reported by us as "high-quality" and "common in practice".

翻訳日:2023-02-07 19:21:17 公開日:2023-02-04

# CLiNet:2次元および3次元における道路ネットワーク中心線の共同検出

CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D ( http://arxiv.org/abs/2302.02259v1 )

ライセンス: Link先を確認

David Paz, Srinidhi Kalgundi Srinivas, Yunchao Yao, and Henrik I. Christensen

(参考訳) 本研究は,2次元と3次元で共同で特徴をローカライズすることで,画像データに基づく中心線の共同検出のための新しいアプローチを提案する。視覚手がかりの検出に焦点を当てた既存の研究とは対照的に,都市運転タスクに直結する特徴抽出手法について検討する。 AV Breadcrumbsと呼ばれる大規模都市走行データセットをベクトル地図表現と射影幾何学を利用して自動的にラベル付けし,900,000以上の画像に注釈を付ける。本研究は,様々な都市走行シナリオにおける動的シーンモデリングの可能性を示す。本モデルではF1スコアが0.684、平均正規化深度誤差が2.083である。コードとデータアノテーションが公開されている。

This work introduces a new approach for joint detection of centerlines based on image data by localizing the features jointly in 2D and 3D. In contrast to existing work that focuses on detection of visual cues, we explore feature extraction methods that are directly amenable to the urban driving task. To develop and evaluate our approach, a large urban driving dataset dubbed AV Breadcrumbs is automatically labeled by leveraging vector map representations and projective geometry to annotate over 900,000 images. Our results demonstrate potential for dynamic scene modeling across various urban driving scenarios. Our model achieves an F1 score of 0.684 and an average normalized depth error of 2.083. The code and data annotations are publicly available.

翻訳日:2023-02-07 19:20:53 公開日:2023-02-04

# 同時音楽生成と分離のためのマルチソース拡散モデル

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation ( http://arxiv.org/abs/2302.02257v1 )

ライセンス: Link先を確認

Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodol\`a

(参考訳) 本研究では、文脈を共有するソースの結合確率密度のスコアを学習することにより、音楽合成と音源分離の両方が可能な拡散ベース生成モデルを定義する。古典的総推論タスク(例えば、混合を生成し、ソースを分離する)と並行して、ソースインプテーションの部分的推論タスクを紹介し、実験を行い、他のソースのサブセットを生成する(例えば、ドラムとうまく連携するピアノトラックを弾く)。さらに,分離タスクに対する新たな推論手法を提案する。我々は、音源分離のための標準データセットであるslakh2100でモデルをトレーニングし、生成環境における質的結果を提供し、分離設定における競争的定量的結果を示す。本手法は,生成と分離の両方を処理可能な単一モデルの最初の例である。

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e. generating a mixture, separating the sources), we also introduce and experiment on the partial inference task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.

翻訳日:2023-02-07 19:20:43 公開日:2023-02-04

# 学習型レンズレスイメージングによる非知覚的識別

Human-Imperceptible Identification with Learnable Lensless Imaging ( http://arxiv.org/abs/2302.02255v1 )

ライセンス: Link先を確認

Thuong Nguyen Canh, Trung Thanh Ngo, Hajime Nagahara

(参考訳) レンズレスイメージングは、人間が被写体を認識できないが、機械が情報を推測するのに十分な情報を含む、大きくぼやけた画像を撮影することで、視覚プライバシを保護する。残念ながら、視覚的プライバシー保護は、認識精度の低下と、その逆が伴う。認識精度を維持しつつ、視覚プライバシを保護する学習可能なレンズレスイメージングフレームワークを提案する。得られた画像が人間に知覚できないようにするために,全変動,可逆性,および制限等尺性に基づくいくつかの損失関数を設計した。主観的評価に基づいて,プライバシー保護と曖昧さが個人識別に及ぼす影響を定量的に検討した。さらに,光リソグラフィー方式のマスクを用いたレンズレス画像のハードウェア実現によるシミュレーションの検証を行った。

Lensless imaging protects visual privacy by capturing heavily blurred images that are imperceptible for humans to recognize the subject but contain enough information for machines to infer information. Unfortunately, protecting visual privacy comes with a reduction in recognition accuracy and vice versa. We propose a learnable lensless imaging framework that protects visual privacy while maintaining recognition accuracy. To make captured images imperceptible to humans, we designed several loss functions based on total variation, invertibility, and the restricted isometry property. We studied the effect of privacy protection with blurriness on the identification of personal identity via a quantitative method based on a subjective evaluation. Moreover, we validate our simulation by implementing a hardware realization of lensless imaging with photo-lithographically printed masks.

翻訳日:2023-02-07 19:20:29 公開日:2023-02-04

# 密度特徴を持つ低域MDPにおける強化学習

Reinforcement Learning in Low-Rank MDPs with Density Features ( http://arxiv.org/abs/2302.02252v1 )

ライセンス: Link先を確認

Audrey Huang, Jinglin Chen, Nan Jiang

(参考訳) 低ランクな遷移を持つMDP -- すなわち、遷移行列は、左右の2つの行列の積に分解できる -- は、抽出可能な学習を可能にする非常に代表的な構造である。左行列は、値に基づく学習のための表現関数近似を可能にし、広く研究されている。そこで本研究では,密度特性を用いたサンプル効率学習,すなわち,状態占有分布の強力なモデルを生成する正しい行列について検討する。この設定は、教師なし学習をRLで活用するだけでなく、凸RLのプラグインソリューションを可能にする。オフライン環境では,非探索的なデータを処理可能な占有者のオフポリシー推定アルゴリズムを提案する。これをサブルーチンとして、探索的データ分布をレベルバイレベルに構築するオンラインアルゴリズムをさらに考案する。中心的な技術的課題として、占有率推定の付加誤差は、データカバレッジの乗法的定義とは相容れない。到達性のような強い仮定がなければ、この非互換性は、新しい技術ツールによって克服された指数的エラーの爆発を引き起こす。また, 密度特徴が不明であり, 指数関数的に大きな候補集合から学習する必要がある場合, 表現学習環境にも容易に拡張できる。

MDPs with low-rank transitions -- that is, the transition matrix can be factored into the product of two matrices, left and right -- is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blow-up, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.

翻訳日:2023-02-07 19:20:16 公開日:2023-02-04

# ジャマー耐性周波数とパワーアロケーションのための深層強化学習の一般化

Generalization of Deep Reinforcement Learning for Jammer-Resilient Frequency and Power Allocation ( http://arxiv.org/abs/2302.02250v1 )

ライセンス: Link先を確認

Swatantra Kafle, Jithin Jagannath, Zackary Kane, Noor Biswas, Prem Sagar Vasanth Kumar, Anu Jagannath

(参考訳) 我々は,深層強化学習モデルの一般化能力を強調しつつ,結合周波数と電力配分の問題に取り組む。既存の手法の多くは、事前決定された無線ネットワークシナリオの強化学習ベースのワイヤレス問題を解決する。訓練されたエージェントのパフォーマンスはネットワークに非常に特有であり、異なるネットワーク運用シナリオ(例えば、サイズ、周辺、移動性など)で使用されると劣化する傾向がある。本稿では,分散マルチエージェント環境におけるデプロイモデルの推論において,より高度な一般化機能を実現するためのトレーニング強化手法を提案する。これらの結果から,従来は見つからなかった異なるサイズとアーキテクチャの無線ネットワーク上で,提案手法のトレーニングと推論性能が向上したことを示す。さらに重要なことは、実用的な影響を証明するために、組込みソフトウェア定義無線にエンドツーエンドのソリューションを実装し、オーバー・ザ・エア評価を用いて検証したことである。

We tackle the problem of joint frequency and power allocation while emphasizing the generalization capability of a deep reinforcement learning model. Most of the existing methods solve reinforcement learning-based wireless problems for a specific pre-determined wireless network scenario. The performance of a trained agent tends to be very specific to the network and deteriorates when used in a different network operating scenario (e.g., different in size, neighborhood, and mobility, among others). We demonstrate our approach to enhance training to enable a higher generalization capability during inference of the deployed model in a distributed multi-agent setting in a hostile jamming environment. With all these, we show the improved training and inference performance of the proposed methods when tested on previously unseen simulated wireless networks of different sizes and architectures. More importantly, to prove practical impact, the end-to-end solution was implemented on the embedded software-defined radio and validated using over-the-air evaluation.

翻訳日:2023-02-07 19:19:57 公開日:2023-02-04

# 視覚コレクション拡張のための自己教師付きマルチビューディスタングル

Self-supervised Multi-view Disentanglement for Expansion of Visual Collections ( http://arxiv.org/abs/2302.02249v1 )

ライセンス: Link先を確認

Nihal Jain, Praneetha Vaddamanu, Paridhi Maheshwari, Vishwa Vinay, Kuldeep Kulkarni

(参考訳) 画像検索エンジンは、クエリ画像に関連する画像の検索を可能にする。本研究では,類似画像に対するクエリが画像の集合から導出されるような設定について検討する。視覚的検索では、類似度の測定は複数の軸、あるいはスタイルや色などのビューに沿って行われる。我々は、特徴抽出器のセットへのアクセスを想定し、それぞれが特定のビューの表現を計算する。本研究の目的は,複数視点から計算した表現上の類似性を効果的に結合した検索アルゴリズムを設計することである。そこで本研究では,視間重なりを最小限に抑えるために,画像の絡み合った視点特異的表現を抽出する自己教師あり学習法を提案する。これによって、ビュー上の分散としてコレクションの意図を計算することができることを示す。クエリコレクションの意図にマッチする候補拡張画像を優先順位付けすることにより,効果的な検索を行う方法を示す。最後に,本稿で提示した手法を用いて,複数のコレクションを合成して検索することにより,画像検索のための新たな検索機構を提案する。

Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a specific view. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views. To this end, we propose a self-supervised learning method for extracting disentangled view-specific representations for images such that the inter-view overlap is minimized. We show how this allows us to compute the intent of a collection as a distribution over views. We show how effective retrieval can be performed by prioritizing candidate expansion images that match the intent of a query collection. Finally, we present a new querying mechanism for image search enabled by composing multiple collections and perform retrieval under this setting using the techniques presented in this paper.

翻訳日:2023-02-07 19:19:43 公開日:2023-02-04

# 二元分類におけるラベル保護のためのganベース連合学習

GAN-based federated learning for label protection in binary classification ( http://arxiv.org/abs/2302.02245v1 )

ライセンス: Link先を確認

Yujin Han, Leying Guan

(参考訳) 新たな技術として、垂直連合学習は異なるデータソースと連携して、データ交換なしで機械学習モデルを共同訓練する。しかし、フェデレーション学習は複雑な暗号アルゴリズムとセキュアな計算プロトコルによるモデリングにおいて計算コストが高く非効率である。分割学習はこれらの課題を回避する代替ソリューションを提供する。しかし、バニラ分割学習は依然としてプライバシー漏洩に悩まされている。本稿では,GAFM(Generative Adversarial Federated Model)を提案する。GAN(Generative Adversarial Network)とバニラ分割学習フレームワークを統合し,バイナリ分類タスクにおける勾配からのラベル漏洩を防止する。この提案をmarvel、max norm、splitnnといった既存モデルと比較し、gafmは分類精度とラベルのプライバシー保護のトレードオフに関して大きな改善を示している。また,GAFMがベースラインよりも改善できる理由をヒューリスティックに正当化し,SplitNNと比較して勾配摂動によるラベル保護が可能であることを示す。

As an emerging technique, vertical federated learning collaborates with different data sources to jointly train a machine learning model without data exchange. However, federated learning is computationally expensive and inefficient in modeling due to complex encryption algorithms and secure computation protocols. Split learning offers an alternative solution to circumvent these challenges. Despite this, vanilla split learning still suffers privacy leakage. Here, we propose the Generative Adversarial Federated Model (GAFM), which integrates the vanilla split learning framework with the Generative Adversarial Network (GAN) for protection against label leakage from gradients in binary classification tasks. We compare our proposal to existing models, including Marvell, Max Norm, and SplitNN, on three publicly available datasets, where GAFM shows significant improvement regarding the trade-off between classification accuracy and label privacy protection. We also provide heuristic justification for why GAFM can improve over baselines and demonstrate that GAFM offers label protection through gradient perturbation compared to SplitNN.

翻訳日:2023-02-07 19:19:30 公開日:2023-02-04

# 等角化半教師付きランダム森林の分類と異常検出

Conformalized semi-supervised random forest for classification and abnormality detection ( http://arxiv.org/abs/2302.02237v1 )

ライセンス: Link先を確認

Yujin Han, Mingwenchan Xu, Leying Guan

(参考訳) 従来の分類器は、トレーニングとテストサンプルが同じ分布から生成されるという前提の下でラベルを推論する。この仮定は、医療診断やネットワークアタック検出などの安全クリティカルな応用において問題となる可能性がある。本稿では,トレーニングデータとテストデータが異なる分布を持つ場合のマルチクラス分類問題について考察する。本研究では,整合型半教師付きランダムフォレスト(CSForest)を提案する。これは,設定値の予測を$C(x)$で構成し,正しいクラスラベルを所望の確率で含むとともに,効率よく外れ値を検出する。本提案手法は,提案手法の強みを示すために,合成例と実データアプリケーションの両方において,他の最先端手法と比較する。

Traditional classifiers infer labels under the premise that the training and test samples are generated from the same distribution. This assumption can be problematic for safety-critical applications such as medical diagnosis and network attack detection. In this paper, we consider the multi-class classification problem when the training data and the test data may have different distributions. We propose conformalized semi-supervised random forest (CSForest), which constructs set-valued predictions $C(x)$ to include the correct class label with desired probability while detecting outliers efficiently. We compare the proposed method to other state-of-art methods in both a synthetic example and a real data application to demonstrate the strength of our proposal.

翻訳日:2023-02-07 19:19:11 公開日:2023-02-04

# 二元系ボース-アインシュタイン凝縮体のポラロン

Polarons in Binary Bose-Einstein Condensates ( http://arxiv.org/abs/2206.13738v3 )

ライセンス: Link先を確認

Ning Liu and Z. C. Tu

(参考訳) 不純物とボース・アインシュタイン凝縮の集団励起はボース・ポーラロンの出現に繋がる。本稿では,le-low-pines変分アプローチの枠組みにおいて,二成分ボース・アインシュタイン凝縮物に浸漬した単一不純物の性質について検討する。 2種類の効果的な不純物-フォノン相互作用を持つ有効Fr\"{o}hlich Hamiltonianを導出する。低フォノンモードと結合した不純物の挙動は相分離に対する安定性条件によって制約される。比重および不等質量の2成分浴におけるボースポーラロンのエネルギー,有効質量,およびフォノン数を明示的に解析した。例えば、ポラロンエネルギーの1つの分枝は、下分枝が減少している間に、種間散乱長の単調に増加する関数である。特に, 低いフォノンモードと結合した不純物の分岐は, 相分離近傍で劇的な変化を示す。不等質量ボソンの場合、2つの枝が種間質量の置換によってつながっていることが分かる。以上の結果は,多成分のボース浴におけるポーラロンの挙動を基礎的に理解する。

Impurities coupled with collective excitations of Bose-Einstein condensates lead to the emergence of Bose polarons. In this paper, we investigate the properties of a single impurity immersed in binary Bose-Einstein condensates in the framework of the Lee-Low-Pines variational approach. We derive an effective Fr\"{o}hlich Hamiltonian with two kinds of effective impurity-phonon interactions. The behavior of impurity coupled with the lower phonon mode is constrained by the stability condition against phase separation. We show explicit analytical results of the energy, effective mass, and phonon number for Bose polaron in interacting binary baths with equal mass and unequal mass of species. For the equal-mass boson bath, we find the opposite behaviors of two branches in terms of the scattering length between two species, e.g., one branch of polaron energy is a monotonically increasing function of the interspecific scattering length while the lower branch is decreasing. Especially, the branch of impurities coupled with the lower phonon modes exhibits a dramatic change in the vicinity of phase separation. In the case of unequal-mass bosons, we find two branches are connected by the permutation of interspecific mass. The above results provide a fundamental understanding of the behaviors of polarons in Bose baths with multiple components.

翻訳日:2023-02-07 12:55:03 公開日:2023-02-04

# 画像検索における不確実性定量化のためのベイズ計量学習

Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval ( http://arxiv.org/abs/2302.01332v2 )

ライセンス: Link先を確認

Frederik Warburg, Marco Miani, Silas Brack, Soren Hauberg

(参考訳) 計量学習のための最初のベイズエンコーダを提案する。従来の研究では、ニューラル・アモーティゼーションに頼るのではなく、Laplace Approximationでネットワーク重みの分布を学習する。まず、対照的な損失が有効なログポストであることを示す。次に、正の確定ヘッシアンを保証する3つの方法を提案する。最後に,一般化ガウスニュートン近似の新たな分解法を提案する。実験の結果,laplacian metric learner (lam) は不確かさを推定し,分散のサンプルを確実に検出し,最先端の予測性能が得られることがわかった。

We propose the first Bayesian encoder for metric learning. Rather than relying on neural amortization as done in prior works, we learn a distribution over the network weights with the Laplace Approximation. We actualize this by first proving that the contrastive loss is a valid log-posterior. We then propose three methods that ensure a positive definite Hessian. Lastly, we present a novel decomposition of the Generalized Gauss-Newton approximation. Empirically, we show that our Laplacian Metric Learner (LAM) estimates well-calibrated uncertainties, reliably detects out-of-distribution examples, and yields state-of-the-art predictive performance.

翻訳日:2023-02-07 12:48:08 公開日:2023-02-04

PDF登録状況（公開日: 20230204）