Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230425となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ロボット群における人間のフィードバックの進化と創発的行動の発見 Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms ( http://arxiv.org/abs/2305.16148v1 ) ライセンス: Link先を確認	Connor Mattson, Daniel S. Brown	(参考訳) ロボット群は、しばしば観察が興味深い創発的な行動を示すが、エージェントの能力のセットの下でどのような群れの行動が現れるかを予測することは困難である。我々は、人間の入力を効果的に活用し、特定のマルチエージェントシステムから出現しうる集団行動の分類を、人間が事前に興味や可能な行動を知ることなく、自動的に発見することを目指している。提案手法は,自己教師付き学習とHuman-in-the-loopクエリを用いて,Swarm集団行動に対する類似性空間を学習することにより,ユーザの好みに適応する。学習した類似度指標と新規検索とクラスタリングを組み合わせることで,Swarm動作の空間を探索し,分類する。また,創発的行動につながる可能性のあるロボットコントローラを優先することで,創発的検索の効率を向上させる汎用ヒューリスティックも提案する。提案手法は,2つのロボット能力モデルを用いてシミュレーションを行い,先行研究よりもより豊かな創発的行動のセットを一貫して発見することを示す。コード、ビデオ、データセットはhttps://sites.google.com/view/evolving-novel-swarmsで入手できる。 Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand what behaviors are interesting or even possible. Our proposed approach adapts to user preferences by learning a similarity space over swarm collective behaviors using self-supervised learning and human-in-the-loop queries. We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors. We also propose several general-purpose heuristics that improve the efficiency of our novelty search by prioritizing robot controllers that are likely to lead to interesting emergent behaviors. We test our approach in simulation on two robot capability models and show that our methods consistently discover a richer set of emergent behaviors than prior work. Code, videos, and datasets are available at https://sites.google.com/view/evolving-novel-swarms.	翻訳日:2023-05-28 04:30:59 公開日:2023-04-25
# NUANCE:ネットワーク通信環境における近距離超音波攻撃 NUANCE: Near Ultrasound Attack On Networked Communication Environments ( http://arxiv.org/abs/2305.10358v1 ) ライセンス: Link先を確認	Forrest McKee and David Noever	(参考訳) 本研究では,近距離超音波トロイの木馬を用いて,amazon alexa音声サービスにおける一次不聴音攻撃ベクトルを調査し,攻撃面の特徴と不聴音音声コマンド発行の実際的意義について検討した。この研究は、各攻撃ベクトルを、エンタープライズ、モバイル、産業制御システム(ICS)フレームワークをカバーするMITRE ATT&CK行列から戦術またはテクニックにマッピングする。この実験では50台のウルトラソニックオーディオを生成して調査し、攻撃の有効性を評価し、未処理のコマンドが100%成功し、処理された音声が全体の成功率58%に達した。この体系的なアプローチは、事前に調整されていない攻撃面を刺激し、各ATT&CK識別器とテストされた防御手法を組み合わせながら、包括的検知と攻撃設計を確保する。本研究の主目的は、SUSBAM(Single Upper Sideband Amplitude Modulation)を用いて、聴覚音源からほぼ音声を生成することであり、音声コマンドを人間の聴覚以外の周波数域に変換することである。サイドバンドを小さくすることで、16-22kHzから6kHzの最小出力を達成できる。研究は、1つのデバイスが同時に複数のアクションやデバイスをトリガーする1対多の攻撃面を調査した。さらに、この研究は可逆性や復調性を示し、潜在的な警告手法と音声ステガノグラフィのような秘密メッセージを埋め込む可能性を示唆している。 This study investigates a primary inaudible attack vector on Amazon Alexa voice services using near ultrasound trojans and focuses on characterizing the attack surface and examining the practical implications of issuing inaudible voice commands. The research maps each attack vector to a tactic or technique from the MITRE ATT&CK matrix, covering enterprise, mobile, and Industrial Control System (ICS) frameworks. The experiment involved generating and surveying fifty near-ultrasonic audios to assess the attacks' effectiveness, with unprocessed commands having a 100% success rate and processed ones achieving a 58% overall success rate. This systematic approach stimulates previously unaddressed attack surfaces, ensuring comprehensive detection and attack design while pairing each ATT&CK Identifier with a tested defensive method, providing attack and defense tactics for prompt-response options. The main findings reveal that the attack method employs Single Upper Sideband Amplitude Modulation (SUSBAM) to generate near-ultrasonic audio from audible sources, transforming spoken commands into a frequency range beyond human-adult hearing. By eliminating the lower sideband, the design achieves a 6 kHz minimum from 16-22 kHz while remaining inaudible after transformation. The research investigates the one-to-many attack surface where a single device simultaneously triggers multiple actions or devices. Additionally, the study demonstrates the reversibility or demodulation of the inaudible signal, suggesting potential alerting methods and the possibility of embedding secret messages like audio steganography.	翻訳日:2023-05-21 10:34:40 公開日:2023-04-25
# 固有値解に対する逆二次湯川プラス逆二乗ポテンシャルによる位相効果 Topological Effects With Inverse Quadratic Yukawa Plus Inverse Square Potential on Eigenvalue Solutions ( http://arxiv.org/abs/2305.04823v1 ) ライセンス: Link先を確認	Faizuddin Ahmed	(参考訳) 本研究では,非相対論的シュロディンガー波動方程式を,点状大域モノポール(PGM)の背景に相互作用ポテンシャルを持つ量子流束場の影響下で検討する。実際、逆二次湯川プラス逆2乗ポテンシャルを考え、遠心項におけるグリーン・アルドリッチ近似スキームを用いた半径式を導出する。パラメトリックなNikiforov-Uvarov法を用いて近似固有値解を決定し,解析する。その後、指数ポテンシャルの級数展開法を用いて同じポテンシャルを用いて放射波方程式を導出し、解析的に解いた。エネルギー固有値は、平坦な空間結果と比較して点状の大域モノポールの位相的欠陥によってシフトすることを示す。加えて、エネルギー固有値はアハルノフ・ボーム効果の類似性を示す量子束場に依存することが分かる。 In this analysis, we study the non-relativistic Schrodinger wave equation under the influence of quantum flux field with interactions potential in the background of a point-like global monopole (PGM). In fact, we consider an inverse quadratic Yukawa plus inverse square potential and derive the radial equation employing the Greene-Aldrich approximation scheme in the centrifugal term. We determine the approximate eigenvalue solution using the parametric Nikiforov-Uvarov method and analyze the result. Afterwards, we derive the radial wave equation using the same potential employing a power series expansion method in the exponential potential and solve it analytically. We show that the energy eigenvalues are shifted by the topological defects of a point-like global monopole compared to the flat space result. In addition, we see that the energy eigenvalues depend on the quantum flux field that shows an analogue to the Aharonov-Bohm effect	翻訳日:2023-05-14 21:08:06 公開日:2023-04-25
# dartboardsによる量子コンピューティング Quantum Computing with dartboards ( http://arxiv.org/abs/2305.06153v1 ) ライセンス: Link先を確認	Ishaan Ganti, Srinivasan S. Iyengar	(参考訳) ダーツゲーム用に構築されたルールを用いて,量子コンピューティングを物理的に魅力的かつエレガントに表現する。ダーツボードは量子力学における状態空間を表すために使用され、ダーツを投げる行為は測定の概念や量子力学における波動関数の崩壊とよく似ていることが示されている。アナロジーは任意の次元のダートボードを用いた任意の次元空間で構成され、そのような任意の空間に対して不確かさの ``visual''' の記述も与える。最後に、量子ビットと量子コンピューティングアルゴリズムの接続は、量子アルゴリズムとdart-throwコンペティションの類似性を構築する可能性を開く。 We present a physically appealing and elegant picture for quantum computing using rules constructed for a game of darts. A dartboard is used to represent the state space in quantum mechanics and the act of throwing the dart is shown to have close similarities to the concept of measurement, or collapse of the wavefunction in quantum mechanics. The analogy is constructed in arbitrary dimensional spaces, that is using arbitrary dimensional dartboards, and for for such arbitrary spaces this also provides us a ``visual'' description of uncertainty. Finally, connections of qubits and quantum computing algorithms is also made opening the possibility to construct analogies between quantum algorithms and coupled dart-throw competitions.	翻訳日:2023-05-14 20:57:29 公開日:2023-04-25
# UAVのパーチのためのマルチマーカーを用いた視覚的目標位置推定 Vision-based Target Pose Estimation with Multiple Markers for the Perching of UAVs ( http://arxiv.org/abs/2304.14838v1 ) ライセンス: Link先を確認	Truong-Dong Do, Nguyen Xuan-Mung and Sung-Kyung Hong	(参考訳) 自律型ナノ航空機は、その効率性と操縦性のために、監視および監視活動でますます人気を集めている。目標地点に到達すると、ドローンは任務の間も活動し続けなければならない。車両はそのような状況下でモーターをパーチし停止させ、エネルギーを節約し、また、好ましくない飛行条件下で静止位置を維持することができる。パーチング目標推定フェーズでは,マーカを備えた視覚カメラの定常的かつ高精度化が大きな課題である。大きなマーカーを使用すると、遠くから素早く検出できますが、ドローンが近づくと、カメラの視界からすぐに消えてしまいます。本稿では,上記の問題に対処するために,複数のマーカーを用いた視覚的ターゲットポーズ推定手法を提案する。まず, より広い範囲で検出能力を向上させるため, 小型のマーカーを内蔵したパーチングターゲットの設計を行った。第2に、単眼カメラを用いて検出されたマーカーから飛行車両の相対的なポーズを算出する。次にカルマンフィルタを適用し、特に予期せぬ理由により測定データが欠落している場合に、より安定して信頼性の高いポーズ推定を行う。最後に,複数マーカーからのポーズデータをマージするアルゴリズムを導入した。その後、ポーズは位置制御装置に送られ、ドローンとマーカーの中央を調整し、ターゲットのパーチに操る。実験の結果,本手法の有効性と有効性が示された。ドローンは25mmの丸みを帯びた磁石でマーカーの中央に到達することができる。 Autonomous Nano Aerial Vehicles have been increasingly popular in surveillance and monitoring operations due to their efficiency and maneuverability. Once a target location has been reached, drones do not have to remain active during the mission. It is possible for the vehicle to perch and stop its motors in such situations to conserve energy, as well as maintain a static position in unfavorable flying conditions. In the perching target estimation phase, the steady and accuracy of a visual camera with markers is a significant challenge. It is rapidly detectable from afar when using a large marker, but when the drone approaches, it quickly disappears as out of camera view. In this paper, a vision-based target poses estimation method using multiple markers is proposed to deal with the above-mentioned problems. First, a perching target with a small marker inside a larger one is designed to improve detection capability at wide and close ranges. Second, the relative poses of the flying vehicle are calculated from detected markers using a monocular camera. Next, a Kalman filter is applied to provide a more stable and reliable pose estimation, especially when the measurement data is missing due to unexpected reasons. Finally, we introduced an algorithm for merging the poses data from multi markers. The poses are then sent to the position controller to align the drone and the marker's center and steer it to perch on the target. The experimental results demonstrated the effectiveness and feasibility of the adopted approach. The drone can perch successfully onto the center of the markers with the attached 25mm-diameter rounded magnet.	翻訳日:2023-05-07 16:14:34 公開日:2023-04-25
# 金融マーケティングのためのマルチタスク学習によるターゲット間の依存性のモデル化 Curriculum Modeling the Dependence among Targets with Multi-task Learning for Financial Marketing ( http://arxiv.org/abs/2305.01514v1 ) ライセンス: Link先を確認	Yunpeng Weng, Xing Tang, Liang Chen, Xiuqiang He	(参考訳) 様々な実世界のアプリケーションに対するマルチタスク学習は通常、論理的逐次依存を伴うタスクを伴う。例えば、オンラインマーケティングでは、$impression \rightarrow click \rightarrow conversion$のカスケード動作パターンは、通常マルチタスク方式で複数のタスクとしてモデル化される。これらの手法は、タスクシーケンスとともに正のフィードバックがスペーサになるにつれて、長いパスシーケンシャルなタスクに対するデータスペーサリティ問題を緩和する。しかし、下流タスクではエラーの蓄積と負の転送が深刻な問題となる。特に、トレーニングの初期段階では、以前のタスクのパラメータの最適化はまだ収束しておらず、ダウンストリームタスクに転送される情報は否定的である。本稿では,複数の逐次的依存タスク学習のための新しい事前情報マージ(\textbf{PIMM})モジュールを用いて,タスク間の論理的依存を明示的にモデル化する事前情報マージモデル(\textbf{PIMM})を提案する。具体的には、PIMは、トレーニング中に下流タスクに転送するためのソフトサンプリング戦略を用いて、真のラベル情報または先行タスク予測をランダムに選択する。難易度の高いカリキュラムパラダイムに従って,サンプリング確率を動的に調整することで,下流タスクがトレーニングとともに効果的な情報を取得することを保証する。公開データセットと製品データセットのオフライン実験結果は、PIMMが最先端のベースラインを上回っていることを確認する。さらに,大規模なFinTechプラットフォームにPIMMをデプロイし,オンライン実験によりPIMMの有効性を実証した。 Multi-task learning for various real-world applications usually involves tasks with logical sequential dependence. For example, in online marketing, the cascade behavior pattern of $impression \rightarrow click \rightarrow conversion$ is usually modeled as multiple tasks in a multi-task manner, where the sequential dependence between tasks is simply connected with an explicitly defined function or implicitly transferred information in current works. These methods alleviate the data sparsity problem for long-path sequential tasks as the positive feedback becomes sparser along with the task sequence. However, the error accumulation and negative transfer will be a severe problem for downstream tasks. Especially, at the beginning stage of training, the optimization for parameters of former tasks is not converged yet, and thus the information transferred to downstream tasks is negative. In this paper, we propose a prior information merged model (\textbf{PIMM}), which explicitly models the logical dependence among tasks with a novel prior information merged (\textbf{PIM}) module for multiple sequential dependence task learning in a curriculum manner. Specifically, the PIM randomly selects the true label information or the prior task prediction with a soft sampling strategy to transfer to the downstream task during the training. Following an easy-to-difficult curriculum paradigm, we dynamically adjust the sampling probability to ensure that the downstream task will get the effective information along with the training. The offline experimental results on both public and product datasets verify that PIMM outperforms state-of-the-art baselines. Moreover, we deploy the PIMM in a large-scale FinTech platform, and the online experiments also demonstrate the effectiveness of PIMM.	翻訳日:2023-05-07 16:03:46 公開日:2023-04-25
# popsim:都市資源の公平配分のための個人レベル人口シミュレータ PopSim: An Individual-level Population Simulator for Equitable Allocation of City Resources ( http://arxiv.org/abs/2305.02204v1 ) ライセンス: Link先を確認	Khanh Duy Nguyen, Nima Shahbazi and Abolfazl Asudeh	(参考訳) 人種に基づく歴史的体系的な排除戦術は、特定の人口集団の人々が特定の都市部に集結することを強制した。このような分離の倫理的側面とは別に、これらの政策は都市内の公共交通、医療、教育などの都市資源の配分に影響を及ぼす。これらの問題に対処するための最初のステップは、公平なリソース割り当ての状態を評価する監査を行うことである。しかし、プライバシーや機密性の懸念から、人口統計情報を含む個人レベルのデータは公開できない。人口統計データを活用することで、人口統計情報を用いた半合成個人レベルの人口データを生成するシステムであるPopSimを導入する。 PopSimを使って、シカゴ市のために複数のベンチマークデータセットを生成し、それらを検証するために広範な統計的評価を行う。都市資源の公平な配分を監査するためのシステムの適用例を示すいくつかのケーススタディで,我々はさらにデータセットを活用した。 Historical systematic exclusionary tactics based on race have forced people of certain demographic groups to congregate in specific urban areas. Aside from the ethical aspects of such segregation, these policies have implications for the allocation of urban resources including public transportation, healthcare, and education within the cities. The initial step towards addressing these issues involves conducting an audit to assess the status of equitable resource allocation. However, due to privacy and confidentiality concerns, individual-level data containing demographic information cannot be made publicly available. By leveraging publicly available aggregated demographic statistics data, we introduce PopSim, a system for generating semi-synthetic individual-level population data with demographic information. We use PopSim to generate multiple benchmark datasets for the city of Chicago and conduct extensive statistical evaluations to validate those. We further use our datasets for several case studies that showcase the application of our system for auditing equitable allocation of city resources.	翻訳日:2023-05-07 15:54:51 公開日:2023-04-25
# TCN-LSTMとマルチタスク学習モデルによる車線変更意図認識と運転状況予測への統一的アプローチ A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models ( http://arxiv.org/abs/2304.13732v1 ) ライセンス: Link先を確認	Renteng Yuan, Mohamed Abdel-Aty, Xin Gu, Ou Zheng, Qiaojun Xiang	(参考訳) Lane Change (LC) は、連続的で複雑な操作プロセスである。 LCプロセスの正確な検出と予測は、交通参加者が周囲の環境をよりよく理解し、LCの潜在的な安全性を認識し、交通安全を改善するのに役立つ。本稿では,lc意図認識(lc-ir)モデルとlc状態予測(lc-sp)モデルを開発した。長い短期記憶ユニット(TCN-LSTM)を持つ新しいアンサンブル時間畳み込みネットワークが最初に提案され、シーケンシャルデータにおける長距離依存関係をキャプチャする。次に、3つのマルチタスクモデル(MTL-LSTM, MTL-TCN, MTL-TCN -LSTM)を開発し、出力インジケータの内在的関係を捉える。さらに,LC意図認識・駆動状態予測(LC-IR-SP)のための統合モデリングフレームワークを開発した。提案モデルの性能を検証するため,CitySimデータセットから1023台の車両軌跡を抽出した。ピアソン係数は関連する指標を決定するために用いられる。その結果,150フレームを入力長として用いたTN-LSTMモデルは,LC意図分類においてTNおよびLSTMモデルよりも96.67%精度が高く,各クラスに対してよりバランスの取れた結果が得られた。提案された3つのマルチタスク学習モデルは、対応するシングルタスクモデルと比較して、平均24.24%、平均絶対誤差(MAE)が22.86%、ルート平均角誤差(RMSE)がそれぞれ大幅に向上した。開発したLC-IR-SPモデルは,車線変更行動の識別,リアルタイム交通競合指数の算出,車両制御戦略の改善に,自動運転車に有望な応用を期待できる。 Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.	翻訳日:2023-04-28 15:40:03 公開日:2023-04-25
# 名前の由来は? CADファイルのユーザ指定名を用いた言語モデルにおけるアセンブリー部分意味的知識の評価 What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files ( http://arxiv.org/abs/2304.14275v1 ) ライセンス: Link先を確認	Peter Meltzer, Joseph G. Lambourne, Daniele Grandi	(参考訳) 集合における部分的および部分的関係に関する意味的知識は、設計リポジトリの検索からエンジニアリング的知識ベースの構築まで、様々なタスクに有用である。本稿では,設計者がCAD(Computer Aided Design)ソフトウェアで使用する自然言語名は,そのような知識の貴重な情報源であり,Large Language Models(LLM)には,このデータを扱う上で有用なドメイン固有情報や,他のCADやエンジニアリング関連のタスクが含まれていることを提案する。特に、自然言語部分、特徴、文書名の大きなコーパスを抽出し、これを用いて、事前学習された言語モデルが、前例のない3つの自己教師型タスクにおいて、多数のベンチマークを上回り得ることを定量的に示す。さらに,テキストデータコーパスの微調整により全タスクのパフォーマンスが向上し,これまで無視されてきたテキストデータの価値が証明された。また,テキストデータのみを用いた LLM の利用に対する重要な制限も指摘し,本研究はマルチモーダルテキスト幾何学モデルへのさらなる取り組みに強い動機を与える。この分野でのさらなる作業を支援するために、私たちはすべてのデータとコードを公開しています。 Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.	翻訳日:2023-04-28 13:02:19 公開日:2023-04-25
# グラフニューラルネットワークはノード分類に役立つか--ノード識別性に関するホモフィリー原理の検討 When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability ( http://arxiv.org/abs/2304.14274v1 ) ライセンス: Link先を確認	Sitao Luan, Chenqing Hua, Minkai Xu, Qincheng Lu, Jiaqi Zhu, Xiao-Wen Chang, Jie Fu, Jure Leskovec, Doina Precup	(参考訳) 同じラベルを持つノードが接続される可能性が高く、ノード分類(nc)タスクにおいてニューラルネットワーク(nns)よりもグラフニューラルネットワーク(gnns)のパフォーマンスが優れている主な理由は、ホモフィリー原理(homophily principle)であると考えられている。近年, ホモフィリー原理が破られたとしても, 同一クラスのノードが類似した近傍パターンを共有する限り, GNNの優位性は維持され, ホモフィリーの有効性を疑問視する理論的な結果が開発されている。しかし、この議論はクラス内ノード識別可能性(ND)のみを考慮し、クラス間NDを無視し、ホモフィリーの効果を研究するには不十分である。本論では,ND の理想的状況はクラス間 ND よりもクラス内 ND が小さいことである,と論じる。この考え方を定式化し, ホモフィリーの理解を深めるために, CSBM-H (Contextual Stochastic Block Model for Homophily) を提案し, 確率ベイズ誤差 (Probabilistic Bayes Error, PBE) と期待負負のKL偏差 (presented Negative KL-divergence,ENKL) という2つの指標を定義し, NDを定量化する。結果を可視化し、詳細な分析を行う。実験により,GNNの優越性は,KPM (Kernel Performance Metric) の定義に基づくホモフィリーレベルにかかわらず,クラス内NDとクラス間NDの両方に密接に関係していることが確認された。 KPMは、新しい非線形機能ベースのメトリクスであり、合成および実世界のデータセット上でのGNNのアドバンテージとデメリットを明らかにする上で、既存のホモフィリメトリックよりも効果的であることがテストされている。 Homophily principle, i.e. nodes with the same labels are more likely to be connected, was believed to be the main reason for the performance superiority of Graph Neural Networks (GNNs) over Neural Networks (NNs) on Node Classification (NC) tasks. Recently, people have developed theoretical results arguing that, even though the homophily principle is broken, the advantage of GNNs can still hold as long as nodes from the same class share similar neighborhood patterns, which questions the validity of homophily. However, this argument only considers intra-class Node Distinguishability (ND) and ignores inter-class ND, which is insufficient to study the effect of homophily. In this paper, we first demonstrate the aforementioned insufficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and have a better understanding of homophily, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and Expected Negative KL-divergence (ENKL), to quantify ND, through which we can also find how intra- and inter-class ND influence ND together. We visualize the results and give detailed analysis. Through experiments, we verified that the superiority of GNNs is indeed closely related to both intra- and inter-class ND regardless of homophily levels, based on which we define Kernel Performance Metric (KPM). KPM is a new non-linear, feature-based metric, which is tested to be more effective than the existing homophily metrics on revealing the advantage and disadvantage of GNNs on synthetic and real-world datasets.	翻訳日:2023-04-28 13:01:57 公開日:2023-04-25
# 子どもにAI教育を施すための監査フレームワーク An Audit Framework for Adopting AI-Nudging on Children ( http://arxiv.org/abs/2304.14338v1 ) ライセンス: Link先を確認	Marianna Ganapini and Enrico Panai	(参考訳) これはAIナッジのための監査フレームワークである。文献で議論されるような静的なニュジングとは違って,ここでは大量のデータを使用してパーソナライズされたダイナミックなフィードバックとインターフェースを提供するニュジングのタイプに注目します。私たちはこれをAIナッジと呼んでいる(Lanzing, 2019, pp. 549; Yeung, 2017)。ここで概説した監査の最終的な目標は、監査の勧告、要件、提案(言い換えれば、監査の基準)に従えば、ナッジを使用するAIシステムが道徳的慣性や中立性のレベルを維持することを保証することである。意図しないネガティブな結果が発生した場合、監査は、実施可能なリスク軽減メカニズムを示唆する。意図しないポジティブな結果の場合、いくつかの強化メカニズムが示唆される。 IBM-Notre Dame Tech Ethics Labがスポンサー This is an audit framework for AI-nudging. Unlike the static form of nudging usually discussed in the literature, we focus here on a type of nudging that uses large amounts of data to provide personalized, dynamic feedback and interfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). The ultimate goal of the audit outlined here is to ensure that an AI system that uses nudges will maintain a level of moral inertia and neutrality by complying with the recommendations, requirements, or suggestions of the audit (in other words, the criteria of the audit). In the case of unintended negative consequences, the audit suggests risk mitigation mechanisms that can be put in place. In the case of unintended positive consequences, it suggests some reinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab	翻訳日:2023-04-28 12:32:28 公開日:2023-04-25
# チャットボットの教育への脅威を超えて考える--文字とコーディングプロセスの可視化 Thinking beyond chatbots' threat to education: Visualizations to elucidate the writing and coding process ( http://arxiv.org/abs/2304.14342v1 ) ライセンス: Link先を確認	Badri Adhikari	(参考訳) 言語教育と学習のための教育実践の展望は、主に結果駆動のアプローチを中心にしている。最近の大規模言語モデルのアクセシビリティは、これらのアプローチを徹底的に妨げている。この混乱を考慮し、言語教育と学習プラクティスを変革する上で、言語学習が人間の知性の発展において重要な役割を担っていることに注意する必要がある。ライティングとコンピュータプログラミングは、教育システムにとって不可欠な2つのスキルです。何とどのように書くかが思考を形作り、自己指向学習の道筋を定めている。ほとんどの教育者は、‘プロセス’と‘プロダクト’はどちらも重要かつ不可分であることを理解しているが、ほとんどの教育環境では、学習者の形成過程に対する建設的なフィードバックを提供することは困難である。例えば、学習者が入力したコードが実行されるかどうかをコンピュータプログラミングで評価するのは簡単である。しかし、学習者の創造的プロセスを評価し、プロセスに対して有意義なフィードバックを提供するのは難しい。教育(および学習)におけるこの長年の課題に対処するため、本研究では、学習者の執筆やプログラミングプロセスの本質的で教えられた能力を要約する、新しい可視化ツールセットを提案する。これらの対話型プロセス可視化(PV)は、学習者に洞察力、権限、パーソナライズされたプロセス指向のフィードバックを提供する。ツールボックスは、教育者や学習者がテストする準備ができており、www.processfeedback.orgで公開されている。学習者のプロセス - 自己、仲間、教育者から - に対するフィードバックを提供することに重点を置くことで、学習者の自己指向学習やメタ認知といった高次スキル獲得能力が向上する。 The landscape of educational practices for teaching and learning languages has been predominantly centered around outcome-driven approaches. The recent accessibility of large language models has thoroughly disrupted these approaches. As we transform our language teaching and learning practices to account for this disruption, it is important to note that language learning plays a pivotal role in developing human intelligence. Writing and computer programming are two essential skills integral to our education systems. What and how we write shapes our thinking and sets us on the path of self-directed learning. While most educators understand that `process' and `product' are both important and inseparable, in most educational settings, providing constructive feedback on a learner's formative process is challenging. For instance, it is straightforward in computer programming to assess whether a learner-submitted code runs. However, evaluating the learner's creative process and providing meaningful feedback on the process can be challenging. To address this long-standing issue in education (and learning), this work presents a new set of visualization tools to summarize the inherent and taught capabilities of a learner's writing or programming process. These interactive Process Visualizations (PVs) provide insightful, empowering, and personalized process-oriented feedback to the learners. The toolbox is ready to be tested by educators and learners and is publicly available at www.processfeedback.org. Focusing on providing feedback on a learner's process--from self, peers, and educators--will facilitate learners' ability to acquire higher-order skills such as self-directed learning and metacognition.	翻訳日:2023-04-28 12:21:56 公開日:2023-04-25
# 量子クエリー通信シミュレーションにおける対称性の役割 The Role of Symmetry in Quantum Query-to-Communication Simulation ( http://arxiv.org/abs/2012.05233v2 ) ライセンス: Link先を確認	Sourav Chakraborty, Arkadev Chattopadhyay, Peter H{\o}yer, Nikhil S. Mande, Manaswi Paraashar, Ronald de Wolf	(参考訳) Buhrman, Cleve and Wigderson (STOC'98) は、すべてのブール関数 f : {-1,1}^n to {-1,1} と G in {AND_2, XOR_2} に対して、合成関数 f o G の有界エラー量子通信複雑性は O(Q(f) log n) と等しいことを示した。これは、Alice が f に対して最適な量子クエリアルゴリズムを実行し、各クエリを実装するために、O(log n) qubit の丸い通信を用いて実現している。これは古典的な設定とは対照的であり、R^{cc}(f o G) が少なくとも 2R(f) であることは容易に示され、R^{cc} と R はそれぞれ有界エラー通信とクエリ複雑性を表す。量子設定におけるO(log n)オーバーヘッドはいくつかの関数に対して必要であり、したがってBCWシミュレーションは厳密であることを示す。ここでは、我々の研究に先立ち、すべての f に対して Q^{cc}(f o G) = O(Q(f)) の可能性と {AND_2, XOR_2} におけるすべての G が除外されていないことに注意する。より具体的には、以下に示す。 - 対数 n のオーバーヘッドは、f が対称であるときに not が要求されることを示し、Aaronson と Ambainis の結果を集合交叉関数に対して一般化する(Theory of Computing'05)。 -上記のことを証明するため、fがor関数であるときに結果を証明できる雑音振幅増幅の効率的な分散バージョンを設計。 - 上記の最初の結果から、BCWシミュレーションにおける対数 n のオーバーヘッドは、f が推移的であっても回避できるかどうかを問うことができるが、これは対称性の弱い概念である。量子通信プロトコルが任意に1/2に近い誤差確率を許容しても、ある推移関数に対して、ログ n のオーバーヘッドが依然として必要であることを示すことで、強い負の答えを与える。また、bcwシミュレーションにおいて、有界エラー通信モデルにおいて、log n のオーバーヘッドを必要とする関数を構築するための一般的なレシピも提供します。 Buhrman, Cleve and Wigderson (STOC'98) showed that for every Boolean function f : {-1,1}^n to {-1,1} and G in {AND_2, XOR_2}, the bounded-error quantum communication complexity of the composed function f o G equals O(Q(f) log n), where Q(f) denotes the bounded-error quantum query complexity of f. This is achieved by Alice running the optimal quantum query algorithm for f, using a round of O(log n) qubits of communication to implement each query. This is in contrast with the classical setting, where it is easy to show that R^{cc}(f o G) is at most 2R(f), where R^{cc} and R denote bounded-error communication and query complexity, respectively. We show that the O(log n) overhead is required for some functions in the quantum setting, and thus the BCW simulation is tight. We note here that prior to our work, the possibility of Q^{cc}(f o G) = O(Q(f)), for all f and all G in {AND_2, XOR_2}, had not been ruled out. More specifically, we show the following. - We show that the log n overhead is not required when f is symmetric, generalizing a result of Aaronson and Ambainis for the Set-Disjointness function (Theory of Computing'05). - In order to prove the above, we design an efficient distributed version of noisy amplitude amplification that allows us to prove the result when f is the OR function. - In view of our first result above, one may ask whether the log n overhead in the BCW simulation can be avoided even when f is transitive, which is a weaker notion of symmetry. We give a strong negative answer by showing that the log n overhead is still necessary for some transitive functions even when we allow the quantum communication protocol an error probability that can be arbitrarily close to 1/2. - We also give, among other things, a general recipe to construct functions for which the log n overhead is required in the BCW simulation in the bounded-error communication model.	翻訳日:2023-04-27 18:53:47 公開日:2023-04-25
# 機械学習を用いた救急部トリアージ中の敗血症検出 Detection of sepsis during emergency department triage using machine learning ( http://arxiv.org/abs/2204.07657v5 ) ライセンス: Link先を確認	Oleksandr Ivanov, Karin Molander, Robert Dunne, Stephen Liu, Deena Brecher, Kevin Masek, Erica Lewis, Lisa Wolf, Debbie Travers, Deb Delaney, Kyla Montgomery, Christian Reilly	(参考訳) 敗血症は臓器機能不全を伴う生命を脅かす疾患であり、世界でも主要な死因である。敗血症の治療が数時間遅れても死亡率が上昇する。緊急部トリアージ中の敗血症の早期発見は、実験室分析、抗生物質投与、その他の敗血症治療プロトコルの早期開始を可能にする。本研究の目的は、標準敗血症スクリーニングアルゴリズム(感染源を含むsirs)のedトリアージにおける敗血症検出性能と、ehlトリアージデータに基づいて訓練された機械学習アルゴリズムを比較することである。 16病院のトリアージデータを用いた機械学習モデル(KATE Sepsis)を開発した。 KATEシープシスと標準スクリーニングは、成人の医療記録512,949件を振り返って評価した。 KATE Sepsis の AUC は 0.9423 (0.9401 - 0.9441) であり、感度は 71.09% (70.12% - 71.98%)、特異性は94.81% (94.75% - 94.87%) である。標準スクリーニングでは 0.6826 (0.6774 - 0.6878)、感度は 40.8% (39.71% - 41.86%)、特異度は95.72% (95.68% - 95.78%) である。 kate sepsisモデルは、77.67% (75.78% -79.42%) の重症敗血症検出感度、86.95% (84.2% - 88.81%) の敗血症性ショック検出感度を示す。標準スクリーニングプロトコルは、重症敗血症の検出感度が43.06% (41% - 45.87%)、敗血症性ショック検出感度が40% (36.55% - 43.26%)であることを示している。今後の研究は、KATE Sepsisの抗生物質、寛容率、致死率、死亡率に対する将来的な影響に焦点を当てるべきである。 Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Even a few hours of delay in the treatment of sepsis results in increased mortality. Early detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to compare sepsis detection performance at ED triage (prior to the use of laboratory diagnostics) of the standard sepsis screening algorithm (SIRS with source of infection) and a machine learning algorithm trained on EHR triage data. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16participating hospitals. KATE Sepsis and standard screening were retrospectively evaluated on the adult population of 512,949 medical records. KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of 71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standard screening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of 40.8% (39.71% - 41.86%) and specificity of95.72% (95.68% - 95.78%). The KATE Sepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%) sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivity in detecting septic shock. The standard screening protocol demonstrates 43.06% (41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%) sensitivity in detecting septic shock. Future research should focus on the prospective impact of KATE Sepsis on administration of antibiotics, readmission rate, morbidity and mortality.	翻訳日:2023-04-27 18:47:07 公開日:2023-04-25
# 新型コロナウイルス感染拡大に伴うメンタルヘルスのパンデミック : 公衆メンタルヘルスの窓口としてのソーシャルメディア Mental Health Pandemic during the COVID-19 Outbreak: Social Media as a Window to Public Mental Health ( http://arxiv.org/abs/2203.00237v4 ) ライセンス: Link先を確認	Michelle Bak, Chungyi Chiu, Jessie Chin	(参考訳) ロックダウンやソーシャルディスタンシングなどの新型コロナウイルス(covid-19)パンデミックの予防対策が強化され、若者の社会的孤立(社会的ニーズと社会的環境の規定の相違)に対する認識が著しく高まった。社会的孤立は、抑うつ症状の危険因子である状況的孤独(環境変化から生じる孤独)と密接に関連している。以前の研究は、脆弱な若者がredditのようなオンラインソーシャルプラットフォームから支援を求める可能性が高いことを示唆していた。そこで本研究は、新型コロナウイルス(covid-19)の流行による孤独感サブredditにおけるうつ病関連対話の同定と分析を目的としている。本研究は,ロジスティック回帰と話題モデルを用いて,パンデミック前後の孤独度に関する抑うつ関連議論を分類・検討した。その結果、パンデミックの期間中に課題が報告されたうつ病に関する議論(メンタルヘルス、社会的相互作用、家族、感情など)の量が大幅に増加した。また, 抑うつに関する議論から, デート(プレパンデミック)からオンライン交流やコミュニティ(パンデミック)への転換がみられ, パンデミックにおけるオンラインソーシャルサポートの必要性や表現の高まりが示唆された。現在の調査結果は、ソーシャルメディアが公衆のメンタルヘルスを監視する窓口になる可能性を示している。今後の研究は,危機時の監視システム設計に影響を及ぼす現在のアプローチを臨床的に検証する。 Intensified preventive measures during the COVID-19 pandemic, such as lockdown and social distancing, heavily increased the perception of social isolation (i.e., a discrepancy between one's social needs and the provisions of the social environment) among young adults. Social isolation is closely associated with situational loneliness (i.e., loneliness emerging from environmental change), a risk factor for depressive symptoms. Prior research suggested vulnerable young adults are likely to seek support from an online social platform such as Reddit, a perceived comfortable environment for lonely individuals to seek mental health help through anonymous communication with a broad social network. Therefore, this study aims to identify and analyze depression-related dialogues on loneliness subreddits during the COVID-19 outbreak, with the implications on depression-related infoveillance during the pandemic. Our study utilized logistic regression and topic modeling to classify and examine depression-related discussions on loneliness subreddits before and during the pandemic. Our results showed significant increases in the volume of depression-related discussions (i.e., topics related to mental health, social interaction, family, and emotion) where challenges were reported during the pandemic. We also found a switch in dominant topics emerging from depression-related discussions on loneliness subreddits, from dating (prepandemic) to online interaction and community (pandemic), suggesting the increased expressions or need of online social support during the pandemic. The current findings suggest the potential of social media to serve as a window for monitoring public mental health. Our future study will clinically validate the current approach, which has implications for designing a surveillance system during the crisis.	翻訳日:2023-04-27 18:45:32 公開日:2023-04-25
# 手術映像理解のための概念グラフニューラルネットワーク Concept Graph Neural Networks for Surgical Video Understanding ( http://arxiv.org/abs/2202.13402v2 ) ライセンス: Link先を確認	Yutong Ban, Jennifer A. Eckhoff, Thomas M. Ward, Daniel A. Hashimoto, Ozanan R. Meireles, Daniela Rus, Guy Rosman	(参考訳) 私たちは世界の知識と理解を常に統合し、見るものに対する私たちの解釈を強化します。この能力は、AI強化手術など、複数のエンティティや概念を推論するアプリケーションドメインにおいて不可欠である。本稿では,概念知識を時間的概念グラフネットワークを介して時間分析タスクに統合する新しい手法を提案する。提案するネットワークでは,大域的知識グラフが手術例の時間的分析に組み込まれ,データに適用される概念や関係の意味を学習する。本研究は,安全の重要視の検証や,パークランドグレーティングスケールの推定などの作業において,手術映像データから得られた結果を示す。その結果,本手法は複雑なベンチマークの認識と検出を改善し,他の解析的応用も可能となった。 We constantly integrate our knowledge and understanding of the world to enhance our interpretation of what we see. This ability is crucial in application domains which entail reasoning about multiple entities and concepts, such as AI-augmented surgery. In this paper, we propose a novel way of integrating conceptual knowledge into temporal analysis tasks via temporal concept graph networks. In the proposed networks, a global knowledge graph is incorporated into the temporal analysis of surgical instances, learning the meaning of concepts and relations as they apply to the data. We demonstrate our results in surgical video data for tasks such as verification of critical view of safety, as well as estimation of Parkland grading scale. The results show that our method improves the recognition and detection of complex benchmarks as well as enables other analytic applications of interest.	翻訳日:2023-04-27 18:44:58 公開日:2023-04-25
# 高速で高精度な圧縮圧縮ビデオ品質向上のためのビットストリームメタデータの活用 Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement ( http://arxiv.org/abs/2202.00011v2 ) ライセンス: Link先を確認	Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava	(参考訳) ビデオ圧縮は、ソーシャルメディアからビデオ会議まで、現代のインターネットを支える技術の中心的な特徴である。ビデオ圧縮は成熟を続けていますが、多くの圧縮設定では品質の低下が顕著です。これらの設定は、帯域制限や不安定な接続による効率的な動画伝送に重要な応用をもたらす。本研究では,ビデオビットストリームに埋め込まれた構造と動作情報を活用する圧縮ビデオに詳細を復元する深層学習アーキテクチャを開発した。その結果,従来の圧縮補正法と比較して復元精度が向上し,高スループットを実現しつつ,近年のディープラーニングビデオ圧縮法と比較した場合の競合性が示された。さらに、ビットストリームで容易に利用できる量子化データに対して、我々のモデルを条件付けする。これにより、1つのモデルでさまざまな圧縮品質の設定を処理でき、事前作業で複数のモデルが必要になります。 Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.	翻訳日:2023-04-27 18:44:24 公開日:2023-04-25
# モデル変換によるフレキシブル微分可能最適化 Flexible Differentiable Optimization via Model Transformations ( http://arxiv.org/abs/2206.06135v2 ) ライセンス: Link先を確認	Akshay Sharma and Mathieu Besan\c{c}on and Joaquim Dias Garcia and Beno\^it Legat	(参考訳) DiffOpt.jlは、目的および/または制約に存在する任意のパラメータに対する最適化問題の解を通じて区別する、Juliaライブラリである。このライブラリはMathOptInterface上に構築されており、解決者の豊富なエコシステムを活用し、JuMPのようなモデリング言語とうまく連携している。 diffoptは前方微分モードと逆微分モードの両方を提供し、ハイパーパラメータ最適化からバックプロパゲーションや感度分析まで、エンドツーエンドの微分可能プログラミングで制約付き最適化を橋渡しすることができる。 diffopt は二次プログラミングとコニックプログラミングの標準形式を区別するための2つの既知のルールに基づいている。しかし、モデル変換によって区別できる機能のおかげで、ユーザはこれらの形式に限定されず、これらの標準形式に再構成できるモデルのパラメータに関して区別することができる。これは特に、アフィンコニック制約と凸2次制約または客観的関数を混合するプログラムを含む。 We introduce DiffOpt.jl, a Julia library to differentiate through the solution of optimization problems with respect to arbitrary parameters present in the objective and/or constraints. The library builds upon MathOptInterface, thus leveraging the rich ecosystem of solvers and composing well with modeling languages like JuMP. DiffOpt offers both forward and reverse differentiation modes, enabling multiple use cases from hyperparameter optimization to backpropagation and sensitivity analysis, bridging constrained optimization with end-to-end differentiable programming. DiffOpt is built on two known rules for differentiating quadratic programming and conic programming standard forms. However, thanks ability to differentiate through model transformation, the user is not limited to these forms and can differentiate with respect to the parameters of any model that can be reformulated into these standard forms. This notably includes programs mixing affine conic constraints and convex quadratic constraints or objective function.	翻訳日:2023-04-27 18:36:39 公開日:2023-04-25
# オープンアクセスパブリッシングと関連する要因は何か? springer natureのケーススタディ Which Factors are associated with Open Access Publishing? A Springer Nature Case Study ( http://arxiv.org/abs/2208.08221v4 ) ライセンス: Link先を確認	Fakhri Momeni, Stefan Dietze, Philipp Mayr, Kristin Biesenbender and Isabella Peters	(参考訳) Open Access (OA)は、記事へのアクセスを容易にする。しかし、著者や資金提供者は、OAの出版に金銭的支援を受けていない著者がOAの記事の引用に関わらないよう、出版費用を支払わなければならないことが多い。 OAは、出版システムにおける既存の不平等を克服するよりも、さらに悪化させる可能性がある。そこで,Springer Natureに掲載された522,411の論文を調査した。相関分析と回帰分析を用いて、異なる所得水準の国に属する著者間の関係、出版モデルの選択、論文の引用効果について述べる。機械学習の分類手法は,出版モデルの予測における特徴の重要性を検討するのに役立った。以上の結果から, APC ウェイバーの著者はゴールドOA 誌に他よりも多く掲載している。対照的に、APC割引を受ける著者は、OA出版物の中で最も低い割合であり、この割引が著者にゴールドOA雑誌に掲載する動機を不十分にしていると仮定する。 oaオプションはハイブリッドジャーナルでは避けられているが,gold-oaジャーナルではジャーナルランクと出版モデルとの間に強い相関関係がみられた。また,OA出版の収入レベル,年長性,経験が,ハイブリッド雑誌におけるOA出版の予測因子であることも示唆した。 Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate this, we studied 522,411 articles published by Springer Nature. Employing correlation and regression analyses, we describe the relationship between authors affiliated with countries from different income levels, their choice of publishing model, and the citation impact of their papers. A machine learning classification method helped us to explore the importance of different features in predicting the publishing model. The results show that authors eligible for APC waivers publish more in gold-OA journals than others. In contrast, authors eligible for an APC discount have the lowest ratio of OA publications, leading to the assumption that this discount insufficiently motivates authors to publish in gold-OA journals. We found a strong correlation between the journal rank and the publishing model in gold-OA journals, whereas the OA option is mostly avoided in hybrid journals. Also, results show that the countries' income level, seniority, and experience with OA publications are the most predictive factors for OA publishing in hybrid journals.	翻訳日:2023-04-27 18:27:58 公開日:2023-04-25
# BSMS-GNNを用いたメッシュ型物理シミュレーションの効率的学習 Efficient Learning of Mesh-Based Physical Simulation with BSMS-GNN ( http://arxiv.org/abs/2210.02573v3 ) ライセンス: Link先を確認	Yadi Cao, Menglei Chai, Minchen Li, Chenfanfu Jiang	(参考訳) フラットなグラフニューラルネットワーク(GNN)とスタックングメッセージパッシング(MP)による大規模メッシュ上での物理シミュレーションの学習は,ノード数や過度なスムース化といったスケーリングの複雑さのために難しい。物理シミュレーションのための GNN に \textit{multi-scale} 構造を導入することに対するコミュニティの関心が高まっている。しかしながら、現在の最先端の手法は、粗いメッシュの労働集約的な描画に依存するか、空間的近接に基づいて粗いレベルを構築するかによって制限される。 2成分グラフ決定に触発されて,上記の制限に取り組むために,新たなプーリング戦略である \textit{bi-stride} を提案する。バイストライドは、粗いメッシュの手動描画を必要とせず、空間的近接により間違ったエッジを避けることなく、ブロードスファーストサーチ(BFS)の他のフロンティアにノードをプールする。さらに、レベル毎の1MPスキームと非パラメトリズドプールと補間によるアンプールを可能にし、計算コストを大幅に削減するU-Netsに似ている。実験の結果,提案するフレームワークである‘textit{BSMS-GNN} は,物理シミュレーションの精度と計算効率の両面で,既存の手法よりも優れていた。 Learning the physical simulation on large-scale meshes with flat Graph Neural Networks (GNNs) and stacking Message Passings (MPs) is challenging due to the scaling complexity w.r.t. the number of nodes and over-smoothing. There has been growing interest in the community to introduce \textit{multi-scale} structures to GNNs for physical simulation. However, current state-of-the-art methods are limited by their reliance on the labor-intensive drawing of coarser meshes or building coarser levels based on spatial proximity, which can introduce wrong edges across geometry boundaries. Inspired by the bipartite graph determination, we propose a novel pooling strategy, \textit{bi-stride} to tackle the aforementioned limitations. Bi-stride pools nodes on every other frontier of the breadth-first search (BFS), without the need for the manual drawing of coarser meshes and avoiding the wrong edges by spatial proximity. Additionally, it enables a one-MP scheme per level and non-parametrized pooling and unpooling by interpolations, resembling U-Nets, which significantly reduces computational costs. Experiments show that the proposed framework, \textit{BSMS-GNN}, significantly outperforms existing methods in terms of both accuracy and computational efficiency in representative physical simulations.	翻訳日:2023-04-27 18:16:55 公開日:2023-04-25
# CRONOS:Wi-Fi CSIを用いたデバイスフリーNLoS人間プレゼンス検出のためのカラー化とコントラスト学習 CRONOS: Colorization and Contrastive Learning for Device-Free NLoS Human Presence Detection using Wi-Fi CSI ( http://arxiv.org/abs/2211.10354v2 ) ライセンス: Link先を確認	Chia-Che Hsieh, An-Hung Hsiao, Li-Hsiang Shen, Kai-Ten Feng	(参考訳) 近年、広く普及するスマートサービスやアプリケーションに対する需要は急速に増加している。センサーやカメラによるデバイスなしの人間検出は広く採用されているが、プライバシーの問題や、動きのない人の誤検知が伴っている。これらの欠点に対処するため、商用Wi-Fiデバイスから取得したチャネル状態情報(CSI)は、正確な検出のための豊富な信号機能を提供する。しかしながら、既存のシステムは、非視線(NLoS)の下での不正確な分類と、部屋の隅に立っているときのような固定的なシナリオに悩まされている。本研究では,動的な再帰プロット(rps)を生成するcronos(colorization and contrastive learning enhanced nlos human presence detection)と呼ばれるシステムを提案する。また、教師付きコントラスト学習を取り入れて実質的な表現を抽出し、コンサルテーション損失を定式化し、動的ケースと定常ケースの代表的な距離を区別する。さらに,rssとカラーコードcsi比のどちらを利用するかを決定するために,自己切り替え型静的特徴拡張分類器(s3fec)を提案する。包括的実験の結果,cronosは,機械学習や非学習型手法,オープン文学における非csi型機能などを適用した既存システムよりも優れていた。 CRONOSは、空白、移動性、視線(LoS)、NLoSシナリオにおいて、最も高い存在検出精度を達成する。 In recent years, the demand for pervasive smart services and applications has increased rapidly. Device-free human detection through sensors or cameras has been widely adopted, but it comes with privacy issues as well as misdetection for motionless people. To address these drawbacks, channel state information (CSI) captured from commercialized Wi-Fi devices provides rich signal features for accurate detection. However, existing systems suffer from inaccurate classification under a non-line-of-sight (NLoS) and stationary scenario, such as when a person is standing still in a room corner. In this work, we propose a system called CRONOS (Colorization and Contrastive Learning Enhanced NLoS Human Presence Detection), which generates dynamic recurrence plots (RPs) and color-coded CSI ratios to distinguish mobile people from vacancy in a room, respectively. We also incorporate supervised contrastive learning to retrieve substantial representations, where consultation loss is formulated to differentiate the representative distances between dynamic and stationary cases. Furthermore, we propose a self-switched static feature enhanced classifier (S3FEC) to determine the utilization of either RPs or color-coded CSI ratios. Our comprehensive experimental results show that CRONOS outperforms existing systems that apply machine learning, non-learning based methods, as well as non-CSI based features in open literature. CRONOS achieves the highest presence detection accuracy in vacancy, mobility, line-of-sight (LoS), and NLoS scenarios.	翻訳日:2023-04-27 18:09:42 公開日:2023-04-25
# 宇宙デコヒーレンス : 原始パワースペクトルと非ガウス性 Cosmic decoherence: primordial power spectra and non-Gaussianities ( http://arxiv.org/abs/2211.07598v2 ) ライセンス: Link先を確認	Aoumeur Daddi Hammou, Nicola Bartolo	(参考訳) 量子デコヒーレンスがインフレーション宇宙論的摂動に与える影響について検討する。このプロセスは、インフレーションのメカニズムの量子的性質が、インフレーションの変動の量子-古典的遷移の長年の問題と関連していることを示す特定の観察的なサインを印字するかもしれない。いくつかの研究は、原始変動の統計的性質に対する量子デコヒーレンスの影響を調査している。特に、宇宙デコヒーレンスが標準のスローロールインフレーションによって予測される曲率パワースペクトルの補正につながることが示されている。同様に、非ゼロ曲率トリスペクトラムは宇宙デコヒーレンスによって純粋に誘導されることが示されているが、驚くべきことにデコヒーレンスはバイスペクトルを発生しないようである。さらに, ポインターオブザーバブルの一般化形式を採用し, 非消滅曲率双スペクトルをデコヒーレンスが引き起こすことを示し, 具体的な具体的な物理プロセスを提供することにより, 解析をさらに発展させる。原始双スペクトルに関する現在の制約は、環境-システム相互作用の強さに上限を置くことができる。完全な一般性において、デコヒーレンス誘起双スペクトルはスケール依存であり、スケール独立となるパワースペクトルに対応する補正を課す。このような宇宙スケールへのスケール依存は、インフレーション中に起こる量子デコヒーレンス過程の顕著なインプリントを表しているかもしれない。また,宇宙デコヒーレンスが環境の種類とは無関係にスケール独立な補正を誘導する過程を理解するための基準を提供する。最後に,宇宙デコヒーレンスがテンソル摂動に及ぼす影響を考察し,デコヒーレンス補正したテンソル-スカラー摂動比を導出する。特定の場合、デコヒーレンスは標準テンソルパワースペクトルに青い傾いた補正を誘導する。 We study the effect of quantum decoherence on the inflationary cosmological perturbations. This process might imprint specific observational signatures revealing the quantum nature of the inflationary mechanism being related to the longstanding issue of the quantum-to-classical transition of inflationary fluctuations. Several works have investigated the effect of quantum decoherence on the statistical properties of primordial fluctuations. In particular, it has been shown that cosmic decoherence leads to corrections to the curvature power spectrum predicted by standard slow-roll inflation. Equally interesting, a non zero curvature trispectrum has been shown to be purely induced by cosmic decoherence, but surprisingly, decoherence seems not to generate any bispectrum. We further develop such an analysis by adopting a generalized form of the pointer observable, showing that decoherence does induce a non vanishing curvature bispectrum and providing a specific underlying concrete physical process. Present constraints on primordial bispectra allow to put an upper bound on the strength of the environment-system interaction. In full generality, the decoherence-induced bispectrum can be scale dependent provided one imposes the corresponding correction to the power spectrum to be scale independent. Such scale dependence on the largest cosmological scales might represent a distinctive imprint of the quantum decoherence process taking place during inflation. We also provide a criterion that allows to understand when cosmic decoherence induces scale independent corrections, independently of the type of environment considered. As a final result, we study the effect of cosmic decoherence on tensor perturbations and we derive the decoherence corrected tensor-to-scalar perturbation ratio. In specific cases, decoherence induces a blue tilted correction to the standard tensor power spectrum.	翻訳日:2023-04-27 18:08:56 公開日:2023-04-25
# FingerFlex:ECoG信号から指の軌道を推定する FingerFlex: Inferring Finger Trajectories from ECoG signals ( http://arxiv.org/abs/2211.01960v2 ) ライセンス: Link先を確認	Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko	(参考訳) 運動脳コンピュータインタフェース(BCI)の開発は、ニューラルネットワークの時系列復号アルゴリズムに大きく依存している。ディープラーニングアーキテクチャの最近の進歩により、データ内の高次依存性を近似する自動機能選択が可能になった。本稿では,脳波(ECoG)データに対する指の動き回帰に適応した畳み込みエンコーダデコーダアーキテクチャであるFingerFlexモデルについて述べる。実測軌道と予測軌道の相関係数が最大0.74であるBCIコンペティションIVデータセット4で最先端の性能が達成された。提案手法は,完全機能型高精度皮質運動脳-コンピュータインタフェースを開発する機会を提供する。 Motor brain-computer interface (BCI) development relies critically on neural time series decoding algorithms. Recent advances in deep learning architectures allow for automatic feature selection to approximate higher-order dependencies in data. This article presents the FingerFlex model - a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. State-of-the-art performance was achieved on a publicly available BCI competition IV dataset 4 with a correlation coefficient between true and predicted trajectories up to 0.74. The presented method provides the opportunity for developing fully-functional high-precision cortical motor brain-computer interfaces.	翻訳日:2023-04-27 18:06:53 公開日:2023-04-25
# BTS:時間変化CSIによる屋内二室状態検出のための半監督学習における教師の2倍の学習 BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSI ( http://arxiv.org/abs/2212.10802v2 ) ライセンス: Link先を確認	Li-Hsiang Shen, Kai-Jui Chen, An-Hung Hsiao, Kai-Ten Feng	(参考訳) 近年,教師付き学習(SL)とチャネル状態情報(CSI)に基づく屋内人間の存在検知が注目されている。しかし、csiの空間情報に依存する既存の研究は、予測精度を低下させる物体移動、大気要因、機械の再起動などの環境変化に影響を受けやすい。さらに、SLベースの手法では、モデルの再トレーニングに時間を要する。したがって、半教師付き学習方式(SSL)を用いて、継続的に監視されるモデルライフサイクルを設計することが不可欠である。本稿では,SSLとラベル付けされていないデータセットを併用した存在検出システムに対して,BTS学習手法を提案する。提案する教師学習ネットワークは,ラベル付きcsiとラベル付きcsiから空間的・時間的特徴をインテリジェントに学習する。さらに、強化されたペナル化損失関数はエントロピーと距離の計測を利用して、漂流したデータ、すなわち時間変化の影響を受け、元の分布から変化した新しいデータセットの特徴を区別する。実験の結果,BTSシステムはラベルのないデータでモデルを再訓練した後,漸近的精度を保っていることがわかった。さらに、ラベルのないBTSは、SLベースの手法の漸近性能を達成しつつ、最大検出精度で既存のSSLベースのモデルより優れている。 In recent years, indoor human presence detection based on supervised learning (SL) and channel state information (CSI) has attracted much attention. However, the existing studies that rely on spatial information of CSI are susceptible to environmental changes, such as object movement, atmospheric factors, and machine rebooting, which degrade prediction accuracy. Moreover, SL-based methods require time-consuming labeling for retraining models. Therefore, it is imperative to design a continuously monitored model life-cycle using a semi-supervised learning (SSL) based scheme. In this paper, we conceive a bifold teacher-student (BTS) learning approach for presence detection systems that combines SSL by utilizing partially labeled and unlabeled datasets. The proposed primal-dual teacher-student network intelligently learns spatial and temporal features from labeled and unlabeled CSI. Additionally, the enhanced penalized loss function leverages entropy and distance measures to distinguish drifted data, i.e., features of new datasets affected by time-varying effects and altered from the original distribution. The experimental results demonstrate that the proposed BTS system sustains asymptotic accuracy after retraining the model with unlabeled data. Furthermore, the label-free BTS outperforms existing SSL-based models in terms of the highest detection accuracy while achieving the asymptotic performance of SL-based methods.	翻訳日:2023-04-27 17:59:35 公開日:2023-04-25
# 機械学習における公正性と構成の理解に向けて Towards Understanding Fairness and its Composition in Ensemble Machine Learning ( http://arxiv.org/abs/2212.04593v2 ) ライセンス: Link先を確認	Usman Gohar, Sumon Biswas, Hridesh Rajan	(参考訳) 機械学習(ML)ソフトウェアは現代社会において広く採用されており、人種、性別、年齢などに基づく少数派グループに公正な影響が報告されている。近年,MLモデルのアルゴリズムバイアスを計測・緩和する手法が提案されている。既存のアプローチでは、単一分類器ベースのMLモデルに重点を置いている。しかし、現実のMLモデルは複数の独立した学習者(例えばランダムフォレスト)で構成され、フェアネスは非自明な方法で構成される。アンサンブルの公平さはどのように構成されますか。アンサンブルの究極の公平性に対する学習者の公平性の影響はどのようなものか? 公平な学習者は不公平なアンサンブルを生み出すことができるか? さらに、ハイパーパラメータがMLモデルの公平性に影響を与えることが研究によって示されている。アンサンブルハイパーパラメータは、学習者が異なるカテゴリのアンサンブルでどのように結合されるかに影響するため、より複雑である。アンサンブルハイパーパラメータがフェアネスに与える影響を理解することは、プログラマがフェアアンサンブルを設計するのに役立つ。今日では、これらを異なるアンサンブルアルゴリズムについて完全には理解していない。本稿では,バッキング,ブースティング,積み重ね,投票など,現実世界で人気のあるアンサンブルを包括的に研究する。我々は,4つの人気フェアネスデータセットを用いて,Kaggleから収集した168アンサンブルモデルのベンチマークを開発した。私たちはフェアネスの構成を理解するために既存のフェアネスメトリクスを使用します。その結果,アンサンブルは緩和技術を用いることなく,より公平に設計できることがわかった。また,フェアネス構成とデータ特性との相互作用を識別し,フェアアンサンブル設計を導く。最後に、我々のベンチマークはフェアアンサンブルのさらなる研究に活用できる。私たちの知る限りでは、これはまだ文献で提示されていないアンサンブルにおける公正な構成に関する最初のかつ最大の研究の1つである。 Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in different categories of ensembles. Understanding the impact of ensemble hyperparameters on fairness will help programmers design fair ensembles. Today, we do not understand these fully for different ensemble algorithms. In this paper, we comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design. Finally, our benchmark can be leveraged for further research on fair ensembles. To the best of our knowledge, this is one of the first and largest studies on fairness composition in ensembles yet presented in the literature.	翻訳日:2023-04-27 17:58:45 公開日:2023-04-25
# ニューラルフーリエフィルタバンク Neural Fourier Filter Bank ( http://arxiv.org/abs/2212.01735v2 ) ライセンス: Link先を確認	Zhijie Wu and Yuhe Jin and Kwang Moo Yi	(参考訳) 本稿では, 効率的かつ高精度な再構築手法を提案する。ウェーブレットに触発されて、信号が空間的にも周波数的にも分解されるニューラルフィールドを学習する。空間分解のための最近のグリッドベースのパラダイムに従っているが、既存の作業とは異なり、フーリエ特徴エンコーディングを通じて各グリッドに特定の周波数を格納することを推奨している。次に、正の活性化を持つ多層パーセプトロンを適用し、これらフーリエエンコードされた特徴を適切な層に配置することで、高周波数成分を低周波成分の上に順次蓄積し、最終的な出力を形成する。本手法は,2次元画像整合,3次元形状再構成,神経放射場など,複数のタスクにおけるモデルコンパクト性と収束速度に関する技術よりも優れていることを示す。私たちのコードはhttps://github.com/ubc-vision/nffbで利用可能です。 We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, we learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and convergence speed on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields. Our code is available at https://github.com/ubc-vision/NFFB.	翻訳日:2023-04-27 17:58:05 公開日:2023-04-25
# ランク付きQRアルゴリズムを用いた貯水池計算のための時間シフト選択 Time-shift selection for reservoir computing using a rank-revealing QR algorithm ( http://arxiv.org/abs/2211.17095v3 ) ライセンス: Link先を確認	Joseph D. Hart and Francesco Sorrentino and Thomas L. Carroll	(参考訳) 出力層のみをトレーニングしたリカレントニューラルネットワークパラダイムであるReservoir Computingは、非線形システムの予測や制御といったタスクにおいて、顕著なパフォーマンスを示している。近年,貯水池で発生した信号に時間シフトを加えることで,性能が向上することが実証された。そこで,本研究では,位取りQRアルゴリズムを用いて,貯水池行列のランクを最大化する手法を提案する。この技術はタスク依存ではなく、システムのモデルを必要としないため、アナログハードウェア貯水池コンピュータに直接適用することができる。我々は,光電子発振器に基づく2種類のリザーバコンピュータと,$tanh$アクティベーション関数を持つ従来のリカレントネットワークを用いた時間シフト選択手法を示す。この手法は,ランダムな時間シフト選択よりも,ほぼすべてのケースにおいて精度が向上することを見出した。 Reservoir computing, a recurrent neural network paradigm in which only the output layer is trained, has demonstrated remarkable performance on tasks such as prediction and control of nonlinear systems. Recently, it was demonstrated that adding time-shifts to the signals generated by a reservoir can provide large improvements in performance accuracy. In this work, we present a technique to choose the time-shifts by maximizing the rank of the reservoir matrix using a rank-revealing QR algorithm. This technique, which is not task dependent, does not require a model of the system, and therefore is directly applicable to analog hardware reservoir computers. We demonstrate our time-shift selection technique on two types of reservoir computer: one based on an opto-electronic oscillator and the traditional recurrent network with a $tanh$ activation function. We find that our technique provides improved accuracy over random time-shift selection in essentially all cases.	翻訳日:2023-04-27 17:57:28 公開日:2023-04-25
# ボトル内の言語:解釈可能な画像分類のための言語モデルガイド型概念ボトルネック Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification ( http://arxiv.org/abs/2211.11158v2 ) ライセンス: Link先を確認	Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, Mark Yatskar	(参考訳) 概念ボトルネックモデル(cbm)は本質的に解釈可能なモデルであり、モデル決定を人間の可読概念に分解する。これにより、モデルが失敗した理由を簡単に理解できるようになる。 CBMは手動で指定した概念を必要とし、しばしばブラックボックスの能力に劣る。まず,ブラックボックスモデルと同様の精度を手作業で指定することなく,高性能なcbmを構築する方法を示す。当社のアプローチであるlanguage guided bottlenecks(labo)は、言語モデルgpt-3を活用して、可能なボトルネックの大きな空間を定義します。問題領域が与えられた場合、LaBoはGPT-3を使用してカテゴリに関する事実文を生成し、候補概念を形成する。 laboは、識別的かつ多様な情報の選択を促進する新しいサブモジュラーユーティリティを通じて、可能なボトルネックを効率的に検索する。最終的に、GPT-3の知覚概念は、CLIPを使用して画像に整列してボトルネック層を形成することができる。実験により、LaBoは視覚認識にとって重要な概念の非常に効果的な事前であることが示された。 11の多様なデータセットによる評価では、LaBoボトルネックは数ショットの分類で優れており、1ショットでのブラックボックス線形プローブよりも11.7%正確で、より多くのデータに匹敵する。全体として、LaBoはブラックボックスアプローチよりも、本質的に解釈可能なモデルが、同じような、あるいはより良いパフォーマンスで広く適用可能であることを示した。 Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and are first to show how to construct high-performance CBMs without manual specification of similar accuracy to black box models. Our approach, Language Guided Bottlenecks (LaBo), leverages a language model, GPT-3, to define a large space of possible bottlenecks. Given a problem domain, LaBo uses GPT-3 to produce factual sentences about categories to form candidate concepts. LaBo efficiently searches possible bottlenecks through a novel submodular utility that promotes the selection of discriminative and diverse information. Ultimately, GPT-3's sentential concepts can be aligned to images using CLIP, to form a bottleneck layer. Experiments demonstrate that LaBo is a highly effective prior for concepts important to visual recognition. In the evaluation with 11 diverse datasets, LaBo bottlenecks excel at few-shot classification: they are 11.7% more accurate than black box linear probes at 1 shot and comparable with more data. Overall, LaBo demonstrates that inherently interpretable models can be widely applied at similar, or better, performance than black box approaches.	翻訳日:2023-04-27 17:56:38 公開日:2023-04-25
# 魚眼画像の空間的統合に基づく人物再同定 Spatio-Visual Fusion-Based Person Re-Identification for Overhead Fisheye Images ( http://arxiv.org/abs/2212.11477v2 ) ライセンス: Link先を確認	Mertcan Cokbas, Prakash Ishwar, Janusz Konrad	(参考訳) パーソナライズ再識別(prid)は、様々なシーンをサイドマウントの直線レンズカメラで監視する典型的な監視シナリオで徹底的に研究されている。これまで魚眼カメラを頭上に搭載する手法は提案されておらず、性能に乏しい。この性能ギャップを解消するために,魚眼PRIDのための多機能フレームワークを提案する。魚眼PRIDデータセットであるFRIDAを用いた各種特徴組合せのためのフレームワークの性能評価を行った。提案手法は,近年の外観に基づくディープラーニング手法を約18%,位置ベース手法を約3%,マッチング精度を約3%向上させた。また,提案するpridフレームワークを,屋内の大規模密集した空間で数える人々に適用する可能性を示す。 Person re-identification (PRID) has been thoroughly researched in typical surveillance scenarios where various scenes are monitored by side-mounted, rectilinear-lens cameras. To date, few methods have been proposed for fisheye cameras mounted overhead and their performance is lacking. In order to close this performance gap, we propose a multi-feature framework for fisheye PRID where we combine deep-learning, color-based and location-based features by means of novel feature fusion. We evaluate the performance of our framework for various feature combinations on FRIDA, a public fisheye PRID dataset. The results demonstrate that our multi-feature approach outperforms recent appearance-based deep-learning methods by almost 18% points and location-based methods by almost 3% points in matching accuracy. We also demonstrate the potential application of the proposed PRID framework to people counting in large, crowded indoor spaces.	翻訳日:2023-04-27 17:47:34 公開日:2023-04-25
# サブガウス分布の高速, サンプル効率, アフィン不変プライベート平均と共分散推定 Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions ( http://arxiv.org/abs/2301.12250v2 ) ライセンス: Link先を確認	Gavin Brown, Samuel B. Hopkins and Adam Smith	(参考訳) ほぼ最適なサンプル複雑性を持つ高次元共分散平均推定のための高速かつ微分プライベートなアルゴリズムを提案する。この保証を達成するのは指数時間推定器のみであった。未知の平均$\mu$ と共分散 $\sigma$ から$n$のサンプルが与えられると、我々の$(\varepsilon,\delta)$ は$\tilde{\mu}$を生成し、$n \gtrsim \tfrac d {\alpha^2} + \tfrac{d \sqrt{\log 1/\delta}}{\alpha \varepsilon}+\frac{d\log 1/\delta}{\varepsilon}$となる。 mahalanobis error metric $\\|\mu - \hat{\mu}\\|_{\sigma}$は、$\hat \mu$ と$\mu$ の間の距離を測定し、サンプル平均の誤差を特徴付ける。我々のアルゴリズムは時間$\tilde{O}(nd^{\omega - 1} + nd/\varepsilon)$で動き、$\omega < 2.38$は行列乗算指数である。 brown, gaboardi, smith, ullman, zakynthinou (2021) の指数時間アプローチを適用し,安定平均と共分散推定サブルーチンの効率的な変種を与え,サンプルの複雑さを上述の最適境界まで向上させた。安定共分散推定器は非制限部分ガウス分布のプライベート共分散推定に変換できる。 n\gtrsim d^{3/2}$サンプルでは、スペクトルノルムで推定が正確である。これは$n= o(d^2)$ サンプルを用いた最初のそのようなアルゴリズムであり、alabiら (2022) が提起した解答である。 n\gtrsim d^2$サンプルでは、この推定はフロベニウスノルムで正確である。これにより、テレビ距離における非制限ガウス分布のプライベート学習のための高速でほぼ最適なアルゴリズムが導かれる。 duchi, haque, kuditipudi (2023)も同様の結果が独立して得られた。 We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given $n$ samples from a (sub-)Gaussian distribution with unknown mean $\mu$ and covariance $\Sigma$, our $(\varepsilon,\delta)$-differentially private estimator produces $\tilde{\mu}$ such that $\\|\mu - \tilde{\mu}\\|_{\Sigma} \leq \alpha$ as long as $n \gtrsim \tfrac d {\alpha^2} + \tfrac{d \sqrt{\log 1/\delta}}{\alpha \varepsilon}+\frac{d\log 1/\delta}{\varepsilon}$. The Mahalanobis error metric $\\|\mu - \hat{\mu}\\|_{\Sigma}$ measures the distance between $\hat \mu$ and $\mu$ relative to $\Sigma$; it characterizes the error of the sample mean. Our algorithm runs in time $\tilde{O}(nd^{\omega - 1} + nd/\varepsilon)$, where $\omega < 2.38$ is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With $n\gtrsim d^{3/2}$ samples, our estimate is accurate in spectral norm. This is the first such algorithm using $n= o(d^2)$ samples, answering an open question posed by Alabi et al. (2022). With $n\gtrsim d^2$ samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.	翻訳日:2023-04-27 17:40:07 公開日:2023-04-25
# 非線型対流拡散吸着系の効率的なハイブリッドモデリングと吸着モデル発見--系統的科学的機械学習アプローチ Efficient hybrid modeling and sorption model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach ( http://arxiv.org/abs/2303.13555v3 ) ライセンス: Link先を確認	Vinicius V. Santana, Erbet Costa, Carine M. Rebello, Ana Mafalda Ribeiro, Chris Rackauckas, Idelfonso B. R. Nogueira	(参考訳) 本研究では,非線型対流拡散吸着系における効率的なハイブリッドモデルの作成と吸着取り込みモデル発見のための機械学習手法を提案する。これは、勾配に基づく最適化器、随伴感度解析、JITコンパイルベクタージャコビアン積を用いて、空間離散化と適応積分器を組み合わせたこれらの複雑なシステムを効果的に訓練する方法を示す。ニューラルネットワークの欠落する機能を特定するためにスパースとシンボリックレグレッションが用いられた。提案手法のロバスト性は, 固定層吸着のノイズ破砕曲線観測のシリカ内データセット上で試験され, 良好なハイブリッドモデルが得られた。本研究は, 偏差とシンボリック回帰を用いて吸収吸収速度論を再構成し, 同定多項式を用いたブレークスルー曲線を精度良く予測し, 吸着運動法則構造の発見のためのフレームワークの可能性を強調した。 This study presents a systematic machine learning approach for creating efficient hybrid models and discovering sorption uptake models in non-linear advection-diffusion-sorption systems. It demonstrates an effective method to train these complex systems using gradient based optimizers, adjoint sensitivity analysis, and JIT-compiled vector Jacobian products, combined with spatial discretization and adaptive integrators. Sparse and symbolic regression were employed to identify missing functions in the artificial neural network. The robustness of the proposed method was tested on an in-silico data set of noisy breakthrough curve observations of fixed-bed adsorption, resulting in a well-fitted hybrid model. The study successfully reconstructed sorption uptake kinetics using sparse and symbolic regression, and accurately predicted breakthrough curves using identified polynomials, highlighting the potential of the proposed framework for discovering sorption kinetic law structures.	翻訳日:2023-04-27 17:31:47 公開日:2023-04-25
# 規制市場:AIガバナンスの未来 Regulatory Markets: The Future of AI Governance ( http://arxiv.org/abs/2304.04914v4 ) ライセンス: Link先を確認	Gillian K. Hadfield, Jack Clark	(参考訳) 人工知能を適切に規制することは、ますます緊急の政策課題である。立法府や規制当局は、公共の要求を法的要件に最善に翻訳するために必要な専門知識を欠いている。産業の自己規制への過度な依存は、民主的要求に責任を負うAIシステムの生産者とユーザを保持することに失敗する。民間規制当局から規制サービスを購入するための規制対象を政府が求める規制市場が提案されている。 ai規制に対するこのアプローチは、指揮統制規制と自己規制の両方の限界を克服する可能性がある。規制市場は、政策立案者の指示された目的を最も達成するための規制方法を開拓する市場力と産業R&Dの努力に頼りながら、AI規制のための政策優先順位を確立することができる。 Appropriately regulating artificial intelligence is an increasingly urgent policy challenge. Legislatures and regulators lack the specialized knowledge required to best translate public demands into legal requirements. Overreliance on industry self-regulation fails to hold producers and users of AI systems accountable to democratic demands. Regulatory markets, in which governments require the targets of regulation to purchase regulatory services from a private regulator, are proposed. This approach to AI regulation could overcome the limitations of both command-and-control regulation and self-regulation. Regulatory market could enable governments to establish policy priorities for the regulation of AI, whilst relying on market forces and industry R&D efforts to pioneer the methods of regulation that best achieve policymakers' stated objectives.	翻訳日:2023-04-27 17:21:54 公開日:2023-04-25
# 木構造Parzen推定器:アルゴリズム成分の理解と実験性能向上のための役割 Tree-structured Parzen estimator: Understanding its algorithm components and their roles for better empirical performance ( http://arxiv.org/abs/2304.11127v2 ) ライセンス: Link先を確認	Shuhei Watanabe	(参考訳) 多くの領域における最近の進歩は、より複雑な実験設計を必要とする。このような複雑な実験は、しばしばパラメータチューニングを必要とする多くのパラメータを持つ。ベイズ最適化手法であるTPE(Tree-structured Parzen estimator)は,最近のパラメータチューニングフレームワークで広く利用されている。その人気にもかかわらず、制御パラメータとアルゴリズム直観の役割については議論されていない。本チュートリアルでは,多種多様なベンチマークを用いて,各制御パラメータの役割とハイパーパラメータ最適化への影響を明らかにする。アブレーション研究から得られた推奨設定とベースライン手法を比較し,提案設定がTPEの性能を向上させることを示す。 tpeの実装はhttps://github.com/nabenabe0928/tpe/tree/single-optで利用可能です。 Recent advances in many domains require more and more complicated experiment design. Such complicated experiments often have many parameters, which necessitate parameter tuning. Tree-structured Parzen estimator (TPE), a Bayesian optimization method, is widely used in recent parameter tuning frameworks. Despite its popularity, the roles of each control parameter and the algorithm intuition have not been discussed so far. In this tutorial, we will identify the roles of each control parameter and their impacts on hyperparameter optimization using a diverse set of benchmarks. We compare our recommended setting drawn from the ablation study with baseline methods and demonstrate that our recommended setting improves the performance of TPE. Our TPE implementation is available at https://github.com/nabenabe0928/tpe/tree/single-opt.	翻訳日:2023-04-27 17:02:23 公開日:2023-04-25
# 医用画像解析のためのsegment anythingモデル--実験的検討 Segment Anything Model for Medical Image Analysis: an Experimental Study ( http://arxiv.org/abs/2304.10517v2 ) ライセンス: Link先を確認	Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang	(参考訳) 医用画像のセグメンテーションモデルは、データアノテーションの可用性と取得費用が限られているため、いまだに困難である。 Segment Anything Model (SAM)は10億以上のアノテーションに基づいてトレーニングされた基礎モデルであり、主に自然画像を対象としており、ユーザ定義の関心対象をインタラクティブな方法でセグメント化することを目的としている。自然画像における印象的な性能にもかかわらず、医療画像領域に移行する際にモデルがどのように影響を受けるかは不明だ。本稿では,様々な形態や解剖から11の医用画像データセットを収集し,samの医療画像のセグメント化能力について広範な評価を行った。実験では,対話的セグメンテーションをシミュレートする標準手法を用いて点プロンプトを生成した。実験の結果,1回のプロンプトに基づくSAMのパフォーマンスは,脊椎MRIデータセットの0.1135から股関節X線データセットの0.8650まで,タスクやデータセットによって大きく異なることがわかった。腫瘍のセグメンテーションのような他の多くのシナリオでは、不明瞭なプロンプトと貧弱なプロンプトを持つ、よく知られたオブジェクトを含むタスクのパフォーマンスは高いように見える。複数のプロンプトが提供されると、パフォーマンスがわずかに改善されるだけでなく、オブジェクトが連続していないデータセットも改善される。 RITMと比較すると、SAMは1つのプロンプトに対してより優れた性能を示したが、2つのメソッドの同様の性能はより多くのプロンプトに対して高い性能を示した。ゼロショット学習のセットアップでは、samはいくつかのデータセットで印象的なパフォーマンスを示すが、他のデータセットではパフォーマンスが低かった。 SAMは、モデルとして、そして学習パラダイムとして、医療画像領域に影響を及ぼすかもしれないが、この領域に適応する適切な方法を特定するためには、広範な研究が必要である。 Training segmentation models for medical images continues to be challenging due to the limited availability and acquisition expense of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to be able to segment the user-defined object of interest in an interactive manner. Despite its impressive performance on natural images, it is unclear how the model is affected when shifting to medical image domains. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 11 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point prompts using a standard method that simulates interactive segmentation. Experimental results show that SAM's performance based on single prompts highly varies depending on the task and the dataset, i.e., from 0.1135 for a spine MRI dataset to 0.8650 for a hip x-ray dataset, evaluated by IoU. Performance appears to be high for tasks including well-circumscribed objects with unambiguous prompts and poorer in many other scenarios such as segmentation of tumors. When multiple prompts are provided, performance improves only slightly overall, but more so for datasets where the object is not contiguous. An additional comparison to RITM showed a much better performance of SAM for one prompt but a similar performance of the two methods for a larger number of prompts. We conclude that SAM shows impressive performance for some datasets given the zero-shot learning setup but poor to moderate performance for multiple other datasets. While SAM as a model and as a learning paradigm might be impactful in the medical imaging domain, extensive research is needed to identify the proper ways of adapting it in this domain.	翻訳日:2023-04-27 17:01:48 公開日:2023-04-25
# ロボット脳としてのLLM : エゴセントリック記憶と制御の統合 LLM as A Robotic Brain: Unifying Egocentric Memory and Control ( http://arxiv.org/abs/2304.09349v2 ) ライセンス: Link先を確認	Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem	(参考訳) embodied aiは、物理的または仮想の体型(つまりロボット)を持ち、環境と動的に相互作用できるインテリジェントなシステムの研究と開発に焦点を当てている。メモリと制御は、具体化されたシステムの2つの重要な部分であり、通常、それぞれをモデル化するために別々のフレームワークが必要です。本稿では,ロボット脳として大規模言語モデルを用いて自己中心記憶と制御を統一する,llm-brainと呼ばれる新しい汎用フレームワークを提案する。 LLM-Brainフレームワークは、ゼロショット学習アプローチを利用して、ロボットタスクのための複数のマルチモーダル言語モデルを統合する。 LLM-Brain内の全てのコンポーネントは、認識、計画、制御、記憶を含む閉ループ多ラウンド対話において自然言語を用いて通信する。システムのコアは、エゴセントリックメモリを維持し、ロボットを制御するための具体化されたllmである。 LLM-Brainは,アクティブ探索と具体的質問応答という,下流の2つの課題を調べることで実証する。アクティブな探索タスクでは、ロボットは限られた数のアクションで未知の環境を広範囲に探索する必要がある。一方、具体的質問応答タスクでは、ロボットが事前探索中に得られた観察に基づいて質問に答える必要がある。 Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.	翻訳日:2023-04-27 16:59:59 公開日:2023-04-25
# 強化学習に基づく制御器に対するモデル抽出攻撃 Model Extraction Attacks Against Reinforcement Learning Based Controllers ( http://arxiv.org/abs/2304.13090v1 ) ライセンス: Link先を確認	Momina Sajid, Yanning Shen, Yasser Shoukry	(参考訳) 本稿では,攻撃者がシステムのフィードバックコントローラを推定(あるいは抽出)しようとするサイバー物理システムにおけるモデル抽出攻撃の問題を紹介する。コントローラの抽出(または推定)は、システムの将来の制御アクションを予測し、それに応じて攻撃を計画できるため、攻撃者に対して一致しないエッジを提供する。したがって、攻撃者がそのような攻撃を行う能力を理解することが重要である。本稿では,Reinforcement Learning (RL)アルゴリズムを用いてディープニューラルネットワーク(DNN)コントローラをトレーニングし,確率的システムを制御する際の設定に焦点を当てる。我々は、そのような未知のDNNコントローラを推定することを目的とした攻撃者の役割を担い、二相アルゴリズムを提案する。オフラインフェーズとも呼ばれる第1フェーズでは、攻撃者はRL-リワード関数とシステムダイナミクスに関するサイドチャネル情報を使用して、未知のDNNの候補推定セットを特定する。オンラインフェーズとも呼ばれる第2フェーズでは、攻撃者は未知のDNNの行動を観察し、これらの観察を使用して最終的なポリシー推定のセットをショートリスト化する。未知のDNNと推定したDNNの誤差を理論的に解析する。また,提案アルゴリズムの有効性を示す数値的な結果も提供する。 We introduce the problem of model-extraction attacks in cyber-physical systems in which an attacker attempts to estimate (or extract) the feedback controller of the system. Extracting (or estimating) the controller provides an unmatched edge to attackers since it allows them to predict the future control actions of the system and plan their attack accordingly. Hence, it is important to understand the ability of the attackers to perform such an attack. In this paper, we focus on the setting when a Deep Neural Network (DNN) controller is trained using Reinforcement Learning (RL) algorithms and is used to control a stochastic system. We play the role of the attacker that aims to estimate such an unknown DNN controller, and we propose a two-phase algorithm. In the first phase, also called the offline phase, the attacker uses side-channel information about the RL-reward function and the system dynamics to identify a set of candidate estimates of the unknown DNN. In the second phase, also called the online phase, the attacker observes the behavior of the unknown DNN and uses these observations to shortlist the set of final policy estimates. We provide theoretical analysis of the error between the unknown DNN and the estimated one. We also provide numerical results showing the effectiveness of the proposed algorithm.	翻訳日:2023-04-27 16:54:33 公開日:2023-04-25
# 目的:自己監督目標が視覚トランスフォーマー表現に与える影響を理解すること Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations ( http://arxiv.org/abs/2304.13089v1 ) ライセンス: Link先を確認	Shashank Shekhar, Florian Bordes, Pascal Vincent, Ari Morcos	(参考訳) 共同学習(例: simclr, moco, dino)と再構成学習(例: beit, simmim, mae)は視覚トランスフォーマーの自己教師付き学習のための2つの主要なパラダイムであるが、それらは転送性能において大きく異なる。本稿では,これらの目的が学習表現の構造と伝達性に与える影響を分析することにより,これらの違いを説明することを目的とする。分析の結果,リコンストラクションに基づく学習機能は,共同インベディングに基づく学習機能とは大きく異なっており,類似した目的を持ったモデルでは,アーキテクチャ全体でも類似した機能を学習できることが判明した。これらの違いはネットワークの初期に発生し、主に注目層と正規化層によって引き起こされる。異なる目的が異なる情報分布と学習表現の不変性を駆動するため,ジョイントエンベディング特徴は分類のためのより良い線形プローブ移動をもたらすことがわかった。これらの違いは、機能に空間的特異性を必要とする下流タスクの転送性能の逆の傾向を説明する。最後に, 微調整による再構成表現が, より優れた伝達を可能にすること, 微調整による情報再構成が, 事前訓練された関節埋め込みモデルとよりよく似たものになることを示す。 Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network and are primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the learned representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that fine-tuning re-organizes the information to be more similar to pre-trained joint embedding models.	翻訳日:2023-04-27 16:54:11 公開日:2023-04-25
# 新興技術の組織的ガバナンス - 医療におけるAI導入 Organizational Governance of Emerging Technologies: AI Adoption in Healthcare ( http://arxiv.org/abs/2304.13081v1 ) ライセンス: Link先を確認	Jee Young Kim, William Boag, Freya Gulamali, Alifia Hasan, Henry David Jeffry Hogg, Mark Lifson, Deirdre Mulligan, Manesh Patel, Inioluwa Deborah Raji, Ajai Sehgal, Keo Shaw, Danny Tobey, Alexandra Valladares, David Vidal, Suresh Balu, Mark Sendak	(参考訳) 民間および公共セクターの構造と規範は、新しい技術が実際にどのように使われているかを洗練している。医療分野では、AIの採用が急増しているにもかかわらず、その利用と統合を取り巻く組織ガバナンスはしばしば理解されていない。この研究でHealth AI Partnership(HAIP)が目指すのは、医療設定におけるAIシステムの適切な組織的ガバナンスの要件をより適切に定義し、ヘルスシステムリーダを支援して、AIの採用に関するより詳細な決定を行うことだ。この理解に向けて、私たちはまず、医療におけるAI採用の標準をどのように設計して、簡単かつ効率的に使用できるかを特定する。次に、特定医療システムにおけるAI技術の実践的導入に関わる、正確な決定ポイントを図示する。実際に、米国の主要医療機関のリーダーと関連する分野の重要情報提供者との複数組織的なコラボレーションを通じて、これを達成します。コンサルタントのIDEO.orgを使って、医療やAI倫理の専門家とユーザビリティテストのセッションを行うことができた。ユーザビリティ分析では、組織リーダが技術導入にアプローチする方法に合わせて、モックの重要な決定ポイントを中心に構成されたプロトタイプが明らかになった。同時に,医療関連分野の専門家89人と半構造化インタビューを行った。修正された基盤理論アプローチを使用して、AI導入ライフサイクルを通じて8つの重要な決定ポイントと包括的な手順を特定できた。これは、米国の医療システムによるAI導入に関わる、現在のガバナンス構造とプロセスに関する、最も詳細な定性的な分析の1つである。これらの発見が、医療における新興テクノロジーの安全で効果的で責任ある採用を促進する能力を構築するための将来の取り組みを知らせてくれることを期待している。 Private and public sector structures and norms refine how emerging technology is used in practice. In healthcare, despite a proliferation of AI adoption, the organizational governance surrounding its use and integration is often poorly understood. What the Health AI Partnership (HAIP) aims to do in this research is to better define the requirements for adequate organizational governance of AI systems in healthcare settings and support health system leaders to make more informed decisions around AI adoption. To work towards this understanding, we first identify how the standards for the AI adoption in healthcare may be designed to be used easily and efficiently. Then, we map out the precise decision points involved in the practical institutional adoption of AI technology within specific health systems. Practically, we achieve this through a multi-organizational collaboration with leaders from major health systems across the United States and key informants from related fields. Working with the consultancy IDEO.org, we were able to conduct usability-testing sessions with healthcare and AI ethics professionals. Usability analysis revealed a prototype structured around mock key decision points that align with how organizational leaders approach technology adoption. Concurrently, we conducted semi-structured interviews with 89 professionals in healthcare and other relevant fields. Using a modified grounded theory approach, we were able to identify 8 key decision points and comprehensive procedures throughout the AI adoption lifecycle. This is one of the most detailed qualitative analyses to date of the current governance structures and processes involved in AI adoption by health systems in the United States. We hope these findings can inform future efforts to build capabilities to promote the safe, effective, and responsible adoption of emerging technologies in healthcare.	翻訳日:2023-04-27 16:53:46 公開日:2023-04-25
# iMixer:階層型のHopfieldネットワークは、可逆的で暗黙的で反復的なMLP-Mixerを意味する iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer ( http://arxiv.org/abs/2304.13061v1 ) ライセンス: Link先を確認	Toshihiro Ota, Masato Taki	(参考訳) ここ数年、コンピュータビジョンにおけるトランスフォーマーの成功は、MLP-Mixerのようなトランスフォーマーと競合する多くの代替モデルの発見を刺激してきた。弱い誘導バイアスにもかかわらず、これらのモデルはよく研究された畳み込みニューラルネットワークに匹敵する性能を達成した。最近のホップフィールドネットワークの研究は、あるエネルギーベースの連想メモリモデルとトランスフォーマーまたはMLPミクサーの対応を示唆し、トランスフォーマー型アーキテクチャの設計の理論的背景に光を当てた。本稿では,最近導入された階層型ホップフィールドネットワークへの対応を一般化し,新しいMLP-Mixerモデルの一般化であるiMixerを求める。通常のフィードフォワードニューラルネットワークとは異なり、iMixerは出力側から入力側へ前進するMLP層を含んでいる。モジュールを可逆的,暗黙的,反復的混合モジュールの例として特徴づける。画像分類タスクの様々なデータセットを用いてモデル性能を評価し,ベースラインのバニラMLP-Mixerと比較して,iMixerが合理的に改善できることを確認した。その結果、ホップフィールドネットワークとミキサーモデルとの対応は、より広い種類のトランスフォーマーライクなアーキテクチャ設計を理解するための原則であることが示唆された。 In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak induced bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer reasonably achieves the improvement compared to the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.	翻訳日:2023-04-27 16:53:23 公開日:2023-04-25
# just構造に関する事前学習 : トランスファー学習による言語帰納的バイアスの理解 Pretrain on just structure: Understanding linguistic inductive biases using transfer learning ( http://arxiv.org/abs/2304.13060v1 ) ライセンス: Link先を確認	Isabel Papadimitriou and Dan Jurafsky	(参考訳) 人間とトランスフォーマーの両方の言語モデルは、明示的な構造的監督なしに言語を学べる。この学習を可能にする帰納的学習バイアスは何か? 本研究では,人工構造データへの事前学習による構造バイアスを伴う言語モデルの提案と,英語の微調整による評価により,異なる帰納的学習バイアスの効果について検討する。実験的なセットアップにより、言語モデルの帰納バイアスを積極的に制御できるようになります。実験では,3種類の帰納バイアスの比較成功について検討した。 1)帰納的階層的処理のための帰納的バイアス 2)文脈自由文法でモデル化できない制約のないトークン分岐依存性に対する帰納的バイアス 3) zipfian power-law vocabulary distribution に対する帰納的バイアス。複雑なトークン-トークン間の相互作用が最高の帰納バイアスを形成し、非文脈自由の場合ではこれが最強であることを示す。また、Zipf の語彙分布は文法構造とは独立に優れた帰納的バイアスを形成することを示す。本研究は,人間では実行できない制御型言語学習実験を行うトランスフォーマーモデルの能力を活用して,人間と機械の両方で言語学習を促進する構造に関する仮説を提示する。 Both humans and transformer language models are able to learn language without explicit structural supervision. What inductive learning biases make this learning possible? In this study, we examine the effect of different inductive learning biases by predisposing language models with structural biases through pretraining on artificial structured data, and then evaluating by fine-tuning on English. Our experimental setup gives us the ability to actively control the inductive bias of language models. With our experiments, we investigate the comparative success of three types of inductive bias: 1) an inductive bias for recursive, hierarchical processing 2) an inductive bias for unrestricted token-token dependencies that can't be modeled by context-free grammars, and 3) an inductive bias for a Zipfian power-law vocabulary distribution. We show that complex token-token interactions form the best inductive biases, and that this is strongest in the non-context-free case. We also show that a Zipfian vocabulary distribution forms a good inductive bias independently from grammatical structure. Our study leverages the capabilities of transformer models to run controlled language learning experiments that are not possible to run in humans, and surfaces hypotheses about the structures that facilitate language learning in both humans and machines.	翻訳日:2023-04-27 16:53:00 公開日:2023-04-25
# 摂動散乱におけるエントロピー成長について On Entropy Growth in Perturbative Scattering ( http://arxiv.org/abs/2304.13052v1 ) ライセンス: Link先を確認	Clifford Cheung, Temple He, Allic Sivaramakrishnan	(参考訳) 熱力学の第2法則に触発されて,二成分系における生成状態の動的ユニタリ進化によって生じるサブシステムエントロピーの変化を考察する。摂動相互作用における先行次数において、サブシステムの量子$n$-Tsallisエントロピーが決して減少しないことを証明し、サブシステムが等確率状態の統計的混合として初期化されることを条件として、$\Delta S_n \geq 0$ とする。これは任意のインタラクションの選択と補完サブシステムの初期化に対して当てはまる。この初期状態の条件が破られると、サブシステムエントロピーである$\delta s_n < 0$ を減少させる ``maxwell's demon''' プロセスを明示的に構築することができる。注目すべきは、粒子散乱の場合、$n$-Tsallisエントロピーに対応する回路図は、現代の散乱振幅プログラムで現れるオンシェル図と同じであり、$\Delta S_n \geq 0$ は断面の非負性と密接に関連していることである。 Inspired by the second law of thermodynamics, we study the change in subsystem entropy generated by dynamical unitary evolution of a product state in a bipartite system. Working at leading order in perturbative interactions, we prove that the quantum $n$-Tsallis entropy of a subsystem never decreases, $\Delta S_n \geq 0$, provided that subsystem is initialized as a statistical mixture of states of equal probability. This is true for any choice of interactions and any initialization of the complementary subsystem. When this condition on the initial state is violated, it is always possible to explicitly construct a ``Maxwell's demon'' process that decreases the subsystem entropy, $\Delta S_n < 0$. Remarkably, for the case of particle scattering, the circuit diagrams corresponding to $n$-Tsallis entropy are the same as the on-shell diagrams that have appeared in the modern scattering amplitudes program, and $\Delta S_n \geq 0$ is intimately related to the nonnegativity of cross-sections.	翻訳日:2023-04-27 16:52:40 公開日:2023-04-25
# GULP: SMS通知と機械学習画像処理を備えたソーラーパワーのスマートガベージセグメンテーションビン GULP: Solar-Powered Smart Garbage Segregation Bins with SMS Notification and Machine Learning Image Processing ( http://arxiv.org/abs/2304.13040v1 ) ライセンス: Link先を確認	Jerome B. Sigongan, Hamer P. Sinodlay, Shahida Xerxy P. Cuizon, Joanna S. Redondo, Maricel G. Macapulay, Charlene O. Bulahan-Undag and Kenn Migan Vincent C. Gumonan	(参考訳) 本研究は, 廃棄物をそれぞれの容器に分離するスマートビンを構築することを目的としている。廃棄物管理プロセスをエンドユーザーにとってより面白くし、スマートビンを降ろす必要があるときにユーティリティスタッフに通知し、再生可能太陽エネルギー源を利用して環境にやさしいスマートビンを奨励する。研究者たちは、チームがワークロードをうまく管理でき、割り当てられた予算内に留まらずに最高の製品を作ることができるため、アジャイル開発アプローチを採用した。 6つの基本的なフェーズは、計画、設計、開発、テスト、リリース、フィードバックです。 iso/iec 25010による全体的な品質テストの結果は肯定的な結果となった。全体の平均は4.55で、これは口頭で素晴らしいと解釈される。さらに、このアプリケーションは太陽エネルギー源と独立して動作することができる。ユーザは、その興味深いメカニズムを通じて廃棄物処理の全過程を楽しんだ。以上の結果から, コンプレッサーは, ごみレベルが最大値に達すると圧縮し, より多くのゴミを収容できる部屋を造ることを推奨した。同時に複数のガベージを判定するアルゴリズムも推奨されている。ソーラーパネルとソーラーパネルを組み合わせることで、スマートビンの再生可能エネルギーを増やすことができる。 This study intends to build a smartbin that segregates solid waste into its respective bins. To make the waste management process more interesting for the end-users; to notify the utility staff when the smart bin needs to be unloaded; to encourage an environment-friendly smart bin by utilizing renewable solar energy source. The researchers employed an Agile Development approach because it enables teams to manage their workloads successfully and create the highest-quality product while staying within their allocated budget. The six fundamental phases are planning, design, development, test, release, and feedback. The Overall quality testing result that was provided through the ISO/IEC 25010 evaluation which concludes a positive outcome. The overall average was 4.55, which is verbally interpreted as excellent. Additionally, the application can also independently run with its solar energy source. Users were able to enjoy the whole process of waste disposal through its interesting mechanisms. Based on the findings, a compressor is recommended to compress the trash when the trash level reaches its maximum point to create more rooms for more garbage. An algorithm to determine multiple garbage at a time is also recommended. Adding a solar tracker coupled with solar panel will help produce more renewable energy for the smart bin.	翻訳日:2023-04-27 16:52:19 公開日:2023-04-25
# Raspberry Piのディープラーニングモデル最適化 Optimizing Deep Learning Models For Raspberry Pi ( http://arxiv.org/abs/2304.13039v1 ) ライセンス: Link先を確認	Salem Ameen and Kangaranmulle Siriwardana and Theo Theodoridis	(参考訳) ディープラーニングモデルは、コンピュータビジョン、自然言語処理、音声認識など、幅広いアプリケーションで広く普及しています。しかし、これらのモデルは通常、大量の計算リソースを必要とするため、raspberry piのような低消費電力デバイスでの実行は困難である。この課題に対処する1つのアプローチは、プルーニング技術を使用してディープラーニングモデルのサイズを減らすことだ。プルーニングは、重要でない重みと接続をモデルから取り除き、より小さく、より効率的なモデルをもたらす。プルーニングはトレーニング中またはモデルがトレーニングされた後に行うことができる。もう1つのアプローチは、特にraspberry piアーキテクチャのためにディープラーニングモデルを最適化することです。これには、モデルのアーキテクチャとパラメータを最適化して、CPUやGPUなどのRaspberry Piのハードウェア機能を活用することが含まれる。さらに、モデルに必要な計算量を最小化することで、エネルギー効率に最適化することができる。 raspberry pi用のディープラーニングモデルのプルーニングと最適化は、低消費電力デバイスの計算とエネルギーの制約を克服する上で有効であり、幅広いデバイスでディープラーニングモデルを実行できる。以下の節では、これらのアプローチをさらに詳細に検討し、Raspberry Piのディープラーニングモデルを最適化する効果について論じる。 Deep learning models have become increasingly popular for a wide range of applications, including computer vision, natural language processing, and speech recognition. However, these models typically require large amounts of computational resources, making them challenging to run on low-power devices such as the Raspberry Pi. One approach to addressing this challenge is to use pruning techniques to reduce the size of the deep learning models. Pruning involves removing unimportant weights and connections from the model, resulting in a smaller and more efficient model. Pruning can be done during training or after the model has been trained. Another approach is to optimize the deep learning models specifically for the Raspberry Pi architecture. This can include optimizing the model's architecture and parameters to take advantage of the Raspberry Pi's hardware capabilities, such as its CPU and GPU. Additionally, the model can be optimized for energy efficiency by minimizing the amount of computation required. Pruning and optimizing deep learning models for the Raspberry Pi can help overcome the computational and energy constraints of low-power devices, making it possible to run deep learning models on a wider range of devices. In the following sections, we will explore these approaches in more detail and discuss their effectiveness for optimizing deep learning models for the Raspberry Pi.	翻訳日:2023-04-27 16:51:59 公開日:2023-04-25
# 拡散確率モデルに基づく高精度・高自由度メタ表面逆設計 Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design ( http://arxiv.org/abs/2304.13038v1 ) ライセンス: Link先を確認	Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li	(参考訳) 従来のメタ原子設計は、全波シミュレーションを用いた研究者の事前知識と試行錯誤検索に重きを置き、結果として時間の消費と非効率なプロセスを生み出す。進化アルゴリズムやトポロジカル最適化といった最適化アルゴリズムに基づく逆設計法がメタマテリアルの設計に導入されている。しかし、これらのアルゴリズムはいずれも多目的タスクを満足するほど一般的ではない。近年, メタマテリアルの逆設計にGAN(Generative Adversarial Networks)で表される深層学習法が適用されており, Sパラメータ要求に基づいて, 直接的に自由度の高いメタ原子を生成することができる。しかし、gansの敵対的な訓練プロセスはネットワークを不安定にさせ、高いモデリングコストをもたらす。本稿では拡散確率理論に基づく新しいメタマテリアル逆設計法を提案する。元の構造をガウス分布に変換するマルコフ過程を学習することにより、ガウス分布から徐々にノイズを除去し、Sパラメータ条件を満たす新しい高次自由度メタ原子を生成することができる。モデル収束速度, 生成精度, 品質の観点から, 提案手法はGANの代表的な手法よりも優れていることが実証された。 Conventional meta-atom designs rely heavily on researchers' prior knowledge and trial-and-error searches using full-wave simulations, resulting in time-consuming and inefficient processes. Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials. However, none of these algorithms are general enough to fulfill multi-objective tasks. Recently, deep learning methods represented by Generative Adversarial Networks (GANs) have been applied to inverse design of metamaterials, which can directly generate high-degree-of-freedom meta-atoms based on S-parameter requirements. However, the adversarial training process of GANs makes the network unstable and results in high modeling costs. This paper proposes a novel metamaterial inverse design method based on the diffusion probability theory. By learning the Markov process that transforms the original structure into a Gaussian distribution, the proposed method can gradually remove the noise starting from the Gaussian distribution and generate new high-degree-of-freedom meta-atoms that meet S-parameter conditions, which avoids the model instability introduced by the adversarial training process of GANs and ensures more accurate and high-quality generation results. Experiments have proven that our method is superior to representative methods of GANs in terms of model convergence speed, generation accuracy, and quality.	翻訳日:2023-04-27 16:51:40 公開日:2023-04-25
# veml:大規模高次元データのためのエンドツーエンド機械学習ライフサイクル VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data ( http://arxiv.org/abs/2304.13037v1 ) ライセンス: Link先を確認	Van-Duc Le	(参考訳) エンドツーエンドの機械学習(ML)ライフサイクルは、データ準備やMLモデル設計からモデルトレーニング、そして推論のためのトレーニングされたモデルのデプロイに至るまで、多くの反復プロセスで構成されている。 ML問題のためのエンドツーエンドライフサイクルを構築する場合、多くのMLパイプラインを設計して実行し、多数のライフサイクルバージョンを生成する必要がある。そこで本稿では,エンドツーエンドMLライフサイクル専用のバージョン管理システムであるVeMLを紹介する。我々のシステムは、他のシステムが解決していないいくつかの重要な問題に取り組む。まず、特に大規模かつ高次元のデータセットにおいて、MLライフサイクルを構築するための高コストに対処する。我々は、システム内で管理されている類似データセットのライフサイクルを、新しいトレーニングデータに転送することで、この問題を解決する。大規模・高次元データの類似性を効率的に計算するためのコアセットに基づくアルゴリズムを設計する。もうひとつの重要な問題は、トレーニングデータとML寿命中のテストデータの違いによるモデルの精度低下であり、リカバリにつながる。このシステムは、テストデータからラベル付きデータを取得し、新しいデータバージョンのmlライフサイクルを再構築することなく、このミスマッチを検出するのに役立ちます。本研究は,運転画像と時空間センサデータを用いた実世界の大規模データセット実験を行い,有望な結果を示す。 An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data efficiently. Another critical issue is the model accuracy degradation by the difference between training data and testing data during the ML lifetime, which leads to lifecycle rebuild. Our system helps to detect this mismatch without getting labeled data from testing data and rebuild the ML lifecycle for a new data version. To demonstrate our contributions, we conduct experiments on real-world, large-scale datasets of driving images and spatiotemporal sensor data and show promising results.	翻訳日:2023-04-27 16:51:17 公開日:2023-04-25
# 離散化すべき時:離散化連続問題におけるアルゴリズムの性能解析 When to be Discrete: Analyzing Algorithm Performance on Discretized Continuous Problems ( http://arxiv.org/abs/2304.13117v1 ) ライセンス: Link先を確認	Andr\'e Thomaser and Jacob de Nobel and Diederick Vermetten and Furong Ye and Thomas B\"ack and Anna V. Kononova	(参考訳) 最適化問題の領域は、その最も重要な特徴の1つと見なされる。特に、連続最適化と離散最適化の区別は、かなり影響が大きい。これに基づいて、最適化アルゴリズム、解析方法等が特定される。しかし実際には,真に連続した問題はありません。これが問題の計算限界やより具体的な性質によって引き起こされるかどうかに関わらず、ほとんどの変数は有限分解能を持つ。本研究では,連続変数の解法の概念を用いて,問題と連続領域を区別する。この解像度が連続最適化アルゴリズムの性能に与える影響について検討する。整数空間へのマッピングを通じて、これら連続最適化器と全く同じ問題に対する離散アルゴリズムを比較することができる。問題に離散化を追加すると、標準$(\mu_W, \lambda)$-CMA-ESが失敗することを示す。 The domain of an optimization problem is seen as one of its most important characteristics. In particular, the distinction between continuous and discrete optimization is rather impactful. Based on this, the optimizing algorithm, analyzing method, and more are specified. However, in practice, no problem is ever truly continuous. Whether this is caused by computing limits or more tangible properties of the problem, most variables have a finite resolution. In this work, we use the notion of the resolution of continuous variables to discretize problems from the continuous domain. We explore how the resolution impacts the performance of continuous optimization algorithms. Through a mapping to integer space, we are able to compare these continuous optimizers to discrete algorithms on the exact same problems. We show that the standard $(\mu_W, \lambda)$-CMA-ES fails when discretization is added to the problem.	翻訳日:2023-04-27 16:44:29 公開日:2023-04-25
# avface: 視聴覚4次元顔再建に向けて AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction ( http://arxiv.org/abs/2304.13115v1 ) ライセンス: Link先を確認	Aggelina Chatziagapi, Dimitris Samaras	(参考訳) 本研究では,モノクロ映像からの4次元顔再構成問題に対するマルチモーダル・ソリューションを提案する。 2次元画像からの3次元顔の再構成は、深さのあいまいさによる制約の少ない問題である。最先端の手法は、単一の画像やビデオからの視覚情報を活用してこの問題を解決しようとするが、3dメッシュアニメーションのアプローチはオーディオに依存している。しかし、ほとんどのケース(例えばAR/VRアプリケーション)では、ビデオには視覚情報と音声情報の両方が含まれている。本研究では,任意の話者の4次元顔と唇の動きを,訓練に3次元的真実を必要とせず正確に再構成するAVFaceを提案する。粗いステージは、3次元の変形可能なモデルのフレームあたりのパラメータを推定し、続いて唇の精製を行い、さらに細かいステージは顔の幾何学的詳細を復元する。トランスフォーマティブ・モジュールによってキャプチャされた時間的音声と映像情報により,どちらのモダリティも不十分な場合(顔のオクルージョンなど)ではロバストな手法である。大規模定性的・定量的評価は,本手法が現状よりも優れていることを示す。 In this work, we present a multimodal solution to the problem of 4D face reconstruction from monocular videos. 3D face reconstruction from 2D images is an under-constrained problem due to the ambiguity of depth. State-of-the-art methods try to solve this problem by leveraging visual information from a single image or video, whereas 3D mesh animation approaches rely more on audio. However, in most cases (e.g. AR/VR applications), videos include both visual and speech information. We propose AVFace that incorporates both modalities and accurately reconstructs the 4D facial and lip motion of any speaker, without requiring any 3D ground truth for training. A coarse stage estimates the per-frame parameters of a 3D morphable model, followed by a lip refinement, and then a fine stage recovers facial geometric details. Due to the temporal audio and video information captured by transformer-based modules, our method is robust in cases when either modality is insufficient (e.g. face occlusions). Extensive qualitative and quantitative evaluation demonstrates the superiority of our method over the current state-of-the-art.	翻訳日:2023-04-27 16:44:19 公開日:2023-04-25
# BO-ICP:ベイズ最適化に基づく反復閉点の初期化 BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization ( http://arxiv.org/abs/2304.13114v1 ) ライセンス: Link先を確認	Harel Biggie, Andrew Beathard, Christoffer Heckman	(参考訳) イテレーティブ・クローズト・ポイント(ICP)のような点群登録のための典型的なアルゴリズムは、2つの点群間で良好な初期変換推定を必要とする。この開始条件を選択するための最先端の手法は、確率的サンプリングやブランチやバウンドなどの大域的最適化技術に依存している。本研究では,ベイズ最適化に基づく重要な初期ICP変換の探索手法を提案する。提案手法は,オフラインマップ構築などの実行環境において,高速な結果の検索と改善を行うアルゴリズムの汎用性を強調した3つの異なる構成を提供する。実験は一般的なデータセット上で行われ、同様の計算時間を与えると最先端の手法よりも優れた結果が得られることを示す。さらに、ICPベースのメソッドの開始点である初期変換の選択にのみ焦点をあてているため、ICPの他の改善とも互換性がある。 Typical algorithms for point cloud registration such as Iterative Closest Point (ICP) require a favorable initial transform estimate between two point clouds in order to perform a successful registration. State-of-the-art methods for choosing this starting condition rely on stochastic sampling or global optimization techniques such as branch and bound. In this work, we present a new method based on Bayesian optimization for finding the critical initial ICP transform. We provide three different configurations for our method which highlights the versatility of the algorithm to both find rapid results and refine them in situations where more runtime is available such as offline map building. Experiments are run on popular data sets and we show that our approach outperforms state-of-the-art methods when given similar computation time. Furthermore, it is compatible with other improvements to ICP, as it focuses solely on the selection of an initial transform, a starting point for all ICP-based methods.	翻訳日:2023-04-27 16:44:00 公開日:2023-04-25
# 限定CSIを用いたTHzビーム探索のための深部強化学習 Federated Deep Reinforcement Learning for THz-Beam Search with Limited CSI ( http://arxiv.org/abs/2304.13109v1 ) ライセンス: Link先を確認	Po-Chun Hsu, Li-Hsiang Shen, Chun-Hung Liu, and Kai-Ten Feng	(参考訳) 超広帯域でのテラヘルツ(THz)通信は、次世代無線ネットワークにおける高データレートの厳密な要求を実現するための有望な技術であるが、その高度な伝搬減衰は、実際にの実装を著しく妨げている。 thz信号の重大伝搬減衰を効果的に克服するために、大規模アンテナアレイのビーム方向を見つけることは、圧迫の必要である。本稿では,携帯電話ネットワーク上でエッジサーバが協調する複数の基地局(BS)のTHzビーム探索を高速に行うためのFDRL(Federated Deep reinforcement Learning)を提案する。全てのBSはDDPG(Deep Deterministic Policy gradient)ベースのDRLを実行し、限られたチャネル状態情報(CSI)を持つTHzビームフォーミングポリシーを得る。彼らは、細胞間干渉を軽減するために、隠された情報でDDPGモデルを更新する。我々は,THz CSIとDDPGの隠れニューロンの採用により,セルネットワークのスループットが向上できることを実証した。また、部分モデル更新によるFDRLは、フルモデル更新によるFDRLと同じ性能をほぼ達成できることを示し、部分モデルアップロードによるエッジサーバとBS間の通信負荷を低減する効果的な手段を示す。さらに、提案したFDRLは、従来の非学習ベースおよび既存の非FDRLベンチマーク最適化手法よりも優れている。 Terahertz (THz) communication with ultra-wide available spectrum is a promising technique that can achieve the stringent requirement of high data rate in the next-generation wireless networks, yet its severe propagation attenuation significantly hinders its implementation in practice. Finding beam directions for a large-scale antenna array to effectively overcome severe propagation attenuation of THz signals is a pressing need. This paper proposes a novel approach of federated deep reinforcement learning (FDRL) to swiftly perform THz-beam search for multiple base stations (BSs) coordinated by an edge server in a cellular network. All the BSs conduct deep deterministic policy gradient (DDPG)-based DRL to obtain THz beamforming policy with limited channel state information (CSI). They update their DDPG models with hidden information in order to mitigate inter-cell interference. We demonstrate that the cell network can achieve higher throughput as more THz CSI and hidden neurons of DDPG are adopted. We also show that FDRL with partial model update is able to nearly achieve the same performance of FDRL with full model update, which indicates an effective means to reduce communication load between the edge server and the BSs by partial model uploading. Moreover, the proposed FDRL outperforms conventional non-learning-based and existing non-FDRL benchmark optimization methods.	翻訳日:2023-04-27 16:43:46 公開日:2023-04-25
# WiFi CSIを用いたデバイスレスマルチルーム人間プレゼンス検出のための時間選択型RNN Time-Selective RNN for Device-Free Multi-Room Human Presence Detection Using WiFi CSI ( http://arxiv.org/abs/2304.13107v1 ) ライセンス: Link先を確認	Fang-Yu Chu, Li-Hsiang Shen, An-Hung Hsiao, Kai-Ten Feng	(参考訳) 人間の存在検出は、ホームオートメーション、セキュリティ、医療など、さまざまなアプリケーションにとって重要な技術である。カメラベースのシステムは伝統的にこの目的で使われてきたが、プライバシーの懸念が高まる。この問題に対処するため、最近の研究では、商用WiFiアクセスポイント(AP)から抽出し、詳細なチャネル特性を提供するチャネル状態情報(CSI)アプローチについて検討している。本稿では,tcd-fern(time-selective conditional dual feature extract recurrent network)を用いたマルチルームシナリオのためのデバイスフリーな人間存在検出システムを提案する。本システムは、動的かつ静的なデータ前処理技術を用いて、現在の人間の特徴を条件付きで有意な時間的特徴を捉え、人の移動や空間的特徴を抽出し、視線遮断(LoS)経路とノンブロッキングケースを区別するように設計されている。部屋分割による特徴減衰問題を緩和するため,投票方式を採用した。提案するTD-FERNシステムは,コモディティなWiFi APの少ないマルチルームシナリオに対して,人間の存在検出を実現することができることを示すため,評価および実時間実験を行った。 Human presence detection is a crucial technology for various applications, including home automation, security, and healthcare. While camera-based systems have traditionally been used for this purpose, they raise privacy concerns. To address this issue, recent research has explored the use of channel state information (CSI) approaches that can be extracted from commercial WiFi access points (APs) and provide detailed channel characteristics. In this thesis, we propose a device-free human presence detection system for multi-room scenarios using a time-selective conditional dual feature extract recurrent Network (TCD-FERN). Our system is designed to capture significant time features with the condition on current human features using a dynamic and static (DaS) data preprocessing technique to extract moving and spatial features of people and differentiate between line-of-sight (LoS) path blocking and non-blocking cases. To mitigate the feature attenuation problem caused by room partitions, we employ a voting scheme. We conduct evaluation and real-time experiments to demonstrate that our proposed TCD-FERN system can achieve human presence detection for multi-room scenarios using fewer commodity WiFi APs.	翻訳日:2023-04-27 16:43:27 公開日:2023-04-25
# 屋内Wi-Fiを用いたデバイス不要な壁面位置検出のための注意深度学習 Attention-Enhanced Deep Learning for Device-Free Through-the-Wall Presence Detection Using Indoor WiFi System ( http://arxiv.org/abs/2304.13105v1 ) ライセンス: Link先を確認	Li-Hsiang Shen, Kuan-I Lu, An-Hung Hsiao and Kai-Ten Feng	(参考訳) 屋内環境における人的存在の正確な検出は,エネルギー管理やセキュリティなど,様々な用途において重要である。本稿では,WiFi信号のチャネル状態情報(CSI)を用いた人間の存在検知システムを提案する。本システムでは,CSIデータから情報サブキャリアを自動選択するためのアテンション・エンハンスド・ディープ・ラーニング(ALPD)と,CSIにおける時間的依存を捉えるための双方向長短期記憶(LSTM)ネットワークを利用する。さらに、静的な状態における人間の存在検出の精度を向上させるために静的な特徴を利用する。提案するALPDシステムは,CSIデータセットを収集するための一対のWiFiアクセスポイント(AP)をデプロイすることで評価し,さらにいくつかのベンチマークと比較した。その結果,alpdシステムは,特に干渉の有無において,精度の点でベンチマークを上回っていることがわかった。さらに、双方向送信データは、安定性と精度の向上、およびトレーニング用データ収集のコスト削減の訓練に有用である。提案するALPDシステムは,WiFi CSI信号を用いた人的存在検出において有望な結果を示す。 Accurate detection of human presence in indoor environments is important for various applications, such as energy management and security. In this paper, we propose a novel system for human presence detection using the channel state information (CSI) of WiFi signals. Our system named attention-enhanced deep learning for presence detection (ALPD) employs an attention mechanism to automatically select informative subcarriers from the CSI data and a bidirectional long short-term memory (LSTM) network to capture temporal dependencies in CSI. Additionally, we utilize a static feature to improve the accuracy of human presence detection in static states. We evaluate the proposed ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI dataset, which is further compared with several benchmarks. The results demonstrate that our ALPD system outperforms the benchmarks in terms of accuracy, especially in the presence of interference. Moreover, bidirectional transmission data is beneficial to training improving stability and accuracy, as well as reducing the costs of data collection for training. Overall, our proposed ALPD system shows promising results for human presence detection using WiFi CSI signals.	翻訳日:2023-04-27 16:43:06 公開日:2023-04-25
# LSTMによるマイクログリッドの騒音発生に対するロバスト性予測 LSTM-based Load Forecasting Robustness Against Noise Injection Attack in Microgrid ( http://arxiv.org/abs/2304.13104v1 ) ライセンス: Link先を確認	Amirhossein Nazeri and Pierluigi Pisu	(参考訳) 本稿では,マイクログリッドにおける負荷予測のためのノイズ注入攻撃に対するLSTMニューラルネットワークの堅牢性について検討する。 LSTMモデルの性能は、異なるSNRを持つブラックボックスガウスノイズアタックの下で検討する。攻撃者はlstmモデルの入力データのみにアクセスすると仮定される。その結果,ノイズアタックはLSTMモデルの性能に影響を及ぼすことがわかった。負荷予測は、健全な予測では絶対誤差(MAE)が0.047MWであり、SNR=6dBのガウスノイズ挿入では0.097MWとなる。 LSTMモデルをノイズアタックに対して堅牢化するために、モデル入力に最適カットオフ周波数の低域フィルタを適用し、ノイズアタックを除去する。フィルタは、SNRが低い場合にはより良く、小さなノイズに対しては期待できない。 In this paper, we investigate the robustness of an LSTM neural network against noise injection attacks for electric load forecasting in an ideal microgrid. The performance of the LSTM model is investigated under a black-box Gaussian noise attack with different SNRs. It is assumed that attackers have just access to the input data of the LSTM model. The results show that the noise attack affects the performance of the LSTM model. The load prediction means absolute error (MAE) is 0.047 MW for a healthy prediction, while this value increases up to 0.097 MW for a Gaussian noise insertion with SNR= 6 dB. To robustify the LSTM model against noise attack, a low-pass filter with optimal cut-off frequency is applied at the model's input to remove the noise attack. The filter performs better in case of noise with lower SNR and is less promising for small noises.	翻訳日:2023-04-27 16:42:43 公開日:2023-04-25
# HyMo:新しいマルチモードハイブリッドモデルによるスマートコントラクトの脆弱性検出 HyMo: Vulnerability Detection in Smart Contracts using a Novel Multi-Modal Hybrid Model ( http://arxiv.org/abs/2304.13103v1 ) ライセンス: Link先を確認	Mohammad Khodadadi, Jafar Tahmoresnezhad (1) ((1) Department of IT & Computer Engineering, Urmia University of Technology, Or\=um\=iyeh, Iran)	(参考訳) ブロックチェーン技術が急速に進歩し、金融、ヘルスケア、保険、ゲームなど、多くの業界でスマートコントラクトが一般的なツールになりつつある。スマートコントラクトの数は増えており、同時にスマートコントラクトのセキュリティは、スマートコントラクトの脆弱性によって引き起こされる金銭的損失により、かなりの注目を集めている。既存の分析技術は、多くのスマートコントラクトセキュリティの欠陥を識別できるが、専門家によって確立された厳格な基準に頼りすぎており、スマートコントラクトの複雑さが高まるにつれて検出プロセスがはるかに時間がかかる。本稿では,HyMoをマルチモーダルハイブリッド深層学習モデルとして提案し,多モード性を考慮した各種入力表現と,BiGRU深層学習技術を用いて各単語を文字のn-gramとして表現するFastText単語埋め込みを,スマートコントラクトの脆弱性検出において高精度な2つのGRUからなるシーケンス処理モデルとして提案する。このモデルは、さまざまなディープラーニングモデルを使用して機能を収集し、スマートコントラクトの脆弱性を特定する。 scrawldのような現在公開されているデータセットに関する一連の研究を通じて、当社のハイブリッドhymoモデルはスマートコントラクト脆弱性検出性能に優れています。したがってHyMoは、他のアプローチに対するスマートコントラクトの脆弱性をよりよく検出する。 With blockchain technology rapidly progress, the smart contracts have become a common tool in a number of industries including finance, healthcare, insurance and gaming. The number of smart contracts has multiplied, and at the same time, the security of smart contracts has drawn considerable attention due to the monetary losses brought on by smart contract vulnerabilities. Existing analysis techniques are capable of identifying a large number of smart contract security flaws, but they rely too much on rigid criteria established by specialists, where the detection process takes much longer as the complexity of the smart contract rises. In this paper, we propose HyMo as a multi-modal hybrid deep learning model, which intelligently considers various input representations to consider multimodality and FastText word embedding technique, which represents each word as an n-gram of characters with BiGRU deep learning technique, as a sequence processing model that consists of two GRUs to achieve higher accuracy in smart contract vulnerability detection. The model gathers features using various deep learning models to identify the smart contract vulnerabilities. Through a series of studies on the currently publicly accessible dataset such as ScrawlD, we show that our hybrid HyMo model has excellent smart contract vulnerability detection performance. Therefore, HyMo performs better detection of smart contract vulnerabilities against other approaches.	翻訳日:2023-04-27 16:42:20 公開日:2023-04-25
# サーロゲート勾配で学習したスパイクニューラルネットワークの表現について Uncovering the Representation of Spiking Neural Networks Trained with Surrogate Gradient ( http://arxiv.org/abs/2304.13098v1 ) ライセンス: Link先を確認	Yuhang Li, Youngeun Kim, Hyoungseob Park, Priyadarshini Panda	(参考訳) スパイキングニューラルネットワーク(snn)は、その生体適合性とエネルギー効率のために次世代ニューラルネットワークの候補として認識される。近年、SNNは、代理勾配トレーニングを用いて画像認識タスクにおいて、ほぼ最先端のパフォーマンスを達成できることを示した。サーロゲート勾配で訓練されたsnsは、従来のニューラルネットワーク(anns)とは異なる表現を学んでいるのだろうか? SNNの時間次元は独特な表現力を提供するか? 本稿では,中心核アライメント(cka)を用いて,snsとann間の表現類似性解析を行うことにより,これらの質問に答える。まず、幅と深さの両方を含むネットワークの空間次元を分析する。さらに, 残差接続の解析により, SNN は周期パターンを学習し, SNN の表現を ANN 様に修正することを示した。さらに, 時間次元がSNN表現に与える影響についても検討し, より深い層が時間次元に沿ってより動的に作用することを示した。また、イベントストリームデータや敵攻撃などの入力データの影響についても検討する。我々の研究は、SNNにおける表現の新しい発見のホストを明らかにする。この研究が将来の研究に刺激を与え、SNNの表現力を完全に理解することを願っている。コードはhttps://github.com/Intelligent-Computing-Lab-Yale/SNNCKAで公開されている。 Spiking Neural Networks (SNNs) are recognized as the candidate for the next-generation neural networks due to their bio-plausibility and energy efficiency. Recently, researchers have demonstrated that SNNs are able to achieve nearly state-of-the-art performance in image recognition tasks using surrogate gradient training. However, some essential questions exist pertaining to SNNs that are little studied: Do SNNs trained with surrogate gradient learn different representations from traditional Artificial Neural Networks (ANNs)? Does the time dimension in SNNs provide unique representation power? In this paper, we aim to answer these questions by conducting a representation similarity analysis between SNNs and ANNs using Centered Kernel Alignment (CKA). We start by analyzing the spatial dimension of the networks, including both the width and the depth. Furthermore, our analysis of residual connections shows that SNNs learn a periodic pattern, which rectifies the representations in SNNs to be ANN-like. We additionally investigate the effect of the time dimension on SNN representation, finding that deeper layers encourage more dynamics along the time dimension. We also investigate the impact of input data such as event-stream data and adversarial attacks. Our work uncovers a host of new findings of representations in SNNs. We hope this work will inspire future research to fully comprehend the representation power of SNNs. Code is released at https://github.com/Intelligent-Computing-Lab-Yale/SNNCKA.	翻訳日:2023-04-27 16:41:43 公開日:2023-04-25
# ビデオ品質評価モデルをビット深度にロバストにする Making Video Quality Assessment Models Robust to Bit Depth ( http://arxiv.org/abs/2304.13092v1 ) ライセンス: Link先を確認	Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman and Alan C. Bovik	(参考訳) 標準ダイナミックレンジ (sdr) ビデオ用に設計されたビデオ品質評価 (vqa) アルゴリズムに含まれるhdrmax機能と呼ばれる新しい機能セットを導入することで、これらのアルゴリズムによって不適切に説明される高ダイナミックレンジ (hdr) ビデオの歪みに感応する。これらの特徴はHDRに特有ではなく、SDRコンテンツ上でのVQAモデルの等価性予測性能を増大させるが、特にHDRに有効である。 hdrmaxの特徴は、自然ビデオ統計(nvs)モデルから引き出された強力な前処理を、映像の最も明るく暗い部分に影響を与える可測性を高め、既存のvqaモデルによって説明されない歪みを捉えることである。提案手法の有効性の実証として,現状のVQAモデルでは10ビットのHDRデータベースでは性能が低かったが,HDRMAXと10ビットの歪みビデオでテストした場合のHDRMAX機能の導入により性能が大幅に向上したことを示す。 We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range (HDR) videos that are inadequately accounted for by these algorithms. While these features are not specific to HDR, and also augment the equality prediction performances of VQA models on SDR content, they are especially effective on HDR. HDRMAX features modify powerful priors drawn from Natural Video Statistics (NVS) models by enhancing their measurability where they visually impact the brightest and darkest local portions of videos, thereby capturing distortions that are often poorly accounted for by existing VQA models. As a demonstration of the efficacy of our approach, we show that, while current state-of-the-art VQA models perform poorly on 10-bit HDR databases, their performances are greatly improved by the inclusion of HDRMAX features when tested on HDR and 10-bit distorted videos.	翻訳日:2023-04-27 16:41:07 公開日:2023-04-25
# 非一様超グラフ確率ブロックモデルの厳密な回復 Exact recovery for the non-uniform Hypergraph Stochastic Block Model ( http://arxiv.org/abs/2304.13139v1 ) ライセンス: Link先を確認	Ioana Dumitriu, Haixiao Wang	(参考訳) 非一様ハイパーグラフ確率ブロックモデル(hsbm)の下でのランダムハイパーグラフにおけるコミュニティ検出問題を考える。特に、K$クラスを持つモデルと対称二項モデル(K=2$)を考える。ここでの重要なポイントは、すべての均一な層から情報を集約することで、各層が単独では不可能に見える場合であっても、正確な回復が得られることである。しきい値以上の正確な回復を達成する2つの効率的なアルゴリズムが提供される。我々のアルゴリズムの理論的解析は、非一様ランダムハイパーグラフに対する隣接行列の濃度と正規化に依存しており、これは独立な関心を持つ可能性がある。またパラメータ知識と推定に関するオープンな問題にも対処する。 Consider the community detection problem in random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), where each hyperedge appears independently with some given probability depending only on the labels of its vertices. We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints; in particular, we consider the model with $K$ classes as well as the symmetric binary model ($K=2$). One crucial point here is that by aggregating information from all the uniform layers, we may obtain exact recovery even in cases when this may appear impossible if each layer were considered alone. Two efficient algorithms that successfully achieve exact recovery above the threshold are provided. The theoretical analysis of our algorithms relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which could be of independent interest. We also address some open problems regarding parameter knowledge and estimation.	翻訳日:2023-04-27 16:35:18 公開日:2023-04-25
# 決定時間計画のための更新等価フレームワーク The Update Equivalence Framework for Decision-Time Planning ( http://arxiv.org/abs/2304.13138v1 ) ライセンス: Link先を確認	Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown	(参考訳) 実行直前にポリシーを修正(あるいは構築)するプロセス – 決定時間計画(decisive-time planning)と呼ばれる – は、チェスやゴーといった完璧な情報設定で超人的なパフォーマンスを達成する上でキーとなる。最近の作業では、意思決定時間の計画をより一般的な不完全な情報設定に拡張し、ポーカーにおける超人的なパフォーマンスに繋がった。しかし,これらの手法では,非公開情報の量が多い場合には,そのサイズが急速に大きくなるサブゲームを考える必要がある。本稿では,サブゲームではなく,更新等価性の概念に基づく,意思決定時計画のための代替フレームワークを提案する。このフレームワークでは、決定時間計画アルゴリズムが同期学習アルゴリズムの更新をシミュレートする。この枠組みにより,公的な情報に依存しない意思決定時間計画手法を新たに導入し,非公的な情報量の多い設定において,健全かつ効果的な意思決定計画への扉を開くことができる。実験では、このファミリーのメンバーは、ハナビの最先端のアプローチと同等または優れた結果を生成し、3x3のAbrupt Dark HexとPhantom Tic-Tac-Toeのパフォーマンスを改善した。 The process of revising (or constructing) a policy immediately prior to execution -- known as decision-time planning -- is key to achieving superhuman performance in perfect-information settings like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information settings, leading to superhuman performance in poker. However, these methods requires considering subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on subgames but rather on the notion of update equivalence. In this framework, decision-time planning algorithms simulate updates of synchronous learning algorithms. This framework enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision-time planning in settings with large amounts of non-public information. In experiments, members of this family produce comparable or superior results compared to state-of-the-art approaches in Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.	翻訳日:2023-04-27 16:35:02 公開日:2023-04-25
# MEDNC:COVID-19診断のためのマルチアンサンブルディープニューラルネットワーク MEDNC: Multi-ensemble deep neural network for COVID-19 diagnosis ( http://arxiv.org/abs/2304.13135v1 ) ライセンス: Link先を確認	Lin Yang, Shuihua Wang, Yudong Zhang	(参考訳) 2019年の新型コロナウイルス(covid-19)は3年間世界中に広がったが、多くの地域の医療施設はまだ不十分だ。リスクの高い患者を特定し、限られた医療資源の使用を最大化するために、急速な新型コロナウイルスの診断が必要である。そこで本研究では,CT画像を用いたCOVID-19自動予測・診断のための深層学習フレームワーク MEDNC を提案する。当社のモデルは、2セットのCOVID-19データを使用してトレーニングされました。そして、トランスファーラーニングのインスピレーションを得て作られた。その結果、medncは新型コロナウイルスの感染の検出を大幅に強化し、それぞれ98.79%と99.82%の精度に達した。我々は脳腫瘍と血液細胞データセットを用いてMDDNCを試験し、我々のモデルが幅広い問題に適用可能であることを示した。その結果,提案モデルでそれぞれ99.39%,99.28%の精度が得られた。この新型コロナウイルス(COVID-19)認識ツールは、医療資源の最適化と、ウイルスのスクリーニング時に臨床医の負担軽減に役立つ。 Coronavirus disease 2019 (COVID-19) has spread all over the world for three years, but medical facilities in many areas still aren't adequate. There is a need for rapid COVID-19 diagnosis to identify high-risk patients and maximize the use of limited medical resources. Motivated by this fact, we proposed the deep learning framework MEDNC for automatic prediction and diagnosis of COVID-19 using computed tomography (CT) images. Our model was trained using two publicly available sets of COVID-19 data. And it was built with the inspiration of transfer learning. Results indicated that the MEDNC greatly enhanced the detection of COVID-19 infections, reaching an accuracy of 98.79% and 99.82% respectively. We tested MEDNC on a brain tumor and a blood cell dataset to show that our model applies to a wide range of problems. The outcomes demonstrated that our proposed models attained an accuracy of 99.39% and 99.28%, respectively. This COVID-19 recognition tool could help optimize healthcare resources and reduce clinicians' workload when screening for the virus.	翻訳日:2023-04-27 16:34:36 公開日:2023-04-25
# LAST: JAXにおけるスケーラブルな格子ベースの音声モデリング LAST: Scalable Lattice-Based Speech Modelling in JAX ( http://arxiv.org/abs/2304.13134v1 ) ライセンス: Link先を確認	Ke Wu, Ehsan Variani, Tom Bagby, Michael Riley	(参考訳) JAX で LAttice ベースの Speech Transducer ライブラリ LAST を紹介する。柔軟性、使いやすさ、スケーラビリティに重点を置いて、lastは、発話全体に対する認識格子のような大きなwfsaにスケールする \&推論のトレーニングに必要な微分可能重み付き有限状態オートマトン(wfsa)アルゴリズムを実装している。これらのWFSAアルゴリズムは文献でよく知られているが、現代のアーキテクチャのパフォーマンス特性や、自動微分におけるニュアンスから新たな課題が生じる。本稿では、これらの課題に対処するためにLASTで使用される一般的なテクニックのスイートを説明し、TPUv3とV100 GPUのベンチマークでその効果を実証する。 We introduce LAST, a LAttice-based Speech Transducer library in JAX. With an emphasis on flexibility, ease-of-use, and scalability, LAST implements differentiable weighted finite state automaton (WFSA) algorithms needed for training \& inference that scale to a large WFSA such as a recognition lattice over the entire utterance. Despite these WFSA algorithms being well-known in the literature, new challenges arise from performance characteristics of modern architectures, and from nuances in automatic differentiation. We describe a suite of generally applicable techniques employed in LAST to address these challenges, and demonstrate their effectiveness with benchmarks on TPUv3 and V100 GPU.	翻訳日:2023-04-27 16:34:20 公開日:2023-04-25
# 有向連鎖生成型逆ネットワーク Directed Chain Generative Adversarial Networks ( http://arxiv.org/abs/2304.13131v1 ) ライセンス: Link先を確認	Ming Min, Ruimeng Hu, Tomoyuki Ichiba	(参考訳) 実世界のデータは、コミュニティにおける意見のばらつきを記述するデータ、ニューロンのインタースパイク間隔分布、振動子自然周波数などのマルチモーダルな分散が可能である。マルチモーダル分散実世界のデータ生成は,GAN(Generative Adversarial Network)の課題となっている。例えば、無限次元GANとして扱われるニューラル確率微分方程式(Neural SDEs)は、主に単調時系列データを生成することに成功している。本稿では,方向連鎖型sdesのドリフトと拡散係数に分布制約のある時系列データセット(方向連鎖または入力の近傍プロセスと呼ばれる)を挿入する,方向連鎖gans (dc-gans) という新しい時系列生成器を提案する。 dc-gansは近隣プロセスと同じ分布の新しい時系列を生成することができ、近傍プロセスはマルチモーダルな分散時系列を学習し生成するための重要なステップを提供する。提案するdc-ganは,社会科学と計算神経科学の2つの確率モデルと,株価とエネルギー消費に関する実世界データセットを含む4つのデータセットで検討された。我々の知る限り、DC-GANは、マルチモーダル時系列データを生成し、分布、データ類似性、予測能力に関して、常に最先端のベンチマークを上回ります。 Real-world data can be multimodal distributed, e.g., data describing the opinion divergence in a community, the interspike interval distribution of neurons, and the oscillators natural frequencies. Generating multimodal distributed real-world data has become a challenge to existing generative adversarial networks (GANs). For example, neural stochastic differential equations (Neural SDEs), treated as infinite-dimensional GANs, have demonstrated successful performance mainly in generating unimodal time series data. In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints. DC-GANs can generate new time series of the same distribution as the neighborhood process, and the neighborhood process will provide the key step in learning and generating multimodal distributed time series. The proposed DC-GANs are examined on four datasets, including two stochastic models from social sciences and computational neuroscience, and two real-world datasets on stock prices and energy consumption. To our best knowledge, DC-GANs are the first work that can generate multimodal time series data and consistently outperforms state-of-the-art benchmarks with respect to measures of distribution, data similarity, and predictive ability.	翻訳日:2023-04-27 16:34:07 公開日:2023-04-25
# 接地型マルチモーダルプリトレーニングのための名前付きエンティティリッチキャプションのハイパーニミゼーション Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining ( http://arxiv.org/abs/2304.13130v1 ) ライセンス: Link先を確認	Giacomo Nebbia, Adriana Kovashka	(参考訳) 名前付きエンティティは、画像に自然に付随するテキスト、特にニュースやwikipediaの記事のようなドメインにおいてユビキタスである。これまでの研究で、wikipediaで事前トレーニングされ、名前付きエンティティフリーのベンチマークデータセットで評価された画像テキスト検索モデルの低パフォーマンスの理由として、名前付きエンティティが挙げられてきた。滅多に言及されないため、名前付きエンティティはモデル化が難しい場合がある。画像内の名前付きエンティティとオブジェクトの間のリンクは、モデルによって見逃されるかもしれないが、オブジェクトがより一般的な用語で言及された場合ではない。本研究では,複数モーダルモデルの事前学習やオープン語彙検出の微調整のための名前付きエンティティを扱う方法として,ハイパニミゼーションについて検討する。ハイパーnymizationを行うには,(1)概念の包括的オントロジーに依存する‘manual’パイプライン,(2)言語モデルを学習してハイパーnymizationを行う‘learned’アプローチの2つの方法を提案する。ウィキペディアやThe New York Timesのデータに関する実験を行っている。ハイパーnym化後の関心対象の事前学習性能の向上を報告し,特にトレーニング中に見ないクラスにおいて,オープンボキャブラリー検出におけるハイパーnym化の期待を示す。 Named entities are ubiquitous in text that naturally accompanies images, especially in domains such as news or Wikipedia articles. In previous work, named entities have been identified as a likely reason for low performance of image-text retrieval models pretrained on Wikipedia and evaluated on named entities-free benchmark datasets. Because they are rarely mentioned, named entities could be challenging to model. They also represent missed learning opportunities for self-supervised models: the link between named entity and object in the image may be missed by the model, but it would not be if the object were mentioned using a more common term. In this work, we investigate hypernymization as a way to deal with named entities for pretraining grounding-based multi-modal models and for fine-tuning on open-vocabulary detection. We propose two ways to perform hypernymization: (1) a ``manual'' pipeline relying on a comprehensive ontology of concepts, and (2) a ``learned'' approach where we train a language model to learn to perform hypernymization. We run experiments on data from Wikipedia and from The New York Times. We report improved pretraining performance on objects of interest following hypernymization, and we show the promise of hypernymization on open-vocabulary detection, specifically on classes not seen during training.	翻訳日:2023-04-27 16:33:41 公開日:2023-04-25
# 軌道角運動量を持つ光の方位バックフロー Azimuthal backflow in light carrying orbital angular momentum ( http://arxiv.org/abs/2304.13124v1 ) ライセンス: Link先を確認	Bohnishikha Ghosh, Anat Daniel, Bernard Gorzkowski, Radek Lapkiewicz	(参考訳) M.V.ベリーの研究(J. Phys. A: Math. Theor. 43, 415302 (2010))は、量子力学のバックフローと波動の超振動の対応を強調した。スーパーオシレーション(Superoscillations)は、スーパーポジションの局所的な振動が最も速いフーリエ成分よりも速い状況を指す。この概念は、光波の逆線形運動量の逆流を示すために使われてきた。本研究では、負の軌道角運動量しか持たない古典光の干渉を調べ、そのような干渉、正の局所軌道角運動量の暗い縁で観測する。この発見は光-物質相互作用の研究に影響を及ぼし、量子逆流を2次元で観測する過程を示す。 M.V. Berry's work [J. Phys. A: Math. Theor. 43, 415302 (2010)] highlighted the correspondence between backflow in quantum mechanics and superoscillations in waves. Superoscillations refer to situations where the local oscillation of a superposition is faster than its fastest Fourier component. This concept has been used to demonstrate backflow in transverse linear momentum for optical waves. In this work, we examine the interference of classical light carrying only negative orbital angular momentum and observe in the dark fringes of such an interference, positive local orbital angular momentum. This finding may have implications for the studies of light-matter interaction and represents a step towards observing quantum backflow in two dimensions.	翻訳日:2023-04-27 16:33:18 公開日:2023-04-25
# 外部ゲージ場結合量子力学:ゲージ選択、ハイゼンベルク代数表現とゲージ不変性、特にランダウ問題 External gauge field coupled quantum dynamics: gauge choices, Heisenberg algebra representations and gauge invariance in general, and the Landau problem in particular ( http://arxiv.org/abs/2304.13122v1 ) ライセンス: Link先を確認	Jan Govaerts (CP3, Univ. cath. Louvain, UCLouvain, Louvain-la-Neuve, Belgium)	(参考訳) 古典的運動方程式は不変のままであるが、作用が加法的な全微分あるいは発散項によって再定義されるとき(機械系の場合)、そのような変換は系の正準位相空間の定式化に非自明な結果をもたらす。これはさらに真実であり、特に量子系で使われているハイゼンベルク代数のユニタリ構成空間表現における誘導変換を含む、正準量子化力学にとってさらに微妙な方法である。背景ゲージ場と結合すると、そのような考察は、その古典的な外部背景ゲージ場のゲージ変換に対するシステムの量子力学の結果を適切に理解するために重要となるが、そのような変換の下では、系の自由度、抽象量子状態、量子力学は確実に厳密に不変である。一般的な文脈におけるこれらの異なる点の詳細な解析の後、これらは量子ランダウ問題の場合、古典的な背景磁気ベクトルポテンシャルを持つ量子ランダウ問題において、最も一般的なパラメトリッドゲージ選択をここで実施する。後者の議論は、量子系の磁気ベクトルポテンシャルに対するゲージ選択の状況に関する文献におけるいくつかの難解な言明を明らかにすることを目的としている。ランダウ問題における大域的時空対称性とそのゲージ不変なネーター電荷の役割も強調される。 Even though its classical equations of motion are then left invariant, when an action is redefined by an additive total derivative or divergence term (in time, in the case of a mechanical system) such a transformation induces nontrivial consequences for the system's canonical phase space formulation. This is even more true and then in more subtle ways for the canonically quantised dynamics, with in particular an induced transformation in the unitary configuration space representation of the Heisenberg algebra being used for the quantum system. When coupled to a background gauge field, such considerations become crucial for a proper understanding of the consequences for the system's quantum dynamics of gauge transformations of that classical external background gauge field, while under such transformations the system's degrees of freedom, abstract quantum states and quantum dynamics are certainly strictly invariant. After a detailed analysis of these different points in a general context, these are then illustrated specifically in the case of the quantum Landau problem with its classical external background magnetic vector potential for which the most general possible parametrised gauge choice is implemented herein. The latter discussion aims as well to clarify some perplexing statements in the literature regarding the status of gauge choices to be made for the magnetic vector potential for that quantum system. The role of the global space-time symmetries of the Landau problem and their gauge invariant Noether charges is then also emphasized.	翻訳日:2023-04-27 16:33:02 公開日:2023-04-25
# 機械学習によるイベントバイイベントドップラー補正を用いた高速・高温エキゾチック同位体の精密分光 Precision Spectroscopy of Fast, Hot Exotic Isotopes Using Machine Learning Assisted Event-by-Event Doppler Correction ( http://arxiv.org/abs/2304.13120v1 ) ライセンス: Link先を確認	Silviu-Marian Udrescu, Diego Alejandro Torres, Ronald Fernando Garcia Ruiz	(参考訳) 本研究では,高速エキゾチック同位体の高感度・高精度レーザー分光法を提案する。電場内を走行する原子の段階的共振イオン化を誘導し、その後イオン及び対応する電子を検出することにより、得られた粒子の時間及び位置感応測定を行うことができる。混合密度ネットワーク(mdn)を用いて、この情報を利用して個々の原子の初期エネルギーを予測し、観測された遷移周波数のドップラー補正を事象ごとに適用することができる。提案手法の数値シミュレーションを行い, 極低温で生成するイオンビームに対して, 最大10ドルkev, 非一様速度分布でkhzレベルの不確かさが得られることを示した。高度エネルギービームで直接飛行中の分光を行う能力は、冷却技術を必要としない高温で汚染の高い環境で、ミリ秒以下の寿命の短寿命同位体を研究する特別な機会を提供する。このような種は、核構造、天体物理学、新しい物理探索に顕著な関心を持っている。 We propose an experimental scheme for performing sensitive, high-precision laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise resonant ionization of the atoms travelling inside an electric field and subsequently detecting the ion and the corresponding electron, time- and position-sensitive measurements of the resulting particles can be performed. Using a Mixture Density Network (MDN), we can leverage this information to predict the initial energy of individual atoms and thus apply a Doppler correction of the observed transition frequencies on an event-by-event basis. We conduct numerical simulations of the proposed experimental scheme and show that kHz-level uncertainties can be achieved for ion beams produced at extreme temperatures ($> 10^8$ K), with energy spreads as large as $10$ keV and non-uniform velocity distributions. The ability to perform in-flight spectroscopy, directly on highly energetic beams, offers unique opportunities to studying short-lived isotopes with lifetimes in the millisecond range and below, produced in low quantities, in hot and highly contaminated environments, without the need for cooling techniques. Such species are of marked interest for nuclear structure, astrophysics, and new physics searches.	翻訳日:2023-04-27 16:32:35 公開日:2023-04-25
# 非線形チャネル補償用変圧器の光学系への応用 Application of Transformers for Nonlinear Channel Compensation in Optical Systems ( http://arxiv.org/abs/2304.13119v1 ) ライセンス: Link先を確認	Behnam Behinaein Hamgini, Hossein Najafi, Ali Bakhshali, and Zhuhong Zhang	(参考訳) 本稿では,トランスフォーマを用いたコヒーレント長距離伝送のための非線形チャネル等化手法を提案する。シンボル列にまたがるメモリに直接出席する能力により,並列化構造でトランスフォーマーを効果的に使用できることを示す。本稿では,非線形等化のためのトランスのエンコーダ部分を実装し,その性能を多種多様なハイパーパラメータで解析する。各繰り返しでシンボルのブロックを処理し、エンコーダの出力のサブセットを慎重に選択することにより、効率的な非線形補償を実現することができる。また,非線形摂動理論に触発された物理形状のマスクを用いて,トランスフォーマー非線形等化の計算複雑性を低減する手法を提案する。 In this paper, we introduce a new nonlinear channel equalization method for the coherent long-haul transmission based on Transformers. We show that due to their capability to attend directly to the memory across a sequence of symbols, Transformers can be used effectively with a parallelized structure. We present an implementation of encoder part of Transformer for nonlinear equalization and analyze its performance over a wide range of different hyper-parameters. It is shown that by processing blocks of symbols at each iteration and carefully selecting subsets of the encoder's output to be processed together, an efficient nonlinear compensation can be achieved. We also propose the use of a physic-informed mask inspired by nonlinear perturbation theory for reducing the computational complexity of Transformer nonlinear equalization.	翻訳日:2023-04-27 16:32:12 公開日:2023-04-25
# オートエンコーダを用いたSMAPパッシブラジオメータの電波干渉低減 Autoencoder-based Radio Frequency Interference Mitigation For SMAP Passive Radiometer ( http://arxiv.org/abs/2304.13158v1 ) ライセンス: Link先を確認	Ali Owfi, Fatemeh Afghah	(参考訳) 1400-1427mhz保護周波数帯域対電波干渉(rfi)で運用される受動空間型放射計。無線デバイスの成長と新しい技術の出現により、このスペクトルを他の技術と共有することは、これらの放射計により多くのRFIをもたらす可能性がある。このバンドは5g以降にとって理想的な中間帯域周波数であり、高い容量と良好なカバレッジを提供する。 SMAP (Soil Moisture Active Passive) における現在のRFI検出・緩和技術は、特に重度のRFIケースにおいて、貴重な情報が失われる原因となる汚染されたデータを正しく検出・破棄・フィルタリングすることに依存する。本稿では, 受信受信側で受信した汚染信号から, 潜在的に共存する地上ユーザ(例えば5G基地局)によって引き起こされる支配的RFIを除去し, 貴重な情報を保存し, 汚染データの破棄を防止する, 自己エンコーダに基づくRFI緩和手法を提案する。 Passive space-borne radiometers operating in the 1400-1427 MHz protected frequency band face radio frequency interference (RFI) from terrestrial sources. With the growth of wireless devices and the appearance of new technologies, the possibility of sharing this spectrum with other technologies would introduce more RFI to these radiometers. This band could be an ideal mid-band frequency for 5G and Beyond, as it offers high capacity and good coverage. Current RFI detection and mitigation techniques at SMAP (Soil Moisture Active Passive) depend on correctly detecting and discarding or filtering the contaminated data leading to the loss of valuable information, especially in severe RFI cases. In this paper, we propose an autoencoder-based RFI mitigation method to remove the dominant RFI caused by potential coexistent terrestrial users (i.e., 5G base station) from the received contaminated signal at the passive receiver side, potentially preserving valuable information and preventing the contaminated data from being discarded.	翻訳日:2023-04-27 16:25:16 公開日:2023-04-25
# HDR-ChipQA:高ダイナミックレンジ映像の非参照品質評価 HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos ( http://arxiv.org/abs/2304.13156v1 ) ライセンス: Link先を確認	Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman and Alan C. Bovik	(参考訳) 我々は,HDR-ChipQAと呼ぶハイダイナミックレンジ(HDR)ビデオのスタンドアウト性能を実現するノン参照ビデオ品質モデルとアルゴリズムを提案する。 HDRビデオは、標準ダイナミックレンジ(SDR)ビデオよりも幅広い輝度、詳細、色を表現している。大規模ビデオネットワークにおけるHDRの採用の増加により、HDRコンテンツの歪みを考慮に入れたビデオ品質評価(VQA)アルゴリズムの必要性が高まっている。特に、標準的なVQAモデルは、ダイナミックレンジの極端における顕著な歪みを捉えることができない。局所的な拡張的非線形性は、{local} luma範囲の上端と下端で発生する歪みを強調する新しいアプローチを導入し、別々の経路に沿って計算される付加的な品質認識特徴の定義を可能にした。これらの特徴はHDR特有のものではなく、SDR映像のVQAも改善されている。この前処理ステップは,hdrコンテンツの品質予測に用いる際に,歪みに敏感な自然ビデオ統計(nvs)機能のパワーを著しく高めている。同様の方法で、同じ非線形処理ステップを用いて、新しい広域色特徴を別々に計算する。当社のモデルがsdr vqaアルゴリズムを,公開されている唯一の包括的なhdrデータベースで大幅に上回っていると同時に,sdrコンテンツの最先端のパフォーマンスも達成していることが分かりました。 We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms that better account for distortions on HDR content. In particular, standard VQA models may fail to capture conspicuous distortions at the extreme ends of the dynamic range, because the features that drive them may be dominated by distortions {that pervade the mid-ranges of the signal}. We introduce a new approach whereby a local expansive nonlinearity emphasizes distortions occurring at the higher and lower ends of the {local} luma range, allowing for the definition of additional quality-aware features that are computed along a separate path. These features are not HDR-specific, and also improve VQA on SDR video contents, albeit to a reduced degree. We show that this preprocessing step significantly boosts the power of distortion-sensitive natural video statistics (NVS) features when used to predict the quality of HDR content. In similar manner, we separately compute novel wide-gamut color features using the same nonlinear processing steps. We have found that our model significantly outperforms SDR VQA algorithms on the only publicly available, comprehensive HDR database, while also attaining state-of-the-art performance on SDR content.	翻訳日:2023-04-27 16:24:57 公開日:2023-04-25
# LumiGAN:3D顔の無条件生成 LumiGAN: Unconditional Generation of Relightable 3D Human Faces ( http://arxiv.org/abs/2304.13153v1 ) ライセンス: Link先を確認	Boyang Deng, Yifan Wang, Gordon Wetzstein	(参考訳) 非構造化2次元画像データによる3次元顔の教師なし学習は活発な研究領域である。近年の作品は印象的なフォトリアリズムを達成しているが、通常は照明の制御が欠如しており、生成された資産が新しい環境に配備されることを防いでいる。そこで本稿では,3次元顔用無条件生成逆ネットワーク(gan)lumiganと,推定時に新たな照明下での照明を可能にする物理ベースの照明モジュールを提案する。以前の研究とは異なり、LumiGANは自己監督的な方法で学習された効率的な可視性定式化を用いて、現実的な影効果を生み出すことができる。 LumiGANは、表面の正常、拡散アルベド、地上の真実データのない特異なスズなど、可照性のある顔の物理的特性を生成する。再現性に加えて, 最新技術である3D GANと比較して幾何生成が著しく向上し, 既存の3D GANよりもフォトリアリズムが優れていた。 Unsupervised learning of 3D human faces from unstructured 2D image data is an active research area. While recent works have achieved an impressive level of photorealism, they commonly lack control of lighting, which prevents the generated assets from being deployed in novel environments. To this end, we introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D human faces with a physically based lighting module that enables relighting under novel illumination at inference time. Unlike prior work, LumiGAN can create realistic shadow effects using an efficient visibility formulation that is learned in a self-supervised manner. LumiGAN generates plausible physical properties for relightable faces, including surface normals, diffuse albedo, and specular tint without any ground truth data. In addition to relightability, we demonstrate significantly improved geometry generation compared to state-of-the-art non-relightable 3D GANs and notably better photorealism than existing relightable GANs.	翻訳日:2023-04-27 16:24:34 公開日:2023-04-25
# ロールドロップ:単一パラメータによる観測ノイズの計算 Roll-Drop: accounting for observation noise with a single parameter ( http://arxiv.org/abs/2304.13150v1 ) ライセンス: Link先を確認	Luigi Campanaro and Daniele De Martini and Siddhant Gangapurwala and Wolfgang Merkt and Ioannis Havoutis	(参考訳) 本稿では,シミュレーション中にドロップアウトすることで,各状態の分布を明示的にモデル化することなく,デプロイ時の観測ノイズを考慮し,drl (deep-reinforcement learning) におけるsim-to-real の簡易な戦略を提案する。 drlは、ロボットを高度にダイナミックでフィードバックベースの操作に制御するための有望なアプローチであり、正確なシミュレーターは、望ましい振る舞いを学ぶために安価で豊富なデータを提供するために不可欠である。それでもシミュレートされたデータはノイズがなく、一般的にはノイズの影響を受けやすい実機への展開に挑戦する分布シフトを示す。標準的な解決策は、後者をモデル化し、トレーニング中に注入することであり、完全なシステム識別が必要であるが、ロールドロップは単一のパラメータのみをチューニングすることで、センサノイズに対する堅牢性を高める。観測では,最大25%のノイズを注入した場合の80%の成功率を示し,ベースラインの2倍の堅牢性を示した。シミュレーションで訓練したコントローラをUnitree A1プラットフォーム上に展開し,この物理系におけるロバスト性の向上を評価した。 This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to providing cheap and abundant data to learn the desired behaviour. Nevertheless, the simulated data are noiseless and generally show a distributional shift that challenges the deployment on real machines where sensor readings are affected by noise. The standard solution is modelling the latter and injecting it during training; while this requires a thorough system identification, Roll-Drop enhances the robustness to sensor noise by tuning only a single parameter. We demonstrate an 80% success rate when up to 25% noise is injected in the observations, with twice higher robustness than the baselines. We deploy the controller trained in simulation on a Unitree A1 platform and assess this improved robustness on the physical system.	翻訳日:2023-04-27 16:24:15 公開日:2023-04-25
# 仮想アシスタントのための音声情報クエリのモデル化 : オープン問題,課題,機会 Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities ( http://arxiv.org/abs/2304.13149v1 ) ライセンス: Link先を確認	Christophe Van Gysel	(参考訳) バーチャルアシスタントは音声による情報検索プラットフォームとしてますます重要になりつつある。本稿では,仮想アシスタントのための音声情報クエリのモデル化に関するオープン問題と課題と,仮想アシスタント音声認識の品質向上のために情報検索手法と研究が適用できる機会の一覧について論じる。問合せドメイン分類,知識グラフ,ユーザインタラクションデータ,および問合せパーソナライゼーションが,音声情報ドメインクエリの正確な認識向上にどのように役立つかを論じる。最後に,音声認識における現状の問題点と課題について概説する。 Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.	翻訳日:2023-04-27 16:23:53 公開日:2023-04-25
# MBIB - 最初のメディアバイアス識別ベンチマークタスクとデータセットコレクション Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection ( http://arxiv.org/abs/2304.13148v1 ) ライセンス: Link先を確認	Martin Wessel, Tom\'a\v{s} Horych, Terry Ruas, Akiko Aizawa, Bela Gipp and Timo Spinde	(参考訳) メディアバイアス検出は複雑なマルチタスク問題であるが、これまでこれらの評価タスクをグループ化する統一ベンチマークは存在しない。メディアバイアス識別ベンチマーク(MBIB)は,様々なタイプのメディアバイアス(言語,認知,政治など)を共通の枠組みの下でグループ化し,予測検出手法の一般化を検証するための総合ベンチマークである。 115のデータセットをレビューした後、9つのタスクを選択し、メディアバイアス検出技術を評価するための22の関連するデータセットを慎重に提案する。我々は,最先端トランスフォーマー技術(T5,BARTなど)を用いてMBIBを評価する。我々の結果は、ヘイトスピーチ、人種的偏見、性別的偏見は検出しやすいが、モデルが認知や政治的偏見といった特定のバイアスタイプを扱うのに苦労していることを示唆している。しかし,本研究の結果から,他の手法よりも優れた性能を発揮する技術はないことがわかった。また,メディアバイアスの個々のタスクに対する研究関心と資源配分の不均一な分布も見いだした。統一ベンチマークは、より堅牢なシステムの開発を奨励し、メディアバイアス検出評価の現在のパラダイムを、複数のメディアバイアスタイプを同時に取り組むソリューションへとシフトさせる。 Although media bias detection is a complex multi-task problem, there is, to date, no unified benchmark grouping these evaluation tasks. We introduce the Media Bias Identification Benchmark (MBIB), a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques. We evaluate MBIB using state-of-the-art Transformer techniques (e.g., T5, BART). Our results suggest that while hate speech, racial bias, and gender bias are easier to detect, models struggle to handle certain bias types, e.g., cognitive and political bias. However, our results show that no single technique can outperform all the others significantly. We also find an uneven distribution of research interest and resource allocation to the individual tasks in media bias. A unified benchmark encourages the development of more robust systems and shifts the current paradigm in media bias detection evaluation towards solutions that tackle not one but multiple media bias types simultaneously.	翻訳日:2023-04-27 16:23:43 公開日:2023-04-25
# 時間スケール間の一貫性から自己教師付きマルチオブジェクトトラッキング Self-Supervised Multi-Object Tracking From Consistency Across Timescales ( http://arxiv.org/abs/2304.13147v1 ) ライセンス: Link先を確認	Christopher Lang, Alexander Braun, Lars Schillingmann, Abhinav Valada	(参考訳) 自己監視されたマルチオブジェクトトラッカーは、世界中の膨大な生データを活用できる可能性がある。しかし、彼らは監督対象に比べて再同定の精度が低い。この欠損は、自己監督対象を単一のフレームまたはフレームペアに制限することに起因すると仮定する。このような設計は、一貫した再識別機能を学ぶのに十分な視覚的外観の変化を欠いている。そこで本研究では,短期・長期にわたる一貫したアソシエーションスコアを強制することにより,フレーム列上の再同定特徴を学習する学習目標を提案する。 BDD100KとMOT17ベンチマークの大規模な評価では、学習したReID機能は、他の自己管理手法と比較してIDスイッチを著しく減らし、自己管理されたマルチオブジェクトトラッキングのための技術の新たな状態を設定し、BDD100kベンチマークの教師付きメソッドと同等に実行しました。 Self-supervised multi-object trackers have the potential to leverage the vast amounts of raw data recorded worldwide. However, they still fall short in re-identification accuracy compared to their supervised counterparts. We hypothesize that this deficiency results from restricting self-supervised objectives to single frames or frame pairs. Such designs lack sufficient visual appearance variations during training to learn consistent re-identification features. Therefore, we propose a training objective that learns re-identification features over a sequence of frames by enforcing consistent association scores across short and long timescales. Extensive evaluations on the BDD100K and MOT17 benchmarks demonstrate that our learned ReID features significantly reduce ID switches compared to other self-supervised methods, setting the new state of the art for self-supervised multi-object tracking and even performing on par with supervised methods on the BDD100k benchmark.	翻訳日:2023-04-27 16:23:23 公開日:2023-04-25
# t細胞受容体タンパク質配列とスパースコード : 癌分類への新しいアプローチ T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification ( http://arxiv.org/abs/2304.13145v1 ) ライセンス: Link先を確認	Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad and Murray Patterson	(参考訳) 癌は、制御不能な細胞増殖と増殖を特徴とする複雑な疾患である。 T細胞受容体(TCR)は、適応免疫系に必須のタンパク質であり、抗原の特異的認識は、がんを含む疾患に対する免疫応答において重要な役割を果たす。 TCRの多様性と特異性は、がん細胞をターゲットにするのに理想的であり、シークエンシング技術の最近の進歩は、TCRレパートリーの包括的なプロファイリングを可能にしている。これにより、強力な抗がん活性を持つTCRの発見とTCRベースの免疫療法の開発につながった。本研究では,癌分類を対象とするTCRタンパク質配列のマルチクラス分類におけるスパース符号の利用について検討した。スパースコーディングは、一連の情報的特徴を持つデータの表現を可能にし、アミノ酸間の複雑な関係を捉え、低次元の方法で見逃される可能性のあるシーケンス内の微妙なパターンを識別できる機械学習の一般的なテクニックである。まず、TCRシーケンスからk-merを計算し、次にスパース符号化を適用してデータの本質的な特徴を捉える。最終埋め込みの予測性能を向上させるため,各種類のがん特性に関するドメイン知識を統合する。次に,教師付き解析のためにtcr系列の埋め込みについて,異なる機械学習(線形および非線形)分類器を訓練する。提案手法は,TCRシーケンスのベンチマークデータセットへの埋め込みにより,予測性能においてベースラインを著しく上回り,99.8\%の精度を実現する。本研究は癌研究や他の関連分野におけるTCRタンパク質配列の解析におけるスパースコーディングの可能性を明らかにするものである。 Cancer is a complex disease characterized by uncontrolled cell growth and proliferation. T cell receptors (TCRs) are essential proteins for the adaptive immune system, and their specific recognition of antigens plays a crucial role in the immune response against diseases, including cancer. The diversity and specificity of TCRs make them ideal for targeting cancer cells, and recent advancements in sequencing technologies have enabled the comprehensive profiling of TCR repertoires. This has led to the discovery of TCRs with potent anti-cancer activity and the development of TCR-based immunotherapies. In this study, we investigate the use of sparse coding for the multi-class classification of TCR protein sequences with cancer categories as target labels. Sparse coding is a popular technique in machine learning that enables the representation of data with a set of informative features and can capture complex relationships between amino acids and identify subtle patterns in the sequence that might be missed by low-dimensional methods. We first compute the k-mers from the TCR sequences and then apply sparse coding to capture the essential features of the data. To improve the predictive performance of the final embeddings, we integrate domain knowledge regarding different types of cancer properties. We then train different machine learning (linear and non-linear) classifiers on the embeddings of TCR sequences for the purpose of supervised analysis. Our proposed embedding method on a benchmark dataset of TCR sequences significantly outperforms the baselines in terms of predictive performance, achieving an accuracy of 99.8\%. Our study highlights the potential of sparse coding for the analysis of TCR protein sequences in cancer research and other related fields.	翻訳日:2023-04-27 16:23:08 公開日:2023-04-25
# 時空間データの自己監督時間解析 Self-Supervised Temporal Analysis of Spatiotemporal Data ( http://arxiv.org/abs/2304.13143v1 ) ライセンス: Link先を確認	Yi Cao and Swetava Ganguli and Vipul Pandey	(参考訳) 地形活動の時間的パターンと土地利用の種類には相関関係がある。移動活動時系列に基づいて景観を階層化する新しい自己監督手法を提案する。まず、時系列信号は周波数領域に変換され、時間系列で観察される周期時間パターンを保存する収縮型オートエンコーダによりタスク非依存の時間埋め込みに圧縮される。ピクセルワイズ埋め込みは、深いセマンティックセグメンテーションを用いた下流空間タスクのタスクベースマルチモーダルモデリングに使用できるイメージライクなチャネルに変換される。実験により,時間的埋め込みは時系列データの意味的に意味のある表現であり,住宅地や商業地域を分類するといった様々なタスクに有効であることが示された。 There exists a correlation between geospatial activity temporal patterns and type of land use. A novel self-supervised approach is proposed to stratify landscape based on mobility activity time series. First, the time series signal is transformed to the frequency domain and then compressed into task-agnostic temporal embeddings by a contractive autoencoder, which preserves cyclic temporal patterns observed in time series. The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling of downstream geospatial tasks using deep semantic segmentation. Experiments show that temporal embeddings are semantically meaningful representations of time series data and are effective across different tasks such as classifying residential area and commercial areas.	翻訳日:2023-04-27 16:22:42 公開日:2023-04-25
# カシミール-ポルダー相互作用とチャーン-シモン境界層 Casimir-Polder interaction with Chern-Simons boundary layers ( http://arxiv.org/abs/2304.13186v1 ) ライセンス: Link先を確認	Valery N. Marachevsky and Arseny A. Sidelnikov	(参考訳) グリーン関数散乱法は、異なる媒体間の平面境界からの反射後の電磁偏光の混合を考慮し、チャーン・サイモンズ平面境界層を持つ系におけるカシミール・ポルダーポテンシャルの導出に適用するために一般化される。誘電体ハーフスペース上のチャーン・サイモンズ平面境界層の存在下で、異方性原子のカシミール・ポルダーポテンシャルを導出する手法が最初に適用された。そして、チャーン・サイモンズ平面平行境界層を持つ2つの誘電体半空間間の異方性原子のカシミール・ポルダーポテンシャルの一般結果を得る。真空中におけるチャーン・サイモンズ平面平行層間の異方性原子のカシミール・ポルダーポテンシャルは特殊関数によって表される。 2つのChern-Simons平面平行層と、その基底状態にある中性原子の系において、新しいP-odd三体真空効果を発見し、解析した。電界とqed双極子相互作用を持つ中性原子を用いた実験において、チャーン・シモン層の180度回転によって生じるp-odd三体真空効果が顕著に検証できる。 Green functions scattering method is generalized to consider mixing of electromagnetic polarizations after reflection from the plane boundary between different media and applied to derivation of the Casimir-Polder potential in systems with Chern-Simons plane boundary layers. The method is first applied to derive the Casimir-Polder potential of an anisotropic atom in the presence of a Chern-Simons plane boundary layer on a dielectric half-space. Then a general result for the Casimir-Polder potential of an anisotropic atom between two dielectric half-spaces with Chern-Simons plane parallel boundary layers is derived. The Casimir-Polder potential of an anisotropic atom between two Chern-Simons plane parallel layers in vacuum is expressed through special functions. Novel P-odd three-body vacuum effects are discovered and analyzed in the system of two Chern-Simons plane parallel layers and a neutral atom in its ground state between the layers. Remarkably, P-odd three-body vacuum effects arising due to 180 degree rotation of one of the Chern-Simons layers can be verified in experiments with neutral atoms having QED dipole interaction with an electromagnetic field.	翻訳日:2023-04-27 16:16:35 公開日:2023-04-25
# 画像テキストモデルのためのサンプル特異的デバイアス Sample-Specific Debiasing for Better Image-Text Models ( http://arxiv.org/abs/2304.13181v1 ) ライセンス: Link先を確認	Peiqi Wang, Yingcheng Liu, Ching-Yun Ko, William M. Wells, Seth Berkowitz, Steven Horng, Polina Golland	(参考訳) 画像テキストデータに基づく自己教師付き表現学習は、画像分類、視覚的接地、相互モーダル検索などの重要な医療応用を促進する。 1つの一般的なアプローチは、意味論的に類似した(正)と異種(負)のデータポイントの対を対比することである。トレーニングデータセットから一様に負のサンプルを描画すると、偽の陰性、すなわち同じクラスに属する異種として扱われるサンプルが生じる。医療データでは、基礎となるクラス分布は不均一であり、偽陰性は高い変動率で起こることを意味する。学習表現の品質を向上させるために,偽陰性を補正する新しい手法を開発した。提案手法は, 標本特異的なクラス確率を推定し, 偏差学習の変種と見なすことができる。目的関数の理論的解析を行い、画像とペア画像テキストのデータセットに対して提案したアプローチを示す。実験はサンプル特異的デバイアスの実証的利点を示す。 Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points. Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class. In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate. To improve the quality of learned representations, we develop a novel approach that corrects for false negatives. Our method can be viewed as a variant of debiased constrastive learning that uses estimated sample-specific class probabilities. We provide theoretical analysis of the objective function and demonstrate the proposed approach on both image and paired image-text data sets. Our experiments demonstrate empirical advantages of sample-specific debiasing.	翻訳日:2023-04-27 16:16:18 公開日:2023-04-25
# Sebis at SemEval-2023 Task 7: A Joint System for Natural Language Inference and Evidence Retrieval from Clinical Trial Reports Sebis at SemEval-2023 Task 7: A Joint System for Natural Language Inference and Evidence Retrieval from Clinical Trial Reports ( http://arxiv.org/abs/2304.13180v1 ) ライセンス: Link先を確認	Juraj Vladika, Florian Matthes	(参考訳) 毎日生成される臨床試験報告の数が増えるにつれて、証拠に基づく医療勧告を知らせる新たな発見に追随することは難しくなってきている。このプロセスを自動化し、医療専門家を支援するため、NLPソリューションが開発されている。これは、エビデンス検索と臨床試験データからの自然言語推論の2つのタスクのためのnlpシステムの開発を目標としたsemeval-2023タスク7の動機となった。本稿では,2つのシステムについて述べる。 1つは2つのタスクを個別にモデル化するパイプラインシステムであり、2つ目は2つのタスクを共有表現とマルチタスク学習アプローチで同時に学習するジョイントシステムである。最終的なシステムは、その出力をアンサンブルシステムに結合する。モデルを形式化し,その特性と課題を提示し,得られた結果の分析を行う。 With the increasing number of clinical trial reports generated every day, it is becoming hard to keep up with novel discoveries that inform evidence-based healthcare recommendations. To help automate this process and assist medical experts, NLP solutions are being developed. This motivated the SemEval-2023 Task 7, where the goal was to develop an NLP system for two tasks: evidence retrieval and natural language inference from clinical trial data. In this paper, we describe our two developed systems. The first one is a pipeline system that models the two tasks separately, while the second one is a joint system that learns the two tasks simultaneously with a shared representation and a multi-task learning approach. The final system combines their outputs in an ensemble system. We formalize the models, present their characteristics and challenges, and provide an analysis of achieved results.	翻訳日:2023-04-27 16:16:04 公開日:2023-04-25
# 電力制約深層学習によるロバスト非線形フィードバック符号化 Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning ( http://arxiv.org/abs/2304.13178v1 ) ライセンス: Link先を確認	Junghoon Kim, Taejoon Kim, David Love, Christopher Brinton	(参考訳) フィードバック可能な通信のためのコードの設計は、長年のオープンな問題であった。非線形な深層学習に基づく符号化方式に関する最近の研究は、線形符号よりも通信信頼性が大幅に向上しているが、チャネル上での前方およびフィードバックノイズの存在に対して脆弱である。本稿では,チャネルノイズに対するロバスト性を大幅に向上させる非線形フィードバック符号のファミリーを開発した。私たちのオートエンコーダベースのアーキテクチャは、ビットの連続ブロックに基づいてコードを学習するように設計されており、ノイズの多いチャネル上でのエンコーダとデコーダの物理的分離を克服するために、ビット単位での処理よりもノイズの少ないアドバンテージを得ます。さらに,ハードウェア制約を学習最適化に明示的に組み込むため,エンコーダの電力制御層を開発し,平均電力制約が漸近的に満たされることを示す。数値実験により,本手法は実効的なフォワードノイズやフィードバックノイズよりも広いマージンでフィードバック符号よりも優れており,非線形符号の挙動に関する情報理論的洞察を提供する。さらに, 長いブロック長条件下では, フィードバックノイズが高くなると, 標準誤り訂正符号がフィードバック符号より好まれることがわかった。 The design of codes for feedback-enabled communications has been a long-standing open problem. Recent research on non-linear, deep learning-based coding schemes have demonstrated significant improvements in communication reliability over linear codes, but are still vulnerable to the presence of forward and feedback noise over the channel. In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise. Our autoencoder-based architecture is designed to learn codes based on consecutive blocks of bits, which obtains de-noising advantages over bit-by-bit processing to help overcome the physical separation between the encoder and decoder over a noisy channel. Moreover, we develop a power control layer at the encoder to explicitly incorporate hardware constraints into the learning optimization, and prove that the resulting average power constraint is satisfied asymptotically. Numerical experiments demonstrate that our scheme outperforms state-of-the-art feedback codes by wide margins over practical forward and feedback noise regimes, and provide information-theoretic insights on the behavior of our non-linear codes. Moreover, we observe that, in a long blocklength regime, canonical error correction codes are still preferable to feedback codes when the feedback noise becomes high.	翻訳日:2023-04-27 16:15:46 公開日:2023-04-25
# 金融強化学習のための動的データセットと市場環境 Dynamic Datasets and Market Environments for Financial Reinforcement Learning ( http://arxiv.org/abs/2304.13174v1 ) ライセンス: Link先を確認	Xiao-Yang Liu, Ziyi Xia, Hongyang Yang, Jiechao Gao, Daochen Zha, Ming Zhu, Christina Dan Wang, Zhaoran Wang, Jian Guo	(参考訳) 金融市場は、動的データセットのユニークな特徴から、深層強化学習にとって特に困難な場である。金融強化学習(FinRL)エージェントを訓練するための高品質な市場環境の構築は、財務データの信号対雑音比の低さ、過去のデータの生存バイアス、モデルオーバーフィッティングなどの大きな要因により困難である。本稿では,実世界の市場からジム型の市場環境へ動的データセットを処理し,ai4financeコミュニティによって積極的に維持されているデータ中心かつオープンアクセス可能なライブラリであるfinrl-metaを提案する。まず、dataopsパラダイムに従って、自動データキュレーションパイプラインを通じて、数百のマーケット環境を提供します。第二に、我々は自家製の事例を提供し、人気のある研究論文を、ユーザーが新しい取引戦略を設計するための足場として再現する。また、ライブラリをクラウドプラットフォームにデプロイすることで、ユーザは自身の結果を視覚化し、コミュニティによるコンペティションを通じて相対的なパフォーマンスを評価することができます。第3に、急速に成長するコミュニティにサービスを提供するために、カリキュラムとドキュメントウェブサイトにまとめられた数十のJupyter/Pythonデモを提供しています。データキュレーションパイプラインのオープンソースコードはhttps://github.com/AI4Finance-Foundation/FinRL-Metaで公開されている。 The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The open-source codes for the data curation pipeline are available at https://github.com/AI4Finance-Foundation/FinRL-Meta	翻訳日:2023-04-27 16:15:24 公開日:2023-04-25
# SAFE: Shard Graphsを使った機械学習 SAFE: Machine Unlearning With Shard Graphs ( http://arxiv.org/abs/2304.13169v1 ) ライセンス: Link先を確認	Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto	(参考訳) 本稿では,学習モデルからトレーニングサンプルの影響を最小化しつつ,さまざまなデータ集合に大規模モデルを適応させる手法であるSynergy Aware Forgetting Ensemble (SAFE)を提案する。このプロセスは選択的忘れまたはアンラーニングとしても知られ、データセットをシャードに分割し、それぞれに完全に独立したモデルをトレーニングし、結果のモデルをアンセンブルすることで実行されることが多い。シャード数の増加は、期待されるコストを減少させるが、独立したモデルトレーニング中にサンプル間の相乗的情報が失われるため、推論コストを増加させ、モデルの最終的な精度を低下させる。個々のシャードを独立したものとして扱うのではなく、SAFEはシャードグラフの概念を導入し、これは訓練中に他のシャードから限られた情報を取り込むことを可能にし、予想される忘れるコストをわずかに増加させ、精度を著しく向上させる。 SAFEは軽量なアダプタシステムを使用し、ほとんどの計算を再利用しながらトレーニングすることができる。これにより、SAFEは現在の最先端の方法(つまり、忘れることのコストを削減)よりも小さなシャードでトレーニングできると同時に、精密なコンピュータビジョンデータセットで実証的に示すように、高い精度を維持することができる。 We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also known as selective forgetting or unlearning, is often conducted by partitioning a dataset into shards, training fully independent models on each, then ensembling the resulting models. Increasing the number of shards reduces the expected cost to forget but at the same time it increases inference cost and reduces the final accuracy of the model since synergistic information between samples is lost during the independent model training. Rather than treating each shard as independent, SAFE introduces the notion of a shard graph, which allows incorporating limited information from other shards during training, trading off a modest increase in expected forgetting cost with a significant increase in accuracy, all while still attaining complete removal of residual influence after forgetting. SAFE uses a lightweight system of adapters which can be trained while reusing most of the computations. This allows SAFE to be trained on shards an order-of-magnitude smaller than current state-of-the-art methods (thus reducing the forgetting costs) while also maintaining high accuracy, as we demonstrate empirically on fine-grained computer vision datasets.	翻訳日:2023-04-27 16:15:04 公開日:2023-04-25
# 進化アルゴリズムを用いた正定値非パラメトリック回帰と共分散関数推定への応用 Positive definite nonparametric regression using an evolutionary algorithm with application to covariance function estimation ( http://arxiv.org/abs/2304.13168v1 ) ライセンス: Link先を確認	Myeongjong Kang	(参考訳) 正定性制約を考慮した新しい非パラメトリック回帰フレームワークを提案する。定常過程の共分散関数を推定するための高度にモジュラーなアプローチを提供する。提案手法は, 正定性, 等方性, 単調性を推定器に課すことができ, クロスバリデーションを用いてその過度パラメータを決定することができる。カーネルベースの分布サロゲートの積分変換を用いて推定器を定義する。次に,分布推定アルゴリズムの変種である反復密度推定アルゴリズムを用いて推定値に適合する。また,点参照データの共分散関数を推定する手法を拡張した。代替手法と比較して,本手法は長距離依存性の信頼性の高い推定を行う。本手法の有効性と性能を示すために,いくつかの数値的研究を行った。また,空間補間比較97プロジェクトの降水データを用いて,本手法について述べる。 We propose a novel nonparametric regression framework subject to the positive definiteness constraint. It offers a highly modular approach for estimating covariance functions of stationary processes. Our method can impose positive definiteness, as well as isotropy and monotonicity, on the estimators, and its hyperparameters can be decided using cross validation. We define our estimators by taking integral transforms of kernel-based distribution surrogates. We then use the iterated density estimation evolutionary algorithm, a variant of estimation of distribution algorithms, to fit the estimators. We also extend our method to estimate covariance functions for point-referenced data. Compared to alternative approaches, our method provides more reliable estimates for long-range dependence. Several numerical studies are performed to demonstrate the efficacy and performance of our method. Also, we illustrate our method using precipitation data from the Spatial Interpolation Comparison 97 project.	翻訳日:2023-04-27 16:14:38 公開日:2023-04-25
# LEMaRT:画像調和のためのラベル効率の良いマスク付き領域変換 LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization ( http://arxiv.org/abs/2304.13166v1 ) ライセンス: Link先を確認	Sheng Liu, Cong Phuoc Huynh, Cong Chen, Maxim Arap, Raffay Hamid	(参考訳) 本稿では,大規模無注画像データセットを活用可能な画像調和のための,単純かつ効果的な自己教師付き事前学習手法を提案する。この目標を達成するために、私たちはまず、Label-Efficient Masked Region Transform (LEMaRT)パイプラインでオンラインで事前トレーニングデータを生成します。画像が与えられた後、LEMaRTは前景マスクを生成し、その後、生成されたマスクによって指定された領域のデフォーカスブラー、コントラスト、飽和などの様々な視覚特性を摂動させる一連の変換を適用する。次に,摂動画像から元の画像を復元して画像調和モデルを事前学習する。次に,Swin Transformer[27]を局所的・大域的自己注意機構の組み合わせで再現することで,画像調和モデル,すなわちSwinIHを導入する。 LEMaRTを用いたSwinIHの事前トレーニングは、ラベル効率が良く、既存の方法よりも微調整にアノテーションの少ないデータを使用するという、画像調和技術の新しい状態をもたらす。特に、iHarmony4データセット[8]では、SwinIHは、トレーニングデータのわずか50%で微調整された場合、SCS-Co[16]のマージンが0.4dB、フルトレーニングデータセットでトレーニングされた場合には1.0dBという、芸術の状態を上回ります。 We present a simple yet effective self-supervised pre-training method for image harmonization which can leverage large-scale unannotated image datasets. To achieve this goal, we first generate pre-training data online with our Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image, LEMaRT generates a foreground mask and then applies a set of transformations to perturb various visual attributes, e.g., defocus blur, contrast, saturation, of the region specified by the generated mask. We then pre-train image harmonization models by recovering the original image from the perturbed image. Secondly, we introduce an image harmonization model, namely SwinIH, by retrofitting the Swin Transformer [27] with a combination of local and global self-attention mechanisms. Pre-training SwinIH with LEMaRT results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co [16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data, and by 1.0 dB when it is trained on the full training dataset.	翻訳日:2023-04-27 16:14:21 公開日:2023-04-25
# 計算最適転送学習に向けて Towards Compute-Optimal Transfer Learning ( http://arxiv.org/abs/2304.13164v1 ) ライセンス: Link先を確認	Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu	(参考訳) 転送学習の分野は、様々な下流タスクに強い適応性を示す大規模な事前訓練モデルの導入によって、大きな変化を遂げている。しかし、これらのモデルを微調整または使用するための高い計算およびメモリ要求は、それらが広く使われるのを妨げる可能性がある。本研究では,学習アルゴリズムが計算の無限大化の傾向として達成する性能として定義する漸近的性能の計算効率を,単純かつ効果的に取引する方法を提案する。具体的には、事前訓練されたモデルのゼロショット構造化プルーニングにより、性能を最小限に抑えて計算効率を向上させることができると論じる。提案手法は,様々なトランスファーシナリオを提供するnevis'22連続学習ベンチマークを用いて評価する。その結果, プリトレーニングモデルの畳み込みフィルタは, 低計算環境では20%以上の性能向上をもたらすことがわかった。 The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet effective way to trade computational efficiency for asymptotic performance which we define as the performance a learning algorithm achieves as compute tends to infinity. Specifically, we argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. We evaluate our method on the Nevis'22 continual learning benchmark that offers a diverse set of transfer scenarios. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.	翻訳日:2023-04-27 16:13:53 公開日:2023-04-25
# HDRかSDRか? スケール・圧縮映像の主観的・客観的研究 HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos ( http://arxiv.org/abs/2304.13162v1 ) ライセンス: Link先を確認	Joshua P. Ebenezer, Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik	(参考訳) 本研究では,HDR(High Dynamic Range)とSDR(Standard Dynamic Range)の人間の知覚品質判定を大規模に検討し,3種類の表示装置で観察した。 hdrビデオは、sdrビデオよりも、より広い色域、コントラスト、明るい白と暗い黒を表示することができる。従来の予測では、HDR品質はSDR品質よりも優れているが、HDRとSDRの主観的嗜好はディスプレイ装置や解像度のスケーリングやビットレートに大きく依存している。そこで本研究では,OLED,QLED,LCDテレビで356本のビデオを見た67名のボランティアから,品質評価を23,000件以上収集した。例えば、スケーリング、圧縮、およびSDR対HDRに関する決定を通知するために、これらのシナリオの下でビデオの品質を測定することは興味があるので、新しいデータベース上で、よく知られたフル参照およびノン参照ビデオ品質モデルを試した。この問題の進展に向けて,従来の測定値よりも精度良く,古典的およびビット深部に敏感な歪み統計量を用いる,hdrpatchmaxと呼ばれる新しい非参照モデルを開発した。 We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. HDR videos are able to present wider color gamuts, better contrasts, and brighter whites and darker blacks than SDR videos. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses both classical and bit-depth sensitive distortion statistics more accurately than existing metrics.	翻訳日:2023-04-27 16:13:37 公開日:2023-04-25
# 適応量子力学における連続対称性の破れ Continuous symmetry breaking in adaptive quantum dynamics ( http://arxiv.org/abs/2304.13198v1 ) ライセンス: Link先を確認	Jacob Hauser, Yaodong Li, Sagar Vijay, Matthew P. A. Fisher	(参考訳) 量子多体系を操るためにユニタリ演算、測定、フィードバックが用いられる適応量子回路は、新しい動的定常状態を生成するエキサイティングな機会を提供する。我々は,一元演算,測定,局所ユニタリフィードバックを順序付けに用いた連続対称性を持つ適応量子力学を導入する。この設定では、純粋に定常な状態が対称性を保ち、これはギャップのない局所ハミルトニアンの基底状態である。この定常状態へのアプローチの力学的性質について検討する。この定常秩序は摂動に対して脆弱であり、連続対称性を尊重するものでさえある。 Adaptive quantum circuits, in which unitary operations, measurements, and feedback are used to steer quantum many-body systems, provide an exciting opportunity to generate new dynamical steady states. We introduce an adaptive quantum dynamics with continuous symmetry where unitary operations, measurements, and local unitary feedback are used to drive ordering. In this setting, we find a pure steady state hosting symmetry-breaking order, which is the ground state of a gapless, local Hamiltonian. We explore the dynamical properties of the approach to this steady state. We find that this steady-state order is fragile to perturbations, even those that respect the continuous symmetry.	翻訳日:2023-04-27 16:05:43 公開日:2023-04-25
# connector 0.5: グラフ表現学習のための統一フレームワーク Connector 0.5: A unified framework for graph representation learning ( http://arxiv.org/abs/2304.13195v1 ) ライセンス: Link先を確認	Thanh Sang Nguyen, Jooho Lee, Van Thuy Hoang, O-Joun Lee	(参考訳) グラフ表現学習モデルは、グラフ構造とその特徴を潜在空間内の低次元ベクトルに表現することを目的としており、ノード分類やリンク予測などの下流タスクの恩恵を受けることができる。強力なグラフデータモデリング機能により、様々なグラフ埋め込みモデルやライブラリが提案されており、埋め込みを学び、研究者が実験を簡単に行えるようにしている。本稿では,浅層モデルから最先端モデル,すなわちコネクタモデルまで,様々なグラフ埋め込みモデルをカバーする新しいグラフ表現フレームワークを提案する。まず, 均質グラフ, 署名グラフ, 不均一グラフ, 知識グラフなど, 構造的関係が異なる様々な種類のグラフを構築し, グラフ生成を考える。次に,浅いグラフから深いグラフ埋め込みモデルまで,様々なグラフ表現学習モデルを紹介する。最後に、グラフの構造関係を表現するために、ディープグラフ埋め込みモデルを提供する効率的なオープンソースフレームワークを構築する計画である。フレームワークはhttps://github.com/NSLab-CUK/Connector.comから入手できる。 Graph representation learning models aim to represent the graph structure and its features into low-dimensional vectors in a latent space, which can benefit various downstream tasks, such as node classification and link prediction. Due to its powerful graph data modelling capabilities, various graph embedding models and libraries have been proposed to learn embeddings and help researchers ease conducting experiments. In this paper, we introduce a novel graph representation framework covering various graph embedding models, ranging from shallow to state-of-the-art models, namely Connector. First, we consider graph generation by constructing various types of graphs with different structural relations, including homogeneous, signed, heterogeneous, and knowledge graphs. Second, we introduce various graph representation learning models, ranging from shallow to deep graph embedding models. Finally, we plan to build an efficient open-source framework that can provide deep graph embedding models to represent structural relations in graphs. The framework is available at https://github.com/NSLab-CUK/Connector.	翻訳日:2023-04-27 16:05:31 公開日:2023-04-25
# 視力に基づく触覚センシングと信頼度校正ニューラルネットワークによる大腸癌ポリープ分類の信頼性向上に向けて Towards Reliable Colorectal Cancer Polyps Classification via Vision Based Tactile Sensing and Confidence-Calibrated Neural Networks ( http://arxiv.org/abs/2304.13192v1 ) ライセンス: Link先を確認	Siddhartha Kapuria, Tarunraj G. Mohanraj, Nethra Venkatayogi, Ozdemir Can Kara, Yuki Hirata, Patrick Minot, Ariel Kapusta, Naruhiko Ikoma, and Farshid Alambeigi	(参考訳) 本研究では,既存の人工知能を用いた大腸癌 (CRC) ポリープ分類手法の高信頼出力に対処するために,信頼度校正された残留ニューラルネットワークを提案する。視覚に基づく触覚センサ(VS-TS)システムと独自のCRCポリープファントムを用いて,感度の高いCRCポリープ診断のためのモデル性能をカプセル化するには,精度や精度などの従来の指標が不十分であることを示す。そこで本研究では,残差ニューラルネットワーク分類器を開発し,crcポリプス分類のための過密出力を温度スケーリングの処理後手法で解決する。提案手法を評価するために,得られたVS-TSの音声画像にノイズとぼかしを導入し,信頼性図や他の統計指標を用いて非理想的な入力に対するモデルの信頼性をテストする。 In this study, toward addressing the over-confident outputs of existing artificial intelligence-based colorectal cancer (CRC) polyp classification techniques, we propose a confidence-calibrated residual neural network. Utilizing a novel vision-based tactile sensing (VS-TS) system and unique CRC polyp phantoms, we demonstrate that traditional metrics such as accuracy and precision are not sufficient to encapsulate model performance for handling a sensitive CRC polyp diagnosis. To this end, we develop a residual neural network classifier and address its over-confident outputs for CRC polyps classification via the post-processing method of temperature scaling. To evaluate the proposed method, we introduce noise and blur to the obtained textural images of the VS-TS and test the model's reliability for non-ideal inputs through reliability diagrams and other statistical metrics.	翻訳日:2023-04-27 16:05:15 公開日:2023-04-25
# メンタルヘルスのための説明可能で安全な会話エージェント--調査から Towards Explainable and Safe Conversational Agents for Mental Health: A Survey ( http://arxiv.org/abs/2304.13191v1 ) ライセンス: Link先を確認	Surjodeep Sarkar, Manas Gaur, L. Chen, Muskan Garg, Biplav Srivastava, Bhaktee Dongaonkar	(参考訳) バーチャルメンタルヘルスアシスタント(vmhas)は、6000万のプライマリケア訪問と、毎年600万の救急室(er)訪問を受ける、過激なグローバル医療システムをサポートするために、継続的に進歩している。これらのシステムは、臨床心理学者、精神科医、認知行動療法(CBT)の研究者によって構築されている。現在、VMHAの役割は、情報を通じて感情的な支援を提供することであり、患者との反射的な会話の開発に注力することである。より包括的で安全で説明可能なアプローチは、責任あるVMHAを構築してフォローアップ質問をしたり、十分なインフォームドレスポンスを提供するために必要です。この調査は、メンタルヘルスにおける既存の会話エージェントの体系的な批判的レビューと、文脈知識、データセット、臨床決定支援におけるvmhasの改善に関する新たな洞察を提供する。また、VMHAのユーザエクスペリエンスを説明責任、安全性、そして全くの信頼性で強化する新たな方向性も提供します。最後に,VMHAとアクティブコミュニケーションの患者との信頼関係を構築するため,VMHAの評価指標と実践的考察を提供する。 Virtual Mental Health Assistants (VMHAs) are seeing continual advancements to support the overburdened global healthcare system that gets 60 million primary care visits, and 6 million Emergency Room (ER) visits annually. These systems are built by clinical psychologists, psychiatrists, and Artificial Intelligence (AI) researchers for Cognitive Behavioral Therapy (CBT). At present, the role of VMHAs is to provide emotional support through information, focusing less on developing a reflective conversation with the patient. A more comprehensive, safe and explainable approach is required to build responsible VMHAs to ask follow-up questions or provide a well-informed response. This survey offers a systematic critical review of the existing conversational agents in mental health, followed by new insights into the improvements of VMHAs with contextual knowledge, datasets, and their emerging role in clinical decision support. We also provide new directions toward enriching the user experience of VMHAs with explainability, safety, and wholesome trustworthiness. Finally, we provide evaluation metrics and practical considerations for VMHAs beyond the current literature to build trust between VMHAs and patients in active communications.	翻訳日:2023-04-27 16:04:58 公開日:2023-04-25
# 固有光力生成ゲインを持つ超ラジアント2レベルレーザー A superradiant two-level laser with intrinsic light force generated gain ( http://arxiv.org/abs/2304.13190v1 ) ライセンス: Link先を確認	Anna Bychek, Helmut Ritsch	(参考訳) 能動周波数標準としてのスーパーラジアントレーザーの実装は、標準受動光時計と比較して短期安定性と熱的・機械的揺らぎに対する堅牢性の向上をもたらすと予測されている。しかし、最近の顕著な進歩にもかかわらず、光共振器内の活性原子の連続的な負荷、冷却、ポンプを必要とするため、連続波の超ラジアントレーザーの実験的実現は依然として未解決の課題である。本稿では, 単一モードキャビティ内に閉じ込められた冷媒ガスのバイクロマチックコヒーレントポンプによる2レベル原子状態に作用する光力を用いて, 連続的な利得を生み出す新しいシナリオを提案する。原子メーザーのセットアップとは対照的に、基底状態原子が反発する間、強い原子空洞結合領域における励起状態原子の収集と集中に調整された状態依存力が使用される。十分大きな原子アンサンブルの数値シミュレーションを容易にするために, 2次累積展開に依存し, 空洞軸に沿った光学的勾配力を誘導する位置依存光シフトを受ける半古典的点粒子近似における原子運動を記述する。超放射能発光に必要なポンプレーザ強度とデチューニングの最小条件について検討した。バランシングドップラー冷却と利得誘導加熱は、素原子周波数に近い連続狭帯域レーザー動作のパラメータ構造を同定する。 The implementation of a superradiant laser as an active frequency standard is predicted to provide better short-term stability and robustness to thermal and mechanical fluctuations when compared to standard passive optical clocks. However, despite significant recent progress, the experimental realization of continuous wave superradiant lasing still remains an open challenge as it requires continuous loading, cooling, and pumping of active atoms within an optical resonator. Here we propose a new scenario for creating continuous gain by using optical forces acting on the states of a two-level atom via bichromatic coherent pumping of a cold atomic gas trapped inside a single-mode cavity. Analogous to atomic maser setups, tailored state-dependent forces are used to gather and concentrate excited state atoms in regions of strong atom-cavity coupling while ground-state atoms are repelled. To facilitate numerical simulations of a sufficiently large atomic ensemble, we rely on a second-order cumulant expansion and describe the atomic motion in a semi-classical point-particle approximation subject to position-dependent light shifts which induce optical gradient forces along the cavity axis. We study minimal conditions on pump laser intensities and detunings required for collective superradiant emission. Balancing Doppler cooling and gain-induced heating we identify a parameter regime of a continuous narrow-band laser operation close to the bare atomic frequency.	翻訳日:2023-04-27 16:04:38 公開日:2023-04-25
# 海洋生命サーベイヤーにおける顕微鏡バイオシグナチュア検出のためのオンボード科学機器自律性 Onboard Science Instrument Autonomy for the Detection of Microscopy Biosignatures on the Ocean Worlds Life Surveyor ( http://arxiv.org/abs/2304.13189v1 ) ライセンス: Link先を確認	Mark Wronkiewicz, Jake Lee, Lukas Mandrake, Jack Lightholder, Gary Doran, Steffen Mauceri, Taewoo Kim, Nathan Oborny, Thomas Schibler, Jay Nadeau, James K. Wallace, Eshaan Moorjani, Chris Lindensmith	(参考訳) 地球外生命の探索は、文明レベルの意味を持つ重要な科学的取り組みである。太陽系の氷の衛星は、その液体の海が微小な生命の生息地になる可能性があるため、探査のターゲットとして有望です。しかし、生命の正確な定義の欠如は、検出戦略の定式化に根本的な課題をもたらす。不明瞭な検出の可能性を高めるために、補完的な機器群は複数の独立した生物記号(例えば、組成、運動/行動、可視構造)をサンプリングする必要がある。このような機器は、エンケラドゥスやエウロパのような遠く離れた海から送信されるデータより1万倍多い生データを生成することができる。この帯域制限に対処するため、オンボード・サイエンス・インスツルメンツ・オートノミー (Onboard Science Instrument Autonomy, OSIA) は、科学のリターンを最大化するために観測機器データを評価、要約、優先順位付けできる飛行システムの新興分野である。ジェット推進研究所のOcean Worlds Life Surveyor (OWLS) の試作機器スイートの一部として開発された2つのOSIA実装について述べる。第1はデジタルホログラフィービデオで生命に似た動きを識別し、第2は自然蛍光と染料誘起蛍光によって細胞構造と組成を識別する。飛行のような要求と計算上の制約は、火星のヘリコプター「インジェニュティ」と同様に、輸液の障壁を低くするために用いられた。シミュレーションおよび実験室データを用いてOSIAの性能評価を行い,超塩質モノレイク惑星アナログ地点で実地試験を行った。本研究は,バイオシグナチャ検出のためのOSIAの可能性を示すとともに,太陽系外惑星探査を目的とした将来のミッション概念に対する洞察と教訓を提供する。 The quest to find extraterrestrial life is a critical scientific endeavor with civilization-level implications. Icy moons in our solar system are promising targets for exploration because their liquid oceans make them potential habitats for microscopic life. However, the lack of a precise definition of life poses a fundamental challenge to formulating detection strategies. To increase the chances of unambiguous detection, a suite of complementary instruments must sample multiple independent biosignatures (e.g., composition, motility/behavior, and visible structure). Such an instrument suite could generate 10,000x more raw data than is possible to transmit from distant ocean worlds like Enceladus or Europa. To address this bandwidth limitation, Onboard Science Instrument Autonomy (OSIA) is an emerging discipline of flight systems capable of evaluating, summarizing, and prioritizing observational instrument data to maximize science return. We describe two OSIA implementations developed as part of the Ocean Worlds Life Surveyor (OWLS) prototype instrument suite at the Jet Propulsion Laboratory. The first identifies life-like motion in digital holographic microscopy videos, and the second identifies cellular structure and composition via innate and dye-induced fluorescence. Flight-like requirements and computational constraints were used to lower barriers to infusion, similar to those available on the Mars helicopter, "Ingenuity." We evaluated the OSIA's performance using simulated and laboratory data and conducted a live field test at the hypersaline Mono Lake planetary analog site. Our study demonstrates the potential of OSIA for enabling biosignature detection and provides insights and lessons learned for future mission concepts aimed at exploring the outer solar system.	翻訳日:2023-04-27 16:04:17 公開日:2023-04-25
# tablet: 表データのための指示から学ぶ TABLET: Learning From Instructions For Tabular Data ( http://arxiv.org/abs/2304.13188v1 ) ライセンス: Link先を確認	Dylan Slack and Sameer Singh	(参考訳) 高品質なデータを取得することは、表的な予測のための機械学習(ml)モデルをトレーニングする上で、しばしば重要な課題である。大規模言語モデル(LLM)への自然言語命令の提供は代替ソリューションを提供する。しかし,表予測問題に対するllmの知識をいかに効果的に活用するかは明らかでない。このギャップに対処するために、私たちはタブレットを紹介します。タブレットは20の多様な表型データセットのベンチマークで、そのフラージング、粒度、技術性によって異なる指示を注釈付けしています。さらに、TABLETには命令のロジックと命令の構造化変更が含まれている。テキスト内命令はFlan-T5 11bのゼロショットF1性能を平均44%、TABLETのChatGPTでは13%向上させる。また,本ベンチマークにおける表予測にllmを用いた場合の制限について,命令忠実性の評価により検討する。 LLMは命令を無視し、例でも特定のインスタンスを正しく予測できないことが多い。 TABLET を用いた解析では,命令が LLM のパフォーマンスを補助する一方で,表データの命令から学習するには新たな機能が必要であることが示された。 Acquiring high-quality data is often a significant challenge in training machine learning (ML) models for tabular prediction, particularly in privacy-sensitive and costly domains like medicine and finance. Providing natural language instructions to large language models (LLMs) offers an alternative solution. However, it is unclear how effectively instructions leverage the knowledge in LLMs for solving tabular prediction problems. To address this gap, we introduce TABLET, a benchmark of 20 diverse tabular datasets annotated with instructions that vary in their phrasing, granularity, and technicality. Additionally, TABLET includes the instructions' logic and structured modifications to the instructions. We find in-context instructions increase zero-shot F1 performance for Flan-T5 11b by 44% on average and 13% for ChatGPT on TABLET. Also, we explore the limitations of using LLMs for tabular prediction in our benchmark by evaluating instruction faithfulness. We find LLMs often ignore instructions and fail to predict specific instances correctly, even with examples. Our analysis on TABLET shows that, while instructions help LLM performance, learning from instructions for tabular data requires new capabilities.	翻訳日:2023-04-27 16:03:47 公開日:2023-04-25
# AI支援コーディング: GPT-4による実験 AI-assisted coding: Experiments with GPT-4 ( http://arxiv.org/abs/2304.13187v1 ) ライセンス: Link先を確認	Russell A Poldrack, Thomas Lu, and Ga\v{s}per Begu\v{s}	(参考訳) 大規模言語モデルに基づく人工知能(AI)ツールは、いくつかのコンピュータプログラミングタスクにおいて人間レベルのパフォーマンスを高めている。 GPT-4を用いてコンピュータコードを生成する実験をいくつか報告する。これらの実験は、現在の世代のツールを使用したAIコード生成が強力であるにも関わらず、正確なパフォーマンスを保証するためには、人間による検証がかなり必要であることを実証している。また,既存のコードに対する GPT-4 のリファクタリングは,コード品質の確立した指標に沿ってコードを大幅に改善できることを示すとともに,GPT-4 がかなりのカバレッジでテストを生成することができることを示した。これらの結果は、AIコーディングツールは非常に強力であるが、結果の妥当性と正確性を保証するためには、まだ人間を必要とすることを示唆している。 Artificial intelligence (AI) tools based on large language models have acheived human-level performance on some computer programming tasks. We report several experiments using GPT-4 to generate computer code. These experiments demonstrate that AI code generation using the current generation of tools, while powerful, requires substantial human validation to ensure accurate performance. We also demonstrate that GPT-4 refactoring of existing code can significantly improve that code along several established metrics for code quality, and we show that GPT-4 can generate tests with substantial coverage, but that many of the tests fail when applied to the associated code. These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.	翻訳日:2023-04-27 16:03:26 公開日:2023-04-25
# KBody:一般,堅牢,整列した単分子体全体推定を目指して KBody: Towards general, robust, and aligned monocular whole-body estimation ( http://arxiv.org/abs/2304.11542v2 ) ライセンス: Link先を確認	Nikolaos Zioulis and James F. O'Brien	(参考訳) KBodyは、低次元のボディモデルを画像に適合させる方法である。予測と最適化のアプローチに従い、体のパラメータの解決に使用される制約のためにデータ駆動モデル見積に依存する。高品質な対応の重要性を認識し、"仮想関節"を活用してフィッティング性能を改善し、ポーズパラメータと形状パラメータの最適化を解き、非対称距離場を統合してポーズと形状キャプチャの能力と画素アライメントのバランスをとる。また, 生成モデルインバージョンは, 人間の部分像の完成に用いられ, 汎用的かつロバストな単眼体フィッティングのビルディングブロックとして用いられるような, 強い外観を事前に与えていることを示す。プロジェクトページ: https://klothed.github.io/KBody.com KBody is a method for fitting a low-dimensional body model to an image. It follows a predict-and-optimize approach, relying on data-driven model estimates for the constraints that will be used to solve for the body's parameters. Acknowledging the importance of high quality correspondences, it leverages ``virtual joints" to improve fitting performance, disentangles the optimization between the pose and shape parameters, and integrates asymmetric distance fields to strike a balance in terms of pose and shape capturing capacity, as well as pixel alignment. We also show that generative model inversion offers a strong appearance prior that can be used to complete partial human images and used as a building block for generalized and robust monocular body fitting. Project page: https://klothed.github.io/KBody.	翻訳日:2023-04-27 10:53:10 公開日:2023-04-25
# マイクロ波磁気力学における単一光子冷却 Single-photon cooling in microwave magneto-mechanics ( http://arxiv.org/abs/1912.05489v2 ) ライセンス: Link先を確認	D. Zoepfl, M. L. Juan, C. M. F. Schneider, G. Kirchmair	(参考訳) 光子を機械的運動に結合するキャビティ光学は、基本的な量子限界付近で機械的運動を制御するツールを提供する。単一光子強いカップリングは、非ガウス量子状態における機械共振器の準備を可能にする。このような状態における巨大な機械共振器の調製は、量子力学の境界をテストする上で特に興味深い。しかし、この目標は、通常大規模な装置で達成される小さな光機械的カップリングのため、依然として困難である。ここではマイクロ波空洞に機械共振器を磁気的に結合する新しい手法を示す。 g_0/2 \pi \sim 3$ khzの単光子カップリングを計測し、現在のマイクロ波光機械システムよりも1桁大きくなった。この結合において、我々は1光子強結合に達する重要なステップである$c_0 \gtrsim 10$の大きい1光子協調性を測定する。このような強い相互作用により、マイクロ波空洞に2光子未満の定常フォノン集団の3分の1に機械共振器を冷却することができる。量子基盤のテスト以外にも、我々のアプローチは量子センサーやマイクロ波から光トランスデューサにも適しています。 Cavity optomechanics, where photons are coupled to mechanical motion, provides the tools to control mechanical motion near the fundamental quantum limits. Reaching single-photon strong coupling would allow to prepare the mechanical resonator in non-Gaussian quantum states. Preparing massive mechanical resonators in such states is of particular interest for testing the boundaries of quantum mechanics. This goal remains however challenging due to the small optomechanical couplings usually achieved with massive devices. Here we demonstrate a novel approach where a mechanical resonator is magnetically coupled to a microwave cavity. We measure a single-photon coupling of $g_0/2 \pi \sim 3$ kHz, an improvement of one order of magnitude over current microwave optomechanical systems. At this coupling we measure a large single-photon cooperativity with $C_0 \gtrsim 10$, an important step to reach single-photon strong coupling. Such a strong interaction allows us to cool the massive mechanical resonator to a third of its steady state phonon population with less than two photons in the microwave cavity. Beyond tests for quantum foundations, our approach is also well suited as a quantum sensor or a microwave to optical transducer.	翻訳日:2023-04-27 04:19:05 公開日:2023-04-25
# Quantikzパッケージのチュートリアル Tutorial on the Quantikz Package ( http://arxiv.org/abs/1809.03842v6 ) ライセンス: Link先を確認	Alastair Kay	(参考訳) このチュートリアルでは、量子回路図のタイプセットのためのQuantikz LaTeXパッケージを紹介(およびドキュメントソース経由で提供)する。これによりtikzを活用することで、回路オプションの制御性が向上する。優れたqcircuitパッケージに慣れている人は、記法の多くを認識するだろうが、少し進化している(願わくばシンプルだ! This tutorial introduces (and provides, via the document source) the Quantikz LaTeX package for typesetting quantum circuit diagrams. This takes advantage of tikz to give greater control over the circuit options. Those familiar with the excellent QCircuit package will recognise much of the notation, although it has evolved a bit (hopefully simplified!).	翻訳日:2023-04-27 04:18:46 公開日:2023-04-25
# リレー付きIoTネットワークにおける実用的なAoIスケジューリング A Practical AoI Scheduler in IoT Networks with Relays ( http://arxiv.org/abs/2203.04227v3 ) ライセンス: Link先を確認	Biplav Choudhury, Prasenjit Karmakar, Vijay K. Shah, Jeffrey H. Reed	(参考訳) IoT(Internet of Things)ネットワークは、自律コンピューティング、通信、デバイス間のコラボレーションがさまざまなタスクを達成するために人気になるにつれて、広く普及している。 IoTネットワークにおけるリレーの利用により、通信範囲の拡大や消費電力の最小化など、リレーが多くのメリットを提供するため、IoTネットワークのデプロイも便利になる。従来のAoIスケジューラの2つのホップリレーIoTネットワークに関する文献は、定数/非変更チャネル条件を前提として設計されており、既知の(通常、生成する)パケット生成パターンのために制限されている。ディープ強化学習(DRL)アルゴリズムは、リレー付き2ホップIoTネットワークにおけるAoIスケジューリングのために研究されているが、ネットワークが大きくなるにつれて行動空間が指数関数的に増加するため、小規模IoTネットワークにのみ適用可能である。これらの制限は、IoTネットワークデプロイメントにおけるAoIスケジューラの実用的利用を妨げる。本稿では、上記の制限に対処するリレー付き2ホップIoTネットワークのための実用的なAoIスケジューラを提案する。提案するスケジューラは,リニアなアクションスペースを維持した,新たな投票機構に基づく近距離ポリシ最適化(v-ppo)アルゴリズムを使用して,大規模iotネットワークとのスケーラビリティを実現している。提案されたv-PPOベースのAoIスケジューラは、未知のトラフィック生成パターンのネットワーク条件やアカウントの変更に順応する。シミュレーションの結果,提案したV-PPOベースのAoIスケジューラは,DQNベースのAoIスケジューラ,MAF-MAD(Maximal Age First-Maximal Age Difference),MAF(Maximal Age First),ラウンドロビンなど,MLおよび従来の(非ML)AoIスケジューラよりも優れていた。 Internet of Things (IoT) networks have become ubiquitous as autonomous computing, communication and collaboration among devices become popular for accomplishing various tasks. The use of relays in IoT networks further makes it convenient to deploy IoT networks as relays provide a host of benefits, like increasing the communication range and minimizing power consumption. Existing literature on traditional AoI schedulers for such two-hop relayed IoT networks are limited because they are designed assuming constant/non-changing channel conditions and known (usually, generate-at-will) packet generation patterns. Deep reinforcement learning (DRL) algorithms have been investigated for AoI scheduling in two-hop IoT networks with relays, however, they are only applicable for small-scale IoT networks due to exponential rise in action space as the networks become large. These limitations discourage the practical utilization of AoI schedulers for IoT network deployments. This paper presents a practical AoI scheduler for two-hop IoT networks with relays that addresses the above limitations. The proposed scheduler utilizes a novel voting mechanism based proximal policy optimization (v-PPO) algorithm that maintains a linear action space, enabling it be scale well with larger IoT networks. The proposed v-PPO based AoI scheduler adapts well to changing network conditions and accounts for unknown traffic generation patterns, making it practical for real-world IoT deployments. Simulation results show that the proposed v-PPO based AoI scheduler outperforms both ML and traditional (non-ML) AoI schedulers, such as, Deep Q Network (DQN)-based AoI Scheduler, Maximal Age First-Maximal Age Difference (MAF-MAD), MAF (Maximal Age First) , and round-robin in all considered practical scenarios.	翻訳日:2023-04-27 04:16:12 公開日:2023-04-25
# 古典的レート理論におけるキャビティ誘起分岐 Cavity-induced bifurcation in classical rate theory ( http://arxiv.org/abs/2202.12182v3 ) ライセンス: Link先を確認	Kalle S. U. Kansanen and Tero T. Heikkil\"a	(参考訳) 双安定系のアンサンブルと共振器場との結合が、このアンサンブルの集合確率的挙動にどのように影響するかを示す。特に、空洞はシステム間の効果的な相互作用を提供し、準安定状態間の遷移率をパラメトリック的に調節する。我々は空洞がシステム数に線形に依存する臨界温度で集合相転移を引き起こすことを予測した。これは双安定系の定常状態が分岐する自発的対称性の破れとして現れる。遷移速度は相転移とは無関係に低下するが, 共振器の乱れに対応して, 系の共振器結合の符号を交互に変化させる速度変化は消失する。この結果は、キャビティの存在が化学反応に影響を与えることが示唆された分極化学において特に関係している。 We show how coupling an ensemble of bistable systems to a common cavity field affects the collective stochastic behavior of this ensemble. In particular, the cavity provides an effective interaction between the systems, and parametrically modifies the transition rates between the metastable states. We predict that the cavity induces a collective phase transition at a critical temperature which depends linearly on the number of systems. It shows up as a spontaneous symmetry breaking where the stationary states of the bistable system bifurcate. We observe that the transition rates slow down independently of the phase transition, but the rate modification vanishes for alternating signs of the system-cavity couplings, corresponding to a disordered ensemble of dipoles. Our results are of particular relevance in polaritonic chemistry where the presence of a cavity has been suggested to affect chemical reactions.	翻訳日:2023-04-27 04:15:34 公開日:2023-04-25
# 形式理論学習システムにおける単純気泡問題 A Simplicity Bubble Problem in Formal-Theoretic Learning Systems ( http://arxiv.org/abs/2112.12275v2 ) ライセンス: Link先を確認	Felipe S. Abrah\~ao, Hector Zenil, Fabio Porto, Michael Winter, Klaus Wehmuth, Itala M. L. D'Ottaviano	(参考訳) 新しいデータを予測するために大規模なデータセットをマイニングする場合、統計機械学習の背後にある原則の限界は、ビッグデータの崩壊だけでなく、データ生成プロセスがアルゴリズムの複雑さの低さに偏っているという従来の仮定にも深刻な課題をもたらす。有限データセット生成器における単純さに対するアルゴリズム的情報バイアスを仮定しても、機械学習に対する現在のアプローチ(ディープラーニングや、トップダウンaiと統計的機械学習のあらゆる形式的ハイブリッドを含む)は、十分大きなデータセットによって、常に、自然に、あるいは人工的に、欺くことができる。特に、全ての学習アルゴリズム(形式理論にアクセスできるか否かに関わらず)に対して、予測不可能な十進法のアルゴリズム確率が、他の大きなデータセットのアルゴリズム確率の上限(学習アルゴリズムにのみ依存する乗算定数まで)であるような十分に大きなデータセットサイズが存在することを実証する。言い換えれば、非常に大きく複雑なデータセットは、学習アルゴリズムを他の特定の非知覚データセットと同様に‘simplicity bubble’’に認識することができる。これらの決定データセットは、学習アルゴリズムによって影響される予測が、学習アルゴリズムによってグローバルなものと見なされるにもかかわらず、低アルゴリズム-複雑度局所最適解に向かって収束しながら、高アルゴリズム-複雑度グローバルな最適解から予測不可能に分岐することを保証する。アルゴリズム情報理論と計算可能性理論の持つ本質的な力に基づく、より強力な機械学習へと、統計的な機械学習から脱却し、この偽りの現象を回避するために、満たすべき枠組みと追加の経験的条件について議論する。 When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that current approaches to machine learning (including deep learning, or any formal-theoretic hybrid mix of top-down AI and statistical machine learning approaches), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every learning algorithm (with or without access to a formal theory), there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets can deceive learning algorithms into a ``simplicity bubble'' as likely as any other particular non-deceiving dataset. These deceiving datasets guarantee that any prediction effected by the learning algorithm will unpredictably diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution, although the latter is deemed a global one by the learning algorithm. We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, and motivated by, the intrinsic power of algorithmic information theory and computability theory.	翻訳日:2023-04-27 04:15:21 公開日:2023-04-25
# データとデバイスの不均一性を考慮した半分散フェデレーションエッジ学習 Semi-Decentralized Federated Edge Learning with Data and Device Heterogeneity ( http://arxiv.org/abs/2112.10313v3 ) ライセンス: Link先を確認	Yuchang Sun and Jiawei Shao and Yuyi Mao and Jessie Hui Wang and Jun Zhang	(参考訳) feel(federated edge learning)は、ネットワークエッジに分散データを効果的に組み込んでディープラーニングモデルをトレーニングするための、プライバシ保護パラダイムとして注目されている。それでも、単一エッジサーバのカバー範囲が限られると、未参加のクライアントノードが不足し、学習性能が損なわれる可能性がある。本稿では,複数のエッジサーバを用いて多数のクライアントノードを協調的に調整する,半分散型フェデレーションエッジ学習(SD-FEEL)の新たなフレームワークについて検討する。効率的なモデル共有のためにエッジサーバ間の低レイテンシ通信を利用することで、SD-FEELは従来のフェデレート学習に比べてはるかにレイテンシの低いトレーニングデータを組み込むことができる。 SD-FEELのトレーニングアルゴリズムについて,ローカルモデル更新,クラスタ内モデルアグリゲーション,クラスタ間モデルアグリゲーションの3つのステップで詳述する。このアルゴリズムの収束は、非独立かつ同一分散(非iid)データで証明され、鍵パラメータがトレーニング効率に与える影響を明らかにし、実用的な設計ガイドラインを提供するのに役立つ。一方、エッジデバイスの不均一性はストラグラー効果を引き起こし、SD-FEELの収束速度を低下させる可能性がある。そこで本研究では,SD-FEELの安定化を意識したアグリゲーションスキームを用いた非同期トレーニングアルゴリズムを提案する。シミュレーションの結果,SD-FEELのための提案アルゴリズムの有効性と効率を実証し,解析結果を裏付ける。 Federated edge learning (FEEL) has attracted much attention as a privacy-preserving paradigm to effectively incorporate the distributed data at the network edge for training deep learning models. Nevertheless, the limited coverage of a single edge server results in an insufficient number of participated client nodes, which may impair the learning performance. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL), where multiple edge servers are employed to collectively coordinate a large number of client nodes. By exploiting the low-latency communication among edge servers for efficient model sharing, SD-FEEL can incorporate more training data, while enjoying much lower latency compared with conventional federated learning. We detail the training algorithm for SD-FEEL with three main steps, including local model update, intra-cluster, and inter-cluster model aggregations. The convergence of this algorithm is proved on non-independent and identically distributed (non-IID) data, which also helps to reveal the effects of key parameters on the training efficiency and provides practical design guidelines. Meanwhile, the heterogeneity of edge devices may cause the straggler effect and deteriorate the convergence speed of SD-FEEL. To resolve this issue, we propose an asynchronous training algorithm with a staleness-aware aggregation scheme for SD-FEEL, of which, the convergence performance is also analyzed. The simulation results demonstrate the effectiveness and efficiency of the proposed algorithms for SD-FEEL and corroborate our analysis.	翻訳日:2023-04-27 04:14:52 公開日:2023-04-25
# HDR-NeRF:高ダイナミックレンジニューラル放射場 HDR-NeRF: High Dynamic Range Neural Radiance Fields ( http://arxiv.org/abs/2111.14451v4 ) ライセンス: Link先を確認	Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, Qing Wang	(参考訳) 我々は、低ダイナミックレンジ(LDR)ビューのセットからHDR放射界を異なる露出で復元するために、HDR-NeRF(High Dynamic Range Neural Radiance Fields)を提案する。 HDR-NeRFを用いて、異なる露出下で、新しいHDRビューと新しいLDRビューの両方を生成することができる。この方法の鍵は物理イメージングの過程をモデル化することであり、シーンポイントの放射能が2つの暗黙的な機能を持つldr画像の画素値(放射能場とトーンマッパー)に変換されることを示す。放射場はシーンラディアンス(値が0から+infty)を符号化し、対応する光の起源と光方向を与えることにより、光の密度と放射を出力する。トーンマッパーは、カメラセンサに照射された光が画素値になるマッピング過程をモデル化する。放射光と対応する露光時間とをトーンマッパーに供給することにより、光の色を予測する。我々は、古典的なボリュームレンダリング技術を用いて出力放射率、色、密度をHDRおよびLDR画像に投影し、入力されたLDR画像のみを監督する。提案手法を評価するために,新しい前方向きHDRデータセットを収集する。合成および実世界のシーンにおける実験結果は, 合成ビューの露光を正確に制御できるだけでなく, ダイナミックレンジの描画も可能であることを確認した。 We present High Dynamic Range Neural Radiance Fields (HDR-NeRF) to recover an HDR radiance field from a set of low dynamic range (LDR) views with different exposures. Using the HDR-NeRF, we are able to generate both novel HDR views and novel LDR views under different exposures. The key to our method is to model the physical imaging process, which dictates that the radiance of a scene point transforms to a pixel value in the LDR image with two implicit functions: a radiance field and a tone mapper. The radiance field encodes the scene radiance (values vary from 0 to +infty), which outputs the density and radiance of a ray by giving corresponding ray origin and ray direction. The tone mapper models the mapping process that a ray hitting on the camera sensor becomes a pixel value. The color of the ray is predicted by feeding the radiance and the corresponding exposure time into the tone mapper. We use the classic volume rendering technique to project the output radiance, colors, and densities into HDR and LDR images, while only the input LDR images are used as the supervision. We collect a new forward-facing HDR dataset to evaluate the proposed method. Experimental results on synthetic and real-world scenes validate that our method can not only accurately control the exposures of synthesized views but also render views with a high dynamic range.	翻訳日:2023-04-27 04:14:26 公開日:2023-04-25
# HUMAP:階層的一様多様体近似と投影 HUMAP: Hierarchical Uniform Manifold Approximation and Projection ( http://arxiv.org/abs/2106.07718v3 ) ライセンス: Link先を確認	Wilson E. Marc\'ilio-Jr and Danilo M. Eler and Fernando V. Paulovich and Rafael M. Martins	(参考訳) 次元減少(DR)技術は、高次元空間におけるパターンの理解を支援する。これらの手法は、しばしば散乱プロットによって表現され、様々な科学領域で採用され、クラスターとデータサンプル間の類似性分析を容易にする。多くの粒度を含むデータセットや、分析が情報視覚化マントラに従う場合、階層的なdrテクニックは、前もって主要な構造と需要の詳細を示すので、最も適したアプローチである。しかし、現在の階層型DR技術は、階層レベルのプロジェクションメンタルマップを保存せず、ほとんどのデータタイプに適さないため、文学的な問題に完全に対処することができない。 HUMAPは、局所的・大域的構造と階層的探索を通してのメンタルマップの保存に柔軟に設計された、新しい階層的次元削減技術である。本手法の優位性を示す実証的な証拠を,現在の階層的アプローチと比較し,その強みを示す2つのケーススタディを示す。 Dimensionality reduction (DR) techniques help analysts understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. However, current hierarchical DR techniques are not fully capable of addressing literature problems because they do not preserve the projection mental map across hierarchical levels or are not suitable for most data types. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible in preserving local and global structures and the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show two case studies to demonstrate its strengths.	翻訳日:2023-04-27 04:13:40 公開日:2023-04-25
# ネットワークにおける階層的コミュニティ構造 Hierarchical community structure in networks ( http://arxiv.org/abs/2009.07196v2 ) ライセンス: Link先を確認	Michael T. Schaub, Jiaze Li and Leto Peel	(参考訳) モジュラーおよび階層的なコミュニティ構造は、現実世界の複雑なシステムに広く浸透している。これらの構造を検知し研究するために、多くの努力が費やされた。モジュラーの検出における重要な理論的進歩は、確率的生成モデルを用いてコミュニティ構造を形式的に定義することで検出可能性の基本的な限界を特定することである。階層型コミュニティ構造の検出は、コミュニティ検出から受け継いだものと並行して、さらなる課題をもたらす。本稿では,ネットワークにおける階層的コミュニティ構造に関する理論的研究について述べる。我々は以下の疑問に答える。 1)コミュニティの階層をどのように定義すべきか。 2)ネットワークに階層構造の十分な証拠があるかどうかをどうやって判断するか。そして 3)階層構造を効率的に検出する方法確率的外的同値分割の概念と確率的ブロックモデルのような確率的モデルとの関係に基づいて階層構造の定義を導入することにより,これらの疑問にアプローチする。階層構造の検出に関わる課題を列挙し,階層構造のスペクトル特性を調べることにより,効率的かつ原理的に検出する手法を提案する。 Modular and hierarchical community structures are pervasive in real-world complex systems. A great deal of effort has gone into trying to detect and study these structures. Important theoretical advances in the detection of modular have included identifying fundamental limits of detectability by formally defining community structure using probabilistic generative models. Detecting hierarchical community structure introduces additional challenges alongside those inherited from community detection. Here we present a theoretical study on hierarchical community structure in networks, which has thus far not received the same rigorous attention. We address the following questions: 1) How should we define a hierarchy of communities? 2) How do we determine if there is sufficient evidence of a hierarchical structure in a network? and 3) How can we detect hierarchical structure efficiently? We approach these questions by introducing a definition of hierarchy based on the concept of stochastic externally equitable partitions and their relation to probabilistic models, such as the popular stochastic block model. We enumerate the challenges involved in detecting hierarchies and, by studying the spectral properties of hierarchical structure, present an efficient and principled method for detecting them.	翻訳日:2023-04-27 04:12:38 公開日:2023-04-25
# 浮動小数点算術における音のランダム化 Sound Randomized Smoothing in Floating-Point Arithmetics ( http://arxiv.org/abs/2207.07209v2 ) ライセンス: Link先を確認	V\'aclav Vor\'a\v{c}ek and Matthias Hein	(参考訳) ランダム化平滑化は無限の精度で音を出す。しかし,無作為な平滑化は浮動小数点精度の限界に対してもはや健全ではないことを示す。 CIFAR10 の偽証明を提供するために、逆例が 0.8$ の距離にあるにもかかわらず、ランダム化された平滑化が 1 点あたり 1.26$ の半径を示す単純な例を示す。ランダム化平滑化の暗黙の仮定について議論し、平滑化バージョンが一般的に認証されている汎用画像分類モデルには適用されないことを示した。そこで本研究では,浮動小数点精度を本質的に同等の速度で使用する場合のランダム化平滑化のための音響的手法を提案する。唯一の前提は、公正なコインにアクセスできるということです。 Randomized smoothing is sound when using infinite precision. However, we show that randomized smoothing is no longer sound for limited floating-point precision. We present a simple example where randomized smoothing certifies a radius of $1.26$ around a point, even though there is an adversarial example in the distance $0.8$ and extend this example further to provide false certificates for CIFAR10. We discuss the implicit assumptions of randomized smoothing and show that they do not apply to generic image classification models whose smoothed versions are commonly certified. In order to overcome this problem, we propose a sound approach to randomized smoothing when using floating-point precision with essentially equal speed and matching the certificates of the standard, unsound practice for standard classifiers tested so far. Our only assumption is that we have access to a fair coin.	翻訳日:2023-04-27 04:06:44 公開日:2023-04-25
# 分散分配回帰のための非線形十分次元削減 Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression ( http://arxiv.org/abs/2207.04613v2 ) ライセンス: Link先を確認	Qi Zhang, Bing Li, and Lingzhou Xue	(参考訳) 距離空間の構成員としてモデル化された予測値と応答値の両方が分布データである場合の非線形十分次元減少に対する新しいアプローチを提案する。我々の重要なステップは、距離空間上に普遍的なカーネル(cc-ユニバーサル)を構築することであり、その結果、十分な次元の減少を決定する条件独立性を特徴付けるのに十分リッチな予測器と応答のためのカーネルヒルベルト空間を再現する。一変量分布ではワッサーシュタイン距離を用いて普遍核を構築するが、多変量分布ではスライスされたワッサーシュタイン距離を利用する。スライスされたワッサーシュタイン距離は、計量空間がワッサーシュタイン空間に類似した位相的性質を持つことを保証するとともに、重要な計算上の利点を提供する。合成データに基づく数値計算の結果,本手法は競合する手法よりも優れていた。この方法は、出生率、死亡率データ、カルガリー温度データを含むいくつかのデータセットにも適用される。 We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we construct the universal kernel using the Wasserstein distance, while for multivariate distributions, we resort to the sliced Wasserstein distance. The sliced Wasserstein distance ensures that the metric space possesses similar topological properties to the Wasserstein space while also offering significant computation benefits. Numerical results based on synthetic data show that our method outperforms possible competing methods. The method is also applied to several data sets, including fertility and mortality data and Calgary temperature data.	翻訳日:2023-04-27 04:06:29 公開日:2023-04-25
# BRExIt: エキスパートイテレーションにおける応答モデリングについて BRExIt: On Opponent Modelling in Expert Iteration ( http://arxiv.org/abs/2206.00113v2 ) ライセンス: Link先を確認	Daniel Hernandez, Hendrik Baier, Michael Kaisers	(参考訳) 現代の人口ベースのトレーニングアプローチでは、強化学習アルゴリズムを最善の応答神託として採用し、候補者の対戦相手(主に以前に学習した政策)に対する遊びを改善する。本稿では,最先端学習アルゴリズムエキスパートイテレーション(exit)に敵モデルを組み込むことにより,ゲームにおける学習を加速するベストレスポンスエキスパートイテレーション(brexit)を提案する。ブレグジットの目的は、(1)対向政策を補助課題として予測する政策責任者、(2)与または学習した対向モデルに向かって計画中のバイアス相手を移動させ、最適な反応を近似する見習い対象を生成することである。 BRExItのアルゴリズム的変種と固定テストエージェントの集合に対する実証的アブレーションでは、BRExItがExItよりも優れたポリシーを学習しているという統計的証拠を提供する。 Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt's algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt.	翻訳日:2023-04-27 04:04:59 公開日:2023-04-25
# 可変密度雑音によるサブサンプリングを用いた自己教師型MR画像再構成のための理論的枠組み A theoretical framework for self-supervised MR image reconstruction using sub-sampling via variable density Noisier2Noise ( http://arxiv.org/abs/2205.10278v4 ) ライセンス: Link先を確認	Charles Millard, Mark Chiew	(参考訳) 近年,サブサンプルMRI(Magnetic Resonance Imaging)データの再構成にニューラルネットワークの統計的モデリング機能を活用することに注目が集まっている。提案手法は, 代表的な完全サンプルデータセットの存在を前提として, 完全教師付きトレーニングを用いる。しかし、多くのアプリケーションでは、完全なサンプルトレーニングデータは利用できず、取得には非常に実用的でない可能性がある。したがって、訓練にサブサンプリングデータのみを使用する自己教師あり手法の開発と理解が極めて望ましい。この研究は、当初自己教師付き認知タスクのために構築されたNoisier2Noiseフレームワークを、可変密度サブサンプルMRIデータに拡張した。提案手法であるdata undersampling (ssdu) による自己教師付き学習の性能を解析的に説明するために,noisier2noiseフレームワークを用いた。さらに、理論的発展の結果として生じるSSDUの2つの修正を提案する。まず、サンプリングセットを分割して、サブセットが元のサンプリングマスクと同じタイプの分布を持つようにすることを提案する。次に, サンプル密度と分割密度を補償する損失重み付けを提案する。 fastMRIデータセットでは,これらの変化によりSSDUの画像復元精度が向上し,パーティショニングパラメータの堅牢性が向上した。 In recent years, there has been attention on leveraging the statistical modeling capabilities of neural networks for reconstructing sub-sampled Magnetic Resonance Imaging (MRI) data. Most proposed methods assume the existence of a representative fully-sampled dataset and use fully-supervised training. However, for many applications, fully sampled training data is not available, and may be highly impractical to acquire. The development and understanding of self-supervised methods, which use only sub-sampled data for training, are therefore highly desirable. This work extends the Noisier2Noise framework, which was originally constructed for self-supervised denoising tasks, to variable density sub-sampled MRI data. We use the Noisier2Noise framework to analytically explain the performance of Self-Supervised Learning via Data Undersampling (SSDU), a recently proposed method that performs well in practice but until now lacked theoretical justification. Further, we propose two modifications of SSDU that arise as a consequence of the theoretical developments. Firstly, we propose partitioning the sampling set so that the subsets have the same type of distribution as the original sampling mask. Secondly, we propose a loss weighting that compensates for the sampling and partitioning densities. On the fastMRI dataset we show that these changes significantly improve SSDU's image restoration quality and robustness to the partitioning parameters.	翻訳日:2023-04-27 04:04:39 公開日:2023-04-25
# 隠れた量子メモリ:誰かが見た時にメモリは存在するか? Hidden Quantum Memory: Is Memory There When Somebody Looks? ( http://arxiv.org/abs/2204.08298v4 ) ライセンス: Link先を確認	Philip Taranto and Thomas J. Elliott and Simon Milz	(参考訳) 古典物理学では、メモリレス力学とマルコフ統計は同じである。これは量子力学には当てはまらない、なぜなら量子測定は侵入的だからである。ここでは、測定の侵襲性を超えて、古典的および量子的プロセス、すなわち隠れた量子メモリの可能性を区別する。 While Markovian statistics of classical processes can always be reproduced by a memoryless dynamical model, our main result shows that this is not true in quantum mechanics: We first provide an example of quantum non-Markovianity whose manifestation depends on whether or not a previous measurement is performed -- an impossible phenomenon for memoryless dynamics; we then strengthen this result by demonstrating statistics that are Markovian independent of how they are probed, but are nonetheless still incompatible with memoryless quantum dynamics. そこで我々は,その生成にメモリを必要とする量子過程を探究し,マルコフ統計の存在を立証する。 In classical physics, memoryless dynamics and Markovian statistics are one and the same. This is not true for quantum dynamics, first and foremost because quantum measurements are invasive. Going beyond measurement invasiveness, here we derive a novel distinction between classical and quantum processes, namely the possibility of hidden quantum memory. While Markovian statistics of classical processes can always be reproduced by a memoryless dynamical model, our main result shows that this is not true in quantum mechanics: We first provide an example of quantum non-Markovianity whose manifestation depends on whether or not a previous measurement is performed -- an impossible phenomenon for memoryless dynamics; we then strengthen this result by demonstrating statistics that are Markovian independent of how they are probed, but are nonetheless still incompatible with memoryless quantum dynamics. Thus, we establish the existence of Markovian statistics gathered by probing a quantum process that nevertheless fundamentally require memory for their creation.	翻訳日:2023-04-27 04:04:17 公開日:2023-04-25
# スピン回路量子力学を用いたJaynes-Cummings Ladderの提案 Probing the Jaynes-Cummings Ladder with Spin Circuit Quantum Electrodynamics ( http://arxiv.org/abs/2203.05668v2 ) ライセンス: Link先を確認	Tobias Bonsen (1), Patrick Harvey-Collard (1), Maximilian Russ (1), Jurgen Dijkema (1), Amir Sammak (2), Giordano Scappucci, Lieven M. K. Vandersypen (1) ((1) QuTech and Kavli Institute of Nanoscience, Delft University of Technology, (2) QuTech and Netherlands Organization for Applied Scientific Research (TNO))	(参考訳) 電子スピンを用いた回路量子力学(スピン回路QED)のJaynes-Cummingsはしごにおける励起状態間の遷移を報告する。本稿では,最近の実験研究における説明できない特徴がこのような遷移に対応することを示し,これらの効果を含む入力出力フレームワークを提案する。新しい実験では、まず以前の観測を再現し、プローブパワーを増大させ、2トーン分光を用いて励起状態遷移と多光子遷移の両方を明らかにする。このJaynes-Cummingsのはしごを探査する能力は、カップリング対デコヒーレンス比の改善によって実現され、量子現象を研究するための興味深いプラットフォームとしてスピン回路QEDの成熟度が増加することを示す。 We report observations of transitions between excited states in the Jaynes-Cummings ladder of circuit quantum electrodynamics with electron spins (spin circuit QED). We show that unexplained features in recent experimental work correspond to such transitions and present an input-output framework that includes these effects. In new experiments, we first reproduce previous observations and then reveal both excited-state transitions and multiphoton transitions by increasing the probe power and using two-tone spectroscopy. This ability to probe the Jaynes-Cummings ladder is enabled by improvements in the coupling-to-decoherence ratio, and shows an increase in the maturity of spin circuit QED as an interesting platform for studying quantum phenomena.	翻訳日:2023-04-27 04:03:39 公開日:2023-04-25
# 回転操作のトリガと制御のための深層強化学習による小型空中ロボットの逆着陸 Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers ( http://arxiv.org/abs/2209.11043v2 ) ライセンス: Link先を確認	Bryan Habas, Jack W. Langelaan, Bo Cheng	(参考訳) 高速で堅牢な逆着陸は、特に船上でのセンシングと計算に完全に依存しながら、空中ロボットにとって難しい偉業である。それにもかかわらず、この偉業はコウモリ、ハエ、ミツバチなどの生物学的チラシによって定期的に行われる。これまでの研究では、一連の視覚手がかりと運動行動との直接的な因果関係を特定し、この挑戦的なエアロバティックな操作を小型の空中ロボットで信頼できる実行を可能にした。本研究では、まずDeep Reinforcement Learningと物理シミュレーションを用いて、任意のアプローチ条件から始まる頑健な逆着陸のための一般的な最適制御ポリシーを得る。この最適化された制御ポリシーは、システムの観測空間から回転操作のトリガーと制御を含む運動指令行動空間への計算効率のよいマッピングを提供する。これは、大きさや方向によって異なる幅広い接近飛行速度でシステムを訓練することで達成された。次に,シミュレーションにおけるロボットの慣性パラメータを変化させ,ドメインランダム化による学習方針のsim-to-real転送と実験的検証を行った。実験により, 着地堅牢性を大幅に向上させるいくつかの要因と, 逆着陸成功を決定づける主要なメカニズムを同定した。本研究で開発された学習フレームワークは, 騒音センサデータの利用, 様々な方向の面への着地, 動的に動く面への着地など, より困難な課題を解決するために一般化されることを期待している。 Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction. Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.	翻訳日:2023-04-27 03:56:18 公開日:2023-04-25
# 最適化によるビット割り当て Bit Allocation using Optimization ( http://arxiv.org/abs/2209.09422v4 ) ライセンス: Link先を確認	Tongda Xu, Han Gao, Chenjian Gao, Yuanyuan Wang, Dailan He, Jinyong Pi, Jixiang Luo, Ziyu Zhu, Mao Ye, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang	(参考訳) 本稿では,ニューラルビデオ圧縮(NVC)におけるビット割り当ての問題について考察する。まず,NVCにおけるビット割り当てと半補正変分推論(SAVI)の基本的な関係を明らかにする。具体的には、GoP(Group-of-Picture)レベルの確率を持つSAVIは、正確なレート \および品質依存モデルを持つピクセルレベルのビット割り当てと等価であることを示す。この等価性に基づいて、SAVIを用いたビット割り当ての新しいパラダイムを確立する。従来のビット割当法とは異なり、この手法は経験モデルを必要としないため最適である。さらに, 勾配上昇を用いたオリジナルのSAVIは, 単一レベル潜水剤にのみ適用するため, 勾配上昇による逆伝播を再帰的に適用することにより, SAVIをNVCなどのマルチレベルに拡張する。最後に,実用的な実装のためのトラクタブル近似を提案する。提案手法は,ビット割り当てのR-D性能の実証的バウンダリとして機能し,性能超過が速度を符号化するシナリオに適用できる。実験結果から、現在の最先端ビット割り当てアルゴリズムは、我々のものと比較して改善するために、$\approx 0.5$ dB PSNRの空間を持つことがわかった。コードは \url{https://github.com/tongdaxu/bit-allocation-using-optimization} で利用可能である。 In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). First, we reveal a fundamental relationship between bit allocation in NVC and Semi-Amortized Variational Inference (SAVI). Specifically, we show that SAVI with GoP (Group-of-Picture)-level likelihood is equivalent to pixel-level bit allocation with precise rate \& quality dependency model. Based on this equivalence, we establish a new paradigm of bit allocation using SAVI. Different from previous bit allocation methods, our approach requires no empirical model and is thus optimal. Moreover, as the original SAVI using gradient ascent only applies to single-level latent, we extend the SAVI to multi-level such as NVC by recursively applying back-propagating through gradient ascent. Finally, we propose a tractable approximation for practical implementation. Our method can be applied to scenarios where performance outweights encoding speed, and serves as an empirical bound on the R-D performance of bit allocation. Experimental results show that current state-of-the-art bit allocation algorithms still have a room of $\approx 0.5$ dB PSNR to improve compared with ours. Code is available at \url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}.	翻訳日:2023-04-27 03:55:52 公開日:2023-04-25
# 回折データの深層ニューラルネットワークによる弱信号抽出 Weak-signal extraction enabled by deep-neural-network denoising of diffraction data ( http://arxiv.org/abs/2209.09247v2 ) ライセンス: Link先を確認	Jens Oppliger, Michael M. Denner, Julia K\"uspert, Ruggero Frison, Qisi Wang, Alexander Morawietz, Oleh Ivashko, Ann-Christin Dippel, Martin von Zimmermann, Izabela Bia{\l}o, Leonardo Martinelli, Beno\^it Fauqu\'e, Jaewon Choi, Mirian Garcia-Fernandez, Kejin Zhou, Niels B. Christensen, Tohru Kurosawa, Naoki Momono, Migaku Oda, Fabian D. Natterer, Mark H. Fischer, Titus Neupert, Johan Chang	(参考訳) ノイズの除去やキャンセルは、画像や音響に広く応用されている。日常の応用では、デノナイジングには、根本的真実に反する生成的側面を含むこともある。しかし、科学的応用については、真理を正確に再現する必要がある。本稿では,弱い信号が定量的な精度で現れるように,深い畳み込みニューラルネットワークを用いてデータを分節化する方法を示す。特に結晶材料のX線回折について検討する。本研究では,ノイズデータにおける電荷秩序に起因する弱信号の可視性と正確性を示す。この成功は、測定された低ノイズデータと高ノイズデータのペアによるディープニューラルネットワークの教師付きトレーニングによって実現される。このようにして、ニューラルネットワークはノイズの統計的特性について学習する。人工雑音は, 定量的に正確な結果が得られないことを示す。提案手法は,難解な取得問題に適用可能なノイズフィルタリングの実践的戦略を示すものである。 Removal or cancellation of noise has wide-spread applications for imaging and acoustics. In every-day-life applications, denoising may even include generative aspects which are unfaithful to the ground truth. For scientific applications, however, denoising must reproduce the ground truth accurately. Here, we show how data can be denoised via a deep convolutional neural network such that weak signals appear with quantitative accuracy. In particular, we study X-ray diffraction on crystalline materials. We demonstrate that weak signals stemming from charge ordering, insignificant in the noisy data, become visible and accurate in the denoised data. This success is enabled by supervised training of a deep neural network with pairs of measured low- and high-noise data. This way, the neural network learns about the statistical properties of the noise. We demonstrate that using artificial noise does not yield such quantitatively accurate results. Our approach thus illustrates a practical strategy for noise filtering that can be applied to challenging acquisition problems.	翻訳日:2023-04-27 03:55:33 公開日:2023-04-25
# 半間接離散対数問題に対する部分指数量子アルゴリズム A Subexponential Quantum Algorithm for the Semidirect Discrete Logarithm Problem ( http://arxiv.org/abs/2209.02814v4 ) ライセンス: Link先を確認	Christopher Battarbee, Delaram Kahrobaei, Ludovic Perret, and Siamak F. Shahandashti	(参考訳) グループベースの暗号は、量子後暗号における比較的未発見の家系であり、いわゆるセミダイレクト離散対数問題(Semidirect Discrete Logarithm Problem, SDLP)は最も中心的な問題の一つである。しかし、SDLPの複雑さと、特に量子敵に対するセキュリティに関して、よりよく知られた硬さ問題との関係はよく理解されておらず、この分野の研究者にとって重要なオープンな問題であった。本稿では,sdlpのセキュリティ解析を初めて実施する。特に、SDLPとグループアクションの間には、量子部分指数アルゴリズムを適用することが知られているコンテキストがある。したがって、SDLPを解くための部分指数量子アルゴリズムを構築することができ、SDLPの複雑さと既知の計算問題との関係を分類することができる。 Group-based cryptography is a relatively unexplored family in post-quantum cryptography, and the so-called Semidirect Discrete Logarithm Problem (SDLP) is one of its most central problems. However, the complexity of SDLP and its relationship to more well-known hardness problems, particularly with respect to its security against quantum adversaries, has not been well understood and was a significant open problem for researchers in this area. In this paper we give the first dedicated security analysis of SDLP. In particular, we provide a connection between SDLP and group actions, a context in which quantum subexponential algorithms are known to apply. We are therefore able to construct a subexponential quantum algorithm for solving SDLP, thereby classifying the complexity of SDLP and its relation to known computational problems.	翻訳日:2023-04-27 03:54:43 公開日:2023-04-25
# ロバスト音響誘導画像マニピュレーション Robust Sound-Guided Image Manipulation ( http://arxiv.org/abs/2208.14114v3 ) ライセンス: Link先を確認	Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim	(参考訳) 最近の成功は、例えば、晴れた日に風景シーンが、テキスト入力「レイニング」によって駆動される雨の日に同じシーンに操作されるように、テキストプロンプトで画像を操作できることを示唆している。これらのアプローチはしばしば、マルチモーダル(テキストとイメージ)埋め込み空間を利用するStyleCLIPベースのイメージジェネレータを利用する。しかし,このようなテキスト入力は,降雨時の豪雨と雷雨の区別など,リッチなセマンティック・キューの提供と合成においてしばしばボトルネックとなる。この問題に対処するために、テキストよりも多様な意味的手がかり(生き生きとした感情や自然界のダイナミックな表現)を伝達できるため、画像操作において顕著な優位性を持つ追加のモダリティ、音の活用を提唱する。本稿では,まず画像とテキストの組込み空間を音で拡張し,例えば雨音など,音声入力に基づいて画像を操作するための直接潜在最適化手法を提案する。当社の音響誘導画像操作手法は,最先端のテキストや音声誘導画像操作手法よりも,意味的かつ視覚的に正確な操作結果が得られることを示す。ダウンストリームタスク評価では,学習した画像-テキスト-音声統合埋め込み空間が音響入力を効果的に符号化することを示す。 Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in providing and synthesizing rich semantic cues, e.g., differentiating heavy rain from rain with thunderstorms. To address this issue, we advocate leveraging an additional modality, sound, which has notable advantages in image manipulation as it can convey more diverse semantic cues (vivid emotions or dynamic expressions of the natural world) than texts. In this paper, we propose a novel approach that first extends the image-text joint embedding space with sound and applies a direct latent optimization method to manipulate a given image based on audio input, e.g., the sound of rain. Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations. Our downstream task evaluations also show that our learned image-text-sound joint embedding space effectively encodes sound inputs.	翻訳日:2023-04-27 03:54:30 公開日:2023-04-25
# 抽象的な会議要約:調査 Abstractive Meeting Summarization: A Survey ( http://arxiv.org/abs/2208.04163v2 ) ライセンス: Link先を確認	Virgile Rennard, Guokan Shang, Julie Hunter, Michalis Vazirgiannis	(参考訳) 会話の最も重要なポイントを確実に特定し、まとめることができるシステムは、ビジネス会議から医療相談、カスタマーサービスコールに至るまで、さまざまな現実世界のコンテキストにおいて有用である。ディープラーニングの最近の進歩、特にエンコーダ-デコーダアーキテクチャの発明は、言語生成システムを大幅に改善し、多人数会話に特に適した要約の形式である抽象的要約(abstractive summarization)の改善への扉を開く。本稿では,要約を抽象化する作業によって引き起こされた課題の概要と,この問題に対処するためのデータセット,モデル,評価指標について概説する。 A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved language generation systems, opening the door to improved forms of abstractive summarization, a form of summarization particularly well-suited for multi-party conversation. In this paper, we provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models and evaluation metrics that have been used to tackle the problems.	翻訳日:2023-04-27 03:54:03 公開日:2023-04-25
# 自由空間における非平衡超放射相転移 A non-equilibrium superradiant phase transition in free space ( http://arxiv.org/abs/2207.10361v2 ) ライセンス: Link先を確認	Giovanni Ferioli, Antoine Glicenstein, Igor Ferrier-Barbut, and Antoine Browaeys	(参考訳) 散逸、外部駆動、相互作用が競合し、駆動なしでは存在しない非平衡相を生じさせる系のクラスが存在する。ここでは、位相遷移は対称性を破ることなく起こりうるが、局所的な順序パラメータでは、平衡における位相遷移のランダウ理論とは対照的である。最も単純な散逸量子系の一つは、光場によって駆動される原子遷移立方体の波長よりも小さい体積で囲まれた2レベル原子である。原子の駆動場への集団結合と協調的崩壊の競合は、全ての原子双極子が位相ロックされている相と超ラジカル自発的放出によって制御される相の間の遷移に繋がるべきである。ここでは,自由空間におけるレーザー冷却原子の鉛筆型雲を主軸に沿って光学的に励起し,予測した位相を観察することにより,このモデルを実現する。我々の実証は、自由空間超放射光レーザーの取得や、新しいタイプの時間結晶の観測の観点から有望である。 A class of systems exists in which dissipation, external drive and interactions compete and give rise to non equilibrium phases that would not exist without the drive. There, phase transitions could occur without the breaking of any symmetry, yet with a local order parameter, in contrast with the Landau theory of phase transitions at equilibrium. One of the simplest driven dissipative quantum systems consists of two-level atoms enclosed in a volume smaller than the wavelength of the atomic transition cubed, driven by a light field. The competition between collective coupling of the atoms to the driving field and their cooperative decay should lead to a transition between a phase where all the atomic dipoles are phaselocked and a phase governed by superradiant spontaneous emission. Here, we realize this model using a pencil-shaped cloud of laser cooled atoms in free space, optically excited along its main axis, and observe the predicted phases. Our demonstration is promising in view of obtaining free-space superradiant lasers or to observe new types of time crystals.	翻訳日:2023-04-27 03:53:53 公開日:2023-04-25
# Rokhsar-Kivelson-sign波動関数の絡み合い複雑性 Entanglement complexity of the Rokhsar-Kivelson-sign wavefunctions ( http://arxiv.org/abs/2211.01428v3 ) ライセンス: Link先を確認	Stefano Piemontese, Tommaso Roscilde, Alioscia Hamma	(参考訳) 本稿では,1つのパラメータによって絡み合いの度合いが制御される,模範状態であるロクサー・キベルソン符号波動関数(Rokhsar-Kivelson-sign wavefunctions)の絡み合い複雑性の遷移について検討する。この状態群は、エントロピーの体積則スケーリングを示す相と、エンタングルメントのサブ拡張スケーリングを持つ相の間の遷移を特徴とすることが知られており、乱れた量子ハミルトンの多体局所化遷移を想起させる[physical review b 92, 214204 (2015)]。我々は、量子情報理論のいくつかのツールを用いて、ロクサー・キヴェルソン符号波動関数の特異点とその絡み込み複雑性を、量子情報理論のいくつかのツールを用いて研究する: 忠実度計量、絡み合いスペクトル統計、絡み合いエントロピーゆらぎ、安定化器R\enyiエントロピー、および非絡みアルゴリズムの性能。体積則フェーズ全体を通して、状態は普遍的絡み合いスペクトル統計量を持つ。しかし、全てのメトリクスがパラメータ自身から独立になる制御パラメータの小さな値に「超ユニバーサル」の規則が現れる; 絡み合いエントロピーと安定化器 R\'enyi エントロピーは理論的な最大値に近づく; 絡み合いのゆらぎはランダムな普遍回路の出力状態のようにゼロにスケールし、解離アルゴリズムは本質的にゼロ効率を持つ。これら全ての指標は、一貫して複雑な絡み合いのパターンを示す。一方、サブボリューム法相では、絡み合いスペクトル統計はもはや普遍的ではなく、絡み合いの変動はより大きく、非ユニバーサルスケーリングを示し、非絡み合いアルゴリズムの効率は有限となる。モデル波動関数に基づき, エンタングルメントスケーリング特性とエンタングルメント複雑性特性の類似の組み合わせが, 高エネルギーハミルトニアンの固有状態に見られることが示唆された。 In this paper we study the transitions of entanglement complexity in an exemplary family of states - the Rokhsar-Kivelson-sign wavefunctions - whose degree of entanglement is controlled by a single parameter. This family of states is known to feature a transition between a phase exhibiting volume-law scaling of entanglement entropy and a phase with sub-extensive scaling of entanglement, reminiscent of the many-body-localization transition of disordered quantum Hamiltonians [Physical Review B 92, 214204 (2015)]. We study the singularities of the Rokhsar-Kivelson-sign wavefunctions and their entanglement complexity across the transition using several tools from quantum information theory: fidelity metric; entanglement spectrum statistics; entanglement entropy fluctuations; stabilizer R\'enyi Entropy; and the performance of a disentangling algorithm. Across the whole volume-law phase the states feature universal entanglement spectrum statistics. Yet a "super-universal" regime appears for small values of the control parameter in which all metrics become independent of the parameter itself; the entanglement entropy as well as the stabilizer R\'enyi entropy appear to approach their theoretical maximum; the entanglement fluctuations scale to zero as in output states of random universal circuits, and the disentangling algorithm has essentially null efficiency. All these indicators consistently reveal a complex pattern of entanglement. In the sub-volume-law phase, on the other hand, the entanglement spectrum statistics is no longer universal, entanglement fluctuations are larger and exhibiting a non-universal scaling; and the efficiency of the disentangling algorithm becomes finite. Our results, based on model wavefunctions, suggest that a similar combination of entanglement scaling properties and of entanglement complexity features may be found in high-energy Hamiltonian eigenstates.	翻訳日:2023-04-27 03:47:22 公開日:2023-04-25
# 半金属および半伝導性グラフェン-hBN多層膜 Semimetallic and semiconducting graphene-hBN multilayers with parallel or reverse stacking ( http://arxiv.org/abs/2210.16393v2 ) ライセンス: Link先を確認	Xi Chen, Klaus Zollner, Christian Moulsdale, Vladimir I. Fal'ko, Angelika Knothe	(参考訳) 異なる対称性を有する交互グラフェンおよびhbn層の3次元層状結晶を理論的に検討した。グラフェン層間のホッピングパラメータによって、これらの合成3D材料は、半金属、ギャップ、またはワイル半金属相を特徴付けることができる。その結果, 個々の2次元材料から積み重ねられた3次元結晶は, 構成成分とは異なる創発性を有する合成材料クラスであることがわかった。 We theoretically investigate 3D layered crystals of alternating graphene and hBN layers with different symmetries. Depending on the hopping parameters between the graphene layers, we find that these synthetic 3D materials can feature semimetallic, gapped, or Weyl semimetal phases. Our results demonstrate that 3D crystals stacked from individual 2D materials represent a synthetic materials class with emergent properties different from their constituents.	翻訳日:2023-04-27 03:46:41 公開日:2023-04-25
# Masked Autoencodersはアート学習者。 Masked Autoencoders Are Articulatory Learners ( http://arxiv.org/abs/2210.15195v2 ) ライセンス: Link先を確認	Ahmed Adel Attia, Carol Espy-Wilson	(参考訳) 調音録音は声道に沿った異なる調音器の位置と動きを追跡し、音声生成の研究や調音ベースの音声合成装置や音声インバージョンシステムといった音声技術の開発に広く用いられている。ウィスコンシン大学x線マイクロビーム(xrmb)データセットは、音声録音と同期した調音記録を提供する様々なデータセットの1つである。 xrmbの調音録音では、マイクロビームで追跡できる多数の調音器にペレットが配置されている。しかし、録音のかなりの部分は誤トラックされており、これまでは使用不可能であった。本研究では,マスキングオートエンコーダを用いて,xrmbデータセットの話者47名中41名を対象に,誤追跡された調音録音を正確に再構成する深層学習手法を提案する。従来使用できなかった3.4時間のうち3.28時間程度を収集し,8つの調音器のうち3つが誤追跡された場合でも,実感に合致した調音軌跡を再現することができる。 Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems. The University of Wisconsin X-Ray microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings. The XRMB articulatory recordings employ pellets placed on a number of articulators which can be tracked by the microbeam. However, a significant portion of the articulatory recordings are mistracked, and have been so far unsuable. In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. Our model is able to reconstruct articulatory trajectories that closely match ground truth, even when three out of eight articulators are mistracked, and retrieve 3.28 out of 3.4 hours of previously unusable recordings.	翻訳日:2023-04-27 03:46:32 公開日:2023-04-25
# 知識強化関係抽出データセット Knowledge-Enhanced Relation Extraction Dataset ( http://arxiv.org/abs/2210.11231v3 ) ライセンス: Link先を確認	Yucong Lin, Hongming Xiao, Jiani Liu, Zichao Lin, Keming Lu, Feifei Wang, Wei Wei	(参考訳) 近年,補助知識グラフを利用した知識強化手法が,従来のテキストベースアプローチを超越した関係抽出に現れている。しかし、我々の知る限り、現在、知識強化関係抽出のための証拠文と知識グラフの両方を含む公開データセットは存在しない。このギャップに対処するために、知識強化関係抽出データセット(KERED)を導入する。 KEREDは各文に関係事実を付加し、エンティティリンクを通じてエンティティの知識コンテキストを提供する。得られたデータセットを用いて,2つのタスク設定(文レベルとバッグレベル)で,同時代の関係抽出手法を比較した。実験の結果,keredが提供する知識グラフは,知識エンハンスド関係抽出法をサポートできることがわかった。我々は,kered が知識グラフを用いた良質な関係抽出データセットを提供し,知識拡張関係抽出手法の性能評価を行っていると考えている。データセットは以下の通りである。 \url{https://figshare.com/projects/KERED/134459} Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have emerged in relation extraction, surpassing traditional text-based approaches. However, to our best knowledge, there is currently no public dataset available that encompasses both evidence sentences and knowledge graphs for knowledge-enhanced relation extraction. To address this gap, we introduce the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED annotates each sentence with a relational fact, and it provides knowledge context for entities through entity linking. Using our curated dataset, We compared contemporary relation extraction methods under two prevalent task settings: sentence-level and bag-level. The experimental result shows the knowledge graphs provided by KERED can support knowledge-enhanced relation extraction methods. We believe that KERED offers high-quality relation extraction datasets with corresponding knowledge graphs for evaluating the performance of knowledge-enhanced relation extraction methods. Our dataset is available at: \url{https://figshare.com/projects/KERED/134459}	翻訳日:2023-04-27 03:46:16 公開日:2023-04-25
# 調和振動子の非ガウス状態に対するダイナミクスに基づく絡み合い証人 Dynamics-based entanglement witnesses for non-Gaussian states of harmonic oscillators ( http://arxiv.org/abs/2210.10357v3 ) ライセンス: Link先を確認	Pooja Jayachandran, Lin Htoo Zaw, Valerio Scarani	(参考訳) 連続変数系の絡み合い証人の族について紹介する。これはテスト時の結合調和発振器の動力学が結合調和振動子であるという唯一の仮定に依存する。絡み合いは、通常のモードの1つにおけるtsirelson nonclassicality testから推測され、他のモードの状態について何も知らない。各ラウンドにおいて、プロトコルは1つの座標(例えば位置)の符号のみを数回にわたって測定する必要がある。この動的ベースの絡み合いの証人は、不確実性の関係よりもベルの不等式に似ている:特に古典理論の偽陽性は認めない。我々の基準は非ガウス状態を検出するが、それらのいくつかは他の基準では見落としている。 We introduce a family of entanglement witnesses for continuous variable systems, which rely on the sole assumption that their dynamics is that of coupled harmonic oscillators at the time of the test. Entanglement is inferred from the Tsirelson nonclassicality test on one of the normal modes, without any knowledge about the state of the other mode. In each round, the protocol requires measuring only the sign of one coordinate (e.g. position) at one among several times. This dynamic-based entanglement witness is more akin to a Bell inequality than to an uncertainty relation: in particular, it does not admit false positives from classical theory. Our criterion detects non-Gaussian states, some of which are missed by other criteria.	翻訳日:2023-04-27 03:46:01 公開日:2023-04-25
# オープンソースソフトウェア開発者のためのコードレコメンデーション Code Recommendation for Open Source Software Developers ( http://arxiv.org/abs/2210.08332v3 ) ライセンス: Link先を確認	Yiqiao Jin, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, Wei Wang	(参考訳) オープンソースソフトウェア(OSS)は、技術基盤の根幹を形成し、数百万人の人材を惹きつけている。特に、OSS開発者に適切な開発タスクを推奨するために、開発者の関心事とプロジェクトコードのセマンティックな特徴の両方を考慮するのは困難で重要なことです。本稿では,開発者のインタラクション履歴,ソースコードの意味的特徴,プロジェクトの階層的ファイル構造を考慮に入れて,今後の貢献行動を予測することを目的とした,新しいコード推薦問題を提案する。システム内の複数のパーティ間の複雑な相互作用を考慮し,オープンソースソフトウェア開発者のための新しいグラフベースのコードレコメンデーションフレームワークであるCODERを提案する。コーダーは、異種グラフを介して、ミクロなユーザ・コード間インタラクションとマクロなユーザ・プロジェクト間インタラクションを共同でモデル化し、さらに、プロジェクト階層を反映したファイル構造グラフの集約を通じて、2つのレベルの情報を橋渡しする。さらに,信頼性の高いベンチマークの欠如により,将来研究を促進するために3つの大規模データセットを構築した。大規模実験の結果,CODERフレームワークはプロジェクト内,クロスプロジェクト,コールドスタートレコメンデーションなど,様々な実験条件下で優れた性能を発揮することがわかった。この作業が受け入れられ次第、データ検索のためのすべてのデータセット、コード、ユーティリティをリリースします。 Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers' interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. Considering the complex interactions among multiple parties within the system, we propose CODER, a novel graph-based code recommendation framework for open source software developers. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, due to the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation. We will release all the datasets, code, and utilities for data retrieval upon the acceptance of this work.	翻訳日:2023-04-27 03:45:49 公開日:2023-04-25
# 適応バイアス量子近似最適化アルゴリズムによるSAT問題の解法 Solution of SAT Problems with the Adaptive-Bias Quantum Approximate Optimization Algorithm ( http://arxiv.org/abs/2210.02822v3 ) ライセンス: Link先を確認	Yunlong Yu, Chenfeng Cao, Xiang-Bin Wang, Nic Shannon, and Robert Joynt	(参考訳) 量子近似最適化アルゴリズム(QAOA)は、短期量子デバイスにおける古典的な組合せ最適化問題を解くための有望な方法である。 QAOA を 3-SAT および Max-3-SAT 問題に使用する場合、量子コストは、節密度が変化するにつれて、それぞれ容易にハードなパターンまたは簡単なハードなパターンを示す。ハードリージョン問題で必要とされる量子リソースは、現在のNISQデバイスには及ばない。本稿では,最大14変数の数値シミュレーションにより,適応バイアスQAOA(ab-QAOA)が3SAT問題のハード領域とMax-3-SAT問題のハード領域の性能を大幅に向上させることを示す。同様の精度では、ab-QAOAは平均で10変数の3SAT問題に対して3レベルを必要とする。 10変数のMax-3-SAT問題では、数値は7レベルと62レベルである。この改良は、進化の過程でより標的にされ、より限定された絡み合いから生じる。本稿では,ab-QAOAでは局所場を用いて進化を導くため,古典最適化は必須ではないことを示す。これにより,Ab-QAOAに比べて量子ゲートが著しく少ないハードリージョン3SATとMax-3-SATの問題を効果的に解くことができる最適化フリーなAb-QAOAを提案する。我々の研究は、NISQデバイスにおける最適化問題に対する量子アドバンテージを実現するための道を開いた。 The quantum approximate optimization algorithm (QAOA) is a promising method for solving certain classical combinatorial optimization problems on near-term quantum devices. When employing the QAOA to 3-SAT and Max-3-SAT problems, the quantum cost exhibits an easy-hard-easy or easy-hard pattern respectively as the clause density is changed. The quantum resources needed in the hard-region problems are out of reach for current NISQ devices. We show by numerical simulations with up to 14 variables and analytical arguments that the adaptive-bias QAOA (ab-QAOA) greatly improves performance in the hard region of the 3-SAT problems and hard region of the Max-3-SAT problems. For similar accuracy, on average, ab-QAOA needs 3 levels for 10-variable 3-SAT problems as compared to 22 for QAOA. For 10-variable Max-3-SAT problems, the numbers are 7 levels and 62 levels. The improvement comes from a more targeted and more limited generation of entanglement during the evolution. We demonstrate that classical optimization is not strictly necessary in the ab-QAOA since local fields are used to guide the evolution. This leads us to propose an optimization-free ab-QAOA that can solve the hard-region 3-SAT and Max-3-SAT problems effectively with significantly fewer quantum gates as compared to the original ab-QAOA. Our work paves the way for realizing quantum advantages for optimization problems on NISQ devices.	翻訳日:2023-04-27 03:45:27 公開日:2023-04-25
# 材料工学における人工知能: 材料工学におけるAIの応用に関するレビュー Artificial Intelligence in Material Engineering: A review on applications of AI in Material Engineering ( http://arxiv.org/abs/2209.11234v2 ) ライセンス: Link先を確認	Lipichanda Goswami, Manoj Deka and Mohendra Roy	(参考訳) 物質科学と工学(MSE)における人工知能(AI)の役割は、AI技術の進歩とともにますます重要になりつつある。高性能コンピューティングの開発により、大きなパラメータを持つディープラーニング(DL)モデルをテストすることが可能となり、特性予測において密度汎関数理論(DFT)のような従来の計算手法の限界を克服する機会となった。機械学習(ML)ベースの手法は、DFTベースの手法よりも高速で正確である。さらに, 生成逆数ネットワーク(GAN)は, 結晶構造情報を使わずに無機材料の化学組成の生成を促進する。これらの開発は材料工学(ME)と研究に大きな影響を与えた。ここでは、MEにおけるAIの最新開発についてレビューする。まず, 材料加工, 構造と材料特性の研究, 各種面における材料性能の測定など, MEの重要領域におけるAIの開発について論じる。次に、グラフニューラルネットワーク、生成モデル、学習の伝達など、MSEにおけるAIの重要な方法とその利用について論じる。既存の分析機器からの結果を分析するためのAIの利用についても論じる。最後に、MEにおけるAIのアドバンテージ、デメリット、将来について論じる。 The role of artificial intelligence (AI) in material science and engineering (MSE) is becoming increasingly important as AI technology advances. The development of high-performance computing has made it possible to test deep learning (DL) models with significant parameters, providing an opportunity to overcome the limitation of traditional computational methods, such as density functional theory (DFT), in property prediction. Machine learning (ML)-based methods are faster and more accurate than DFT-based methods. Furthermore, the generative adversarial networks (GANs) have facilitated the generation of chemical compositions of inorganic materials without using crystal structure information. These developments have significantly impacted material engineering (ME) and research. Some of the latest developments in AI in ME herein are reviewed. First, the development of AI in the critical areas of ME, such as in material processing, the study of structure and material property, and measuring the performance of materials in various aspects, is discussed. Then, the significant methods of AI and their uses in MSE, such as graph neural network, generative models, transfer of learning, etc. are discussed. The use of AI to analyze the results from existing analytical instruments is also discussed. Finally, AI's advantages, disadvantages, and future in ME are discussed.	翻訳日:2023-04-27 03:44:46 公開日:2023-04-25
# KGML-xDTD: 薬物治療予測とメカニズム記述のための知識グラフベースの機械学習フレームワーク KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description ( http://arxiv.org/abs/2212.01384v2 ) ライセンス: Link先を確認	Chunyu Ma, Zhihan Zhou, Han Liu, David Koslicki	(参考訳) 背景: 計算薬の再利用は、既存の薬物や化合物の新しい治療目標や疾患(指標)を特定することを目的とした、コストと時間効率のよいアプローチである。従来の湿式薬物発見法と比較して、投資が安く、研究サイクルが短いため、特に発病や孤児病にとって重要である。しかし、薬物と標的疾患との間の行動のメカニズム(moas)はほとんど不明であり、このことは依然として、臨床現場で広く採用される薬物再導入法の主要な障害となっている。結果: 本研究では, 薬物処理疾患の予測を行う知識グラフベースの機械学習フレームワークであるKGML-xDTDを提案する。薬物/化合物と疾患の間の治療の確率を予測するだけでなく、知識グラフ(KG)経路に基づくテスト可能な行動機構(MOAs)を介して生物学的にそれらを説明する2モジュールフレームワークである。グラフベース強化学習(GRL)パスの中間指導として,知識と公開に基づく情報を活用し,生物学的に意味のある「実証経路」を抽出する。包括的実験とケーススタディ分析により, 提案手法は, ヒトのmoa経路の薬物再導入と再認識の予測の両方において, 最先端のパフォーマンスを達成できることが示された。結論: KGML-xDTDは、予測結果と既存の生物学的知識と出版物の組み合わせを活用して、KGパスによる薬物再投薬予測を説明できる最初のモデルフレームワークである。我々は,「ブラックボックス」の懸念を効果的に軽減し,予測された経路に基づく説明に基づく薬物再資源化の予測信頼度を高め,新興疾患に対する薬物発見のプロセスをさらに促進できると考えている。 Background: Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings. Results: In this work, we propose KGML-xDTD: a Knowledge Graph-based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a two-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable mechanisms of action (MOAs). We leverage knowledge-and-publication based information to extract biologically meaningful "demonstration paths" as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths. Conclusions: KGML-xDTD is the first model framework that can offer KG-path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce "black-box" concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations, and further accelerate the process of drug discovery for emerging diseases.	翻訳日:2023-04-27 03:38:25 公開日:2023-04-25
# GREAD: グラフニューラル反応拡散ネットワーク GREAD: Graph Neural Reaction-Diffusion Networks ( http://arxiv.org/abs/2211.14208v2 ) ライセンス: Link先を確認	Jeongwhan Choi, Seoyoung Hong, Noseong Park, Sung-Bae Cho	(参考訳) グラフニューラルネットワーク(GNN)は、ディープラーニングに関する最も人気のある研究トピックの1つである。 GNN法は通常、グラフ信号処理理論に基づいて設計されている。特に、拡散方程式はGNNのコア処理層の設計に広く用いられており、悪名高い過密問題に対して必然的に脆弱である。最近、いくつかの論文が拡散方程式とともに反応方程式に注意を払っている。しかし、それらはすべて限定的な反応方程式である。そこで本研究では,我々が設計した1つの特殊反応方程式に加えて,一般的な反応方程式をすべて考慮した反応拡散式に基づくgnn法を提案する。本論文は,反応拡散式に基づくgnnに関する最も包括的な研究の1つである。 9つのデータセットと28のベースラインを用いた実験では、GREADと呼ばれる手法がほとんどのケースで優れています。さらなる合成データ実験により、オーバースムーシング問題を緩和し、様々なホモフィリー率でうまく機能することが示された。 Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.	翻訳日:2023-04-27 03:37:58 公開日:2023-04-25
# エゴセントリック行動予測のためのインタラクションビジュアルトランスフォーマ Interaction Visual Transformer for Egocentric Action Anticipation ( http://arxiv.org/abs/2211.14154v3 ) ライセンス: Link先を確認	Debaditya Roy, Ramanathan Rajendiran and Basura Fernando	(参考訳) ヒトと物体の相互作用は最も重要な視覚的手がかりの1つであり、人間と物体の相互作用をエゴセントリックな行動予測のために表現する方法を提案する。本稿では,アクションの実行による物体と人間の手の外観の変化を計算し,その変化を利用して映像表現を洗練することにより,インタラクションをモデル化するトランスフォーマーを提案する。具体的には,空間クロスアテンション(sca)を用いて手と物体の相互作用をモデル化し,さらに軌道クロスアテンションを用いた文脈情報から環境改良されたインタラクショントークンを得る。これらのトークンを用いて,行動予測のためのインタラクション中心のビデオ表現を構築する。本稿では,EPICKTICHENS100(EK100)とEGTEA Gaze+を用いて,最先端のアクション予測性能を実現するモデルInAViTを述べる。 InAViTは、オブジェクト中心のビデオ表現を含む他のビジュアルトランスフォーマーベースの手法より優れている。 EK100評価サーバでは、InAViTは公開リーダーボード上で(提出時点で)最高パフォーマンスの手法であり、平均5回のリコールで2番目に良いモデルよりも3.3%上回っている。 Human-object interaction is one of the most important visual cues and we propose a novel way to represent human-object interactions for egocentric action anticipation. We propose a novel transformer variant to model interactions by computing the change in the appearance of objects and human hands due to the execution of the actions and use those changes to refine the video representation. Specifically, we model interactions between hands and objects using Spatial Cross-Attention (SCA) and further infuse contextual information using Trajectory Cross-Attention to obtain environment-refined interaction tokens. Using these tokens, we construct an interaction-centric video representation for action anticipation. We term our model InAViT which achieves state-of-the-art action anticipation performance on large-scale egocentric datasets EPICKTICHENS100 (EK100) and EGTEA Gaze+. InAViT outperforms other visual transformer-based methods including object-centric video representation. On the EK100 evaluation server, InAViT is the top-performing method on the public leaderboard (at the time of submission) where it outperforms the second-best model by 3.3% on mean-top5 recall.	翻訳日:2023-04-27 03:37:45 公開日:2023-04-25
# イベントカメラのためのデータ駆動型特徴追跡 Data-driven Feature Tracking for Event Cameras ( http://arxiv.org/abs/2211.12826v3 ) ライセンス: Link先を確認	Nico Messikommer, Carter Fang, Mathias Gehrig, Davide Scaramuzza	(参考訳) 高時間分解能、動きのぼかしに対するレジリエンスの増大、そして非常に少ない出力のため、イベントカメラは挑戦的なシナリオであっても低レイテンシで低帯域幅の特徴追跡に最適であることが示されている。既存のイベントカメラの特徴追跡手法は手作りか第一原理から派生しているが、広範なパラメータチューニングが必要であり、ノイズに敏感であり、非モデル化効果のために異なるシナリオに一般化しない。これらの欠陥に対処するために、グレースケールフレームで検出された特徴を追跡するために、低レイテンシイベントを活用するイベントカメラ用の最初のデータ駆動機能トラッカーを導入する。特徴トラック間で情報を共有する新しいフレームアテンションモジュールにより,ロバストな性能を実現する。合成データから実データへのゼロショットを直接転送することで、データ駆動トラッカーは、相対的特徴年齢における既存のアプローチを最大120%上回り、低レイテンシを実現する。この性能ギャップはさらに130%増加し、トラッカーを新たな自己超越戦略で実データに適用する。 Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tracking, even in challenging scenarios. Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame. We achieve robust performance via a novel frame attention module, which shares information across feature tracks. By directly transferring zero-shot from synthetic to real data, our data-driven tracker outperforms existing approaches in relative feature age by up to 120% while also achieving the lowest latency. This performance gap is further increased to 130% by adapting our tracker to real data with a novel self-supervision strategy.	翻訳日:2023-04-27 03:37:01 公開日:2023-04-25
# 雑音極大絡み状態を持つ完全量子非局所ゲームの決定可能性 Decidability of fully quantum nonlocal games with noisy maximally entangled states ( http://arxiv.org/abs/2211.10613v4 ) ライセンス: Link先を確認	Minglong Qin, Penghui Yao	(参考訳) 本稿では、雑音の多い最大絡み合った状態を持つ完全量子非局所ゲームの決定可能性について考察する。完全量子非ローカルゲームは非ローカルゲームの一般化であり、質問と回答の両方が量子的であり、審判はプレイヤーから量子的回答を受けた後にゲームに勝つかどうかを決定するためにバイナリ povm 測定を行う。完全量子非局所ゲームの量子値 (quantum value) は、プレイヤーがゲームに勝つ確率の上限であり、プレイヤー間で共有される全ての可能な絡み合った状態と、プレイヤーが行うすべての有効な量子演算を超越する。セミナーワーク $\mathrm{MIP}^=\mathrm{RE}$ は、完全非局所ゲームの量子値を近似することは決定不可能であることを意味する。これは、プレイヤーが最大に絡み合った状態を共有することしか許されていない場合でも継続される。本稿では,共有最大絡み合った状態がノイズである場合について検討する。我々は、プレイヤーが量子値に任意に近い確率で完全量子非局所ゲームに勝つために、ノイズの多い最大絡み合い状態のコピーに計算可能な上限が存在することを証明する。これは、これらのゲームの量子値の近似が決定可能であることを意味する。したがって、完全量子非局所ゲームにおける量子値の近似の難しさは共有状態のノイズに対して強固ではない。本稿では,協調分布の非対話的シミュレーションを決定可能とする枠組みを構築し,非局所ゲームに対する類似結果を一般化する。フーリエ解析の理論を超作用素の空間に拡張し、不変原理や超作用素の次元還元を含むいくつかの重要な結果を証明する。これらの結果は、それ自体が興味深いものであり、さらなる応用があると考えられている。 This paper considers the decidability of fully quantum nonlocal games with noisy maximally entangled states. Fully quantum nonlocal games are a generalization of nonlocal games, where both questions and answers are quantum and the referee performs a binary POVM measurement to decide whether they win the game after receiving the quantum answers from the players. The quantum value of a fully quantum nonlocal game is the supremum of the probability that they win the game, where the supremum is taken over all the possible entangled states shared between the players and all the valid quantum operations performed by the players. The seminal work $\mathrm{MIP}^=\mathrm{RE}$ implies that it is undecidable to approximate the quantum value of a fully nonlocal game. This still holds even if the players are only allowed to share (arbitrarily many copies of) maximally entangled states. This paper investigates the case that the shared maximally entangled states are noisy. We prove that there is a computable upper bound on the copies of noisy maximally entangled states for the players to win a fully quantum nonlocal game with a probability arbitrarily close to the quantum value. This implies that it is decidable to approximate the quantum values of these games. Hence, the hardness of approximating the quantum value of a fully quantum nonlocal game is not robust against the noise in the shared states. This paper is built on the framework for the decidability of non-interactive simulations of joint distributions and generalizes the analogous result for nonlocal games. We extend the theory of Fourier analysis to the space of super-operators and prove several key results including an invariance principle and a dimension reduction for super-operators. These results are interesting in their own right and are believed to have further applications.	翻訳日:2023-04-27 03:36:42 公開日:2023-04-25
# 量子化学習のための部分スクラッチオフロッキーチケットの爆発 Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training ( http://arxiv.org/abs/2211.08544v3 ) ライセンス: Link先を確認	Yunshan Zhong, Mingbao Lin, Yuxin Zhang, Gongrui Nan, Fei Chao, Rongrong Ji	(参考訳) 量子化アウェアトレーニング(qat)は、量子化ネットワークのパフォーマンスを保ちながら広く普及している。現代のQATでは、全ての量子化重量がトレーニングプロセス全体に対して更新される。本稿では,我々が観察した興味深い現象をもとに,この経験に挑戦する。具体的には、量子化された重みの大部分が、いくつかのトレーニング期間を経て最適な量子化レベルに達します。この単純で価値の高い観測は、無意味な更新を避けるために、残りのトレーニング期間でこれらの重みの勾配計算をゼロにするきっかけとなりました。このチケットを効果的に見つけるために、フル精度のチケットと量子化レベルの距離が制御可能な閾値よりも小さい場合、重量を凍結する「抽選チケットスクラッカー」(LTS)と呼ばれるヒューリスティック手法を開発した。驚いたことに、提案されたLTSは一般的に、50%-70%の重量更新と25%-35%のFLOPを後方パスから排除するが、それでも比較したベースラインと同等またはそれ以上のパフォーマンスを達成している。例えば、LTSはベースラインと比較して2ビットのMobileNetV2を5.05%改善し、重量更新の46%と後方パスの23%のFLOPを排除した。コードは url{https://github.com/zysxmu/LTS} にある。 Quantization-aware training (QAT) receives extensive popularity as it well retains the performance of quantized networks. In QAT, the contemporary experience is that all quantized weights are updated for an entire training process. In this paper, this experience is challenged based on an interesting phenomenon we observed. Specifically, a large portion of quantized weights reaches the optimal quantization level after a few training epochs, which we refer to as the partly scratch-off lottery ticket. This straightforward-yet-valuable observation naturally inspires us to zero out gradient calculations of these weights in the remaining training period to avoid meaningless updating. To effectively find the ticket, we develop a heuristic method, dubbed lottery ticket scratcher (LTS), which freezes a weight once the distance between the full-precision one and its quantization level is smaller than a controllable threshold. Surprisingly, the proposed LTS typically eliminates 50%-70% weight updating and 25%-35% FLOPs of the backward pass, while still resulting on par with or even better performance than the compared baseline. For example, compared with the baseline, LTS improves 2-bit MobileNetV2 by 5.05%, eliminating 46% weight updating and 23% FLOPs of the backward pass. Code is at url{https://github.com/zysxmu/LTS}.	翻訳日:2023-04-27 03:36:13 公開日:2023-04-25
# 量子スピン系のギャップをブートストラップする Bootstrapping the gap in quantum spin systems ( http://arxiv.org/abs/2211.03819v2 ) ライセンス: Link先を確認	Colin Oscar Nancarrow, Yuan Xin	(参考訳) 本研究では,共形場理論(CFT)のセットアップを密接に反映した量子力学問題に対する新しいブートストラップ法について報告する。運動方程式を用いて、行列要素の共形ブロック展開のアナログを開発し、それらの値に境界を置くために交叉対称性を課す。本手法は,局所ハミルトニアンを持つ任意の量子力学系に適用可能であり,非調和振動子モデルと (1+1)-次元横場イジングモデル(TFIM)を用いて実験を行う。非調和振動子モデルについて、少数の交叉方程式がスペクトルと行列要素の正確な解を与えることを示した。 TFIM に対して、ハミルトン方程式、翻訳不変性、大域対称性選択規則は熱力学極限における TFIM のギャップと行列要素に厳密な境界を課すことを示す。境界は、交差方程式のより大きな系を考えると改善され、より有限体積の解を除外する。本手法は、ハミルトニアンから無限格子の低エネルギースペクトルを厳密かつ近似なしで探究する方法を提供する。 In this work we report on a new bootstrap method for quantum mechanical problems that closely mirrors the setup from conformal field theory (CFT). We use the equations of motion to develop an analogue of the conformal block expansion for matrix elements and impose crossing symmetry in order to place bounds on their values. The method can be applied to any quantum mechanical system with a local Hamiltonian, and we test it on an anharmonic oscillator model as well as the (1+1)-dimensional transverse field Ising model (TFIM). For the anharmonic oscillator model we show that a small number of crossing equations provides an accurate solution to the spectrum and matrix elements. For the TFIM we show that the Hamiltonian equations of motion, translational invariance and global symmetry selection rules imposes a rigorous bound on the gap and the matrix elements of TFIM in the thermodynamic limit. The bound improves as we consider larger systems of crossing equations, ruling out more finite-volume solutions. Our method provides a way to probe the low energy spectrum of an infinite lattice from the Hamiltonian rigorously and without approximation.	翻訳日:2023-04-27 03:35:33 公開日:2023-04-25
# 点雲の所望距離関係へのユークリッド空間の計量化 Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds ( http://arxiv.org/abs/2211.03674v2 ) ライセンス: Link先を確認	Stefan Rass, Sandra K\"onig, Shahzad Ahmad, Maksim Goman	(参考訳) ユークリッド空間 $\mathbb{r}^\ell$ with $\ell>1$ の点の集合が与えられると、それらの点の間の対距離は、その空間的位置と、$\mathbb{r}^\ell$ with を与える計量 $d$ によって決定される。したがって、2つの点の間の距離 $d(\mathbf x,\mathbf y)=\delta$ は、$\mathbf x$ と $\mathbf y$ と $d$ の選択によって固定される。我々は、値 $\delta$ と点 $\mathbf x,\mathbf y$ を固定する関連する問題を研究し、所望距離 $\delta$ を計算する位相計量 $d$ が存在するかどうかを問う。この問題は、最大$o(\sqrt\ell)$ の点間の所望の対距離を$\mathbb{r}^\ell$ で同時に与えるメトリックを構築して解くことができることを示した。 We then introduce the notion of an $\varepsilon$-semimetric $\tilde{d}$ to formulate our main result: for all $\varepsilon>0$, for all $m\geq 1$, for any choice of $m$ points $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$, and all chosen sets of values $\{\delta_{ij}\geq 0: 1\leq i<j\leq m\}$, there exists an $\varepsilon$-semimetric $\tilde{\delta}:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$ such that $\tilde{d}(\mathbf y_i,\mathbf y_j)=\delta_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. 本稿では,教師なし学習アルゴリズム,具体的には$k$-Means and density-based clustering algorithm(DBSCAN)に対する攻撃効果を示す。これらには人工知能における多様体的応用があり、以下に示すように、外部から提供される距離測度で実行させることで、クラスタアルゴリズムが事前に決定され、従って可鍛性を持つ結果を生成することができる。このことはクラスタリングアルゴリズムの結果が、特定の距離関数を使用するための標準化された固定された処方令がない限り、一般的には信頼できないことを示している。 Given a set of points in the Euclidean space $\mathbb{R}^\ell$ with $\ell>1$, the pairwise distances between the points are determined by their spatial location and the metric $d$ that we endow $\mathbb{R}^\ell$ with. Hence, the distance $d(\mathbf x,\mathbf y)=\delta$ between two points is fixed by the choice of $\mathbf x$ and $\mathbf y$ and $d$. We study the related problem of fixing the value $\delta$, and the points $\mathbf x,\mathbf y$, and ask if there is a topological metric $d$ that computes the desired distance $\delta$. We demonstrate this problem to be solvable by constructing a metric to simultaneously give desired pairwise distances between up to $O(\sqrt\ell)$ many points in $\mathbb{R}^\ell$. We then introduce the notion of an $\varepsilon$-semimetric $\tilde{d}$ to formulate our main result: for all $\varepsilon>0$, for all $m\geq 1$, for any choice of $m$ points $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$, and all chosen sets of values $\{\delta_{ij}\geq 0: 1\leq i<j\leq m\}$, there exists an $\varepsilon$-semimetric $\tilde{\delta}:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$ such that $\tilde{d}(\mathbf y_i,\mathbf y_j)=\delta_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. We showcase our results by using them to attack unsupervised learning algorithms, specifically $k$-Means and density-based (DBSCAN) clustering algorithms. These have manifold applications in artificial intelligence, and letting them run with externally provided distance measures constructed in the way as shown here, can make clustering algorithms produce results that are pre-determined and hence malleable. This demonstrates that the results of clustering algorithms may not generally be trustworthy, unless there is a standardized and fixed prescription to use a specific distance function.	翻訳日:2023-04-27 03:35:10 公開日:2023-04-25
# 貧乏者の品質推定:参照のない参照ベースのmtメトリクスの予測 Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference ( http://arxiv.org/abs/2301.09008v3 ) ライセンス: Link先を確認	Vil\'em Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan	(参考訳) 機械翻訳品質推定(QE)は、参照を見ることなく翻訳仮説の人間の判断を予測する。事前訓練された言語モデルに基づく最先端のQEシステムは、人間の判断と顕著な相関を達成しているが、それらは計算的に重く、作成に時間がかかる人間のアノテーションを必要とする。これらの制約に対処するために、基準を使わずに自動測定値を予測する計量推定(ME)の問題を定義する。基準にアクセスしなくても、我々のモデルは自動メトリクス(BLEUは$60%、他のメトリクスは$51%)を文レベルで推定できることを示す。自動メトリクスは人間の判断と相関するため、QEモデルの事前トレーニングにMEタスクを利用することができます。 QEタスクの場合、TERの事前トレーニングは、スクラッチのトレーニング(\rho$=20%)より優れている(\rho$=23%)。 Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference. State-of-the-art QE systems based on pretrained language models have been achieving remarkable correlations with human judgements yet they are computationally heavy and require human annotations, which are slow and expensive to create. To address these limitations, we define the problem of metric estimation (ME) where one predicts the automated metric scores also without the reference. We show that even without access to the reference, our model can estimate automated metrics ($\rho$=60% for BLEU, $\rho$=51% for other metrics) at the sentence-level. Because automated metrics correlate with human judgements, we can leverage the ME task for pre-training a QE model. For the QE task, we find that pre-training on TER is better ($\rho$=23%) than training for scratch ($\rho$=20%).	翻訳日:2023-04-27 03:28:03 公開日:2023-04-25
# 量子コンピュータへのブートストラップ埋め込み Bootstrap Embedding on a Quantum Computer ( http://arxiv.org/abs/2301.01457v2 ) ライセンス: Link先を確認	Yuan Liu, Oinam R. Meitei, Zachary E. Chin, Arkopal Dutt, Max Tao, Isaac L. Chuang, Troy Van Voorhis	(参考訳) 量子コンピュータの実装に適するように分子ブートストラップ埋め込みを拡張した。これにより、全体システムのフラグメントを管理する複合ラグランジアンに対する最適化問題として、大きな分子の電子構造問題の解法が実現され、フラグメント解は量子コンピュータの能力を利用することができる。量子SWAPテストや量子振幅増幅を含む最先端の量子サブルーチンを用いることで、古典的アルゴリズムよりも2次的なスピードアップが原理的に得られることを示す。量子計算の活用により、アルゴリズムは1-rdmに制限されるのではなく、フラグメント境界の完全な密度行列と -- 計算コストを少しでも増やすことができる。現在の量子コンピュータは小さいが、量子ブートストラップの埋め込みは量子フラグメントマッチングを通じてそのような小さなマシンを利用するための潜在的に一般化可能な戦略を提供する。 We extend molecular bootstrap embedding to make it appropriate for implementation on a quantum computer. This enables solution of the electronic structure problem of a large molecule as an optimization problem for a composite Lagrangian governing fragments of the total system, in such a way that fragment solutions can harness the capabilities of quantum computers. By employing state-of-art quantum subroutines including the quantum SWAP test and quantum amplitude amplification, we show how a quadratic speedup can be obtained over the classical algorithm, in principle. Utilization of quantum computation also allows the algorithm to match -- at little additional computational cost -- full density matrices at fragment boundaries, instead of being limited to 1-RDMs. Current quantum computers are small, but quantum bootstrap embedding provides a potentially generalizable strategy for harnessing such small machines through quantum fragment matching.	翻訳日:2023-04-27 03:27:37 公開日:2023-04-25
# アンダーサンプルデータからの非視線イメージングのための曲率正規化 Curvature regularization for Non-line-of-sight Imaging from Under-sampled Data ( http://arxiv.org/abs/2301.00406v2 ) ライセンス: Link先を確認	Rui Ding, Juntian Ye, Qifeng Gao, Feihu Xu, Yuping Duan	(参考訳) 非視線画像(NLOS)は、複数の回折反射の後に光で符号化された光子時間情報を用いて、視線で測定されたデータから3次元の隠れたシーンを再構築することを目的としている。サンプリング済みの走査データは、高速な撮像を容易にすることができる。しかし, 結果として生じる復元問題は, ノイズや歪みにより劣化する可能性が高く, 深刻な逆問題となる。本稿では,曲率正規化に基づく2つの新しいnlos再構成モデル,すなわち,オブジェクト領域曲率正規化モデルと,デュアル(信号およびオブジェクト)領域曲率正規化モデルを提案する。 gpu実装によりさらに加速されるバックトラックステップ化規則(backtracking stepsize rule)を伴う乗算器の交互方向法(admm)に基づいて高速数値最適化アルゴリズムを開発した。提案したアルゴリズムは, 合成データセットと実データセットの両方で評価し, 特に圧縮センシング環境で, 最先端性能を実現する。私たちのコードとデータは、https://github.com/Duanlab123/CurvNLOSで利用可能です。 Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.	翻訳日:2023-04-27 03:27:21 公開日:2023-04-25
# housecat6d -- 現実的なシナリオで家庭用オブジェクトを使った大規模マルチモーダルカテゴリレベル6dオブジェクトポーズデータセット HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios ( http://arxiv.org/abs/2212.10428v3 ) ライセンス: Link先を確認	HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp, Hannah Schieber, Pengyuan Wang, Giulia Rizzoli, Hongcheng Zhao, Sven Damian Meier, Daniel Roth, Nassir Navab, Benjamin Busam	(参考訳) オブジェクトの6Dポーズを推定することは、主要な3Dコンピュータビジョン問題である。インスタンスレベルのアプローチによる有望な結果から、研究責任者はより実用的なアプリケーションシナリオのためのカテゴリレベルのポーズ推定にも取り組んでいる。しかし、よく確立されたインスタンスレベルのポーズデータセットとは異なり、利用可能なカテゴリレベルのデータセットはアノテーションの品質やポーズ量に欠ける。新しいカテゴリーレベルの6DポーズデータセットHouseCat6Dを提案する。 1)ポラリメトリックRGBと深さ(RGBD+P)の多モード性 2)2つのフォトメトリックに挑戦するカテゴリを含む10の家庭用オブジェクトカテゴリの高度に多様な194のオブジェクト。 3) エラー範囲がわずか1.35mmから1.74mmの高品質ポーズアノテーション 4)広い視点と隠蔽を有する41の大規模シーン。 5)全シーンにおけるチェッカーボードのない環境 6) 同時に高密度6Dパラレルジャウグリップを付加した。さらに,最先端カテゴリレベルのポーズ推定ネットワークのベンチマーク結果も提供する。 Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks.	翻訳日:2023-04-27 03:27:01 公開日:2023-04-25
# InferEM:共感的対話生成のための話者意図の推測 InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation ( http://arxiv.org/abs/2212.06373v6 ) ライセンス: Link先を確認	Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng	(参考訳) 共感応答生成に対する現在のアプローチは、一般的に対話履歴全体をエンコードし、出力をデコーダに入れてフレンドリーなフィードバックを生成する。これらの手法は文脈情報のモデル化に焦点をあてるが、話者の直接の意図を捉えることは無視する。我々は,対話の最後の発声が話者の意図を実証的に伝えることを主張する。そこで本研究では,共感応答生成のための新しいモデルInferEMを提案する。我々は,最後の発話を別々に符号化し,多面的注意に基づく意図融合モジュールを通して対話全体と融合し,話者の意図を捉える。さらに,先行した発話を用いて最後の発話を予測し,人間の心理をシミュレートし,対話者が事前に何を話すのかを推測する。発話予測と応答生成の最適化率のバランスをとるために,InferEMのためのマルチタスク学習戦略を設計する。実験の結果,inferemの共感性発現改善における可能性と妥当性が示された。 Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.	翻訳日:2023-04-27 03:26:30 公開日:2023-04-25
# HACA3:マルチサイトMR画像調和のための統一的アプローチ HACA3: A Unified Approach for Multi-site MR Image Harmonization ( http://arxiv.org/abs/2212.06065v2 ) ライセンス: Link先を確認	Lianrui Zuo, Yihao Liu, Yuan Xue, Blake E. Dewey, Samuel W. Remedios, Savannah P. Hays, Murat Bilgel, Ellen M. Mowry, Scott D. Newsome, Peter A. Calabresi, Susan M. Resnick, Jerry L. Prince, Aaron Carass	(参考訳) 標準化の欠如は磁気共鳴(MR)イメージングにおいて顕著な問題である。これはしばしば、ハードウェアと取得パラメータの違いのため、取得した画像に望ましくないコントラスト変動を引き起こす。近年,非所望のコントラスト変動を補うため,画像合成に基づくMRハーモニゼーションが提案されている。既存の方法の成功にもかかわらず、私たちは3つの大きな改善ができると主張している。第一に、既存のほとんどの手法は、同一対象のマルチコントラストMR画像が同じ解剖学を共有するという仮定に基づいて構築されている。この仮定は、異なるMRコントラストが異なる解剖学的特徴の強調に特化しているため、疑わしい。第二に、これらの方法は訓練のために固定されたMRコントラスト(例えば、T1強調画像とT2強調画像の両方)を必要とし、適用性を制限する。最後に、既存の手法は一般的にイメージングアーティファクトに敏感である。本稿では,これらの3つの問題に対処するための新しいアプローチである,注意に基づくコントラスト,解剖,アーティファクト認識(HACA3)を提案する。 HACA3は、MRコントラスト間の固有の解剖学的差異を説明する解剖学的融合モジュールを組み込んでいる。さらに、HACA3はイメージングアーティファクトにも堅牢であり、MRコントラストの任意のセットにトレーニングおよび適用することができる。 HACA3は、フィールド強度、スキャナープラットフォーム、取得プロトコルの異なる21のサイトから取得した多様なMRデータセット上で開発・評価されている。実験により、HACA3は複数の画像品質指標の下で最先端のパフォーマンスを達成することが示された。また,白質病変の分節化や縦断的体積分析を含む下流課題に対するhaca3の適用性と汎用性を示す。 The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations in the acquired images due to differences in hardware and acquisition parameters. In recent years, image synthesis-based MR harmonization with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable, since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both T1-weighted and T2-weighted images), limiting their applicability. Lastly, existing methods are generally sensitive to imaging artifacts. In this paper, we present Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), a novel approach to address these three issues. HACA3 incorporates an anatomy fusion module that accounts for the inherent anatomical differences between MR contrasts. Furthermore, HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. HACA3 is developed and evaluated on diverse MR datasets acquired from 21 sites with varying field strengths, scanner platforms, and acquisition protocols. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability and versatility of HACA3 on downstream tasks including white matter lesion segmentation and longitudinal volumetric analyses.	翻訳日:2023-04-27 03:26:14 公開日:2023-04-25
# 量子カオスと時間の矢印 Quantum chaos and the arrow of time ( http://arxiv.org/abs/2212.03914v4 ) ライセンス: Link先を確認	Nilakash Sorokhaibam	(参考訳) 私たちの周りの世界は明らかに時間の矢を持っている。古典的熱力学は、美しい統計学的解釈を持つ熱力学の第2法則の形で時間的矢印を与える。しかし、時空の矢印の量子的起源の明確な写真は今のところ不足している。ここでは、量子カオス系において時間矢印が生じることを示す。カオス的でもある孤立量子系の場合、エントロピーの変化は、系が全般的に摂動しているときに非負であることを示す。物理系は一般に高度に相互作用し、カオスシステムの良い例である。我々は,システムの摂動時のエネルギー変化を追跡することで,この結果を示す。非常に微調整された摂動を用いて、エントロピーを下げることができる。しかし、摂動を微調整するには、システムの高精度なエネルギー準位を測定する必要がある。これは古典的熱力学におけるマクスウェルのデーモン問題とそのその後の解法を想起させる。 The world around us distinctly possesses an arrow of time. Classical thermodynamics provides an arrow of time in the form of the second law of thermodynamics which also has a beautiful statistical interpretation. But a clear picture of the quantum origin of the arrow of time has been lacking so far. Here we show that an arrow of time arises in quantum chaotic systems. We show that, for an isolated quantum system which is also chaotic, the change in entropy is non-negative when the system is generically perturbed. Physical systems are, in general, highly interacting and are good examples of chaotic systems. We show our result by keeping track of the change in energy when the system is perturbed. Using an extremely fine-tuned perturbation, we can still lower the entropy. But fine-tuning the perturbation requires measurement of highly precise energy levels of the system. This is reminiscent of the Maxwell's demon problem in classical thermodynamics and its subsequent resolution.	翻訳日:2023-04-27 03:25:49 公開日:2023-04-25
# 長距離2次リンドブラディアンにおける絡み合いと局在 Entanglement and localization in long-range quadratic Lindbladians ( http://arxiv.org/abs/2303.07070v2 ) ライセンス: Link先を確認	Alejandro Cros Carrillo de Albornoz, Dominic C. Rose and Arijeet Pal	(参考訳) アンダーソン局在の存在は、無秩序系における古典波と量子波のコヒーレンスを示すものと考えられている。環境への結合が著しく抑制されるが排除されない凝縮物や低温原子系では、局在のシグネチャが観察されている。本研究では,開量子系を記述するランダム・リンドブラッド力学における局在現象を考察する。浴槽の局所的なアンサンブルに結合した非相互作用性スピンレスフェルミオンの1次元連鎖モデルを提案する。各サイトにリンクされた浴槽との相互作用を媒介するジャンプ演算子は、指数$p$のパワーローテールを有する。系の定常状態は,コヒーレントホッピングの有無で安定な$p$をチューニングすることにより,局所的絡み合い相転移が進行することを示す。開系の量子軌道における絡み合い遷移とは異なり、この遷移はリンドブレディアンの平均定常状態密度行列によって表される。局所化相の定常状態は、局所的な人口不均衡の不均一性によって特徴づけられる一方、ジャンプ演算子は影響する部位の一定の参加率を示す。我々の研究は、オープン量子システムにおける局在物理学の新たな実現を提供する。 Existence of Anderson localization is considered a manifestation of coherence of classical and quantum waves in disordered systems. Signatures of localization have been observed in condensed matter and cold atomic systems where the coupling to the environment can be significantly suppressed but not eliminated. In this work we explore the phenomena of localization in random Lindbladian dynamics describing open quantum systems. We propose a model of one-dimensional chain of non-interacting, spinless fermions coupled to a local ensemble of baths. The jump operator mediating the interaction with the bath linked to each site has a power-law tail with an exponent $p$. We show that the steady state of the system undergoes a localization entanglement phase transition by tuning $p$ which remains stable in the presence of coherent hopping. Unlike the entanglement transition in the quantum trajectories of open systems, this transition is exhibited by the averaged steady state density matrix of the Lindbladian. The steady state in the localized phase is characterised by a heterogeneity in local population imbalance, while the jump operators exhibit a constant participation ratio of the sites they affect. Our work provides a novel realisation of localization physics in open quantum systems.	翻訳日:2023-04-27 03:18:51 公開日:2023-04-25
# システム環境量子モデルの完全ダイナミクスに対する複素離散化近似 Complex Discretization approximation for the full dynamics of system-environment quantum models ( http://arxiv.org/abs/2303.06584v3 ) ライセンス: Link先を確認	H. T. Cui, Y. A. Yan, M. Qin, and X. X. Yi	(参考訳) 連続体における環境のオープンダイナミクスをシミュレートするために用いられる離散化近似法は、しばしば再発に悩まされ、効率が低下する。本稿では,複素ガウス二次数を用いた複素平面における離散化近似法の一般化を提案する。結果として得られる効果的ハミルトニアンは、系の散逸ダイナミクスのために非エルミート的である。提案手法は2つの可解モデル,すなわち,一般化 Aubry-Andr\'{e}-Harper モデルにおけるdephasing モデルと単一励起開力学に適用される。その結果, 複雑な離散モードの発生が再現性を著しく低減し, 両モデルにおける開放力学の効率的かつ正確なシミュレーションを可能にした。 The discretization approximation method used to simulate the open dynamics of the environment in continuum often suffers from recurrence, which results in inefficiency. To address this issue, this paper proposes a generalization of the discretization approximation method in the complex plane using complex Gauss quadratures. The resulting effective Hamiltonian is non-Hermitian due to the dissipative dynamics of the system. The proposed method is applied to two solvable models, namely the dephasing model and the single-excitation open dynamics in the generalized Aubry-Andr\'{e}-Harper model. The results demonstrate that the occurrence of complex discrete modes in the environment can significantly reduce recurrence, thereby enabling the efficient and accurate simulation of open dynamics in both models.	翻訳日:2023-04-27 03:18:33 公開日:2023-04-25
# 高温微細化と背景抑制によるきめ細かい視覚分類 Fine-grained Visual Classification with High-temperature Refinement and Background Suppression ( http://arxiv.org/abs/2303.06442v2 ) ライセンス: Link先を確認	Po-Yung Chou, Yu-Yung Kao, Cheng-Hung Lin	(参考訳) 細粒度の視覚的分類は、カテゴリ間の高い類似性と1つのカテゴリ内のデータ間の相違により難しい課題である。これらの課題に対処するため、従来の戦略では、カテゴリ間の微妙な相違点のローカライズと、それらにおける差別的特徴の集中に重点を置いてきた。しかし、背景には、分類に不必要であるか、あるいは有害であるかをモデルに伝える重要な情報もあり、微妙な特徴に強く依存するモデルは、グローバルな特徴や文脈的な情報を見落としてしまう可能性がある。本稿では,2つのモジュール,すなわち高温リファインメントモジュールと背景抑圧モジュールから構成される「高温リファインメント」と「背景抑圧」という,識別特性の抽出と背景雑音の抑制を行う新しいネットワークを提案する。高温改良モジュールは、異なるスケールで特徴マップを精製し、多様な特徴の学習を改善することにより、適切な特徴スケールを学習することを可能にする。そして、背景抑圧モジュールは、まず、分類信頼度スコアを用いて、特徴マップを前景と背景に分割し、識別的特徴を高めながら、低信頼領域の特徴値を抑制する。 CUB-200-2011 と NABirds のベンチマークにおいて, HERBS は様々なスケールの特徴を効果的に融合し, 背景雑音, 識別的特徴を微粒化のための適切なスケールで抑制し, CUB-200-2011 と NABirds のベンチマークにおける最先端性能を 93% を超える精度で達成した。このように、HERBSは、きめ細かい視覚分類タスクの性能を向上させるための有望なソリューションを提供する。コード:https://github.com/chou141253/FGVC-HERBS Fine-grained visual classification is a challenging task due to the high similarity between categories and distinct differences among data within one single category. To address the challenges, previous strategies have focused on localizing subtle discrepancies between categories and enhencing the discriminative features in them. However, the background also provides important information that can tell the model which features are unnecessary or even harmful for classification, and models that rely too heavily on subtle features may overlook global features and contextual information. In this paper, we propose a novel network called ``High-temperaturE Refinement and Background Suppression'' (HERBS), which consists of two modules, namely, the high-temperature refinement module and the background suppression module, for extracting discriminative features and suppressing background noise, respectively. The high-temperature refinement module allows the model to learn the appropriate feature scales by refining the features map at different scales and improving the learning of diverse features. And, the background suppression module first splits the features map into foreground and background using classification confidence scores and suppresses feature values in low-confidence areas while enhancing discriminative features. The experimental results show that the proposed HERBS effectively fuses features of varying scales, suppresses background noise, discriminative features at appropriate scales for fine-grained visual classification.The proposed method achieves state-of-the-art performance on the CUB-200-2011 and NABirds benchmarks, surpassing 93% accuracy on both datasets. Thus, HERBS presents a promising solution for improving the performance of fine-grained visual classification tasks. code: https://github.com/chou141253/FGVC-HERBS	翻訳日:2023-04-27 03:18:19 公開日:2023-04-25
# シグモノイドネットワークのための複合最適化アルゴリズム Composite Optimization Algorithms for Sigmoid Networks ( http://arxiv.org/abs/2303.00589v2 ) ライセンス: Link先を確認	Huixiong Chen, Qi Ye	(参考訳) 本稿では,合成最適化アルゴリズムを用いてシグモイドネットワークを解く。我々は,sgmoidネットワークを凸合成最適化に等価に転送し,線形近位アルゴリズムと乗算器の交互方向法に基づく合成最適化アルゴリズムを提案する。弱鋭極小と正則性条件の仮定の下では、非凸問題や非滑らか問題の場合であっても、アルゴリズムは対象関数のグローバル最適解に収束することが保証される。さらに、収束結果をトレーニングデータの量に直接関連付けることができ、シグモノイドネットワークのサイズを設定するための一般的なガイドを提供する。フランクの関数フィッティングと手書き数字認識に関する数値実験により,提案アルゴリズムは良好かつ堅牢に機能することを示した。 In this paper, we use composite optimization algorithms to solve sigmoid networks. We equivalently transfer the sigmoid networks to a convex composite optimization and propose the composite optimization algorithms based on the linearized proximal algorithms and the alternating direction method of multipliers. Under the assumptions of the weak sharp minima and the regularity condition, the algorithm is guaranteed to converge to a globally optimal solution of the objective function even in the case of non-convex and non-smooth problems. Furthermore, the convergence results can be directly related to the amount of training data and provide a general guide for setting the size of sigmoid networks. Numerical experiments on Franke's function fitting and handwritten digit recognition show that the proposed algorithms perform satisfactorily and robustly.	翻訳日:2023-04-27 03:17:33 公開日:2023-04-25
# 創発的因果性と意識の基礎 Emergent Causality & the Foundation of Consciousness ( http://arxiv.org/abs/2302.03189v3 ) ライセンス: Link先を確認	Michael Timothy Bennett	(参考訳) 対話的な環境で正確な推論を行うためには、エージェントは、イベントの受動的観察と、それらを引き起こすための介入を混同してはならない。 do$オペレータは介入を形式化し、その効果を判断します。しかし、対話的な環境では、介入の明示的な表現を前提とせず、最大精度の推論を行う汎用知能の最適数学的形式主義が存在する。我々はそのような形式主義を一つ検討する。我々は$do$演算子がない場合、介入は変数で表現できることを示した。変数は抽象化であり、事前に介入を明示的に表現する必要があるのは、そのような抽象化を前提にしているからである。上記の形式主義は、これを避けるため、初期条件は、誘導を通じて関連する因果的介入の表現が現れる。これらの創発的抽象化は、自己と他のオブジェクトの表現として機能し、それらのオブジェクトの介入が目標の満足度に影響を与えると判断される。このことは、他人のアイデンティティや意図、他人のアイデンティティや意図を、他人が認識するものとして、どのように考えるかを説明するものだ、と我々は主張する。狭義では、それは何を知るべきかを記述し、意識の側面の機械的な説明である。 To make accurate inferences in an interactive setting, an agent must not confuse passive observation of events with having intervened to cause them. The $do$ operator formalises interventions so that we may reason about their effect. Yet there exist pareto optimal mathematical formalisms of general intelligence in an interactive setting which, presupposing no explicit representation of intervention, make maximally accurate inferences. We examine one such formalism. We show that in the absence of a $do$ operator, an intervention can be represented by a variable. We then argue that variables are abstractions, and that need to explicitly represent interventions in advance arises only because we presuppose these sorts of abstractions. The aforementioned formalism avoids this and so, initial conditions permitting, representations of relevant causal interventions will emerge through induction. These emergent abstractions function as representations of one`s self and of any other object, inasmuch as the interventions of those objects impact the satisfaction of goals. We argue that this explains how one might reason about one`s own identity and intent, those of others, of one`s own as perceived by others and so on. In a narrow sense this describes what it is to be aware, and is a mechanistic explanation of aspects of consciousness.	翻訳日:2023-04-27 03:16:45 公開日:2023-04-25
# トポロジカル非エルミート皮膚効果 Topological Non-Hermitian skin effect ( http://arxiv.org/abs/2302.03057v2 ) ライセンス: Link先を確認	Rijia Lin, Tommy Tai, Linhu Li, Ching Hua Lee	(参考訳) 本稿では,非エルミート皮膚効果(NHSE)の最近の進展,特にトポロジーとの豊かな相互作用について概説する。レビューは、修正されたバルク境界対応、より高次元のnhseとバンドトポロジーの相乗的およびハイブリダイゼーション、スペクトル巻線トポロジーやスペクトルグラフトポロジーのような複素エネルギー平面上の関連するトポロジーに関する教育的紹介から始まります。その後、非エルミート臨界性、動的NHSE現象、および従来の線形非相互作用結晶格子、特に量子多体相互作用との相互作用を超えたNHSEの顕在化など、新たなトピックが導入される。最後に、NHSEの最近の実演と実験的提案について調査する。 This article reviews recent developments in the non-Hermitian skin effect (NHSE), particularly on its rich interplay with topology. The review starts off with a pedagogical introduction on the modified bulk-boundary correspondence, the synergy and hybridization of NHSE and band topology in higher dimensions, as well as, the associated topology on the complex energy plane such as spectral winding topology and spectral graph topology. Following which, emerging topics are introduced such as non-Hermitian criticality, dynamical NHSE phenomena, and the manifestation of NHSE beyond the traditional linear non-interacting crystal lattices, particularly its interplay with quantum many-body interactions. Finally, we survey the recent demonstrations and experimental proposals of NHSE.	翻訳日:2023-04-27 03:16:25 公開日:2023-04-25
# リスナー2Scene:インタラクティブな素材を意識したバイノーラルサウンドプロパゲーション Listen2Scene: Interactive material-aware binaural soundbpropagation for reconstructed 3D scenes ( http://arxiv.org/abs/2302.02809v2 ) ライセンス: Link先を確認	Anton Ratnarajah, Dinesh Manocha	(参考訳) 本稿では、仮想現実(vr)および拡張現実(ar)アプリケーションのためのエンドツーエンドバイノーラルオーディオレンダリングアプローチ(listen2scene)を提案する。実環境の3次元モデルに対する音響効果を生成するニューラルネットを用いたバイノーラル音響伝搬法を提案する。クリーンオーディオやドライオーディオは、生成された音響効果と畳み込み、実際の環境に対応するオーディオをレンダリングすることができる。本稿では,3次元シーンの材料情報とトポロジー情報の両方を用いて,シーン潜在ベクトルを生成するグラフニューラルネットワークを提案する。さらに,現場潜伏ベクトルから音響効果を生成するために,条件付き生成対向ネットワーク(CGAN)を用いる。我々のネットワークは、再構成された3Dメッシュモデルでホールや他のアーティファクトを処理できる。空間音響効果を組み込むために,ジェネレータネットワークに効率的なコスト関数を提案する。ソースとリスナーの位置を考えると、学習に基づくバイノーラル音伝搬アプローチは、nvidia geforce rtx 2080 ti gpu上で0.1ミリ秒で音響効果を生成し、複数のソースを容易に処理できる。本研究では,インタラクティブな幾何音響伝搬アルゴリズムを用いて,バイノーラル音響効果を用いたアプローチの精度を評価し,実際の音響効果を捉えた。また, 従来の学習に基づく音声伝搬アルゴリズムを用いた音声に比べて, 提案手法により得られた音声が, より妥当であることが確認された。 We present an end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to the real environment. We propose a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector. Moreover, we use a conditional generative adversarial network (CGAN) to generate acoustic effects from the scene latent vector. Our network is able to handle holes or other artifacts in the reconstructed 3D mesh model. We present an efficient cost function to the generator network to incorporate spatial audio effects. Given the source and the listener position, our learning-based binaural sound propagation approach can generate an acoustic effect in 0.1 milliseconds on an NVIDIA GeForce RTX 2080 Ti GPU and can easily handle multiple sources. We have evaluated the accuracy of our approach with binaural acoustic effects generated using an interactive geometric sound propagation algorithm and captured real acoustic effects. We also performed a perceptual evaluation and observed that the audio rendered by our approach is more plausible as compared to audio rendered using prior learning-based sound propagation algorithms.	翻訳日:2023-04-27 03:16:11 公開日:2023-04-25
# RobCaps: アフィン変換と敵攻撃に対するカプセルネットワークのロバスト性の評価 RobCaps: Evaluating the Robustness of Capsule Networks against Affine Transformations and Adversarial Attacks ( http://arxiv.org/abs/2304.03973v2 ) ライセンス: Link先を確認	Alberto Marchisio and Antonio De Marco and Alessio Colucci and Maurizio Martina and Muhammad Shafique	(参考訳) Capsule Networks(CapsNets)は、画像分類タスクのための複数のオブジェクト間のポーズ関係を階層的に保存することができる。安全性クリティカルなアプリケーションにCapsNetをデプロイする際のもうひとつの重要な要因は、入力変換や悪意のある敵攻撃に対する堅牢性である。本稿では,従来の畳み込みニューラルネットワーク(cnns)と比較して,capsnetのロバスト性に影響する要因を体系的に分析し,評価する。包括的な比較のために、MNIST, GTSRB, CIFAR10データセットの2つのCapsNetモデルと2つのCNNモデル、およびこれらのデータセットのアフィン変換バージョンをテストする。詳細な分析により,これらのアーキテクチャの特性がロバスト性の向上と制約に寄与することを示す。全体として、CapsNetsは、同じ数のパラメータを持つ従来のCNNと比較して、敵の例やアフィン変換に対する堅牢性を改善する。同様の結論はCapsNetsとCNNのより深いバージョンに導出されている。さらに,この結果から,動的ルーティングがcapsnetsの堅牢性向上に大きく寄与しないことが判明した。実際、主な一般化の貢献はカプセルによる階層的特徴学習によるものである。 Capsule Networks (CapsNets) are able to hierarchically preserve the pose relationships between multiple objects for image classification tasks. Other than achieving high accuracy, another relevant factor in deploying CapsNets in safety-critical applications is the robustness against input transformations and malicious adversarial attacks. In this paper, we systematically analyze and evaluate different factors affecting the robustness of CapsNets, compared to traditional Convolutional Neural Networks (CNNs). Towards a comprehensive comparison, we test two CapsNet models and two CNN models on the MNIST, GTSRB, and CIFAR10 datasets, as well as on the affine-transformed versions of such datasets. With a thorough analysis, we show which properties of these architectures better contribute to increasing the robustness and their limitations. Overall, CapsNets achieve better robustness against adversarial examples and affine transformations, compared to a traditional CNN with a similar number of parameters. Similar conclusions have been derived for deeper versions of CapsNets and CNNs. Moreover, our results unleash a key finding that the dynamic routing does not contribute much to improving the CapsNets' robustness. Indeed, the main generalization contribution is due to the hierarchical feature learning through capsules.	翻訳日:2023-04-27 03:10:25 公開日:2023-04-25
# ロバスト音声翻訳のための選択的データ拡張 Selective Data Augmentation for Robust Speech Translation ( http://arxiv.org/abs/2304.03169v2 ) ライセンス: Link先を確認	Rajul Acharya, Ashish Panda, Sunil Kumar Kopparapu	(参考訳) 音声翻訳(st)システムは、ある言語でスピーチを他の言語でテキストに変換する。終端STシステム(e2e-ST)は、待ち時間と計算コストの削減により性能が向上したため、カスケードシステムで人気を博している。資源集約的なe2e-stシステムは、カスケードシステムとは異なり、パラ言語的および非言語的特徴を保持できる固有の能力を持っている。本稿では,英語ヒンディー語(en-hi)STにおけるe2eアーキテクチャを提案する。2つの不完全な機械翻訳(MT)サービスを用いて,Libri-transのテキストをハイテキストに変換する。本稿では,各サービスが並列STデータを生成するためにMTデータを個別に提供しながら,頑健なSTを支援するため,ノイズの多いMTデータのデータ拡張戦略を提案する。その結果, MTデータの鈍力増強よりもST(BLEUスコア)がよいことがわかった。我々はアプローチで1.59 bleuスコアの絶対的な改善を観察した。 Speech translation (ST) systems translate speech in one language to text in another language. End-to-end ST systems (e2e-ST) have gained popularity over cascade systems because of their enhanced performance due to reduced latency and computational cost. Though resource intensive, e2e-ST systems have the inherent ability to retain para and non-linguistic characteristics of the speech unlike cascade systems. In this paper, we propose to use an e2e architecture for English-Hindi (en-hi) ST. We use two imperfect machine translation (MT) services to translate Libri-trans en text into hi text. While each service gives MT data individually to generate parallel ST data, we propose a data augmentation strategy of noisy MT data to aid robust ST. The main contribution of this paper is the proposal of a data augmentation strategy. We show that this results in better ST (BLEU score) compared to brute force augmentation of MT data. We observed an absolute improvement of 1.59 BLEU score with our approach.	翻訳日:2023-04-27 03:10:03 公開日:2023-04-25
# 法律文書ページの文脈対応分類 Context-Aware Classification of Legal Document Pages ( http://arxiv.org/abs/2304.02787v2 ) ライセンス: Link先を確認	Pavlos Fragkogiannis, Martina Forster, Grace E. Lee, Dell Zhang	(参考訳) 法律文書(PDFフォーマットなど)などの専門文書の処理、索引付け、検索を必要とする多くのビジネスアプリケーションにとって、任意の文書のページを、事前に対応するタイプに分類することが不可欠である。文書画像分類の分野における既存の研究のほとんどは、単ページ文書にフォーカスするか、文書内の複数のページを独立して扱うかのどちらかである。近年,文書ページ分類の強化のために隣接するページの文脈情報を活用する手法が提案されているが,入力長の制約により,大規模な事前学習言語モデルでは利用できないことが多い。本稿では,上記の限界を克服する単純かつ効果的なアプローチを提案する。具体的には、bertのような事前学習されたトランスフォーマーモデルをコンテキスト認識ページ分類に使用できる、以前のページに関するシーケンシャルな情報を含む追加のトークンで入力を強化する。英語とポルトガル語の2つの法定データセットを用いた実験により,提案手法は,非帰納的設定と他の文脈対応ベースラインと比較して,文書ページ分類の性能を著しく向上することが示された。 For many business applications that require the processing, indexing, and retrieval of professional documents such as legal briefs (in PDF format etc.), it is often essential to classify the pages of any given document into their corresponding types beforehand. Most existing studies in the field of document image classification either focus on single-page documents or treat multiple pages in a document independently. Although in recent years a few techniques have been proposed to exploit the context information from neighboring pages to enhance document page classification, they typically cannot be utilized with large pre-trained language models due to the constraint on input length. In this paper, we present a simple but effective approach that overcomes the above limitation. Specifically, we enhance the input with extra tokens carrying sequential information about previous pages - introducing recurrence - which enables the usage of pre-trained Transformer models like BERT for context-aware page classification. Our experiments conducted on two legal datasets in English and Portuguese respectively show that the proposed approach can significantly improve the performance of document page classification compared to the non-recurrent setup as well as the other context-aware baselines.	翻訳日:2023-04-27 03:09:48 公開日:2023-04-25
# 任意条件(3レベル)状態の直観的可視化法 An Intuitive Visualisation Method for Arbitrary Qutrit (Three Level) States ( http://arxiv.org/abs/2304.01741v2 ) ライセンス: Link先を確認	Max Z. Festenstein	(参考訳) 視覚的な手法は、あらゆるレベルの理解において量子力学の理解と解釈に非常に有用である。例えば、ブロッホ球面は、2レベル量子ビット系の量子力学を視覚化するための重要かつ広く用いられるツールである。本研究では,3次元状態を完全に記述するために必要な8自由度をすべて含みながら,直感的に解釈できるような,ブロッホ球に類似したクイットの「オクタント」可視化手法を提案する。このフレームワークを使用して、典型的な3段階のプロセスのセットをモデル化し、記述し、表示する。 Visual methods are of great utility in understanding and interpreting quantum mechanics at all levels of understanding. The Bloch sphere, for example, is an invaluable and widely used tool for visualising quantum dynamics of a two level qubit system. In this work we present an `octant' visualisation method for qutrits bearing similarity to the Bloch sphere, that encompasses all eight degrees of freedom necessary to fully describe a three level state whilst remaining intuitive to interpret. Using this framework, a set of typical three level processes are modelled, described and displayed.	翻訳日:2023-04-27 03:09:31 公開日:2023-04-25
# 教師のいないプライバシ保全連系蒸留における選択的知識共有 Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher ( http://arxiv.org/abs/2304.01731v3 ) ライセンス: Link先を確認	Jiawei Shao, Fangzhao Wu, Jun Zhang	(参考訳) フェデレーション学習は、ローカルデータを公開せずに、プライバシー保護による協調学習を約束する一方で、ホワイトボックス攻撃に弱いままであり、異種クライアントへの適応に苦慮している。 fd(federated distillation)は、教師モデルから生徒モデルへ知識を移す効果的な技術であり、プライバシー保証を強化し、モデルの不均一性に対処するためのパラダイムである。それでも、ローカルなデータ分布の変化と、よく訓練された教師モデルの欠如によって生じる課題は、モデル性能を著しく低下させる誤解を招きあい、曖昧な知識共有につながる。この問題に対処するため,本稿では,fdのための選択的知識共有機構を提案する。クライアント側セレクタとサーバ側セレクタを含み、それぞれローカルとアンサンブルの予測から知識を正確かつ正確に識別する。理論的洞察に裏付けられた実証研究は、このアプローチがfdフレームワークの一般化能力を高め、ベースラインメソッドを一貫して上回っていることを証明している。 While federated learning is promising for privacy-preserving collaborative learning without revealing local data, it remains vulnerable to white-box attacks and struggles to adapt to heterogeneous clients. Federated distillation (FD), built upon knowledge distillation--an effective technique for transferring knowledge from a teacher model to student models--emerges as an alternative paradigm, which provides enhanced privacy guarantees and addresses model heterogeneity. Nevertheless, challenges arise due to variations in local data distributions and the absence of a well-trained teacher model, which leads to misleading and ambiguous knowledge sharing that significantly degrades model performance. To address these issues, this paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD. It includes client-side selectors and a server-side selector to accurately and precisely identify knowledge from local and ensemble predictions, respectively. Empirical studies, backed by theoretical insights, demonstrate that our approach enhances the generalization capabilities of the FD framework and consistently outperforms baseline methods.	翻訳日:2023-04-27 03:09:20 公開日:2023-04-25
# RPTQ:大規模言語モデルのためのリオーダーベースポストトレーニング量子化 RPTQ: Reorder-based Post-training Quantization for Large Language Models ( http://arxiv.org/abs/2304.01089v3 ) ライセンス: Link先を確認	Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu	(参考訳) 大規模言語モデル(llm)は様々なタスクにおいて優れた性能を示しているが、そのデプロイは、その巨大なモデルサイズのために困難をもたらす。本稿では,LCMの量子化における主な課題は,外乱の問題だけでなく,チャネル間のアクティベーション範囲の違いによるものであることを確認し,LCMのアクティベーションの定量化の問題に対処する,新しいリオーダーベースの量子化手法であるRTPQを提案する。 RPTQはアクティベーション中のチャネルを並べ替え、クラスタ内でそれらを定量化することで、チャネルの範囲差の影響を低減する。さらに,明示的な順序変更を回避し,ストレージと計算オーバーヘッドを削減する。このアプローチを実装することで,LLMモデルを3ビットアクティベーションに初めてプッシュすることで,大きなブレークスルーを達成した。 Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.	翻訳日:2023-04-27 03:09:01 公開日:2023-04-25
# $n$th-order Schr\"{o}dinger方程式における境界状態の量子化条件 Quantization Condition of the Bound States in $n$th-order Schr\"{o}dinger equations ( http://arxiv.org/abs/2304.00914v2 ) ライセンス: Link先を確認	Xiong Fan	(参考訳) 一般近似量子化規則 $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential Well of the equations $e^{-i\pi n/2}\nabla ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, $L_{E}$ and $R_{E}$ 古典的に禁止された領域の境界点が許される。唯一の仮説は、指数的に成長するすべての成分は無視可能であることである。 Schr\"{o}dinger 方程式や Bogoliubov-de Gennes 方程式を含む応用について論じる。 We will prove a general approximate quantization rule $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential well of the equations $e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, and $L_{E}$ and $R_{E}$ the boundary points between the classically forbidden regions and the allowed region. The only hypothesis is that all exponentially growing components are negligible, which is appropriate for not narrow wells. Applications including the Schr\"{o}dinger equation and Bogoliubov-de Gennes equation will be discussed.	翻訳日:2023-04-27 03:08:47 公開日:2023-04-25
# AutoKary2022: 染色体インスタンスセグメンテーションのための大規模アノテーション付きデータセット AutoKary2022: A Large-Scale Densely Annotated Dataset for Chromosome Instance Segmentation ( http://arxiv.org/abs/2303.15839v3 ) ライセンス: Link先を確認	Dan You, Pengcheng Xia, Qiuzhu Chen, Minghui Wu, Suncheng Xiang, Jun Wang	(参考訳) 染色体異常 (karyotype analysis) の診断には, 異相細胞顕微鏡画像からの染色体インスタンスの自動分割が重要である。しかし、高い注釈付きデータセットの欠如や染色体の複雑な形態、例えば、密度分布、任意の方向、幅広い長さがあるため、依然として困難な課題である。この領域の開発を容易にするために、我々は、50人の患者から612の顕微鏡画像に27,000以上の染色体インスタンスを含むautokary2022という、大規模な密注釈付きデータセットを手作業で構築する。具体的には、各インスタンスにポリゴンマスクとクラスラベルをアノテートして、正確な染色体の検出とセグメンテーションを支援する。その上で,本データセットの代表的な手法を体系的に検討し,多くの興味深い知見を得た。このデータセットが医学的理解に向けて研究を進めることを願っている。データセットは、https://github.com/wangjuncongyu/chromosome-instance-segmentation-datasetで利用できる。 Automated chromosome instance segmentation from metaphase cell microscopic images is critical for the diagnosis of chromosomal disorders (i.e., karyotype analysis). However, it is still a challenging task due to lacking of densely annotated datasets and the complicated morphologies of chromosomes, e.g., dense distribution, arbitrary orientations, and wide range of lengths. To facilitate the development of this area, we take a big step forward and manually construct a large-scale densely annotated dataset named AutoKary2022, which contains over 27,000 chromosome instances in 612 microscopic images from 50 patients. Specifically, each instance is annotated with a polygonal mask and a class label to assist in precise chromosome detection and segmentation. On top of it, we systematically investigate representative methods on this dataset and obtain a number of interesting findings, which helps us have a deeper understanding of the fundamental problems in chromosome instance segmentation. We hope this dataset could advance research towards medical understanding. The dataset can be available at: https://github.com/wangjuncongyu/chromosome-instance-segmentation-dataset.	翻訳日:2023-04-27 03:08:10 公開日:2023-04-25
# Align and Attend: Dual Contrastive Lossesを用いたマルチモーダル要約 Align and Attend: Multimodal Summarization with Dual Contrastive Losses ( http://arxiv.org/abs/2303.07284v2 ) ライセンス: Link先を確認	Bo He, Jun Wang, Jielin Qiu, Trung Bui, Abhinav Shrivastava, Zhaowen Wang	(参考訳) マルチモーダル要約の目標は、異なるモダリティから最も重要な情報を抽出して要約を形成することである。単項要約とは異なり、マルチモーダル要約タスクはクロスモーダル情報を明示的に活用し、より信頼性が高く高品質な要約を生成する。しかし、既存の手法では、異なるモダリティ間の時間的対応を活用できず、異なるサンプル間の固有の相関を無視する。そこで本研究では,マルチモーダル入力を効果的に調整し,対応できる統一マルチモーダルトランスフォーマーモデルであるa2summ(aldin and attend multimodal summarization)を提案する。さらに,試料間相関と試料内相関の両方をモデル化する2つの新しいコントラスト損失を提案する。 2つの標準ビデオ要約データセット(TVSumとSumMe)と2つのマルチモーダル要約データセット(Daily MailとCNN)に対する大規模な実験は、A2Summの優位性を示し、すべてのデータセットで最先端のパフォーマンスを達成する。さらに,ライブストリームビデオと注釈付き要約文を含む大規模マルチモーダル要約データセットBLiSSを収集した。私たちのコードとデータセットは、~\url{https://boheumd.github.io/A2Summ/}で公開されています。 The goal of multimodal summarization is to extract the most important information from different modalities to form summaries. Unlike unimodal summarization, the multimodal summarization task explicitly leverages cross-modal information to help generate more reliable and high-quality summaries. However, existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples. To address this issue, we introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input. In addition, we propose two novel contrastive losses to model both inter-sample and intra-sample correlations. Extensive experiments on two standard video summarization datasets (TVSum and SumMe) and two multimodal summarization datasets (Daily Mail and CNN) demonstrate the superiority of A2Summ, achieving state-of-the-art performances on all datasets. Moreover, we collected a large-scale multimodal summarization dataset BLiSS, which contains livestream videos and transcribed texts with annotated summaries. Our code and dataset are publicly available at ~\url{https://boheumd.github.io/A2Summ/}.	翻訳日:2023-04-27 03:07:21 公開日:2023-04-25
# 量子メモリのデバイス非依存認証に向けて Towards the device-independent certification of a quantum memory ( http://arxiv.org/abs/2304.10408v2 ) ライセンス: Link先を確認	Pavel Sekatski, Jean-Daniel Bancal, Marie Ioannou, Mikael Afzelius, Nicolas Brunner	(参考訳) 量子記憶は将来の量子通信ネットワークの主要な要素の1つである。そのため、彼らの認証は重要な課題である。ここでは,量子記憶の効率的な認証手法を提案する。ソースや測定装置の事前特徴化が不要なデバイス非依存的なアプローチを考えることで,量子記憶のためのロバストな自己テスト手法を開発した。次に、最近の固体アンサンブル量子メモリ実験において、0.87の忠実性を確認し、緩和されたシナリオでこの技術の実際的妥当性を示す。より一般的に,本手法は量子チャネルを実装した任意のデバイスの特徴付けに適用される。 Quantum memories represent one of the main ingredients of future quantum communication networks. Their certification is therefore a key challenge. Here we develop efficient certification methods for quantum memories. Considering a device-independent approach, where no a priori characterisation of sources or measurement devices is required, we develop a robust self-testing method for quantum memories. We then illustrate the practical relevance of our technique in a relaxed scenario by certifying a fidelity of 0.87 in a recent solid-state ensemble quantum memory experiment. More generally, our methods apply for the characterisation of any device implementing a qubit identity quantum channel.	翻訳日:2023-04-27 03:00:05 公開日:2023-04-25
# task loss-guided lpメトリックによるオブジェクト検出におけるトレーニング後の量子化の改善 Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric ( http://arxiv.org/abs/2304.09785v2 ) ライセンス: Link先を確認	Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu	(参考訳) オブジェクト検出ネットワークの効率的な推論は、エッジデバイスにおいて大きな課題である。完全精度モデルを直接低ビット幅に変換するPTQ(Post-Training Quantization)は、モデル推論の複雑さを減らすための効果的で便利なアプローチである。しかし、オブジェクト検出などの複雑なタスクに適用すると、かなり精度が低下する。 PTQは量子化パラメータを異なるメトリクスで最適化し、量子化の摂動を最小化する。量子化前後の特徴写像のp-ノルム距離 Lp は摂動を評価する計量として広く用いられている。対象検出ネットワークの特殊性について,lpメトリックのパラメータpが量子化性能に大きく影響することを示す。固定ハイパーパラメータpは最適量子化性能を達成できないことを示す。この問題を軽減するため,我々は,オブジェクト検出のタスク損失を表す object detection output loss (odol) を用いて,異なるレイヤを定量化するための異なる p 値を割り当てるフレームワーク detptq を提案する。 DetPTQは最適な量子化パラメータを選択するためにODOLベースの適応Lpメトリックを使用する。実験の結果,DetPTQは2次元と3次元の両方の物体検出器において,最先端のPTQ法よりも優れていた。例えば、RetinaNet-ResNet18上では、31.1/31.7(量子化/フル精度)のmAPを4ビットの重みと4ビットの活性化で達成する。 Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.	翻訳日:2023-04-27 02:59:38 公開日:2023-04-25
# CB-Conformer: バイアス付き単語認識のためのコンテキストバイアス変換器 CB-Conformer: Contextual biasing Conformer for biased word recognition ( http://arxiv.org/abs/2304.09607v2 ) ライセンス: Link先を確認	Yaoxun Xu and Baiji Liu and Qiaochu Huang and, Xingchen Song and Zhiyong Wu and Shiyin Kang and Helen Meng	(参考訳) ソース領域とターゲット領域のミスマッチにより、偏りのある単語情報をうまく利用して、ターゲット領域における自動音声認識モデルの性能を向上させる方法が、ホットな研究テーマとなる。以前のアプローチでは、固定された外部言語モデルでデコードするか、サイズの大きいバイアスモジュールを導入していた。本研究では,文脈バイアスモジュールと自己適応型言語モデルを導入してバイアス付き単語認識を改善するcb-conformerを提案する。コンテキストバイアスモジュールは、オーディオフラグメントとコンテキスト情報を組み合わせたもので、オリジナルのコンフォーメータのモデルパラメータはわずか0.2%である。自己適応言語モデル(Self-Adaptive Language Model)は、そのリコールと精度に基づいてバイアス付き単語の内部重みを修正し、バイアス付き単語に焦点を合わせ、標準の固定言語モデルよりも自動音声認識モデルとの統合を成功させる。さらに,wenetspeechに基づくオープンソースmandarinbiased-wordデータセットを構築し,公開する。実験の結果,提案手法では文字誤り率を15.34%削減し,14.13%の単語リコール,6.80%の単語F1スコアがベースコンバータに比べて増加した。 Due to the mismatch between the source and target domains, how to better utilize the biased word information to improve the performance of the automatic speech recognition model in the target domain becomes a hot research topic. Previous approaches either decode with a fixed external language model or introduce a sizeable biasing module, which leads to poor adaptability and slow inference. In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer. The Contextual Biasing Module combines audio fragments and contextual information, with only 0.2% model parameters of the original Conformer. The Self-Adaptive Language Model modifies the internal weights of biased words based on their recall and precision, resulting in a greater focus on biased words and more successful integration with the automatic speech recognition model than the standard fixed language model. In addition, we construct and release an open-source Mandarin biased-word dataset based on WenetSpeech. Experiments indicate that our proposed method brings a 15.34% character error rate reduction, a 14.13% biased word recall increase, and a 6.80% biased word F1-score increase compared with the base Conformer.	翻訳日:2023-04-27 02:59:18 公開日:2023-04-25
# 単語から音楽へ:シンボリック音楽生成におけるサブワードトークン化手法の研究 From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation ( http://arxiv.org/abs/2304.08953v2 ) ライセンス: Link先を確認	Adarsh Kumar and Pedro Sarmento	(参考訳) サブワードのトークン化は、トランスフォーマーベースのモデルでテキストベースの自然言語処理(nlp)タスクで広く成功している。シンボリック音楽研究においてトランスフォーマーモデルがますます普及するにつれて、シンボリック音楽領域におけるサブワードトークン化の有効性を検討することが重要である。本稿では,シンボリック音楽生成におけるバイトペア符号化(bpe)などのサブワードトークン化手法と,その全体的な構造への影響について検討する。実験は、シングルトラックメロディのみ、シングル楽器付きマルチトラック、マルチトラックとマルチストラクチャの3種類のMIDIデータセットに基づいている。サブワードのトークン化をポスト・ミュージックのトークン化スキームに適用し,同時に長曲の生成を可能にし,構造指標 (si) やピッチクラスエントロピーなどの客観的指標を用いて,生成された楽曲全体の構造を改善する。また,bpeとunigramという2つのサブワードトークン化手法を比較し,両手法が一貫した改善をもたらすことを確認した。本研究は,サブワードのトークン化が記号的音楽生成に有望な手法であることを示唆し,特にマルチトラック曲などの複雑なデータを含む場合において,楽曲構成に広範な影響を及ぼす可能性があることを示唆する。 Subword tokenization has been widely successful in text-based natural language processing (NLP) tasks with Transformer-based models. As Transformer models become increasingly popular in symbolic music-related studies, it is imperative to investigate the efficacy of subword tokenization in the symbolic music domain. In this paper, we explore subword tokenization techniques, such as byte-pair encoding (BPE), in symbolic music generation and its impact on the overall structure of generated songs. Our experiments are based on three types of MIDI datasets: single track-melody only, multi-track with a single instrument, and multi-track and multi-instrument. We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time and improves the overall structure of the generated music in terms of objective metrics like structure indicator (SI), Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE and Unigram, and observe that both methods lead to consistent improvements. Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition, particularly in cases involving complex data such as multi-track songs.	翻訳日:2023-04-27 02:58:55 公開日:2023-04-25
# DiffFit: 簡単なパラメータ効率の良い微調整による大拡散モデルの解錠性 DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2304.06648v3 ) ライセンス: Link先を確認	Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li	(参考訳) 拡散モデルは高品質な画像の生成に非常に有効であることが証明されている。しかし、大規模な事前学習拡散モデルを新しい領域に適用することは、現実世界のアプリケーションにとって重要な課題である。本稿では,新しい領域への高速適応を可能にする大規模事前学習拡散モデルを微調整するパラメータ効率の高い手法であるdifffitを提案する。 DiffFitは、特定のレイヤでバイアス項と新たに追加されたスケーリング要素のみを微調整するが、トレーニングのスピードアップとモデルストレージコストの削減をもたらす、恥ずかしいほど単純である。完全な微調整と比較すると、DiffFitは2$\times$トレーニングスピードアップを実現しており、全体のモデルパラメータの約0.12\%を格納する必要がある。高速適応におけるスケーリング因子の有効性を正当化する直観的理論解析が提案されている。下流の8つのデータセットでは、DiffFitはより効率的でありながら、完全な微調整よりも優れた、あるいは競争的なパフォーマンスを達成する。注目すべきは、DiffFitが最小のコストを加えることで、訓練済みの低解像度生成モデルを高解像度に適応できることである。拡散ベースの手法の中で、DiffFitはImageNet 512$\times$512ベンチマークで3.02の最先端FIDを新たに設定し、公開前のImageNet 256$\times$256チェックポイントから25エポックだけを微調整した。 Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12\% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.	翻訳日:2023-04-27 02:57:59 公開日:2023-04-25
# SwiftTron: 量子トランスフォーマーのための効率的なハードウェアアクセラレータ SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers ( http://arxiv.org/abs/2304.03986v2 ) ライセンス: Link先を確認	Alberto Marchisio and Davide Dura and Maurizio Capra and Maurizio Martina and Guido Masera and Muhammad Shafique	(参考訳) Transformerの計算集約操作は、リソースに制約のあるEdgeAI / smallMLデバイスへのデプロイにおいて、大きな課題となる。確立されたニューラルネットワーク圧縮技術として、量子化はハードウェア計算とメモリ資源を減らす。特に、固定点量子化は、基礎となるハードウェアの加算器や乗算器のような軽量ブロックを使った計算を容易にするために望ましい。しかし、既存の汎用ハードウェアや汎用AIアクセラレータ、あるいは浮動小数点ユニットを備えたトランスフォーマー専用のアーキテクチャに完全に量子化されたトランスフォーマーをデプロイすることは、実現不可能または/または非効率である。そこで我々は,量子トランスフォーマー用に設計された,効率的なハードウェアアクセラレータSwiftTronを提案する。 SwiftTronは、さまざまなタイプのTransformer操作(Attention、Softmax、GELU、Layer Normalizationなど)の実行をサポートし、正しい計算を行うためのさまざまなスケーリング要因を説明できる。 ASIC設計フローを用いて,完全なSwiftTronアーキテクチャを65ドル nm CMOS 技術で合成する。我々の加速器はRoBERTaベースモデルを1.83 nsで実行し、33.64 mWの電力を消費し、面積は273 mm^2である。再現性を容易にするため、SwiftTronアーキテクチャのRTLはhttps://github.com/albertomarchisio/SwiftTronでリリースされています。 Transformers' compute-intensive operations pose enormous challenges for their deployment in resource-constrained EdgeAI / tinyML devices. As an established neural network compression technique, quantization reduces the hardware computational and memory resources. In particular, fixed-point quantization is desirable to ease the computations using lightweight blocks, like adders and multipliers, of the underlying hardware. However, deploying fully-quantized Transformers on existing general-purpose hardware, generic AI accelerators, or specialized architectures for Transformers with floating-point units might be infeasible and/or inefficient. Towards this, we propose SwiftTron, an efficient specialized hardware accelerator designed for Quantized Transformers. SwiftTron supports the execution of different types of Transformers' operations (like Attention, Softmax, GELU, and Layer Normalization) and accounts for diverse scaling factors to perform correct computations. We synthesize the complete SwiftTron architecture in a $65$ nm CMOS technology with the ASIC design flow. Our Accelerator executes the RoBERTa-base model in 1.83 ns, while consuming 33.64 mW power, and occupying an area of 273 mm^2. To ease the reproducibility, the RTL of our SwiftTron architecture is released at https://github.com/albertomarchisio/SwiftTron.	翻訳日:2023-04-27 02:57:12 公開日:2023-04-25
# MAP-MRF問題の緩和:F-Wolfe方向の組合せ Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions ( http://arxiv.org/abs/2010.09567v5 ) ライセンス: Link先を確認	Vladimir Kolmogorov	(参考訳) MAP-MRF推論問題のLP緩和問題,特に最近提案された手法について考察する(Swoboda, Kolmogorov 2019; Kolmogorov, Pock 2021)。重要な計算サブルーチンとして、FW(Frank-Wolfe)法の変種を用いて、組合せポリトープ上の滑らかな凸関数を最小化する。本稿では, (freund et al. 2017) において異なる文脈で導入された, 内面的frank-wolfe方向に基づくこのサブプルーチンの効率的な実装を提案する。より一般的には、合成サブプロブレムの抽象データ構造を定義し、FW方向を表わし、木構造MAP-MRF推論サブプロブレムの特殊化を記述する。実験の結果,本手法は問題のあるクラスに対する現状のlpソルバであることが判明した。私たちのコードはhttps://pub.ist.ac.at/~vnk/papers/IN-FACE-FW.htmlで利用可能です。 We consider the problem of solving LP relaxations of MAP-MRF inference problems, and in particular the method proposed recently in (Swoboda, Kolmogorov 2019; Kolmogorov, Pock 2021). As a key computational subroutine, it uses a variant of the Frank-Wolfe (FW) method to minimize a smooth convex function over a combinatorial polytope. We propose an efficient implementation of this subproutine based on in-face Frank-Wolfe directions, introduced in (Freund et al. 2017) in a different context. More generally, we define an abstract data structure for a combinatorial subproblem that enables in-face FW directions, and describe its specialization for tree-structured MAP-MRF inference subproblems. Experimental results indicate that the resulting method is the current state-of-art LP solver for some classes of problems. Our code is available at https://pub.ist.ac.at/~vnk/papers/IN-FACE-FW.html.	翻訳日:2023-04-27 01:12:36 公開日:2023-04-25
# 適度に監督された学習:定義、枠組み、一般性 Moderately Supervised Learning: Definition, Framework and Generality ( http://arxiv.org/abs/2008.11945v4 ) ライセンス: Link先を確認	Yongquan Yang	(参考訳) 教師付き学習は多くの人工知能(AI)アプリケーションで顕著な成功を収めた。現在の文献では、トレーニングデータセットに用意されたラベルの特性を参照することにより、教師あり学習(SL)と弱教師あり学習(WSL)に分類される。 SLは、トレーニングデータセットが理想的な(完全で正確な)ラベルで割り当てられている状況、WSLはトレーニングデータセットが非理想的(不完全、不正確な、不正確な)ラベルで割り当てられている状況に関する。しかし、SLタスクに対する様々なソリューションは、与えられたラベルが必ずしも習得しやすいとは限らないことを示しており、与えられたラベルから学習が容易なターゲットへの変換は最終SLソリューションの性能に大きな影響を及ぼす可能性がある。 SLの定義は、与えられたラベルから簡単に学習できるターゲットへの変換の性質を考慮せずに、特定のSLタスクの適切なソリューションを構築する上で重要ないくつかの詳細を隠蔽する。したがって、AIアプリケーション分野のエンジニアには、これらの詳細を体系的に明らかにすることが望ましい。本稿では、SLの分類を拡大し、与えられたラベルが理想である状況に関するサブタイプの中等教育学習(MSL)を調査することにより、この目標を達成することを試みるが、アノテーションの単純さにより、与えられたラベルを学習しやすいターゲットに変換するには、注意深い設計が必要である。定義, フレームワーク, 一般性の観点から, MSL を概念化し, MSL タスクを体系的に解析するための基本的基礎を提供する。その間、mslの概念化と数学者のビジョンの関係を明らかにするとともに、この論文は、数学者のビジョンから解決すべき問題を見るためのaiアプリケーションエンジニアのためのチュートリアルを確立する。 Learning with supervision has achieved remarkable success in numerous artificial intelligence (AI) applications. In the current literature, by referring to the properties of the labels prepared for the training dataset, learning with supervision is categorized as supervised learning (SL) and weakly supervised learning (WSL). SL concerns the situation where the training data set is assigned with ideal (complete, exact and accurate) labels, while WSL concerns the situation where the training data set is assigned with non-ideal (incomplete, inexact or inaccurate) labels. However, various solutions for SL tasks have shown that the given labels are not always easy to learn, and the transformation from the given labels to easy-to-learn targets can significantly affect the performance of the final SL solutions. Without considering the properties of the transformation from the given labels to easy-to-learn targets, the definition of SL conceals some details that can be critical to building the appropriate solutions for specific SL tasks. Thus, for engineers in the AI application field, it is desirable to reveal these details systematically. This article attempts to achieve this goal by expanding the categorization of SL and investigating the sub-type moderately supervised learning (MSL) that concerns the situation where the given labels are ideal, but due to the simplicity in annotation, careful designs are required to transform the given labels into easy-to-learn targets. From the perspectives of the definition, framework and generality, we conceptualize MSL to present a complete fundamental basis to systematically analyse MSL tasks. At meantime, revealing the relation between the conceptualization of MSL and the mathematicians' vision, this paper as well establishes a tutorial for AI application engineers to refer to viewing a problem to be solved from the mathematicians' vision.	翻訳日:2023-04-27 01:12:16 公開日:2023-04-25
# マルチエンベディングインタラクションの観点からの知識グラフ埋め込み手法の解析 Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective ( http://arxiv.org/abs/1903.11406v4 ) ライセンス: Link先を確認	Hung Nghiep Tran, Atsuhiro Takasu	(参考訳) 知識グラフは知識を表現するための一般的なフォーマットであり、セマンティック検索エンジン、質問応答システム、レコメンデーターシステムに多くの応用がある。実世界の知識グラフは通常不完全であるため、この問題を解決するためにCanonical decomposition/Parallel factorization (CP)、DistMult、ComplExなどの知識グラフ埋め込み法が提案されている。これらの手法は、実体と関係を意味空間への埋め込みベクトルとして表現し、それらの間の関係を予測する。埋め込みベクトル自体は、リッチなセマンティック情報を含み、データ分析のような他のアプリケーションで使用することができる。しかし、これらのモデルと埋め込みベクトル自体のメカニズムは大きく異なり、それらの理解と比較が困難である。このような理解の欠如を考えると、特にCPのような2つのロールベースの埋め込みベクトルを持つ複雑なモデルや、複雑な値の埋め込みベクトルを持つ最先端のComplExモデルに対して、それらを用いることは効果的または正しくない。本稿では,これらのモデルの統合と一般化のための新しいアプローチとして,マルチエンベディングインタラクション機構を提案する。理論的にこのメカニズムを導出し、実験的な分析と比較を行う。また,四元数代数に基づく新しいマルチエンベディングモデルを提案し,人気のあるベンチマークを用いて有望な結果が得られることを示す。ソースコードはgithubのhttps://github.com/tranhungnghiep/analyzekgeで入手できる。 Knowledge graph is a popular format for representing knowledge, with many applications to semantic search engines, question-answering systems, and recommender systems. Real-world knowledge graphs are usually incomplete, so knowledge graph embedding methods, such as Canonical decomposition/Parallel factorization (CP), DistMult, and ComplEx, have been proposed to address this issue. These methods represent entities and relations as embedding vectors in semantic space and predict the links between them. The embedding vectors themselves contain rich semantic information and can be used in other applications such as data analysis. However, mechanisms in these models and the embedding vectors themselves vary greatly, making it difficult to understand and compare them. Given this lack of understanding, we risk using them ineffectively or incorrectly, particularly for complicated models, such as CP, with two role-based embedding vectors, or the state-of-the-art ComplEx model, with complex-valued embedding vectors. In this paper, we propose a multi-embedding interaction mechanism as a new approach to uniting and generalizing these models. We derive them theoretically via this mechanism and provide empirical analyses and comparisons between them. We also propose a new multi-embedding model based on quaternion algebra and show that it achieves promising results using popular benchmarks. Source code is available on GitHub at https://github.com/tranhungnghiep/AnalyzeKGE.	翻訳日:2023-04-27 01:11:47 公開日:2023-04-25
# アンサンブルサンプリング Ensemble Sampling ( http://arxiv.org/abs/1705.07347v4 ) ライセンス: Link先を確認	Xiuyuan Lu, Benjamin Van Roy	(参考訳) トンプソンサンプリングは、幅広いオンライン決定問題に対して効果的なヒューリスティックとして現れた。その基本的な形式では、アルゴリズムはモデル上の後方分布から計算とサンプリングを必要とし、単純な特別な場合のみ扱いやすい。本稿では,ニューラルネットワークのような複雑なモデルに直面した場合でもトラクタビリティを維持しつつ,トンプソンサンプリングを近似するアンサンブルサンプリングを開発する。アンサンブルサンプリングは、トンプソンサンプリングが実現可能なアプリケーションの範囲を劇的に拡大する。我々は、このアプローチを支持する理論的基盤を確立し、さらなる洞察を提供する計算結果を示す。 Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. We establish a theoretical basis that supports the approach and present computational results that offer further insight.	翻訳日:2023-04-27 01:11:23 公開日:2023-04-25
# ガウス過程回帰における最大確率推定は不適切である Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed ( http://arxiv.org/abs/2203.09179v3 ) ライセンス: Link先を確認	Toni Karvonen and Chris J. Oates	(参考訳) ガウス過程の回帰は、機械学習と統計学の無数の学術的および工業的応用を基盤としており、最大推定値は、共分散カーネルの適切なパラメータを選択するために日常的に使用される。しかしながら、回帰モデルの予測がデータの小さな摂動に影響を受けないような場合、最大帰納推定が適切に設定される状況を確立することは、まだ未解決の問題である。本稿は,データ内の予測分布がヘリンガー距離に関してリプシッツではない場合に,最大確率推定器が適切に適用されないシナリオを明らかにする。これらの障害ケースは、最大確率で長さスケールパラメータを推定する定常共分散関数を持つガウス過程において、ノイズのないデータ設定で発生する。最大推定の失敗はガウス過程の民俗学の一部ではあるが、これらの厳密な理論的な結果がこの種の最初のものと思われる。これらの負の結果は、ガウス過程モデルのトレーニングに最大推定値を用いた場合、ケース・バイ・ケース・ケース・バイ・ケースにおいて、適切性を評価する必要があることを示唆している。 Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.	翻訳日:2023-04-27 00:21:40 公開日:2023-04-25
# 強化学習におけるスキル伝達の事前, 階層, 情報非対称性 Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning ( http://arxiv.org/abs/2201.08115v2 ) ライセンス: Link先を確認	Sasha Salter, Kristian Hartikainen, Walter Goodwin, Ingmar Posner	(参考訳) 過去の経験から行動を発見し、それらを新しいタスクに移す能力は、現実世界でサンプル効率よく行動するインテリジェントエージェントの目印である。具体化された強化学習者を同じ能力で獲得することは、ロボット工学への展開を成功させる上で重要である。階層的およびKL規則化された強化学習は、ここでは個別に約束するが、おそらくハイブリッドアプローチはそれぞれの利点を組み合わせることができるだろう。これらの分野の鍵となるのは、学習するスキルをバイアスするために、アーキテクチャモジュール間で情報非対称性を使用することである。非対称性の選択は転送可能性に大きな影響を及ぼすが、既存の方法は主にドメインに依存しない、潜在的に最適でない方法での直観に基づく。本稿では,情報非対称性によって制御された逐次的タスク間のスキルの重要表現性と伝達可能性のトレードオフを理論的かつ実証的に示す。この知見を生かして,階層的kl正規化手法である表現可能・伝達可能スキル(apes)に対する注意的優先事項について紹介する。既存のアプローチとは異なり、APESはデータ駆動の領域依存的な方法で、表現性-伝達可能性定理に基づいて非対称性の選択を自動化する。ロボットブロックの積み重ねなど、様々なレベルの外挿と疎結合の複雑な転写領域に対する実験は、APESが以前の手法を大幅に上回って、正しい非対称選択の臨界度を示す。 The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized reinforcement learning individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry across architectural modules to bias which skills are learnt. While asymmetry choice has a large influence on transferability, existing methods base their choice primarily on intuition in a domain-independent, potentially sub-optimal, manner. In this paper, we theoretically and empirically show the crucial expressivity-transferability trade-off of skills across sequential tasks, controlled by information asymmetry. Given this insight, we introduce Attentive Priors for Expressive and Transferable Skills (APES), a hierarchical KL-regularized method, heavily benefiting from both priors and hierarchy. Unlike existing approaches, APES automates the choice of asymmetry by learning it in a data-driven, domain-dependent, way based on our expressivity-transferability theorems. Experiments over complex transfer domains of varying levels of extrapolation and sparsity, such as robot block stacking, demonstrate the criticality of the correct asymmetric choice, with APES drastically outperforming previous methods.	翻訳日:2023-04-27 00:20:42 公開日:2023-04-25
# 適応重み付きマルチビュークラスタリング Adaptive Weighted Multi-View Clustering ( http://arxiv.org/abs/2110.13240v2 ) ライセンス: Link先を確認	Shuo Shuo Liu and Lin Lin	(参考訳) マルチビューデータの学習は機械学習研究において新たな問題であり、非負行列分解(NMF)は複数のビューから情報を統合するための一般的な次元性還元法である。これらの見解はしばしばコンセンサスだけでなく補完的な情報を提供する。しかし、多くのマルチビューnmfアルゴリズムは、各ビューに等しい重みを割り当てたり、ラインサーチを通じて経験的に重みを調整したりする。本稿では,重み付きマルチビューNMF(WM-NMF)アルゴリズムを提案する。特に,各視点の情報内容の定量化のために,視点固有の重みと観測固有の再構築重みの両方を学ぶことを目的とした。導入された重み付けスキームは不要なビューの悪影響を緩和し、より小さいビューとより大きなビューを割り当てることで重要なビューのポジティブな効果を増大させることができる。提案手法の有効性と利点は,既存のアルゴリズムと比較して,クラスタリング性能の向上とノイズデータへの対処について検証した。 Learning multi-view data is an emerging problem in machine learning research, and nonnegative matrix factorization (NMF) is a popular dimensionality-reduction method for integrating information from multiple views. These views often provide not only consensus but also complementary information. However, most multi-view NMF algorithms assign equal weight to each view or tune the weight via line search empirically, which can be infeasible without any prior knowledge of the views or computationally expensive. In this paper, we propose a weighted multi-view NMF (WM-NMF) algorithm. In particular, we aim to address the critical technical gap, which is to learn both view-specific weight and observation-specific reconstruction weight to quantify each view's information content. The introduced weighting scheme can alleviate unnecessary views' adverse effects and enlarge the positive effects of the important views by assigning smaller and larger weights, respectively. Experimental results confirm the effectiveness and advantages of the proposed algorithm in terms of achieving better clustering performance and dealing with the noisy data compared to the existing algorithms.	翻訳日:2023-04-27 00:20:21 公開日:2023-04-25
# エージェントにマップの仕方を教える:マルチオブジェクトナビゲーションのための空間推論 Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation ( http://arxiv.org/abs/2107.06011v4 ) ライセンス: Link先を確認	Pierre Marza, Laetitia Matignon, Olivier Simonin, Christian Wolf	(参考訳) 視覚ナビゲーションの文脈では,エージェントがその観測履歴を考慮した場所で活用し,既知の目標を効率的に達成するためには,新たな環境をマップする能力が必要である。この能力は、エージェントが空間的関係や規則性を知覚し、対象の特性を発見できる空間的推論と関連付けられる。最近の研究は、ディープニューラルネットワークによってパラメータ化され、強化学習(RL)でトレーニングされた学習可能なポリシーを導入している。古典的なRLセットアップでは、報酬のみから、空間的にマッピングと推論の能力がエンドツーエンドで学習される。そこで本研究では,目標達成目標達成のために訓練されたエージェントにおける空間認識能力の出現を指向した補助的タスクの形で補足的監視を導入する。与えられた位置におけるエージェントと到達目標の間の空間的関係を定量化する指標を推定する学習は、多目的ナビゲーション設定において高い正の影響を及ぼすことを示す。提案手法は,環境の明示的あるいは暗黙的な表現を構築する,異なるベースラインエージェントの性能を著しく向上させる。提案された補助的損失で訓練された文献の学習ベースのエージェントは、CVPR 2021 Embodied AI Workshopの一部であるMulti-Object Navigation Challengeに優勝した。 In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object characteristics. Recent work introduces learnable policies parametrized by deep neural networks and trained with Reinforcement Learning (RL). In classical RL setups, the capacity to map and reason spatially is learned end-to-end, from reward alone. In this setting, we introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge, part of the CVPR 2021 Embodied AI Workshop.	翻訳日:2023-04-27 00:20:03 公開日:2023-04-25
# Fed-FSNet:ファジィ合成ネットワークによる非I.I.D.フェデレーション学習の緩和 Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network ( http://arxiv.org/abs/2208.12044v2 ) ライセンス: Link先を確認	Jingcai Guo, Song Guo, Jie Zhang, Ziming Liu	(参考訳) フェデレーテッド・ラーニング(FL)は、最近、有望なプライバシー保護分散機械学習フレームワークとして登場した。エッジデバイス上でローカルに分散トレーニングを実行し、クラウドサーバに生のデータ共有を集中せずにグローバルモデルに集約することで、共有グローバルモデルを共同学習することを目指している。しかし、エッジデバイス間の大きなローカルデータ不均一性(Non-I.D.データ)のため、FLはローカルデータセットによりシフトした勾配を生成できるグローバルモデルを容易に得ることができ、それによってモデルの性能が低下したり、トレーニング中に非収束に苦しむことさえできる。本稿では,Fed-FSNet(Fed-FSNet)と呼ばれる新しいFLトレーニングフレームワークを提案する。具体的には、クラウドサーバにエッジに依存しない隠れモデルを保持し、グローバルモデルの方向対応インバージョンを推定する。隠れたモデルは、グローバルモデルのみに条件付きI.I.D.データサンプル(サンプル特徴)をファジィに合成し、エッジデバイスで共有することで、FLトレーニングを高速でよりよく収束させる。さらに、この合成プロセスは、ローカルモデルのパラメータや更新情報へのアクセスや、個々のローカルモデル出力の分析を伴わないため、FLのプライバシを保証できる。いくつかのFLベンチマークによる実験結果から,本手法は非I.D.問題を大幅に軽減し,他の代表手法よりも優れた性能が得られることが示された。 Federated learning (FL) has emerged as a promising privacy-preserving distributed machine learning framework recently. It aims at collaboratively learning a shared global model by performing distributed training locally on edge devices and aggregating local models into a global one without centralized raw data sharing in the cloud server. However, due to the large local data heterogeneities (Non-I.I.D. data) across edge devices, the FL may easily obtain a global model that can produce more shifted gradients on local datasets, thereby degrading the model performance or even suffering from the non-convergence during training. In this paper, we propose a novel FL training framework, dubbed Fed-FSNet, using a properly designed Fuzzy Synthesizing Network (FSNet) to mitigate the Non-I.I.D. FL at-the-source. Concretely, we maintain an edge-agnostic hidden model in the cloud server to estimate a less-accurate while direction-aware inversion of the global model. The hidden model can then fuzzily synthesize several mimic I.I.D. data samples (sample features) conditioned on only the global model, which can be shared by edge devices to facilitate the FL training towards faster and better convergence. Moreover, since the synthesizing process involves neither access to the parameters/updates of local models nor analyzing individual local model outputs, our framework can still ensure the privacy of FL. Experimental results on several FL benchmarks demonstrate that our method can significantly mitigate the Non-I.I.D. issue and obtain better performance against other representative methods.	翻訳日:2023-04-27 00:12:21 公開日:2023-04-25
# 事前知識を用いた多目的パラメータ最適化のための効率的なユーティリティ関数学習 Efficient Utility Function Learning for Multi-Objective Parameter Optimization with Prior Knowledge ( http://arxiv.org/abs/2208.10300v2 ) ライセンス: Link先を確認	Farha A. Khan, J\"org P. Dietrich, Christian Wirth	(参考訳) マルチオブジェクト最適化における現在の最先端は、与えられたユーティリティ関数を仮定し、インタラクティブにユーティリティ関数を学習するか、または完全なParetoフロントを決定しようとする。しかしながら、実世界の問題における結果誘発は、しばしば暗黙的かつ明示的な専門家の知識に基づいているため、ユーティリティ関数の定義が困難である。これを軽減するため、好み学習によって専門家の知識を用いて、オフラインでユーティリティ関数を学習する。他の作品とは対照的に、結果の選好(pairwise)だけでなく、ユーティリティ関数空間に関する粗い情報も使用しています。これにより、特に非常に少ない結果を使用する場合、ユーティリティ関数の推定を改善することができる。さらに,ユーティリティ関数学習タスクにおける不確かさをモデル化し,最適化チェーン全体を通して伝達する。ユーティリティ関数を学習する手法は,高品質な結果をもたらす一方で,専門家の関与を繰り返す必要をなくす。本稿では,提案手法のサンプル効率と品質向上を4つの領域で示し,特にサーロゲートユーティリティ関数が真のエキスパートユーティリティ関数を正確に捉えることができない場合について述べる。また, 良好な結果を得るには, 誘導不確実性を検討し, 実世界領域で一般的な問題であるバイアスドサンプルの効果を分析することが重要であることを示した。 The current state-of-the-art in multi-objective optimization assumes either a given utility function, learns a utility function interactively or tries to determine the complete Pareto front, requiring a post elicitation of the preferred result. However, result elicitation in real world problems is often based on implicit and explicit expert knowledge, making it difficult to define a utility function, whereas interactive learning or post elicitation requires repeated and expensive expert involvement. To mitigate this, we learn a utility function offline, using expert knowledge by means of preference learning. In contrast to other works, we do not only use (pairwise) result preferences, but also coarse information about the utility function space. This enables us to improve the utility function estimate, especially when using very few results. Additionally, we model the occurring uncertainties in the utility function learning task and propagate them through the whole optimization chain. Our method to learn a utility function eliminates the need of repeated expert involvement while still leading to high-quality results. We show the sample efficiency and quality gains of the proposed method in 4 domains, especially in cases where the surrogate utility function is not able to exactly capture the true expert utility function. We also show that to obtain good results, it is important to consider the induced uncertainties and analyze the effect of biased samples, which is a common problem in real world domains.	翻訳日:2023-04-27 00:11:53 公開日:2023-04-25
# ディープニューラルネットワークを用いた等方関数予測 Isoform Function Prediction Using a Deep Neural Network ( http://arxiv.org/abs/2208.03325v3 ) ライセンス: Link先を確認	Sara Ghazanfari, Ali Rasteh, Seyed Abolfazl Motahari, Mahdieh Soleymani Baghshah	(参考訳) アイソフォームは、オルタナティブスプライシングと呼ばれる現象において同じ遺伝子部位から生成されるmRNAである。ヒトマルチエクソン遺伝子の95%以上が代替スプライシングを受けていることが研究で示されている。 mRNA配列にはほとんど変化はないが、細胞機能や調節に系統的な影響を及ぼす可能性がある。遺伝子のアイソフォームは異なる、あるいは対照的な機能を持っていると広く報告されている。多くの研究は、代替スプライシングが人間の健康と病気に重要な役割を果たすことを示した。幅広い遺伝子機能研究にもかかわらず、アイソフォームの機能についてはほとんど情報がない。近年,遺伝子機能と遺伝子発現プロファイルを用いてアイソフォーム関数を予測するために,複数インスタンス学習に基づく計算手法が提案されている。しかし、ラベル付きトレーニングデータがないため、それらのパフォーマンスは望ましいものではない。さらに、条件ランダム場(CRF)のような確率モデルを用いてアイソフォームの関係をモデル化している。本研究は, アイソフォーム配列, 発現プロファイル, 遺伝子オントロジーグラフなどのデータと貴重な情報を全て利用し, ディープニューラルネットワークに基づく包括的モデルを提案する。 UniProt Gene Ontology (GO)データベースは、遺伝子機能の標準参照として使用される。 NCBI RefSeqデータベースは遺伝子およびアイソフォーム配列の抽出に使用され、NCBI SRAデータベースは発現プロファイルデータに使用される。予測精度の測定には、曲線下の受信機動作特性領域(roc auc)や曲線下の精度リコール(pr auc)などの指標を用いる。 Isoforms are mRNAs produced from the same gene site in the phenomenon called Alternative Splicing. Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing. Although there are few changes in mRNA sequence, They may have a systematic effect on cell function and regulation. It is widely reported that isoforms of a gene have distinct or even contrasting functions. Most studies have shown that alternative splicing plays a significant role in human health and disease. Despite the wide range of gene function studies, there is little information about isoforms' functionalities. Recently, some computational methods based on Multiple Instance Learning have been proposed to predict isoform function using gene function and gene expression profile. However, their performance is not desirable due to the lack of labeled training data. In addition, probabilistic models such as Conditional Random Field (CRF) have been used to model the relation between isoforms. This project uses all the data and valuable information such as isoform sequences, expression profiles, and gene ontology graphs and proposes a comprehensive model based on Deep Neural Networks. The UniProt Gene Ontology (GO) database is used as a standard reference for gene functions. The NCBI RefSeq database is used for extracting gene and isoform sequences, and the NCBI SRA database is used for expression profile data. Metrics such as Receiver Operating Characteristic Area Under the Curve (ROC AUC) and Precision-Recall Under the Curve (PR AUC) are used to measure the prediction accuracy.	翻訳日:2023-04-27 00:11:30 公開日:2023-04-25
# イベントレベルの視覚的質問応答に対するクロスモーダル因果関係推論 Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering ( http://arxiv.org/abs/2207.12647v7 ) ライセンス: Link先を確認	Yang Liu, Guanbin Li, Liang Lin	(参考訳) 既存の視覚的質問応答手法は、しばしばクロスモーダルなスプリアス相関や、ビデオにまたがる事象の時間性、因果性、ダイナミクスを捉えるのに失敗するイベントレベルの推論プロセスを単純化してしまう。本稿では,イベントレベルの視覚的質問応答のタスクに対処するため,クロスモーダル因果関係推論のためのフレームワークを提案する。特に、視覚的および言語的モダリティにまたがる因果構造を発見するために、一連の因果的介入操作が導入された。私たちのフレームワークは、Cross-Modal Causal RelatIonal Reasoning (CMCIR)と呼ばれ、3つのモジュールを含んでいる。一正面的及び裏的因果的介入による視覚的及び言語的スプリアス相関を共同的に区別する因果性認識視覚言語的推論(cvlr)モジュール二視覚的・言語的意味論のきめ細かい相互作用を捉えるための時空間変換器(STT)モジュール三グローバル意味認識視覚言語表現を適応的に学習するための視覚言語機能融合(vlff)モジュール 4つのイベントレベルのデータセットに対する大規模な実験は、視覚言語学的因果構造を発見し、堅牢なイベントレベルの視覚的質問応答を実現する上で、CMCIRの優位性を示している。データセット、コード、モデルはhttps://github.com/HCPLab-SYSU/CMCIRで公開されている。 Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively disentangling the visual and linguistic spurious correlations via front-door and back-door causal interventions; ii) Spatial-Temporal Transformer (STT) module for capturing the fine-grained interactions between visual and linguistic semantics; iii) Visual-Linguistic Feature Fusion (VLFF) module for learning the global semantic-aware visual-linguistic representations adaptively. Extensive experiments on four event-level datasets demonstrate the superiority of our CMCIR in discovering visual-linguistic causal structures and achieving robust event-level visual question answering. The datasets, code, and models are available at https://github.com/HCPLab-SYSU/CMCIR.	翻訳日:2023-04-27 00:11:07 公開日:2023-04-25
# バイオメトリックブレンダー:生体特徴空間を模倣する超高次元多クラス合成データジェネレータ BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space ( http://arxiv.org/abs/2206.10747v2 ) ライセンス: Link先を確認	Marcell Stippinger, D\'avid Han\'ak, Marcell T. Kurbucz, Gergely Hancz\'ar, Oliv\'er M. T\"orteli, Zolt\'an Somogyv\'ari	(参考訳) 自由に利用可能な(実物または合成物)高次元または超高次元のマルチクラスデータセットの欠如は、特徴スクリーニングの研究、特にバイオメトリックスの分野では、このようなデータセットの使用が一般的である。本稿では,超高次元多クラス合成データ生成器であるbiometricblenderと呼ばれるpythonパッケージについて報告する。データ生成プロセスにおいて、ブレンドされた特徴の全体的な有用性と相互関係をユーザによって制御することができ、合成特徴空間は実際のバイオメトリックデータセットの重要な特性を模倣することができる。 The lack of freely available (real-life or synthetic) high or ultra-high dimensional, multi-class datasets may hamper the rapidly growing research on feature screening, especially in the field of biometrics, where the usage of such datasets is common. This paper reports a Python package called BiometricBlender, which is an ultra-high dimensional, multi-class synthetic data generator to benchmark a wide range of feature screening methods. During the data generation process, the overall usefulness and the intercorrelations of blended features can be controlled by the user, thus the synthetic feature space is able to imitate the key properties of a real biometric dataset.	翻訳日:2023-04-27 00:10:43 公開日:2023-04-25
# Twitter会話スレッドのヘイトインテンシティ予測 Predicting Hate Intensity of Twitter Conversation Threads ( http://arxiv.org/abs/2206.08406v3 ) ライセンス: Link先を確認	Qing Meng and Tharun Suresh, Roy Ka-Wei Lee, Tanmoy Chakraborty	(参考訳) ツイートは、オンラインのソーシャルメディアにおける最も簡潔なコミュニケーション形態であり、一つのツイートが会話の会話を作り、破壊する可能性を秘めている。オンラインヘイトスピーチはかつてないほどアクセスしやすく、その拡散を抑制することは、ソーシャルメディア企業やユーザーにとって、コンジェニアルコミュニケーションにとって最も重要である。最近の少数の研究は、ツイートスレッド/コンテキストに関わらず、個々のツイートを分類することに重点を置いている。ヘイトスピーチを抑制する古典的なアプローチの1つは、ヘイトスピーチの投稿後にリアクティブ戦略を採用することである。ポストのファクト戦略は、ヘイトスピーチを自力で扇動する可能性を示さない微妙なポストを無視する結果となり、ポストの回答で続く議論に終止符を打つ可能性がある。本稿では,将来,ツイートが応答チェーンを通じてもたらす憎悪の強さを予測することを目的としたDRAGNET++を提案する。ツイートスレッドのセマンティックな構造と伝播構造を利用して、続く各ツイートにおけるヘイト強度の低下につながるコンテキスト情報を最大化する。反人種差別には、米国の政治や新型コロナウイルス(covid-19)背景における人種差別的発言に関するソーシャルメディア談話の返信ツイート、新型コロナウイルス(covid-19)のパンデミック中の4000万ツイートのデータセット、新型コロナウイルス(covid-19)のパンデミック時の反asian行動に基づくtwitterデータセットが含まれる。キュレートされたデータセットはすべて、ツイートスレッドの構造グラフ情報で構成されている。 DRAGNET++は最先端のすべてのベースラインを大幅に上回ることを示す。これは、Person相関係数の11%のマージンで最高のベースラインを上回り、他の2つのデータセットで同様のパフォーマンスを持つ反ラチズムデータセットのRMSEでは25%低下する。 Tweets are the most concise form of communication in online social media, wherein a single tweet has the potential to make or break the discourse of the conversation. Online hate speech is more accessible than ever, and stifling its propagation is of utmost importance for social media companies and users for congenial communication. Most of the research barring a recent few has focused on classifying an individual tweet regardless of the tweet thread/context leading up to that point. One of the classical approaches to curb hate speech is to adopt a reactive strategy after the hate speech postage. The ex-post facto strategy results in neglecting subtle posts that do not show the potential to instigate hate speech on their own but may portend in the subsequent discussion ensuing in the post's replies. In this paper, we propose DRAGNET++, which aims to predict the intensity of hatred that a tweet can bring in through its reply chain in the future. It uses the semantic and propagating structure of the tweet threads to maximize the contextual information leading up to and the fall of hate intensity at each subsequent tweet. We explore three publicly available Twitter datasets -- Anti-Racism contains the reply tweets of a collection of social media discourse on racist remarks during US political and Covid-19 background; Anti-Social presents a dataset of 40 million tweets amidst the COVID-19 pandemic on anti-social behaviours; and Anti-Asian presents Twitter datasets collated based on anti-Asian behaviours during COVID-19 pandemic. All the curated datasets consist of structural graph information of the Tweet threads. We show that DRAGNET++ outperforms all the state-of-the-art baselines significantly. It beats the best baseline by an 11% margin on the Person correlation coefficient and a decrease of 25% on RMSE for the Anti-Racism dataset with a similar performance on the other two datasets.	翻訳日:2023-04-27 00:10:31 公開日:2023-04-25
# 自己教師付き視覚前訓練のためのマスク周波数モデリング Masked Frequency Modeling for Self-Supervised Visual Pre-Training ( http://arxiv.org/abs/2206.07706v2 ) ライセンス: Link先を確認	Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy	(参考訳) MFM(Masked Frequency Modeling)は、視覚モデルの自己教師付き事前学習のための統合周波数領域に基づくアプローチである。本稿では,空間領域の入力埋め込みにマスクトークンをランダムに挿入する代わりに,その視点を周波数領域にシフトする。具体的には、まずMFMが入力画像の周波数成分の一部をマスクし、周波数スペクトルの欠落周波数を予測する。我々の重要な洞察は、周波数領域におけるマスキング成分の予測は、空間領域におけるマスキングパッチの予測よりも、空間領域におけるマスキングパターンを明らかにすることがより理想的なことである。その結果,マスク・アンド・予測戦略の適切な構成では,高周波数成分の構造情報と低周波数成分間の低レベル統計の両方が優れた表現の学習に有用であることが示唆された。 MFMは初めて、ViTとCNNの両方で、単純な非シームフレームワークが、以下のものを使って意味のある表現を学習できることを示した。 (i)余分なデータ (ii)余分なモデル (iii)マスクトークン。画像分類と意味セグメンテーションの実験結果およびいくつかのロバスト性ベンチマークは、最近のマスク画像モデリングアプローチと比較して、mfmの競争力と高度なロバスト性を示している。さらに,従来の画像復元作業の有効性を,統合周波数の観点から総合的に検討し,MFM手法との興味深い関係を明らかにする。 We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. Instead of randomly inserting mask tokens to the input embeddings in the spatial domain, in this paper, we shift the perspective to the frequency domain. Specifically, MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum. Our key insight is that predicting masked components in the frequency domain is more ideal to reveal underlying image patterns rather than predicting masked patches in the spatial domain, due to the heavy spatial redundancy. Our findings suggest that with the right configuration of mask-and-predict strategy, both the structural information within high-frequency components and the low-level statistics among low-frequency counterparts are useful in learning good representations. For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token. Experimental results on image classification and semantic segmentation, as well as several robustness benchmarks show the competitive performance and advanced robustness of MFM compared with recent masked image modeling approaches. Furthermore, we also comprehensively investigate the effectiveness of classical image restoration tasks for representation learning from a unified frequency perspective and reveal their intriguing relations with our MFM approach.	翻訳日:2023-04-27 00:09:55 公開日:2023-04-25
# 分散多重ネットワークモデルにおけるスパース部分空間クラスタリング Sparse Subspace Clustering in Diverse Multiplex Network Model ( http://arxiv.org/abs/2206.07602v2 ) ライセンス: Link先を確認	Majid Noroozi and Marianna Pensky	(参考訳) 本論文は,pensky と wang (2021) で導入された多元的多重化(dimple)ネットワークモデルについて考察する。さらに、すべての層を同じコミュニティ構造を持つグループに分割することができるが、同じグループの層はブロック接続確率の異なる行列を持つかもしれない。 DIMPLEモデルは、すべての層で同じコミュニティ構造を持つ多層ネットワークを研究する複数の論文と、同じグループの層がブロック接続確率の同じ行列を持つMixture Multilayer Stochastic Block Model (MMLSBM)を一般化する。ペンスキーとwang (2021) は隣接テンソルのプロキシにスペクトルクラスタリングを適用したが、本論文は同一のコミュニティ構造を持つ層群を識別するためにスパース部分空間クラスタリング (ssc) を用いる。穏やかな条件下では、後者は層間クラスタリングに強い一貫性をもたらす。さらに、SSC は Pensky や Wang (2021) の方法論よりもはるかに大きなネットワークを扱うことができ、並列コンピューティングの応用に完全に適している。 The paper considers the DIverse MultiPLEx (DIMPLE) network model, introduced in Pensky and Wang (2021), where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection probabilities. The DIMPLE model generalizes a multitude of papers that study multilayer networks with the same community structures in all layers, as well as the Mixture Multilayer Stochastic Block Model (MMLSBM), where the layers in the same group have identical matrices of block connection probabilities. While Pensky and Wang (2021) applied spectral clustering to the proxy of the adjacency tensor, the present paper uses Sparse Subspace Clustering (SSC) for identifying groups of layers with identical community structures. Under mild conditions, the latter leads to the strongly consistent between-layer clustering. In addition, SSC allows to handle much larger networks than methodology of Pensky and Wang (2021), and is perfectly suitable for application of parallel computing.	翻訳日:2023-04-27 00:09:33 公開日:2023-04-25
# nash平衡としての対称一般化固有値問題 The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium ( http://arxiv.org/abs/2206.04993v2 ) ライセンス: Link先を確認	Ian Gemp, Charlie Chen, Brian McWilliams	(参考訳) 対称一般化固有値問題(SGEP)は、数値線型代数の基本概念である。標準相関解析、独立成分分析、部分最小二乗法、線形判別分析、主成分など、多くの古典的機械学習問題の解を捉えている。それにもかかわらず、ほとんどの一般的なソルバは、ストリーミングデータセット(ミニバッチ)を扱う場合、制限的に高価であり、研究は、特定の問題インスタンスに対する効率的なソリューションを見つけることに集中している。本研究では,nash 平衡が一般化固有ベクトルの集合であるトップ-$k$ sgep のゲーム理論的定式化を考案する。また,Nashへの漸近収束を保証した並列化可能なアルゴリズムを提案する。現在の最先端のメソッドは、1回のイテレーションで$o(d^2k)$ランタイムの複雑さを必要とします。私たちは、この並列アプローチを$O(dk)$ランタイムの複雑さを達成する方法を示します。実験の結果,ニューラルネットワークのアクティベーションの大規模解析を含む様々なsgep問題に対して,このアルゴリズムが解決できることを実証する。 The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive when dealing with streaming data sets (i.e., minibatches) and research has instead concentrated on finding efficient solutions to specific problem instances. In this work, we develop a game-theoretic formulation of the top-$k$ SGEP whose Nash equilibrium is the set of generalized eigenvectors. We also present a parallelizable algorithm with guaranteed asymptotic convergence to the Nash. Current state-of-the-art methods require $O(d^2k)$ runtime complexity per iteration which is prohibitively expensive when the number of dimensions ($d$) is large. We show how to modify this parallel approach to achieve $O(dk)$ runtime complexity. Empirically we demonstrate that this resulting algorithm is able to solve a variety of SGEP problem instances including a large-scale analysis of neural network activations.	翻訳日:2023-04-27 00:09:13 公開日:2023-04-25
# ターゲット適応設計 Targeted Adaptive Design ( http://arxiv.org/abs/2205.14208v2 ) ライセンス: Link先を確認	Carlo Graziani and Marieme Ngom	(参考訳) 現代の先進的製造と先端材料設計は、しばしば最適な構造、特性、性能パラメータをもたらす設定のために比較的高次元のプロセス制御パラメータ空間を探索する必要がある。前者から後者へのマッピングは、ノイズの実験や高価なシミュレーションから決定されなければならない。本稿では,制御空間から設計空間への未知の関数を,所定の許容範囲内で所望の設計特徴を生成する最適制御設定を定量化して,高価なノイズ測定により確認しなければならない数学的枠組みに抽象化する。本稿では,このサンプリング作業を効率的に行う新しいアルゴリズムであるtarget adaptive design (tad)について述べる。 TADは、各反復段階で未知のマッピングのガウス過程サロゲートモデルを作成し、新しい制御設定のバッチを実験的にサンプリングし、ターゲット設計のログ予測可能性の更新を最適化する。 tadは、許容ボックス内に収まる不確実性のある解を見つけるか、将来の予測情報を用いて探索空間が無解で枯渇したかどうかを判定する。したがって、TADは、ベイズ最適化や最適実験設計と本質的に異なる方法で探査・探査の緊張を具現化している。 Modern advanced manufacturing and advanced materials design often require searches of relatively high-dimensional process control parameter spaces for settings that result in optimal structure, property, and performance parameters. The mapping from the former to the latter must be determined from noisy experiments or from expensive simulations. We abstract this problem to a mathematical framework in which an unknown function from a control space to a design space must be ascertained by means of expensive noisy measurements, which locate optimal control settings generating desired design features within specified tolerances, with quantified uncertainty. We describe targeted adaptive design (TAD), a new algorithm that performs this sampling task efficiently. TAD creates a Gaussian process surrogate model of the unknown mapping at each iterative stage, proposing a new batch of control settings to sample experimentally and optimizing the updated log-predictive likelihood of the target design. TAD either stops upon locating a solution with uncertainties that fit inside the tolerance box or uses a measure of expected future information to determine that the search space has been exhausted with no solution. TAD thus embodies the exploration-exploitation tension in a manner that recalls, but is essentially different from, Bayesian optimization and optimal experimental design.	翻訳日:2023-04-27 00:08:56 公開日:2023-04-25
# 図面における顔・身体検出のためのドメイン適応型自己監督型事前訓練 Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings ( http://arxiv.org/abs/2211.10641v2 ) ライセンス: Link先を確認	Bar{\i}\c{s} Batuhan Topal, Deniz Yuret, Tevfik Metin Sezgin	(参考訳) 図面は絵の抽象とコミュニケーションの強力な手段である。デジタルアート、漫画、漫画など様々な形の図面を理解することは、コンピュータビジョンやコンピュータグラフィックスのコミュニティにとって大きな関心事となっている。漫画や漫画のデジタル化図面は多いが、多彩なスタイルのバリエーションがあり、ドメイン固有認識器の訓練に高価な手書きラベルを必要とする。本研究では,学生ネットワークの更新設計を改良した教師学生ネットワークに基づく自己教師型学習が,顔と身体の検知にどのように役立つかを示す。私たちの設定では、少数のサブセットのみにラベルが提供される場合、ターゲットドメインから大量のラベル付きデータを利用できます。さらに我々は,自然画像(現実世界の画像)から大量のドメイン外ラベル付き画像を用いて,学習パイプラインからブートストラップ検出器へのスタイル転送が可能であることを実証した。組合わされたアーキテクチャは,最小限のアノテーションによる最先端(SOTA)および近SOTA性能の検出器を生成する。私たちのコードはhttps://github.com/barisbatuhan/DASS_Detectorからアクセスできます。 Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.	翻訳日:2023-04-27 00:02:57 公開日:2023-04-25
# MMD-B-Fair:統計的テストによる公正表現の学習 MMD-B-Fair: Learning Fair Representations with Statistical Testing ( http://arxiv.org/abs/2211.07907v3 ) ライセンス: Link先を確認	Namrata Deka and Danica J. Sutherland	(参考訳) 本稿では,カーネル2サンプルテストによるデータの公平な表現を学習するためのMDD-B-Fairを提案する。最大平均不一致(mmd)テストでは、対象属性に関する情報を保存しつつ、異なる感度グループの表現を区別できないような、データのニューラルな特徴を見出す。 mmdテストのパワーを最小化することは、テストしきい値の複雑な振る舞いを単純に無視できないため、(以前の作業のように)最大化するよりも難しい。本手法は, ブロックテスト方式の単純な漸近を利用して, 複雑な対角最適化や生成的モデリング方式を必要とせずに, 公正表現を効率的に見つける。提案手法を各種データセット上で評価し, 重要属性に関する情報を「隠蔽」する機能, 下流転送における有効性を示す。 We introduce a method, MMD-B-Fair, to learn fair representations of data via kernel two-sample testing. We find neural features of our data where a maximum mean discrepancy (MMD) test cannot distinguish between representations of different sensitive groups, while preserving information about the target attributes. Minimizing the power of an MMD test is more difficult than maximizing it (as done in previous work), because the test threshold's complex behavior cannot be simply ignored. Our method exploits the simple asymptotics of block testing schemes to efficiently find fair representations without requiring complex adversarial optimization or generative modelling schemes widely used by existing work on fair representation learning. We evaluate our approach on various datasets, showing its ability to ``hide'' information about sensitive attributes, and its effectiveness in downstream transfer tasks.	翻訳日:2023-04-27 00:02:38 公開日:2023-04-25
# 安全な潜伏拡散:拡散モデルにおける不適切な変性の緩和 Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models ( http://arxiv.org/abs/2211.05105v3 ) ライセンス: Link先を確認	Patrick Schramowski, Manuel Brack, Bj\"orn Deiseroth, Kristian Kersting	(参考訳) テキスト条件付き画像生成モデルは近年,画像品質とテキストアライメントの驚くべき結果が得られ,急速に成長するアプリケーションに採用されている。それらは高度にデータ駆動であり、インターネットからランダムにスクレイピングされた数十億規模のデータセットに依存しているため、デジェネレーションや偏りのある人間の行動からも苦しんでいます。逆に、これらのバイアスを補強することもある。好ましくない副作用に対処するために,安全な潜伏拡散(SLD)を示す。具体的には, トレーニングセットの不整合による不適切な変性を測定するため, ヌード性や暴力などの概念を包含する, ベッド不適切な画像プロンプト(I2P)を含む新しい画像生成テスト用画像プロンプトを確立する。以上の結果から,SLDは拡散過程において不適切な画像部分を除去・抑制し,追加の訓練を必要とせず,全体的な画像品質やテキストアライメントに悪影響を及ぼさない。 Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the internet, they also suffer, as we demonstrate, from degenerated and biased human behavior. In turn, they may even reinforce such biases. To help combat these undesired side effects, we present safe latent diffusion (SLD). Specifically, to measure the inappropriate degeneration due to unfiltered and imbalanced training sets, we establish a novel image generation test bed-inappropriate image prompts (I2P)-containing dedicated, real-world image-to-text prompts covering concepts such as nudity and violence. As our exhaustive empirical evaluation demonstrates, the introduced SLD removes and suppresses inappropriate image parts during the diffusion process, with no additional training required and no adverse effect on overall image quality or text alignment.	翻訳日:2023-04-27 00:02:18 公開日:2023-04-25
# structdiffusion: 未知のオブジェクトを用いた物理的に有価な構造の作成 StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects ( http://arxiv.org/abs/2211.04604v2 ) ライセンス: Link先を確認	Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton	(参考訳) 人間の環境で動作しているロボットは、オブジェクトを意味的に意味のある構成に再構成できる必要がある。本研究では,ステップバイステップの指示を伴わずに,物理的に有効な構造を構築する問題に着目する。本研究では,拡散モデルとオブジェクト中心トランスフォーマーを組み合わせることで,部分視点の雲や高レベルな言語目標,例えば「テーブルをセットする」といった構造を構築する。 1つのモデルを用いて言語条件の多段階計画タスクを複数実行することができる。 structdiffusionは、特定の構造で訓練された既存のマルチモーダルトランスフォーマーモデルと比較して、物理的に有価な構造を、被写体から組み立てる成功率を平均16%向上させる。シミュレーションおよび実世界の再配置作業における保持対象について実験を行った。重要となるのは,拡散モデルと衝突弁別モデルを統合することで,これまで見つからなかった物体を並べ替える際の他の方法に対する一般化が向上することを示すことである。ビデオや追加結果については、当社のwebサイトをご覧ください。 Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.	翻訳日:2023-04-27 00:01:58 公開日:2023-04-25
# 視覚トランスフォーマーのためのデータレベル抽選券仮説 Data Level Lottery Ticket Hypothesis for Vision Transformers ( http://arxiv.org/abs/2211.01484v2 ) ライセンス: Link先を確認	Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang	(参考訳) 従来の抽選切符仮説(LTH)は、密集ニューラルネットワーク内にスパースサブネットワークが存在し、入賞切符と呼ばれる適切なランダム初期化法が存在し、それは密集切符と同程度にゼロからトレーニングすることができると主張している。一方、視覚変換器(ViT)におけるLTHの研究はほとんど評価されていない。本稿では,従来の方式ではvitの重量レベルでは従来の当選券を見つけることが困難であることを示す。次に、VTの入力依存性にインスパイアされた画像パッチからなるデータを入力するために、VTのLTHを一般化する。すなわち、入力イメージパッチのサブセットが存在し、このパッチのサブセットだけを使用して、ViTをゼロからトレーニングし、すべてのイメージパッチを使用してトレーニングされたViTと同様の精度を達成することができる。我々は、このサブセットを、入力データのかなりの量の情報を表す、エムの入賞チケットにパッチを当てる。チケットセレクタを用いて,DeiT,LV-ViT,Swin Transformerなど,様々な種類のViTのパッチ情報に基づいて,入賞券を生成する。実験の結果, 入賞券で学習したモデルの性能とランダムに選択された部分集合との間には明らかな差が認められ, 提案する理論が検証された。提案するデータ-LTH-ViTと従来のLTHの類似性について詳しく検討し,理論の完全性をさらに検証した。コードは補足室で提供される。 The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at the weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the em winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate on the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH to further verify the integrity of our theory. The code is provided in the supplementary.	翻訳日:2023-04-27 00:01:40 公開日:2023-04-25
# マルチビューデータにおける欠落値のインプット Imputation of missing values in multi-view data ( http://arxiv.org/abs/2210.14484v2 ) ライセンス: Link先を確認	Wouter van Loon, Marjolein Fokkema, Mark de Rooij	(参考訳) オブジェクトの集合が複数の異なる特徴集合(ビューと呼ばれる)によって記述されるデータは、マルチビューデータと呼ばれる。マルチビューデータに欠落する値が発生した場合、ビュー内のすべての機能が同時に欠落する可能性がある。これは、特に高次元性と組み合わせた場合、計算的に不可能な条件付き計算手法を適用する、非常に大量の欠落データをもたらす。多視点学習のための既存の累積ペナル化ロジスティック回帰(StaPLR)アルゴリズムに基づく新しい計算法を提案する。マルチビューコンテキストに固有の計算問題に対処するために、次元還元空間で計算を実行する。シミュレーションデータセットにおいて,新しい計算法の性能と既存の計算アルゴリズムを比較した。その結果,新しいインプテーション手法は,計算コストがはるかに低く競争結果をもたらすことを示し,計算が不可能であるような環境では,ミスフォレストや予測平均マッチングといった高度なインプテーションアルゴリズムを利用可能とする。 Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This leads to very large quantities of missing data which, especially when combined with high-dimensionality, makes the application of conditional imputation methods computationally infeasible. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.	翻訳日:2023-04-27 00:01:15 公開日:2023-04-25
# 特徴選択のための逐次注意 Sequential Attention for Feature Selection ( http://arxiv.org/abs/2209.14881v3 ) ライセンス: Link先を確認	Taisuke Yasuda, MohammadHossein Bateni, Lin Chen, Matthew Fahrbach, Gang Fu, Vahab Mirrokni	(参考訳) 機能選択は、予算制約の対象となるモデル品質を最大化する機械学習モデルの機能のサブセットを選択する問題である。ニューラルネットワークでは、$\ell_1$の正規化、注意、その他のテクニックに基づく先行手法は、通常、1つの評価ラウンドにおいて機能サブセット全体を選択し、選択中の機能の残価値、すなわち、他の機能が既に選択されているという特徴の限界寄与を無視する。本稿では,ニューラルネットワークの最先端な実験結果を実現するSequential Attentionと呼ばれる特徴選択アルゴリズムを提案する。このアルゴリズムは、グレディフォワード選択の効率的なワンパス実装に基づいており、各ステップの注意重みを特徴量のプロキシとして利用する。線形回帰のアルゴリズムに対する理論的洞察は、この設定への適応が古典直交マッチング追従法 (omp) のアルゴリズムと同値であることを示し、従って証明可能な保証をすべて継承する。我々の理論および経験的分析は、注意の有効性と過剰パラメータ化との関連について、独立した関心を持つかもしれない新しい説明を提供する。 Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.	翻訳日:2023-04-27 00:00:47 公開日:2023-04-25
# 訓練中のllmによる解釈モデルの拡張 Augmenting Interpretable Models with LLMs during Training ( http://arxiv.org/abs/2209.11799v3 ) ライセンス: Link先を確認	Chandan Singh, Armin Askari, Rich Caruana, Jianfeng Gao	(参考訳) 最近の大規模言語モデル(llm)は、増加するタスク群に対する顕著な予測性能を示している。しかし、高吸収領域(医学など)への増殖と計算限界の設定は、解釈可能性と効率性に急激なニーズを生み出している。 LLMが学んだ知識を活用して極めて効率的かつ解釈可能なモデルを構築するためのフレームワークであるAug-imodels(Aug-imodels)を提案することで、このニーズに対処する。 Aug-imodel は入射時に LLM を使用するが、推論中は使用せず、完全な透過性を実現し、LLM と比較して1000倍以上の速度/メモリの改善が可能である。自然言語処理における aug-imodel の2つのインスタンス化について検討する。一 LLM と LLM との疎結合による一般化加法モデルを強化した Aug-GAM (ii) LLM機能拡張で決定木を拡大するAug-Tree。さまざまなテキスト分類データセットにまたがって、どちらも非指定のデータセットよりも優れています。 Aug-GAMは1万倍のパラメータを持ち、完全に透明であるにもかかわらず、はるかに大きなモデル(例えば6ビリオンのパラメータ GPT-J モデル)よりも優れている。さらに、Aug-imodelsを自然言語fMRI研究で探求し、科学データから興味深い解釈を生成する。 Aug-imodelsの使用と結果の再現に関するすべてのコードはGithubで公開されている。 Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github.	翻訳日:2023-04-27 00:00:26 公開日:2023-04-25
# 信号時相論理述語のモデル予測ロバスト性 Model Predictive Robustness of Signal Temporal Logic Predicates ( http://arxiv.org/abs/2209.07881v2 ) ライセンス: Link先を確認	Yuanfei Lin, Haoxuan Li, Matthias Althoff	(参考訳) 信号時相論理のロバスト性は、信号が仕様に準拠しているかを評価するだけでなく、式がどの程度満たされるか違反しているかの指標を提供する。ロバスト性の計算は、基礎となる述語のロバスト性の評価に基づいている。しかしながら、述語のロバスト性は通常、システムダイナミクスを含まずに、モデルフリーな方法で定義される。さらに、複雑な述語の堅牢性を定義することはしばしば自明である。これらの問題に対処するために,モデルに基づく予測を考慮し,従来の手法に比べて頑健性を評価する体系的な方法を提供するモデル予測頑健性の概念を提案する。特にガウス過程回帰を用いて事前計算された予測に基づいてロバストネスを学習し、ロバストネス値をオンライン上で効率的に計算する。記録されたデータセット上での形式化された交通ルールに使用される述語を用いた自動運転のユースケースに対する我々のアプローチの評価を行い、表現性の観点から従来のアプローチと比較して、我々のアプローチの利点を強調した。堅牢性の定義をトラジェクティブプランナーに組み込むことで、自動運転車はデータセットの人間ドライバーよりもロバストな交通規則に従う。 The robustness of signal temporal logic not only assesses whether a signal adheres to a specification but also provides a measure of how much a formula is fulfilled or violated. The calculation of robustness is based on evaluating the robustness of underlying predicates. However, the robustness of predicates is usually defined in a model-free way, i.e., without including the system dynamics. Moreover, it is often nontrivial to define the robustness of complicated predicates precisely. To address these issues, we propose a notion of model predictive robustness, which provides a more systematic way of evaluating robustness compared to previous approaches by considering model-based predictions. In particular, we use Gaussian process regression to learn the robustness based on precomputed predictions so that robustness values can be efficiently computed online. We evaluate our approach for the use case of autonomous driving with predicates used in formalized traffic rules on a recorded dataset, which highlights the advantage of our approach compared to traditional approaches in terms of expressiveness. By incorporating our robustness definitions into a trajectory planner, autonomous vehicles obey traffic rules more robustly than human drivers in the dataset.	翻訳日:2023-04-27 00:00:03 公開日:2023-04-25
# マスク付き自動エンコーディングは自然言語を大規模に監視するのに役立たない Masked Autoencoding Does Not Help Natural Language Supervision at Scale ( http://arxiv.org/abs/2301.07836v3 ) ライセンス: Link先を確認	Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter	(参考訳) 自己監督と自然言語監督は、様々な下流タスクに優れた汎用画像エンコーダを訓練する2つのエキサイティングな方法として登場した。 m3aeやslipのような最近の研究は、これらのアプローチを効果的に組み合わせられることを示唆しているが、最も注目すべきは、小さな事前トレーニングデータセット(<50mサンプル)を使用しており、これらのアプローチで一般的に使用される大規模なレジーム(>100mサンプル)を効果的に反映していないことである。ここでは、同様のアプローチが、はるかに多くのデータでトレーニングした場合に有効かどうかを検討する。マスク付きオートエンコーダ,MAE,コントラスト言語イメージ事前トレーニングの2つの方法を組み合わせることで,CLIPは11.3Mイメージテキストペアのコーパスでトレーニングされた場合にはCLIPよりもメリットを提供するが,1.4Bイメージの大規模なコーパスでトレーニングされた場合には,CLIPに対する(一般的なビジョンタスクのスイートで評価された)メリットはほとんどない。私たちの研究は、大規模な画像テキストトレーニングにおける自己監督の有効性(あるいは欠如)について、必要な明確さを提供します。 Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks. Recent works such as M3AE and SLIP have suggested that these approaches can be effectively combined, but most notably their results use small pre-training datasets (<50M samples) and don't effectively reflect the large-scale regime (>100M examples) that is commonly used for these approaches. Here we investigate whether a similar approach can be effective when trained with a much larger amount of data. We find that a combination of two state of the art approaches: masked auto-encoders, MAE and contrastive language image pre-training, CLIP provides a benefit over CLIP when trained on a corpus of 11.3M image-text pairs, but little to no benefit (as evaluated on a suite of common vision tasks) over CLIP when trained on a large corpus of 1.4B images. Our work provides some much needed clarity into the effectiveness (or lack thereof) of self supervision for large-scale image-text training.	翻訳日:2023-04-26 23:53:48 公開日:2023-04-25
# ディープニューラルネットワークは2年生よりスマートか? Are Deep Neural Networks SMARTer than Second Graders? ( http://arxiv.org/abs/2212.09993v3 ) ライセンス: Link先を確認	Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum	(参考訳) 最近では、高度な認知能力を必要とするタスク(例えば、囲い込み、アートの生成、チャットgptなど)を解決するためのディープニューラルネットワークの応用が増えている。幅広いスキルを必要とする問題を解決する上で、ニューラルネットワークはどの程度一般化可能か? この質問に答えるために、ニューラルネットワークの抽象化、推論、一般化能力を評価するための、単純なマルチモーダルアルゴリズム推論タスクと関連するsmart-101データセットを提案する。私たちのデータセットは101の独特なパズルで構成されており、それぞれのパズルは絵と質問で構成されており、それらの解には算術、代数、空間的推論などいくつかの基本的なスキルが必要です。ディープニューラルネットワークのトレーニングに向けてデータセットをスケールするために、解アルゴリズムを維持しながら、パズルごとに完全に新しいインスタンスをプログラムで生成する。 SMART-101の性能をベンチマークするために,様々な最先端のバックボーンを用いた視覚・言語メタラーニングモデルを提案する。実験の結果,強力な深層モデルでは教師付き環境下でのパズルに対して妥当な性能が得られたが,一般化のための解析ではランダムな精度に劣らないことがわかった。また,最近のchatgptや他の大規模言語モデルをsmart-101の一部として評価し,説得力のある推論能力を示すが,回答はしばしば誤りであることを確認した。 Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle, while retaining their solution algorithm. To benchmark performances on SMART-101, we propose a vision and language meta-learning model using varied state-of-the-art backbones. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT and other large language models on a part of SMART-101 and find that while these models show convincing reasoning abilities, the answers are often incorrect.	翻訳日:2023-04-26 23:53:22 公開日:2023-04-25
# Invariant Lipschitz Bandits: A Side Observation Approach Invariant Lipschitz Bandits: A Side Observation Approach ( http://arxiv.org/abs/2212.07524v2 ) ライセンス: Link先を確認	Nam Phuong Tran, Long Tran-Thanh	(参考訳) 対称性は多くの最適化と意思決定の問題に現れ、最適化コミュニティからかなりの注目を集めている。最適化の成功にもかかわらず、特にバンディット文学において、オンライン最適化設定において対称性の利用は十分に検討されていない。そこで本論文では、リプシッツ・バンディット・セッティング(Lipschitz bandit setting)という、リプシッツ・バンディットのサブクラスにおいて、報酬関数とアームの集合が変換群の下で保存されるような不変なリプシッツ・バンディット・セッティング(Lipschitz bandit setting)について検討する。これは、群軌道を用いたサイドオブザーバーを、アームの集合を一様に判別する \texttt{uniformmesh-n} アルゴリズム (\cite{kleinberg2005_uniformmesh}) に統合するものである。サイドオブザーブレーションアプローチを用いて、群が有限であることを前提に、群の濃度に依存する後悔の上界が改善されたことを証明する。また、不変リプシッツ・バンディット類(対数因子まで)に対する後悔の下限が一致することも証明する。我々は、バンディット理論とシーケンシャルな意思決定理論における対称性のさらなる研究に火をつけることを願っている。 Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.	翻訳日:2023-04-26 23:52:56 公開日:2023-04-25
# GPViT:グループ伝搬を用いた高分解能非階層視覚変換器 GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation ( http://arxiv.org/abs/2212.06795v2 ) ライセンス: Link先を確認	Chenhongyi Yang, Jiarui Xu, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang	(参考訳) グループ伝搬型視覚トランスフォーマ(gpvit: group propagation vision transformer, gpvit)は、非階層的(非ピラミダル)トランスフォーマモデルである。高分解能機能(またはトークン)は、検出やセグメンテーションなどの細かな詳細を知覚するタスクに自然に適合するが、これらの機能間のグローバル情報交換は、自己依存のスケール方法のため、メモリと計算において高価である。グローバルな情報を交換するための,効率のよいグループ伝搬ブロック(GPブロック)を提供する。各GPブロックでは、まず一定数の学習可能なグループトークンで特徴をグループ化し、次にグループ間でグローバル情報を交換するグループプロパゲーションを行い、最後に、更新されたグループ化された特徴のグローバル情報を変換器デコーダを介して画像特徴に戻す。画像分類,セマンティックセグメンテーション,オブジェクト検出,インスタンスセグメンテーションなど,さまざまな視覚的タスクにおけるGPViTの評価を行った。例えば、我々のgpvit-l3 は ade20k の意味セマンティクスセグメンテーションにおいて、swin transformer-b を 2.0 miou で上回っており、パラメータは半分に過ぎません。プロジェクトページ:chenhongyiyang.com/projects/GPViT/GPViT We present the Group Propagation Vision Transformer (GPViT): a novel nonhierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information. In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder. We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves significant performance gains over previous works across all tasks, especially on tasks that require highresolution outputs, for example, our GPViT-L3 outperforms Swin Transformer-B by 2.0 mIoU on ADE20K semantic segmentation with only half as many parameters. Project page: chenhongyiyang.com/projects/GPViT/GPViT	翻訳日:2023-04-26 23:52:32 公開日:2023-04-25
# ヒト精子追跡データセットVISEM-Tracking VISEM-Tracking, a human spermatozoa tracking dataset ( http://arxiv.org/abs/2212.02842v3 ) ライセンス: Link先を確認	Vajira Thambawita, Steven A. Hicks, Andrea M. Stor{\aa}s, Thu Nguyen, Jorunn M. Andersen, Oliwia Witczak, Trine B. Haugen, Hugo L. Hammer, P{\aa}l Halvorsen, Michael A. Riegler	(参考訳) 精子運動を手動で評価するには顕微鏡観察が必要であり、視野の速い精子の観察が困難である。正確な結果を得るためには、手動による評価には広範な訓練が必要である。そのため、コンピュータ支援精子分析(CASA)はクリニックでの利用が増えている。それにもかかわらず、精子運動と運動学の評価の精度と信頼性を向上させるために、教師付き機械学習アプローチの訓練にはより多くのデータが必要である。そこで本研究では,濡れた精子の30秒間(29,196フレームを含む)のビデオ記録を手動で注釈付き拘束箱座標で記録するVISEM-Tracking(VISEM-Tracking)というデータセットと,その領域の専門家が分析した精子特性のセットを提供する。注釈付きデータに加えて,自己教師なし学習などの手法により,データへのアクセスと分析が容易なラベル付きビデオクリップを提供する。本稿では,VISEM-Trackingデータセットを用いて学習したYOLOv5ディープラーニング(DL)モデルを用いた精子検出性能について述べる。その結果、データセットは複雑なdlモデルの訓練と精子の分析に使用できることが示された。 A manual assessment of sperm motility requires microscopy observation, which is challenging due to the fast-moving spermatozoa in the field of view. To obtain correct results, manual evaluation requires extensive training. Therefore, computer-assisted sperm analysis (CASA) has become increasingly used in clinics. Despite this, more data is needed to train supervised machine learning approaches in order to improve accuracy and reliability in the assessment of sperm motility and kinematics. In this regard, we provide a dataset called VISEM-Tracking with 20 video recordings of 30 seconds (comprising 29,196 frames) of wet sperm preparations with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. In addition to the annotated data, we provide unlabeled video clips for easy-to-use access and analysis of the data via methods such as self- or unsupervised learning. As part of this paper, we present baseline sperm detection performances using the YOLOv5 deep learning (DL) model trained on the VISEM-Tracking dataset. As a result, we show that the dataset can be used to train complex DL models to analyze spermatozoa.	翻訳日:2023-04-26 23:52:06 公開日:2023-04-25
# 質問応答のための関係認識言語グラフ変換 Relation-Aware Language-Graph Transformer for Question Answering ( http://arxiv.org/abs/2212.00975v2 ) ライセンス: Link先を確認	Jinyoung Park, Hyeong Kyu Choi, Juyeon Ko, Hyeonjin Park, Ji-Hoon Kim, Jisu Jeong, Kyungmin Kim, Hyunwoo J. Kim	(参考訳) 質問回答(QA)は自然言語の文脈を推論するタスクであり、関連する多くの作業は、言語モデル(LM)をグラフニューラルネットワーク(GNN)で拡張し、知識グラフ(KG)情報をエンコードする。しかし、既存のGNNベースのQAモジュールの多くは、KGのリッチリレーショナル情報を活用せず、LMとKG間の限られた情報相互作用に依存している。これらの問題に対処するために,言語とグラフを統一的に関連づける質問応答変換器(QAT)を提案する。具体的には、QATはメタパストークンを構築し、多様な構造的および意味的関係に基づいて関係中心の埋め込みを学習する。そこで,我々のRelation-Aware Self-Attentionモジュールは,異なるモダリティの関係者間の情報交換を案内するクロスモーダル相対位置バイアスを通じて,様々なモダリティを包括的に統合する。我々は,CommonsenseQA や OpenBookQA などの常識質問応答データセットと医療質問応答データセット MedQA-USMLE に対するQAT の有効性を検証する。すべてのデータセットにおいて,本手法は最先端の性能を実現する。私たちのコードはhttp://github.com/mlvlab/qatで利用可能です。 Question Answering (QA) is a task that entails reasoning over natural language contexts, and many relevant works augment language models (LMs) with graph neural networks (GNNs) to encode the Knowledge Graph (KG) information. However, most existing GNN-based modules for QA do not take advantage of rich relational information of KGs and depend on limited information interaction between the LM and the KG. To address these issues, we propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations in a unified manner. Specifically, QAT constructs Meta-Path tokens, which learn relation-centric embeddings based on diverse structural and semantic relations. Then, our Relation-Aware Self-Attention module comprehensively integrates different modalities via the Cross-Modal Relative Position Bias, which guides information exchange between relevant entites of different modalities. We validate the effectiveness of QAT on commonsense question answering datasets like CommonsenseQA and OpenBookQA, and on a medical question answering dataset, MedQA-USMLE. On all the datasets, our method achieves state-of-the-art performance. Our code is available at http://github.com/mlvlab/QAT.	翻訳日:2023-04-26 23:51:47 公開日:2023-04-25
# 頂点間の相互作用をモデル化するグラフニューラルネットワークの能力について On the Ability of Graph Neural Networks to Model Interactions Between Vertices ( http://arxiv.org/abs/2211.16494v4 ) ライセンス: Link先を確認	Noam Razin, Tom Verbin, Nadav Cohen	(参考訳) グラフニューラルネットワーク(GNN)は、グラフの頂点として表されるエンティティ間の複雑な相互作用をモデル化するために広く使われている。近年のGNNの表現力を理論的に分析する試みにもかかわらず、相互作用をモデル化する能力の形式的特徴は欠如している。現在の論文は、このギャップに対処することを目的としている。分離ランクと呼ばれる確立された尺度による相互作用の形式化強度は、与えられた頂点の部分集合とその補集合の間の相互作用をモデル化する特定のGNNの能力を定量化する。この結果から, 相互作用をモデル化する能力は, 分割の境界から得られるウォーク数によって定義されるグラフ理論特性であるウォーク指数によって決定されることがわかった。一般的なgnnアーキテクチャを用いた実験はこの発見を裏付ける。本理論の実用的応用として,入力エッジの除去時にGNNが相互作用をモデル化する能力を保持するWIS(Walk Index Sparsification)というエッジスペーシフィケーションアルゴリズムを設計する。 wisは単純で計算効率が良く,本実験では誘導予測の精度で代替手法を著しく上回っている。より広義には、モデリング可能な相互作用を理論的に分析することで、GNNを改善する可能性を示している。 Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between the sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic defined by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and in our experiments has markedly outperformed alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.	翻訳日:2023-04-26 23:50:25 公開日:2023-04-25
# プライバシ・イン・プラクティス:X線画像におけるプライベート新型コロナウイルス検出(拡張版) Privacy in Practice: Private COVID-19 Detection in X-Ray Images (Extended Version) ( http://arxiv.org/abs/2211.11434v3 ) ライセンス: Link先を確認	Lucas Lange, Maja Schneider, Peter Christen, Erhard Rahm	(参考訳) 機械学習(ML)は、大量の画像の迅速なスクリーニングを可能にすることで、新型コロナウイルスなどのパンデミックに対抗するのに役立つ。患者のプライバシを維持しながらデータ分析を行うため,差分プライバシー(DP)を満たすMLモデルを作成する。新型コロナウイルス(COVID-19)のプライベートモデルを探索する以前の研究は、部分的には小さなデータセットに基づいており、より弱いか不明確なプライバシー保証を提供し、実用的なプライバシーを調査していない。これらのオープンギャップに対処するための改善を提案する。我々は、固有の階級不均衡を考慮し、ユーティリティとプライバシのトレードオフをより広範囲に、より厳格なプライバシー予算よりも評価する。我々の評価は、ブラックボックスメンバーシップ推論攻撃(MIA)による実践的プライバシを実証的に推定することで支持される。導入されたdpは,miasによる漏洩脅威の抑制に役立ち,この仮説をcovid-19分類タスクで最初に検証する実践的な分析を行う。以上の結果から,MIAの課題依存的実践的脅威によって,必要なプライバシーレベルが異なる可能性が示唆された。以上の結果から, DP保証の増加に伴い, 経験的プライバシー漏洩はわずかに改善し, DPがMIA防衛に限られた影響を及ぼす可能性が示唆された。本研究は, 実用プライバシトレードオフの改善の可能性を明らかにし, 実用プライバシのチューニングにおいて, 経験的攻撃特異的プライバシ推定が重要な役割を果たすと考えている。 Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML models that satisfy Differential Privacy (DP). Previous works exploring private COVID-19 models are in part based on small datasets, provide weaker or unclear privacy guarantees, and do not investigate practical privacy. We suggest improvements to address these open gaps. We account for inherent class imbalances and evaluate the utility-privacy trade-off more extensively and over stricter privacy budgets. Our evaluation is supported by empirically estimating practical privacy through black-box Membership Inference Attacks (MIAs). The introduced DP should help limit leakage threats posed by MIAs, and our practical analysis is the first to test this hypothesis on the COVID-19 classification task. Our results indicate that needed privacy levels might differ based on the task-dependent practical threat from MIAs. The results further suggest that with increasing DP guarantees, empirical privacy leakage only improves marginally, and DP therefore appears to have a limited impact on practical MIA defense. Our findings identify possibilities for better utility-privacy trade-offs, and we believe that empirical attack-specific privacy estimation can play a vital role in tuning for practical privacy.	翻訳日:2023-04-26 23:50:04 公開日:2023-04-25
# chatgptは優れたnlgエバブリエーターか? 予備的研究 Is ChatGPT a Good NLG Evaluator? A Preliminary Study ( http://arxiv.org/abs/2303.04048v2 ) ライセンス: Link先を確認	Jiaan Wang, Yunlong Liang, Fandong Meng, Zengkui Sun, Haoxiang Shi, Zhixu Li, Jinan Xu, Jianfeng Qu, Jie Zhou	(参考訳) 近年、ChatGPTの出現は、計算言語学コミュニティから広く注目を集めている。多くの先行研究により、ChatGPTは自動評価指標を用いて様々なNLPタスクにおいて顕著な性能を発揮することが示されている。しかし、ChatGPTが評価指標として機能する能力はまだ未定である。自然言語生成モデル(NLG)の質を評価することは困難な作業であり、NLGの指標は人間の判断と相関が低いことで悪名高いことから、ChatGPTは優れたNLG評価指標であるのだろうか。本稿では,その信頼性を NLG 測定値として示すため,ChatGPT の予備メタ評価を行う。より詳しくは、ChatGPTを人間評価器とみなし、タスク固有(例えば、要約)とアスペクト固有(例えば、関連)の指示を与えて、ChatGPTにNLGモデルの生成された結果を評価する。我々は5つのNLGメタ評価データセット(要約、ストーリー生成、データ・トゥ・テキストタスクを含む)について実験を行った。実験の結果,ChatGPTは従来の自動測定値と比較すると,ほとんどの場合,人間の判断と最先端あるいは競合的な相関が得られた。さらに,ChatGPT評価器の有効性は,メタ評価データセットの作成方法の影響を受けている可能性が示唆された。参照に大きく依存して生成されるメタ評価データセットに対して、ChatGPT評価器は効果を失う可能性がある。我々の予備研究は、汎用的な信頼性NLGメトリックの出現を促すことを願っている。 Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.	翻訳日:2023-04-26 23:44:10 公開日:2023-04-25
# 変分オートエンコーダを用いた安全監視型自律ロボットのラストミル配送における有効探索空間の学習 Using a Variational Autoencoder to Learn Valid Search Spaces of Safely Monitored Autonomous Robots for Last-Mile Delivery ( http://arxiv.org/abs/2303.03211v2 ) ライセンス: Link先を確認	Peter J. Bentley, Soo Ling Lim, Paolo Arcaini, Fuyuki Ishikawa	(参考訳) 顧客に商品を届けるための自律ロボットの利用は、信頼性と持続可能なサービスを提供するためのエキサイティングな新しい方法だ。しかし、現実の世界では、自律ロボットは安全のために人間の監督を必要とする。我々は、自律ロボットのタイミングを最適化して配達を最大化する現実的な問題に取り組み、安全に監視できるように、同時に走るロボットが多すぎることを保証する。我々は,最近のハイブリッド機械学習最適化手法であるCOIL (Constrained optimization in learn latent space) を用いて,この問題のバリエーションを探索するためのベースライン遺伝的アルゴリズムと比較した。また,COILの高速化と効率向上のための新しい手法についても検討した。テストされた全ての問題に対して,適切な数のロボットが同時に動作するような有効な解はCOILでのみ見つかることを示す。また,COILが遅延表現を学習した場合には,GAよりも10%高速に最適化できることが示され,毎日の配達要求をロボットに割り当てるロボットの再最適化において,同時に走るロボットの安全数を確保できる。 The use of autonomous robots for delivery of goods to customers is an exciting new way to provide a reliable and sustainable service. However, in the real world, autonomous robots still require human supervision for safety reasons. We tackle the realworld problem of optimizing autonomous robot timings to maximize deliveries, while ensuring that there are never too many robots running simultaneously so that they can be monitored safely. We assess the use of a recent hybrid machine-learningoptimization approach COIL (constrained optimization in learned latent space) and compare it with a baseline genetic algorithm for the purposes of exploring variations of this problem. We also investigate new methods for improving the speed and efficiency of COIL. We show that only COIL can find valid solutions where appropriate numbers of robots run simultaneously for all problem variations tested. We also show that when COIL has learned its latent representation, it can optimize 10% faster than the GA, making it a good choice for daily re-optimization of robots where delivery requests for each day are allocated to robots while maintaining safe numbers of robots running at once.	翻訳日:2023-04-26 23:43:45 公開日:2023-04-25
# 日立 at semeval-2023 task 3: explore cross-lingual multi-task strategies for genre and framing detection in online news Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online News ( http://arxiv.org/abs/2303.01794v2 ) ライセンス: Link先を確認	Yuta Koreeda, Ken-ichi Yokote, Hiroaki Ozaki, Atsuki Yamaguchi, Masaya Tsunokake, Yasuhiro Sogawa	(参考訳) 本稿では、日立チームによるsemeval-2023タスク3「多言語環境におけるオンラインニュースにおけるジャンル、フレーミング、説得技術の検出」への参加について述べる。タスクのマルチリンガル・マルチタスク特性と低リソース設定に基づき,事前学習された言語モデルの訓練のための異なるクロスリンガル・マルチタスク戦略を検討した。広範な実験を通して、私たちは (a)クロスランガル/マルチタスク・トレーニング、及び b)外部バランスの取れたデータセットを収集し、ジャンルやフレーミング検出に役立てることができる。結果からアンサンブルモデルを構築し,イタリアおよびロシアのジャンル分類サブタスクにおけるマクロ平均F1スコアを達成した。 This paper explains the participation of team Hitachi to SemEval-2023 Task 3 "Detecting the genre, the framing, and the persuasion techniques in online news in a multi-lingual setup.'' Based on the multilingual, multi-task nature of the task and the low-resource setting, we investigated different cross-lingual and multi-task strategies for training the pretrained language models. Through extensive experiments, we found that (a) cross-lingual/multi-task training, and (b) collecting an external balanced dataset, can benefit the genre and framing detection. We constructed ensemble models from the results and achieved the highest macro-averaged F1 scores in Italian and Russian genre categorization subtasks.	翻訳日:2023-04-26 23:43:27 公開日:2023-04-25
# モーメントベース正定値部分多様体最適化の簡易化とディープラーニングへの応用 Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning ( http://arxiv.org/abs/2302.09738v3 ) ライセンス: Link先を確認	Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt	(参考訳) 運動量を伴うリーマン部分多様体の最適化は、しばしば難しい微分方程式の解法や近似を必要とするため、計算的に困難である。我々は、アフィン不変距離を持つ構造化対称正定値行列のクラスに対するそのような最適化アルゴリズムを単純化する。我々は、計量を保存し、問題をユークリッド非制約問題に動的に自明化するリーマン正規座標の一般化版を提案する。提案手法は,構造化共分散の既存手法の説明と単純化に利用し,行列逆数のない大規模NNの学習に有効な2次最適化器を開発した。 Riemannian submanifold optimization with momentum is computationally challenging because ensuring iterates remain on the submanifold often requires solving or approximating difficult differential equations. We simplify such optimization algorithms for a class of structured symmetric positive-definite matrices with the affine invariant metric. We propose a generalized version of the Riemannian normal coordinates which preserves the metric and dynamically trivializes the problem into a Euclidean unconstrained problem. We use our approach to explain and simplify existing approaches for structured covariances and develop efficient second-order optimizers for training large-scale NNs without matrix inverses.	翻訳日:2023-04-26 23:43:14 公開日:2023-04-25
# ゲーミフィケーションはmHealthアプリケーションにおける自己申告の負担を軽減するか? スマートウォッチデータからの機械学習による認知負荷推定の実現可能性の検討 Can gamification reduce the burden of self-reporting in mHealth applications? A feasibility study using machine learning from smartwatch data to estimate cognitive load ( http://arxiv.org/abs/2302.03616v2 ) ライセンス: Link先を確認	Michal K. Grzeszczyk and Paulina Adamczyk and Sylwia Marek and Ryszard Pr\k{e}cikowski and Maciej Ku\'s and M. Patrycja Lelujko and Rosmary Blanco and Tomasz Trzci\'nski and Arkadiusz Sitek and Maciej Malawski and Aneta Lisowska	(参考訳) デジタル治療の有効性は、患者にアプリケーションを通じて自身の状態を自己報告するよう要求することで測定できるが、圧倒的であり、離脱を引き起こす可能性がある。我々は,ゲーミフィケーションが自己報告に与える影響を調査する。本研究のアプローチは,光胸腺造影(PPG)信号の解析を通じて認知負荷(CL)を評価するシステムの構築である。 11人の参加者のデータを機械学習モデルにトレーニングしてCLを検出する。その後、ゲーミフィケーションと従来の調査の2つのバージョンを作成します。調査終了後に他の参加者(13)が経験したclを推定した。 CL検出器の性能は,ストレス検出タスクの事前学習により向上できることがわかった。 13人中10人に対して、パーソナライズされたCL検出器は0.7以上のF1スコアを達成できる。 CLでは,ゲーミフィケーション版と非ゲーミフィケーション版の違いは認められなかったが,参加者はゲーミフィケーション版を好んだ。 The effectiveness of digital treatments can be measured by requiring patients to self-report their state through applications, however, it can be overwhelming and causes disengagement. We conduct a study to explore the impact of gamification on self-reporting. Our approach involves the creation of a system to assess cognitive load (CL) through the analysis of photoplethysmography (PPG) signals. The data from 11 participants is utilized to train a machine learning model to detect CL. Subsequently, we create two versions of surveys: a gamified and a traditional one. We estimate the CL experienced by other participants (13) while completing surveys. We find that CL detector performance can be enhanced via pre-training on stress detection tasks. For 10 out of 13 participants, a personalized CL detector can achieve an F1 score above 0.7. We find no difference between the gamified and non-gamified surveys in terms of CL but participants prefer the gamified version.	翻訳日:2023-04-26 23:43:02 公開日:2023-04-25
# ricci流下における学習離散化ニューラルネットワーク Learning Discretized Neural Networks under Ricci Flow ( http://arxiv.org/abs/2302.03390v3 ) ライセンス: Link先を確認	Jun Chen, Hanwen Chen, Mengmeng Wang, Guang Dai, Ivor W. Tsang, Yong Liu	(参考訳) 本稿では,非微分的離散関数による無限勾配あるいはゼロ勾配に苦しむ低精度重みとアクティベーションからなる離散化ニューラルネットワーク(dnn)について検討する。この場合、ほとんどのトレーニングベースのDNNは、勾配w.r.t.離散値の近似に標準のSTE(Straight-Through Estimator)を使用する。しかし、STEは近似勾配の摂動により勾配ミスマッチの問題を引き起こす。この問題に対処するため、本論文ではこのミスマッチを双対性理論のレンズを通してリーマン多様体の計量摂動と見なすことができる。さらに,情報幾何学に基づいて,DNNに対する線形近傍ユークリッド多様体(LNE)を構築し,摂動に対処する。計量に偏微分方程式、すなわちリッチフローを導入することで、LNE計量の動的安定性と収束を$L^2$-norm摂動で証明する。収束率が分数である以前の摂動理論とは異なり、リッチフロー下の計量摂動はlne多様体において指数関数的に減衰することができる。各種データセットに対する実験結果から,本手法はDNNに対して,他の代表的なトレーニングベース手法よりも優れた,より安定した性能を示すことが示された。 In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function in the training process. In this case, most training-based DNNs employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the STE gives rise to the problem of gradient mismatch, due to the perturbations of the approximated gradient. To address this problem, this paper reveals that this mismatch can be viewed as a metric perturbation in a Riemannian manifold through the lens of duality theory. Further, on the basis of the information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs as a background to deal with perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. Unlike the previous perturbation theory whose convergence rate is the fractional powers, the metric perturbation under the Ricci flow can be exponentially decayed in the LNE manifold. The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.	翻訳日:2023-04-26 23:42:47 公開日:2023-04-25
# コンフォーマルコスト制御による高速オンライン価値最大化予測セット Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control ( http://arxiv.org/abs/2302.00839v3 ) ライセンス: Link先を確認	Zhen Lin, Shubhendu Trivedi, Cao Xiao, Jimeng Sun	(参考訳) 実世界のマルチラベル予測問題の多くは、下流の使用によって引き起こされる特定の要件を満たさなければならない集合値予測を伴う。我々は、このような要件を個別に$\textit{value}$と$\textit{cost}$をエンコードし、互いに競合する典型的なシナリオに焦点を当てる。例えば、病院はスマート診断システムによって、重篤で、しばしば共死的な病気(その価値)をできるだけ多く捉え、誤った予測(コスト)を厳格にコントロールすることを期待しているかもしれない。このようなシナリオのコストを制御しながら、価値を最大化するために、FavMacと呼ばれる一般的なパイプラインを提案する。 FavMacは、ほとんどすべてのマルチラベル分類器と組み合わせて、コスト管理における分布のない理論的保証を提供する。さらに、従来の作業とは異なり、慎重に設計されたオンライン更新メカニズムを通じて、現実世界の大規模アプリケーションを扱うことができる。 FavMacは、厳格なコスト管理を維持しつつ、いくつかの変種やベースラインよりも高い価値を提供する。私たちのコードはhttps://github.com/zlin7/FavMacで利用可能です。 Many real-world multi-label prediction problems involve set-valued predictions that must satisfy specific requirements dictated by downstream usage. We focus on a typical scenario where such requirements, separately encoding $\textit{value}$ and $\textit{cost}$, compete with each other. For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios. FavMac can be combined with almost any multi-label classifier, affording distribution-free theoretical guarantees on cost control. Moreover, unlike prior works, it can handle real-world large-scale applications via a carefully designed online update mechanism, which is of independent interest. Our methodological and theoretical contributions are supported by experiments on several healthcare tasks and synthetic datasets - FavMac furnishes higher value compared with several variants and baselines while maintaining strict cost control. Our code is available at https://github.com/zlin7/FavMac	翻訳日:2023-04-26 23:42:25 公開日:2023-04-25
# 仮説の最適選択は最も弱く、最短ではない The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest ( http://arxiv.org/abs/2301.12987v3 ) ライセンス: Link先を確認	Michael Timothy Bennett	(参考訳) もし$A$と$B$が$A \subset B$であるような集合であれば、一般化は$B$を構成するのに十分な仮説の$A$からの推論として理解することができる。 A$から任意の数の仮説を推測できるが、それらのいくつかだけが$B$に一般化できる。どちらが一般化しそうなのか、どうしてわかるのか? 一つの戦略は最も短いものを選び、情報を圧縮する能力と一般化する能力(知能の代理人)を同等にすることである。我々は、エンアクティブ認知の数学的形式論の文脈でこれを調べる。圧縮は性能を最大化するのに必要でも十分でもないことを示す(仮説の一般化の確率の観点から測る)。弱点と呼ばれる長さや単純さに関係のないプロキシを定式化する。タスクが一様に分散している場合、少なくともすべてのタスクにおいて弱点を最大化しながら、少なくとも1つで厳密に実行するプロキシの選択肢がないことを示す。 2進算術の文脈における最大弱さと最小記述長を比較する実験では、前者は後者の1.1ドルから5ドルの間で一般化した。これは弱点がはるかに優れたプロキシであることを示し、DeepmindのApperception Engineが効果的に一般化できる理由を説明する。 If $A$ and $B$ are sets such that $A \subset B$, generalisation may be understood as the inference from $A$ of a hypothesis sufficient to construct $B$. One might infer any number of hypotheses from $A$, yet only some of those may generalise to $B$. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a proxy for intelligence). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between $1.1$ and $5$ times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.	翻訳日:2023-04-26 23:42:05 公開日:2023-04-25
# 分子結晶構造サンプリングのための剛体流 Rigid body flows for sampling molecular crystal structures ( http://arxiv.org/abs/2301.11355v2 ) ライセンス: Link先を確認	Jonas K\"ohler, Michele Invernizzi, Pim de Haan, Frank No\'e	(参考訳) 正規化フロー(NF)は、高い柔軟性と表現力を持つ複雑な分布をモデル化する能力によって近年人気を集めている強力な生成モデルである。本研究では,結晶中の分子などの3次元空間における複数の物体の位置と向きをモデル化するために調整された新しい正規化フローを導入する。第一に、単位四元数の群上の滑らかで表現的な流れを定義し、剛体の連続的な回転運動を捉えること、第二に、単位四元数の二重被覆性を用いて回転群の適切な密度を定義することである。これにより,本モデルは,熱力学的対象密度に対する標準確率法や変分推論を用いてトレーニングすることができる。 TIP4P-Ew水モデルでは,外部磁場における四面体系の多モード密度と氷XI相の2つの分子例に対してボルツマン発生器を訓練して評価した。我々の流れは分子の内部自由度に作用する流れと組み合わせることができ、多くの相互作用する分子の分布のモデリングへの重要なステップとなる。 Normalizing flows (NF) are a class of powerful generative models that have gained popularity in recent years due to their ability to model complex distributions with high flexibility and expressiveness. In this work, we introduce a new type of normalizing flow that is tailored for modeling positions and orientations of multiple objects in three-dimensional space, such as molecules in a crystal. Our approach is based on two key ideas: first, we define smooth and expressive flows on the group of unit quaternions, which allows us to capture the continuous rotational motion of rigid bodies; second, we use the double cover property of unit quaternions to define a proper density on the rotation group. This ensures that our model can be trained using standard likelihood-based methods or variational inference with respect to a thermodynamic target density. We evaluate the method by training Boltzmann generators for two molecular examples, namely the multi-modal density of a tetrahedral system in an external field and the ice XI phase in the TIP4P-Ew water model. Our flows can be combined with flows operating on the internal degrees of freedom of molecules, and constitute an important step towards the modeling of distributions of many interacting molecules.	翻訳日:2023-04-26 23:41:43 公開日:2023-04-25
# rangevit:自動運転における3次元意味セグメンテーションのための視覚トランスフォーマ RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving ( http://arxiv.org/abs/2301.10222v2 ) ライセンス: Link先を確認	Angelika Ando, Spyros Gidaris, Andrei Bursuc, Gilles Puy, Alexandre Boulch, Renaud Marlet	(参考訳) 外部LiDAR点雲のキャスティングセマンティックセマンティックセグメンテーションは、例えばレンジプロジェクションによる2次元問題として、効果的で一般的なアプローチである。これらのプロジェクションベースの手法は、通常は高速計算の恩恵を受け、他のポイントクラウド表現を使用する技術と組み合わせると、最先端の結果が得られる。今日、投影ベースの手法は2d cnnを利用するが、コンピュータビジョンの最近の進歩により、視覚トランスフォーマー(vits)は多くの画像ベースのベンチマークで最先端の結果を得た。本研究では,3次元セマンティックセグメンテーションのプロジェクションに基づく手法が,ViTの最近の改良の恩恵を受けるかどうかを問う。私たちは正に答えるが、それらと3つの主要な材料を組み合わせることでのみ答える。 (a)ViTはトレーニングが難しいことで知られており、強力な表現を学ぶために多くのトレーニングデータが必要です。 RGBイメージと同じバックボーンアーキテクチャを保存することで、ポイントクラウドよりもはるかに安価でアノテート可能な大規模なイメージコレクションの長いトレーニングから知識を活用できます。大規模な画像データセット上で、トレーニング済みのViTで最高の結果を得る。 b) 古典的な線形埋込み層に対して, 適合した畳み込み茎を置換することにより, ViTsの誘導バイアスの欠如を補う。 c)畳み込みデコーダと畳み込みステムからのスキップ接続により,畳み込みステムの低レベルだが細粒度の特徴とvitエンコーダの高レベルだが粗い予測を組み合わせることにより,画素単位での予測を洗練する。これらの材料を用いて,本手法はRangeViTと呼ばれ,nuScenes や SemanticKITTI の既存のプロジェクションベース手法よりも優れていることを示す。コードはhttps://github.com/valeoai/rangevitで入手できる。 Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other point cloud representations, achieve state-of-the-art results. Today, projection-based methods leverage 2D CNNs but recent advances in computer vision show that vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks. In this work, we question if projection-based methods for 3D semantic segmentation can benefit from these latest improvements on ViTs. We answer positively but only after combining them with three key ingredients: (a) ViTs are notoriously hard to train and require a lot of training data to learn powerful representations. By preserving the same backbone architecture as for RGB images, we can exploit the knowledge from long training on large image collections that are much cheaper to acquire and annotate than point clouds. We reach our best results with pre-trained ViTs on large image datasets. (b) We compensate ViTs' lack of inductive bias by substituting a tailored convolutional stem for the classical linear embedding layer. (c) We refine pixel-wise predictions with a convolutional decoder and a skip connection from the convolutional stem to combine low-level but fine-grained features of the the convolutional stem with the high-level but coarse predictions of the ViT encoder. With these ingredients, we show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and SemanticKITTI. The code is available at https://github.com/valeoai/rangevit.	翻訳日:2023-04-26 23:41:25 公開日:2023-04-25
# チャットボットにおける生成的安全性を目指して Learn What NOT to Learn: Towards Generative Safety in Chatbots ( http://arxiv.org/abs/2304.11220v2 ) ライセンス: Link先を確認	Leila Khalatbari, Yejin Bang, Dan Su, Willy Chung, Saeed Ghadimi, Hossein Sameti, Pascale Fung	(参考訳) 生成的かつオープンドメインな会話モデルは、Webベースのソーシャルデータで訓練されているため、特に安全でないコンテンツを生成する可能性がある。この問題を軽減する以前のアプローチには、会話の流れを乱す、有害な入力コンテキストを認識できないような一般化を制限する、安全性のために対話の品質を犠牲にするといった欠点がある。本稿では,正と負の両方のトレーニング信号から学習することで一般化を促進するために,対照的な損失を生かした「LOT(Learn NOT to)」という新しいフレームワークを提案する。本手法は,従来学習されてきた安全で安全でない言語分布から,正負の信号を自動的に得るという点で,標準のコントラスト学習フレームワークと異なる。 LOTフレームワークは、会話の流れを保ちながら、安全でない部分空間から安全な部分空間へ世代を誘導するために分岐を利用する。提案手法は, 復号時の記憶効率と時間効率が向上し, 関与性と流動性を維持しつつ毒性を効果的に低減する。実験の結果,LOTは基準モデルに比べて4倍から6倍のエンゲージネスとフラエンシを達成し,毒性を最大4倍に低下させることがわかった。我々の発見は人間の評価によってさらに裏付けられている。 Conversational models that are generative and open-domain are particularly susceptible to generating unsafe content since they are trained on web-based social data. Prior approaches to mitigating this issue have drawbacks, such as disrupting the flow of conversation, limited generalization to unseen toxic input contexts, and sacrificing the quality of the dialogue for the sake of safety. In this paper, we present a novel framework, named "LOT" (Learn NOT to), that employs a contrastive loss to enhance generalization by learning from both positive and negative training signals. Our approach differs from the standard contrastive learning framework in that it automatically obtains positive and negative signals from the safe and unsafe language distributions that have been learned beforehand. The LOT framework utilizes divergence to steer the generations away from the unsafe subspace and towards the safe subspace while sustaining the flow of conversation. Our approach is memory and time-efficient during decoding and effectively reduces toxicity while preserving engagingness and fluency. Empirical results indicate that LOT reduces toxicity by up to four-fold while achieving four to six-fold higher rates of engagingness and fluency compared to baseline models. Our findings are further corroborated by human evaluation.	翻訳日:2023-04-26 23:36:04 公開日:2023-04-25
# グラニュラ・ボール・コンピューティング : 効率的で堅牢で解釈可能な適応型多粒度表現と計算法 Granular-ball computing: an efficient, robust, and interpretable adaptive multi-granularity representation and computation method ( http://arxiv.org/abs/2304.11171v2 ) ライセンス: Link先を確認	Shuyin Xia, Guoyin Wang, Xinbo Gao	(参考訳) 人間の認知には「大規模ファースト」認知機構があり、適応的な多粒性記述能力を有する。これにより、効率、堅牢性、解釈可能性などの計算特性が得られる。既存の人工知能学習手法の多くは、特定の多粒度特徴を持つが、「大規模ファースト」認知機構と完全に一致していない。マルチグラニュラー性粒球計算は近年開発された重要なモデル手法である。この方法は、異なる大きさの粒状球を用いてサンプル空間を適応的に表現し、粒状球に基づいて学習することができる。粒度が粗い「粒度」の数はサンプル点数より小さいため、粒度計算はより効率的であり、粒度が粗い粒度の特徴は細かい試料点の影響を受けにくく、より堅牢になり、粒度の多粒度構造はトポロジカルな構造と粗い粒度記述を生成でき、自然な解釈性を提供する。グラニュラ・ボール・コンピューティングは人工知能の様々な分野に効果的に拡張され、グラニュラ・ボール分類器、グラニュラ・ボール・クラスタリング法、グラニュラ・ボール・ニューラルネットワーク、グラニュラ・ボール・ラフ・セット、グラニュラ・ボールの進化計算などの理論的手法を開発し、効率、ノイズの堅牢性、既存手法の解釈可能性を大幅に向上させた。優れたイノベーション、実用性、そして開発の可能性を持っている。本稿では,これらの手法を体系的に紹介し,グラニュラーボールコンピューティングが現在直面している主な問題を解析し,グラニュラーボールコンピューティングの主要なシナリオについて論じるとともに,将来の研究者がこの理論を改善するための参照と提案を提供する。 Human cognition has a ``large-scale first'' cognitive mechanism, therefore possesses adaptive multi-granularity description capabilities. This results in computational characteristics such as efficiency, robustness, and interpretability. Although most existing artificial intelligence learning methods have certain multi-granularity features, they do not fully align with the ``large-scale first'' cognitive mechanism. Multi-granularity granular-ball computing is an important model method developed in recent years. This method can use granular-balls of different sizes to adaptively represent and cover the sample space, and perform learning based on granular-balls. Since the number of coarse-grained "granular-ball" is smaller than the number of sample points, granular-ball computing is more efficient; the coarse-grained characteristics of granular-balls are less likely to be affected by fine-grained sample points, making them more robust; the multi-granularity structure of granular-balls can produce topological structures and coarse-grained descriptions, providing natural interpretability. Granular-ball computing has now been effectively extended to various fields of artificial intelligence, developing theoretical methods such as granular-ball classifiers, granular-ball clustering methods, granular-ball neural networks, granular-ball rough sets, and granular-ball evolutionary computation, significantly improving the efficiency, noise robustness, and interpretability of existing methods. It has good innovation, practicality, and development potential. This article provides a systematic introduction to these methods and analyzes the main problems currently faced by granular-ball computing, discussing both the primary applicable scenarios for granular-ball computing and offering references and suggestions for future researchers to improve this theory.	翻訳日:2023-04-26 23:35:45 公開日:2023-04-25
# 量子輸送における多体コヒーレンス Many-Body Coherence in Quantum Transport ( http://arxiv.org/abs/2304.11151v2 ) ライセンス: Link先を確認	Ching-Chi Hang, Liang-Yan Hsu	(参考訳) 本研究では,多体系における電子輸送を制御するために,量子コヒーレンスを利用する概念を提案する。ハバード作用素に基づくオープン量子システム手法を組み合わせることで,多体コヒーレンスが有名なクーロン階段を取り除き,強い負の差動抵抗を引き起こすことを示した。この機構を解明するため、ゼロ電子-フォノンカップリング限界における電流-コヒーレンス関係を解析的に導出する。さらに,ゲートフィールドを組み込むことで,コヒーレンス制御トランジスタ構築の可能性を示す。この開発は、多体コヒーレンスに基づく量子電子デバイス探索のための新しい方向を開く。 In this study, we propose the concept of harnessing quantum coherence to control electron transport in a many-body system. Combining an open quantum system technique based on Hubbard operators, we show that many-body coherence can eliminate the well-known Coulomb staircase and cause strong negative differential resistance. To explore the mechanism, we analytically derive the current-coherence relationship in the zero electron-phonon coupling limit. Furthermore, by incorporating a gate field, we demonstrate the possibility of constructing a coherence-controlled transistor. This development opens up a new direction for exploring quantum electronic devices based on many-body coherence.	翻訳日:2023-04-26 23:34:35 公開日:2023-04-25
# 欠落データに基づく交通信号制御のための強化学習手法 Reinforcement Learning Approaches for Traffic Signal Control under Missing Data ( http://arxiv.org/abs/2304.10722v2 ) ライセンス: Link先を確認	Hao Mei, Junxian Li, Bin Shi, Hua Wei	(参考訳) 信号制御タスクにおける強化学習(RL)手法の出現は,従来のルールベース手法よりも優れた性能を実現している。ほとんどのRLアプローチでは、エージェントが長期的な報酬に最適なアクションを決定するために環境を観察する必要がある。しかし、現実の都市では、センサの欠如により交通状態の観察が欠如することがあるため、既存のRL法を道路網に適用できず、観測が欠如している。本研究では,道路網の交差点の一部にセンサを装着せず,その周辺を直接観測することなく,実環境において交通信号を制御することを目的とする。我々の知る限りでは、実世界の交通信号制御問題に対処するためにRL法を最初に利用した人物である。具体的には,第1に適応制御を実現するために交通状態をインプットし,第2に適応制御とRLエージェントのトレーニングを可能にするために,状態と報酬の両方をインプットする。本手法は,合成と実世界の道路網トラフィックの両方について広範な実験を行い,従来の手法よりも優れており,異なる欠落率で一貫した性能を示す。また,データの欠落がモデルの性能に与える影響についてもさらなる調査を行う。 The emergence of reinforcement learning (RL) methods in traffic signal control tasks has achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.	翻訳日:2023-04-26 23:34:23 公開日:2023-04-25
# 大規模機械学習におけるアダム不安定性の理論 A Theory on Adam Instability in Large-Scale Machine Learning ( http://arxiv.org/abs/2304.09871v2 ) ライセンス: Link先を確認	Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang	(参考訳) 本稿では,大規模言語モデルの訓練において,これまで説明されていなかった発散行動の理論について述べる。我々は、この現象はadamと呼ばれるトレーニングに使用される支配的最適化アルゴリズムの成果物であると主張する。我々は、adam がパラメータ更新ベクトルが比較的大きなノルムを持ち、トレーニング損失のランドスケープにおける降下方向と本質的に無関係である状態に入ることを観測し、分岐を引き起こす。このアーティファクトは、大規模な言語モデルトレーニングの典型的な設定である大きなバッチサイズを持つディープモデルのトレーニングにおいて、より観察される可能性が高い。この理論を議論するために、我々は70億、300億、65億、および546億の異なるスケールの言語モデルのトレーニング実行から観察する。 We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent on the training loss landscape, leading to divergence. This artifact is more likely to be observed in the training of a deep model with a large batch size, which is the typical setting of large-scale language model training. To argue the theory, we present observations from the training runs of the language models of different scales: 7 billion, 30 billion, 65 billion, and 546 billion parameters.	翻訳日:2023-04-26 23:33:47 公開日:2023-04-25
# 中国のオープンインストラクションジェネラリスト:予備リリース Chinese Open Instruction Generalist: A Preliminary Release ( http://arxiv.org/abs/2304.07987v4 ) ライセンス: Link先を確認	Ge Zhang, Yemin Shi, Ruibo Liu, Ruibin Yuan, Yizhi Li, Siwei Dong, Yu Shu, Zhaoqun Li, Zekun Wang, Chenghua Lin, Wenhao Huang, Jie Fu	(参考訳) InstructGPT~\citep{ouyang2022training} と ChatGPT\footnote{\url{https://chat.openai.com/}} のリリースで研究者や一般の注目を集めている。英語指向の大規模言語モデル (LLM) は目覚ましい進歩を遂げているが, 英語をベースとしたLLMが, 英語タスクと多言語タスクでよく似た機能を発揮するか, チューニングに必要なコーパスを構築するかは, いまだ未定である。このギャップを解消するために,4つのサブタスクの特徴に適応した様々な手法による中国語命令データセット作成の試みとして提案する。我々は、品質を保証するために手作業でチェックされた約200万の中国語命令チューニングサンプルを収集した。また、既存の英語と中国語の命令コーパスを要約し、新たに構築された中国語の命令コーパスの潜在的な応用を簡潔に述べる。得られた \textbf{C}hinese \textbf{O}pen \textbf{I}nstruction \textbf{G}eneralist (\textbf{COIG}) corpora は Huggingface\footnote{\url{https://huggingface.co/datasets/BAAI/COIG}} と Github\footnote{\url{https://github.com/BAAI-Zlab/COIG}} で利用可能で、継続的に更新される。 Instruction tuning is widely recognized as a key technique for building generalist language models, which has attracted the attention of researchers and the public with the release of InstructGPT~\citep{ouyang2022training} and ChatGPT\footnote{\url{https://chat.openai.com/}}. Despite impressive progress in English-oriented large-scale language models (LLMs), it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning. To remedy this gap, we propose the project as an attempt to create a Chinese instruction dataset by various methods adapted to the intrinsic characteristics of 4 sub-tasks. We collect around 200k Chinese instruction tuning samples, which have been manually checked to guarantee high quality. We also summarize the existing English and Chinese instruction corpora and briefly describe some potential applications of the newly constructed Chinese instruction corpora. The resulting \textbf{C}hinese \textbf{O}pen \textbf{I}nstruction \textbf{G}eneralist (\textbf{COIG}) corpora are available in Huggingface\footnote{\url{https://huggingface.co/datasets/BAAI/COIG}} and Github\footnote{\url{https://github.com/BAAI-Zlab/COIG}}, and will be continuously updated.	翻訳日:2023-04-26 23:33:09 公開日:2023-04-25
# 大規模言語モデルに関する調査 A Survey of Large Language Models ( http://arxiv.org/abs/2303.18223v7 ) ライセンス: Link先を確認	Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie and Ji-Rong Wen	(参考訳) 言語は基本的に、文法規則によって支配される人間の表現の複雑な複雑な体系である。言語を理解・把握するための有能なaiアルゴリズムを開発することは大きな課題となる。主要なアプローチとして、言語モデリングは過去20年間、言語理解と生成のために広く研究され、統計的言語モデルから神経言語モデルへと進化してきた。近年,大規模コーパス上でのトランスフォーマモデルによる事前学習言語モデル (plms) が提案されている。モデルスケーリングがパフォーマンス改善につながることを研究者は発見しているので、モデルサイズをさらに大きくすることで、スケーリング効果をさらに研究している。興味深いことに、パラメータスケールが一定のレベルを超えると、これらの拡張言語モデルは大幅な性能向上を達成するだけでなく、小規模な言語モデルには存在しない特別な能力を示す。パラメータスケールの違いを識別するために、研究コミュニティは、大きなサイズのplmに対して、大言語モデル(llm)という用語を生み出した。近年、LLMの研究は学術と産業の両方で大きく進歩しており、ChatGPTの立ち上げが目覚ましい進歩であり、社会から広く注目を集めている。 LLMの技術的な進化は、AIアルゴリズムの開発と使用方法に革命をもたらすような、AIコミュニティ全体に重要な影響を与えています。本稿では, LLMの最近の進歩について, 背景, 重要な発見, 主流技術を紹介して概観する。特に,事前トレーニング,適応チューニング,利用,キャパシティ評価という,llmの主な4つの側面に注目した。さらに,llm開発のための利用可能なリソースを要約するとともに,今後の課題についても論じる。 Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.	翻訳日:2023-04-26 23:32:25 公開日:2023-04-25
# カラムローアンタングル型画素合成による高効率スケール不変発電機 Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis ( http://arxiv.org/abs/2303.14157v3 ) ライセンス: Link先を確認	Thuan Hoang Nguyen, Thanh Van Le, Anh Tran	(参考訳) 任意のスケールの画像合成は、任意のスケールで写真リアルな画像を合成する、効率的でスケーラブルなソリューションを提供する。しかし、既存のGANベースのソリューションは畳み込みと階層アーキテクチャに過度に依存するため、出力解像度をスケールする際、一貫性と$``$texture sticking$"$問題が発生する。別の観点では、inrベースのジェネレータは設計によってスケール等価であるが、その巨大なメモリフットプリントと遅い推論は、大規模またはリアルタイムシステムでこれらのネットワークを採用することを妨げている。本研究では,空間的畳み込みや粗雑な設計を使わずに,効率的かつスケール等価な新しい生成モデルである$\textbf{c}$olumn-$\textbf{r}$ow$\textbf{e}$ntangled$\textbf{p}$ixel$\textbf{s}$ynthesis (\textbf{creps}$)を提案する。メモリフットプリントを節約し、システムをスケーラブルにするために、レイヤ毎の機能マップを$`$thick$"$カラムと行エンコーディングに分割する、新しい双方向表現を採用しました。 FFHQ、LSUN-Church、MetFaces、Flickr-Sceneryといったさまざまなデータセットの実験では、CREPSが適切なトレーニングと推論速度で任意の解像度でスケール一貫性とエイリアスのない画像を合成する能力を確認している。コードはhttps://github.com/VinAIResearch/CREPS.comから入手できる。 Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.	翻訳日:2023-04-26 23:31:56 公開日:2023-04-25
# プライバシーラベルの概要とプライバシーポリシーとの互換性 The Overview of Privacy Labels and their Compatibility with Privacy Policies ( http://arxiv.org/abs/2303.08213v2 ) ライセンス: Link先を確認	Rishabh Khandelwal, Asmit Nayak, Paul Chung and Kassem Fawaz	(参考訳) プライバシー栄養ラベルは、長く読みにくいプライバシーポリシーを読むことなく、アプリの重要なデータプラクティスを理解する方法を提供する。最近、ios(apple)とandroid(google)のアプリ配布プラットフォームは、アプリ開発者にデータ収集、データ共有、セキュリティプラクティスなどのプライバシプラクティスを強調するプライバシー栄養ラベルを満たさなければならないという義務を課している。これらのプライバシラベルには、各データタイプに関連するデータタイプや目的など、アプリのデータプラクティスに関する非常に詳細な情報が含まれている。これにより、アプリケーションのデータプラクティスを大規模に理解するための、ユニークなヴァンテージポイントが得られます。 Privacy nutrition labels provide a way to understand an app's key data practices without reading the long and hard-to-read privacy policies. Recently, the app distribution platforms for iOS(Apple) and Android(Google) have implemented mandates requiring app developers to fill privacy nutrition labels highlighting their privacy practices such as data collection, data sharing, and security practices. These privacy labels contain very fine-grained information about the apps' data practices such as the data types and purposes associated with each data type. This provides us with a unique vantage point from which we can understand apps' data practices at scale.	翻訳日:2023-04-26 23:31:18 公開日:2023-04-25
# 融合型グラフ状態生成のグラフ理論的最適化 Graph-theoretical optimization of fusion-based graph state generation ( http://arxiv.org/abs/2304.11988v2 ) ライセンス: Link先を確認	Seok-Hyung Lee and Hyunseok Jeong	(参考訳) グラフ状態は、測定ベースの量子コンピューティングや量子リピータなど、様々な量子情報処理タスクのための汎用的なリソースである。タイプII融合ゲートは、小さなグラフ状態を組み合わせることで全光学的なグラフ状態の生成を可能にするが、その非決定論的性質は大きなグラフ状態の効率的な生成を妨げる。本稿では,Python パッケージ OptGraphState とともに,任意のグラフ状態の融合ベースの生成を効果的に最適化するグラフ理論戦略を提案する。我々の戦略は、対象のグラフ状態を単純化し、融合ネットワークを構築し、融合の順序を決定する3つの段階からなる。提案手法を用いることで,ランダムグラフとよく知られたグラフの資源オーバーヘッドを評価する。われわれの戦略とソフトウェアは、フォトニックグラフ状態を用いた実験可能なスキームの開発と評価を支援することを期待している。 Graph states are versatile resources for various quantum information processing tasks, including measurement-based quantum computing and quantum repeaters. Although the type-II fusion gate enables all-optical generation of graph states by combining small graph states, its non-deterministic nature hinders the efficient generation of large graph states. In this work, we present a graph-theoretical strategy to effectively optimize fusion-based generation of any given graph state, along with a Python package OptGraphState. Our strategy comprises three stages: simplifying the target graph state, building a fusion network, and determining the order of fusions. Utilizing this proposed method, we evaluate the resource overheads of random graphs and various well-known graphs. We expect that our strategy and software will assist researchers in developing and assessing experimentally viable schemes that use photonic graph states.	翻訳日:2023-04-26 23:25:50 公開日:2023-04-25
# 説明可能なAIにおける異文化倫理の実践に向けて Towards a Praxis for Intercultural Ethics in Explainable AI ( http://arxiv.org/abs/2304.11861v2 ) ライセンス: Link先を確認	Chinasa T. Okolo	(参考訳) 説明可能なAI(XAI)は、機械学習モデルがどのように機能し、予測を生成するかを理解するのに役立つというアイデアで、しばしば推奨される。それでも、これらのメリットのほとんどは、マシンラーニング開発者など、専門的なドメイン知識を持つ人たちに限られています。最近の研究は、AIを説明可能なものにすることは、特にグローバル・サウスの低リソース領域において、現実の文脈でAIをより便利にするための実行可能な方法である、と論じている。 AIは国境を越えたが、限られた作業は説明可能なAIの概念を「大国」に民主化することに集中しており、文化的、社会的に異なる領域のユーザーのニーズを満たす新しいアプローチを探求し開発する余地が残っている。本稿では,文化間倫理アプローチの概念について紹介する。文化的ニュアンスがテクノロジの採用と利用にどのように影響するか、aiのような技術的概念がいかに説明されるかを妨げる要因、そしてxaiの開発における文化間倫理アプローチの統合がユーザ理解を改善し、これらの手法の効率的な利用を促進するかを検討する。 Explainable AI (XAI) is often promoted with the idea of helping users understand how machine learning models function and produce predictions. Still, most of these benefits are reserved for those with specialized domain knowledge, such as machine learning developers. Recent research has argued that making AI explainable can be a viable way of making AI more useful in real-world contexts, especially within low-resource domains in the Global South. While AI has transcended borders, a limited amount of work focuses on democratizing the concept of explainable AI to the "majority world", leaving much room to explore and develop new approaches within this space that cater to the distinct needs of users within culturally and socially-diverse regions. This article introduces the concept of an intercultural ethics approach to AI explainability. It examines how cultural nuances impact the adoption and use of technology, the factors that impede how technical concepts such as AI are explained, and how integrating an intercultural ethics approach in the development of XAI can improve user understanding and facilitate efficient usage of these methods.	翻訳日:2023-04-26 23:25:37 公開日:2023-04-25
# Gen-NeRF:アルゴリズム・ハードウエア共同設計による効率的で一般化可能なニューラルラジアンス場 Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design ( http://arxiv.org/abs/2304.11842v2 ) ライセンス: Link先を確認	Yonggan Fu, Zhifan Ye, Jiayi Yuan, Shunyao Zhang, Sixu Li, Haoran You, Yingyan Lin	(参考訳) 新しいビュー合成は、様々な拡張現実および仮想現実(AR/VR)アプリケーションにおいて没入型体験を可能にするために不可欠な機能であり、そのクロスシーンの一般化能力により、一般化可能なニューラルレイディアンス場(NeRF)が人気を博している。それらの約束にもかかわらず、一般化可能なNeRFの実際のデバイス展開は、シーン機能を取得するために大量のメモリアクセスを必要とするため、その禁止的な複雑さによってボトルネックになり、レイマーチングプロセスはメモリバウンドになる。この目的のために,提案するGen-NeRFは,リアルタイムに一般化可能なNeRFを初めて実現可能な,一般化可能なNeRFアクセラレーション専用のアルゴリズムハードウェアの共同設計フレームワークである。アルゴリズム側では、gen-nerfは3dシーンの異なる領域がレンダリングされたピクセルに異なる貢献をするという事実を利用して、粗く効果的なサンプリング戦略を統合する。ハードウェア面では、Gen-NeRFは、そのエピポーラ幾何学的関係を利用して、異なる光線間でのデータ再利用機会を最大化するアクセラレーターマイクロアーキテクチャを強調している。さらに、Gen-NeRFアクセラレータは、ポイント・ツー・ハードウエアマッピング時のデータの局所性を向上するカスタマイズされたデータフローと、メモリバンク競合を最小限に抑える最適化されたシーン特徴記憶戦略を備えている。提案するGen-NeRFフレームワークがリアルタイムかつ一般化可能な新規ビュー合成に有効であることを示す。 Novel view synthesis is an essential functionality for enabling immersive experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for which generalizable Neural Radiance Fields (NeRFs) have gained increasing popularity thanks to their cross-scene generalization capability. Despite their promise, the real-device deployment of generalizable NeRFs is bottlenecked by their prohibitive complexity due to the required massive memory accesses to acquire scene features, causing their ray marching process to be memory-bounded. To this end, we propose Gen-NeRF, an algorithm-hardware co-design framework dedicated to generalizable NeRF acceleration, which for the first time enables real-time generalizable NeRFs. On the algorithm side, Gen-NeRF integrates a coarse-then-focus sampling strategy, leveraging the fact that different regions of a 3D scene contribute differently to the rendered pixel, to enable sparse yet effective sampling. On the hardware side, Gen-NeRF highlights an accelerator micro-architecture to maximize the data reuse opportunities among different rays by making use of their epipolar geometric relationship. Furthermore, our Gen-NeRF accelerator features a customized dataflow to enhance data locality during point-to-hardware mapping and an optimized scene feature storage strategy to minimize memory bank conflicts. Extensive experiments validate the effectiveness of our proposed Gen-NeRF framework in enabling real-time and generalizable novel view synthesis.	翻訳日:2023-04-26 23:25:18 公開日:2023-04-25
# 階層拡散オートエンコーダと異方性画像操作 Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation ( http://arxiv.org/abs/2304.11829v2 ) ライセンス: Link先を確認	Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu	(参考訳) 拡散モデルは画像合成のための印象的な視覚品質を達成している。しかし、拡散モデルの潜在空間を解釈し、操作する方法は広く研究されていない。以前の作業の拡散オートエンコーダは、セマンティック表現をセマンティックな潜在コードにエンコードする。これらの制限を緩和するために,拡散モデルの潜在空間に対して,細粒度と低レベルの特徴階層を利用する階層型拡散オートエンコーダ(HDAE)を提案する。 HDAEの階層的潜在空間は本質的に異なる抽象的な意味論のレベルを符号化し、より包括的な意味表現を提供する。さらに,不連続画像操作のための切断特徴に基づくアプローチを提案する。提案手法の有効性を,画像再構成,スタイル混合,制御可能な補間,ディテール保存・アンタングル画像操作,マルチモーダル・セマンティック画像合成に応用して検証した。 Diffusion models have attained impressive visual quality for image synthesis. However, how to interpret and manipulate the latent space of diffusion models has not been extensively explored. Prior work diffusion autoencoders encode the semantic representations into a semantic latent code, which fails to reflect the rich information of details and the intrinsic feature hierarchy. To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models. The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations. In addition, we propose a truncated-feature-based approach for disentangled image manipulation. We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable interpolation, detail-preserving and disentangled image manipulation, and multi-modal semantic image synthesis.	翻訳日:2023-04-26 23:24:51 公開日:2023-04-25
# 部分閉塞に対するロバストなアプローチは Now You See Me: Robust approach to Partial Occlusions ( http://arxiv.org/abs/2304.11779v2 ) ライセンス: Link先を確認	Karthick Prasad Gunasekaran, Nikita Jaiman	(参考訳) オブジェクトの排除はコンピュータビジョンにおいて不可欠である問題の1つである。畳み込みニューラルネットワークス(CNN)は、正規画像分類のための様々な手法を提供するが、部分閉塞画像の分類には効果がないことが証明されている。部分閉塞(partial occlusion)は、オブジェクトが他のオブジェクト/スペースによって部分的に閉塞されるシナリオである。この問題が解決されると、さまざまなシナリオを促進する大きな可能性を秘めます。特に私たちは、自動運転のシナリオとその影響に関心を持っています。自動運転車の研究は、この10年でもっともホットな話題の1つであり、運転標識や人や物体を異なる角度で隠蔽する状況が数多くある。犯罪の処理、様々なグループの所得水準の予測など、交通データのビデオ分析にさらに拡張できる状況において、その重要さを考えると、多くの面で活用される可能性がある。本稿では,Stanford Car Datasetを応用し,さまざまなサイズと性質のオクルージョンを付加することで,私たち独自の合成データセットを導入する。作成したデータセットでは,VGG-19,ResNet 50/101,GoogleNet,DenseNet 121などのアートCNNモデルのさまざまな状態を用いて総合解析を行った。さらに,これらをスクラッチから微調整し,データセットにトレーニングすることにより,これらのモデルの性能に及ぼす咬合比率と性質の変化の影響を深く研究し,異なるシナリオでトレーニングした場合,すなわち,オクルード画像と未オクルード画像を用いたトレーニング時のパフォーマンスが,部分的オクルージョンに対してより頑健なものになるかについて検討した。 Occlusions of objects is one of the indispensable problems in Computer vision. While Convolutional Neural Net-works (CNNs) provide various state of the art approaches for regular image classification, they however, prove to be not as effective for the classification of images with partial occlusions. Partial occlusion is scenario where an object is occluded partially by some other object/space. This problem when solved,holds tremendous potential to facilitate various scenarios. We in particular are interested in autonomous driving scenario and its implications in the same. Autonomous vehicle research is one of the hot topics of this decade, there are ample situations of partial occlusions of a driving sign or a person or other objects at different angles. Considering its prime importance in situations which can be further extended to video analytics of traffic data to handle crimes, anticipate income levels of various groups etc.,this holds the potential to be exploited in many ways. In this paper, we introduce our own synthetically created dataset by utilising Stanford Car Dataset and adding occlusions of various sizes and nature to it. On this created dataset, we conducted a comprehensive analysis using various state of the art CNN models such as VGG-19, ResNet 50/101, GoogleNet, DenseNet 121. We further in depth study the effect of varying occlusion proportions and nature on the performance of these models by fine tuning and training these from scratch on dataset and how is it likely to perform when trained in different scenarios, i.e., performance when training with occluded images and unoccluded images, which model is more robust to partial occlusions and soon.	翻訳日:2023-04-26 23:24:32 公開日:2023-04-25
# NAIST-SIC対応英語・日本語同時翻訳コーパス NAIST-SIC-Aligned: Automatically-Aligned English-Japanese Simultaneous Interpretation Corpus ( http://arxiv.org/abs/2304.11766v2 ) ライセンス: Link先を確認	Jinming Zhao, Yuka Ko, Kosuke Doi, Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura	(参考訳) 同時解釈(si)データが同時機械翻訳(simt)にどのように影響するかは疑問である。大規模なトレーニングコーパスがないため、研究は限られている。本稿では,自動アライメントされた英日siデータセットであるnaist-sic-alignedを導入することで,このギャップを埋めることを目的とする。非整合コーパスNAIST-SIC から,コーパスを並列化してモデルトレーニングに適した2段階アライメント手法を提案する。第1段階は、ソース文とターゲット文の多対多マッピングを行う粗いアライメントであり、第2段階は、アライメントペアの品質を向上させるために、イントラ・インター・センテンスフィルタリングを行う細粒度のアライメントである。コーパスの品質を確保するため、各ステップは定量的または質的に検証されている。これは文献における最初のオープンソースの大規模並列SIデータセットである。評価目的の小さなテストセットも手作業でキュレートしました。 SIコーパスの構築とSiMTの研究が進むことを願っている。データは \url{https://github.com/mingzi151/ahc-si} にある。 It remains a question that how simultaneous interpretation (SI) data affects simultaneous machine translation (SiMT). Research has been limited due to the lack of a large-scale training corpus. In this work, we aim to fill in the gap by introducing NAIST-SIC-Aligned, which is an automatically-aligned parallel English-Japanese SI dataset. Starting with a non-aligned corpus NAIST-SIC, we propose a two-stage alignment approach to make the corpus parallel and thus suitable for model training. The first stage is coarse alignment where we perform a many-to-many mapping between source and target sentences, and the second stage is fine-grained alignment where we perform intra- and inter-sentence filtering to improve the quality of aligned pairs. To ensure the quality of the corpus, each step has been validated either quantitatively or qualitatively. This is the first open-sourced large-scale parallel SI dataset in the literature. We also manually curated a small test set for evaluation purposes. We hope our work advances research on SI corpora construction and SiMT. Please find our data at \url{https://github.com/mingzi151/AHC-SI}.	翻訳日:2023-04-26 23:24:04 公開日:2023-04-25
# 倫理的・哲学的原理による信頼できる医療人工知能の確立 Ensuring Trustworthy Medical Artificial Intelligencethrough Ethical and Philosophical Principles ( http://arxiv.org/abs/2304.11530v2 ) ライセンス: Link先を確認	Debesh Jha, Ashish Rauniyar, Abhiskek Srivastava, Desta Haileselassie Hagos, Nikhil Kumar Tomar, Vanshali Sharma, Elif Keles, Zheyuan Zhang, Ugur Demir, Ahmet Topcu, Anis Yazidi, Jan Erik H{\aa}akeg{\aa}rd, and Ulas Bagci	(参考訳) 人工知能(AI)手法は、医療専門家や患者の経験を高めることで、多くの医療に革命をもたらす可能性がある。 aiベースのコンピュータ支援診断ツールは、臨床専門家のレベルに匹敵する能力や性能を発揮できれば、非常に有益である。その結果、先進的な医療サービスは発展途上国では手頃な価格で提供でき、専門医の欠如の問題にも対処できる。 AIベースのツールは、患者の治療の時間、リソース、全体的なコストを節約できる。さらに、人間とは対照的に、AIは大量の入力からデータの複雑な関係を明らかにし、医学における新たなエビデンスベースの知識へと導くことができる。しかし、医療におけるAIの統合は、バイアス、透明性、自律性、責任、説明責任など、いくつかの倫理的および哲学的な懸念を提起する。本稿では、AIを用いた医療画像分析の最近の進歩、既存の標準、および臨床現場におけるAIの応用のための倫理的問題やベストプラクティスを理解することの重要性を強調する。我々は、AIの技術的および倫理的課題と、病院や公共機関にAIを配置することの意味について取り上げる。また、倫理的課題、データ不足、人種的バイアス、透明性の欠如、アルゴリズム的バイアスに対処するための重要な手段と手法についても論じる。最後に、私たちは、医療アプリケーションにおけるAIに関連する倫理的課題に対処するための推奨事項と今後の方向性を提供し、このワークフローをより効率的に、正確で、アクセス可能で、透明で、世界中の患者に信頼できるものにするために、AIを臨床環境にデプロイすることを目的としています。 Artificial intelligence (AI) methods have great potential to revolutionize numerous medical care by enhancing the experience of medical experts and patients. AI based computer-assisted diagnosis tools can have a tremendous benefit if they can outperform or perform similarly to the level of a clinical expert. As a result, advanced healthcare services can be affordable in developing nations, and the problem of a lack of expert medical practitioners can be addressed. AI based tools can save time, resources, and overall cost for patient treatment. Furthermore, in contrast to humans, AI can uncover complex relations in the data from a large set of inputs and even lead to new evidence-based knowledge in medicine. However, integrating AI in healthcare raises several ethical and philosophical concerns, such as bias, transparency, autonomy, responsibility and accountability, which must be addressed before integrating such tools into clinical settings. In this article, we emphasize recent advances in AI-assisted medical image analysis, existing standards, and the significance of comprehending ethical issues and best practices for the applications of AI in clinical settings. We cover the technical and ethical challenges of AI and the implications of deploying AI in hospitals and public organizations. We also discuss promising key measures and techniques to address the ethical challenges, data scarcity, racial bias, lack of transparency, and algorithmic bias. Finally, we provide our recommendation and future directions for addressing the ethical challenges associated with AI in healthcare applications, with the goal of deploying AI into the clinical settings to make the workflow more efficient, accurate, accessible, transparent, and reliable for the patient worldwide.	翻訳日:2023-04-26 23:23:46 公開日:2023-04-25
# プロンプティングによる大規模言語モデルの性能向上 Boosting Theory-of-Mind Performance in Large Language Models via Prompting ( http://arxiv.org/abs/2304.11490v2 ) ライセンス: Link先を確認	Shima Rahimi Moghaddam, Christopher J. Honey	(参考訳) 大規模言語モデル(llm)は2023年に多くのタスクで優れているが、複雑な推論では依然として課題に直面している。エージェントの信念、目標、精神状態を理解することを必要とする理論・オブ・ミンド(ToM)タスクは、人間を含む常識的推論に不可欠であり、この分野におけるLLMのパフォーマンスを高めることが不可欠である。本研究では, GPT-4 と 3 つの GPT-3.5 変種 (Davinci-2, Davinci-3, GPT-3.5-Turbo) のTOM 性能を測定し, テキスト内学習の有効性を検討した。思考推論の2ショット連鎖とステップバイステップ思考指示を特徴とするプロンプトを評価した。人間のフィードバックからの強化学習(RLHF)で訓練したLSM(Davinci-2を除く全てのモデル)は、文脈内学習によりToMの精度を向上させた。 GPT-4はゼロショットで最高の性能を示し、80%の精度に達したが、それでもテストセットの87%の精度には届かなかった。しかし、インコンテキスト学習のプロンプトを供給された場合、全てのRLHF学習LLMは80%ToMの精度を達成し、GPT-4は100%に達した。これらの結果は、適切なプロンプトがLLM ToM推論を促進することを示し、LLM認知能力の文脈依存性を強調している。 Large language models (LLMs) excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents' beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but still fell short of the 87% human accuracy on the test set. However, when supplied with prompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToM accuracy, with GPT-4 reaching 100%. These results demonstrate that appropriate prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacities.	翻訳日:2023-04-26 23:23:17 公開日:2023-04-25
# 積分近似の改良による拡散型サンプリングプロセスの高速化について On Accelerating Diffusion-Based Sampling Process via Improved Integration Approximation ( http://arxiv.org/abs/2304.11328v2 ) ライセンス: Link先を確認	Guoqiang Zhang, Niwa Kenta, W. Bastiaan Kleijn	(参考訳) 1つの一般的な拡散に基づくサンプリング戦略は、逆常微分方程式(ODE)を効果的に解こうとするものである。得られたODEソルバの係数は、ODE定式化、逆離散時間ステップ、および使用されるODE法により予め決定される。本稿では,改良された積分近似(IIA)により,特定の係数を最適化することにより,人気のあるODEベースのサンプリングプロセスの高速化を検討する。各逆時間ステップにおいて、選択された係数に対して平均二乗誤差(MSE)関数を最小化する。 MSEは、元のODEソルバを一連の微細な時間ステップに適用し、原理的には次の拡散隠れ状態を予測するためのより正確な積分近似を与える。事前学習された拡散モデルが与えられた場合、特定の数の神経機能評価(nfes)のためのiaaの手順は、サンプルのバッチで1回だけ行う必要がある。選択された係数に対する最小MSE (MMSE) による最適解は、後に復元され再利用され、サンプリングプロセスが高速化される。 EDMおよびDDIMの広範囲にわたる実験により、IIA法はNFEの数が小さい場合に顕著な性能向上をもたらすことが示された。 One popular diffusion-based sampling strategy attempts to solve the reverse ordinary differential equations (ODEs) effectively. The coefficients of the obtained ODE solvers are pre-determined by the ODE formulation, the reverse discrete timesteps, and the employed ODE methods. In this paper, we consider accelerating several popular ODE-based sampling processes by optimizing certain coefficients via improved integration approximation (IIA). At each reverse timestep, we propose to minimize a mean squared error (MSE) function with respect to certain selected coefficients. The MSE is constructed by applying the original ODE solver for a set of fine-grained timesteps which in principle provides a more accurate integration approximation in predicting the next diffusion hidden state. Given a pre-trained diffusion model, the procedure for IIA for a particular number of neural functional evaluations (NFEs) only needs to be conducted once over a batch of samples. The obtained optimal solutions for those selected coefficients via minimum MSE (MMSE) can be restored and reused later on to accelerate the sampling process. Extensive experiments on EDM and DDIM show the IIA technique leads to significant performance gain when the numbers of NFEs are small.	翻訳日:2023-04-26 23:22:51 公開日:2023-04-25
# UBC-DLNLP at SemEval-2023 Task 12:Transfer Learning がアフリカ感情分析に及ぼす影響 UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis ( http://arxiv.org/abs/2304.11256v2 ) ライセンス: Link先を確認	Gagan Bhatia, Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed	(参考訳) 我々は2023afrisenti-semeval共有タスクへの我々の貢献について述べ、そこでは14の異なるアフリカの言語における感情分析のタスクに取り組む。完全教師付き設定(サブタスクAとB)の下で単言語モデルと多言語モデルの両方を開発する。また、ゼロショット設定(サブタスクC)のモデルも開発する。私たちのアプローチでは、6つの言語モデルを使って転送学習を実験します。開発データではf1-scoreが70.36、テストデータではf1-scoreが66.13である。当然のことながら、複数の言語にわたる感情分析のための伝達学習と微調整技術の有効性を示した。我々のアプローチは、異なる言語やドメインにおける他の感情分析タスクに適用できる。 We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared task, where we tackle the task of sentiment analysis in 14 different African languages. We develop both monolingual and multilingual models under a full supervised setting (subtasks A and B). We also develop models for the zero-shot setting (subtask C). Our approach involves experimenting with transfer learning using six language models, including further pertaining of some of these models as well as a final finetuning stage. Our best performing models achieve an F1-score of 70.36 on development data and an F1-score of 66.13 on test data. Unsurprisingly, our results demonstrate the effectiveness of transfer learning and fine-tuning techniques for sentiment analysis across multiple languages. Our approach can be applied to other sentiment analysis tasks in different languages and domains.	翻訳日:2023-04-26 23:22:29 公開日:2023-04-25
# 3次元物体検出のための完全スパース融合 Fully Sparse Fusion for 3D Object Detection ( http://arxiv.org/abs/2304.12310v2 ) ライセンス: Link先を確認	Yingyan Li, Lue Fan, Yang Liu, Zehao Huang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang and Tieniu Tan	(参考訳) 現在一般的なマルチモーダル3d検出手法は、通常高密度バードズ・アイビュー(bev)特徴マップを使用するlidarベースの検出器上に構築されている。しかし、このようなBEV特徴マップのコストは検出範囲に2次的であるため、長距離検出には適さない。完全にスパースなアーキテクチャは、長距離知覚において非常に効率的であるため注目されている。本稿では,新たに出現するフルスパースアーキテクチャにおいて,画像のモダリティを効果的に活用する方法を検討する。特にインスタンスクエリを利用することで,十分に研究された2dインスタンスセグメンテーションをlidar側に統合し,完全なスパース検出器内の3dインスタンスセグメンテーション部分と並列化する。この設計は,完全スパース特性を維持しつつ,2次元と3次元の両面に均一なクエリベースの融合フレームワークを実現する。広範な実験では、広く使われているnuscenesデータセットとlong-range argoverse 2データセットの最先端の結果が示されている。特に、長距離LiDAR認識設定における提案手法の推論速度は、他の最先端マルチモーダル3D検出方法よりも2.7$\times$である。コードは \url{https://github.com/BraveGroup/FullySparseFusion} でリリースされる。 Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not suitable for long-range detection. Fully sparse architecture is gaining attention as they are highly efficient in long-range perception. In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture. Particularly, utilizing instance queries, our framework integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the fully sparse detector. This design achieves a uniform query-based fusion framework in both the 2D and 3D sides while maintaining the fully sparse characteristic. Extensive experiments showcase state-of-the-art results on the widely used nuScenes dataset and the long-range Argoverse 2 dataset. Notably, the inference speed of the proposed method under the long-range LiDAR perception setting is 2.7 $\times$ faster than that of other state-of-the-art multimodal 3D detection methods. Code will be released at \url{https://github.com/BraveGroup/FullySparseFusion}.	翻訳日:2023-04-26 23:14:01 公開日:2023-04-25
# 3次元偏波空間モードを持つ高次元量子鍵分布の逆設計による可変ベクトルビームデコーダ Tunable vector beam decoder by inverse design for high-dimensional quantum key distribution with 3D polarized spatial modes ( http://arxiv.org/abs/2304.12296v2 ) ライセンス: Link先を確認	Eileen Otte (1), Alexander D. White (2), Nicholas A. G\"usken (1), Jelena Vu\v{c}kovi\'c (2), Mark L. Brongersma (1) ((1) Geballe Laboratory for Advance Materials, Stanford University, Stanford, CA, USA, (2) E. L. Ginzton Laboratory, Stanford University, Stanford, CA, USA)	(参考訳) 光の空間モードは次元を増やすために非常に魅力的になり、量子鍵分布(QKD)におけるセキュリティと情報容量が増大している。これまでは横電界成分のみが検討されてきたが、縦偏光成分は無視されている。本稿では,qkdにおける電界振動の3つの空間次元を,波長可変なオン・ア・チップベクトルビームデコーダ(vbd)を実装して包含する手法を提案する。この逆設計装置は、高次元(HD)QKDに対する3次元偏光非偏光基底状態の「準備」と「測定」を開拓し、多機能オンチップフォトニクスプラットフォームにおける空間モードとHD QKDの統合の道を開く。 Spatial modes of light have become highly attractive to increase the dimension and, thereby, security and information capacity in quantum key distribution (QKD). So far, only transverse electric field components have been considered, while longitudinal polarization components have remained neglected. Here, we present an approach to include all three spatial dimensions of electric field oscillation in QKD by implementing our tunable, on-a-chip vector beam decoder (VBD). This inversely designed device pioneers the "preparation" and "measurement" of three-dimensionally polarized mutually unbiased basis states for high-dimensional (HD) QKD and paves the way for the integration of HD QKD with spatial modes in multifunctional on-a-chip photonics platforms.	翻訳日:2023-04-26 23:13:41 公開日:2023-04-25
# usa-net: ロボットメモリのための統一意味表現とアフォーアンス表現 USA-Net: Unified Semantic and Affordance Representations for Robot Memory ( http://arxiv.org/abs/2304.12164v2 ) ライセンス: Link先を確認	Benjamin Bolte, Austin Wang, Jimmy Yang, Mustafa Mukadam, Mrinal Kalakrishnan, Chris Paxton	(参考訳) ロボットが「シンクの上に茶色のキャビネットを開く」といったオープンエンドの指示に従うためには、シーンの幾何学と環境の意味の両方を理解する必要がある。ロボットシステムは、しばしばこれらを別々のパイプラインを通して処理し、しばしば非常に異なる表現空間を使用する。本研究では,シーンのセマンティクスと空間的余裕の両方を識別可能な地図にエンコードする世界表現を構築するための簡易な方法であるUSA-Netを提案する。これにより、オープンエンド語彙を用いて指定されたシーンの場所をナビゲートできる勾配ベースのプランナーを構築することができる。私たちは、このプランナーを使って、勾配情報を利用していないグリッドベースのプランナーのパスよりも、CLIP埋め込みスペースのゴールクエリよりも10-30%短い5-10%短いトラジェクトリを生成します。私たちの知る限り、これは1つの暗黙のマップで意味論と余裕の両方を最適化する最初のエンドツーエンドの微分可能なプランナーです。コードとビジュアルは、私たちのウェブサイトで利用可能です。 In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present USA-Net, a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website: https://usa.bolte.cc/	翻訳日:2023-04-26 23:13:23 公開日:2023-04-25
# 重要ノードのブリッジネス同定によるスキップグラムに基づくノード埋め込みのポストホック説明の生成 Generating Post-hoc Explanations for Skip-gram-based Node Embeddings by Identifying Important Nodes with Bridgeness ( http://arxiv.org/abs/2304.12036v2 ) ライセンス: Link先を確認	Hogun Park and Jennifer Neville	(参考訳) ネットワーク内のノード表現学習は、ネットワーク固有の特性と構造を保持しながら、連続ベクトル空間内の関係情報を符号化する重要な機械学習技術である。近年,Skip-gramモデルからDeepWalk,LINE,struc2vec,PTE,UserItem2vec,RWJBGなどの教師なしノード埋め込み手法が登場し,既存のリレーショナルモデルよりもノード分類やリンク予測などの下流タスクで性能が向上している。しかし, 埋込法や理論研究が欠如していることから, 埋込法に関するポストホックな説明は難しい問題である。本稿では,Skip-gramをベースとした埋め込みのグローバルな説明は,スペクトルクラスタを意識した局所摂動下でのブリッジネスの計算によって得られることを示す。さらに, 学習グラフ埋め込みベクトルに関するトップq大域的説明をより効率的に行うために, graph-wgd と呼ぶ新しい勾配に基づく説明法を提案する。実験により, Graph-wGD を用いたスコアによるノードのランク付けは, 真のブリッジネススコアと高い相関性を示した。また, Graph-wGD が選択したトップqノードレベルの説明は,5つの実世界のグラフを用いて,近年の代替案で選択されたノードと比較して,より重要度が高く,乱れ時にクラスラベルの予測値が大きく変化する。 Node representation learning in a network is an important machine learning technique for encoding relational information in a continuous vector space while preserving the inherent properties and structures of the network. Recently, unsupervised node embedding methods such as DeepWalk, LINE, struc2vec, PTE, UserItem2vec, and RWJBG have emerged from the Skip-gram model and perform better performance in several downstream tasks such as node classification and link prediction than the existing relational models. However, providing post-hoc explanations of Skip-gram-based embeddings remains a challenging problem because of the lack of explanation methods and theoretical studies applicable for embeddings. In this paper, we first show that global explanations to the Skip-gram-based embeddings can be found by computing bridgeness under a spectral cluster-aware local perturbation. Moreover, a novel gradient-based explanation method, which we call GRAPH-wGD, is proposed that allows the top-q global explanations about learned graph embedding vectors more efficiently. Experiments show that the ranking of nodes by scores using GRAPH-wGD is highly correlated with true bridgeness scores. We also observe that the top-q node-level explanations selected by GRAPH-wGD have higher importance scores and produce more changes in class label prediction when perturbed, compared with the nodes selected by recent alternatives, using five real-world graphs.	翻訳日:2023-04-26 23:13:06 公開日:2023-04-25
# 2次元 $\pm J$ Ising モデルの非平衡臨界ダイナミクス Nonequilibrium critical dynamics of the bi-dimensional $\pm J$ Ising model ( http://arxiv.org/abs/2304.11997v2 ) ライセンス: Link先を確認	Ramgopal Agrawal, Leticia F. Cugliandolo, and Marco Picco	(参考訳) $\pm J$ Ising モデルは単純なフラストレーションのスピンモデルであり、交換結合は独立に確率$p$の離散値 $-J$ と確率$-p$の $+J$ を取る。量子誤り訂正符号との接続により特に魅力的である。本稿では,二次元$\pm j$ isingモデルの非平衡臨界挙動を,初期条件の異なる点から常磁性強磁性(pf)遷移線上の臨界点$t_c(p)$へのクエンチ後の非平衡臨界挙動,特に,多臨界西森点(np)以下について検討する。動的臨界指数 $z_c$ は、NP の反発的固定点による漸近前特徴として同定される NP の上下のクエンチの非普遍的挙動を示すようである。一方、NPに直接クエンチすると、このダイナミクスは、z_c \simeq 6.02(6)$で漸近状態に達する。また、臨界ダイナミクス中に(スピンサインのように)幾何学的なスピンクラスターを考える。 PFライン上の各普遍性クラスは、対応するパラメータ $\kappa$ を持つ確率ローナー進化(SLE)によって特徴付けられる。さらに, パラ磁性相からの臨界クエンチに対しては, フラストレーションによらず, 大規模スケールにおいて創発的な臨界パーコレーショントポロジーを示す。 The $\pm J$ Ising model is a simple frustrated spin model, where the exchange couplings independently take the discrete value $-J$ with probability $p$ and $+J$ with probability $1-p$. It is especially appealing due to its connection to quantum error correcting codes. Here, we investigate the nonequilibrium critical behavior of the bi-dimensional $\pm J$ Ising model, after a quench from different initial conditions to a critical point $T_c(p)$ on the paramagnetic-ferromagnetic (PF) transition line, especially, above, below and at the multicritical Nishimori point (NP). The dynamical critical exponent $z_c$ seems to exhibit non-universal behavior for quenches above and below the NP, which is identified as a pre-asymptotic feature due to the repulsive fixed point at the NP. Whereas, for a quench directly to the NP, the dynamics reaches the asymptotic regime with $z_c \simeq 6.02(6)$. We also consider the geometrical spin clusters (of like spin signs) during the critical dynamics. Each universality class on the PF line is uniquely characterized by the stochastic Loewner evolution (SLE) with corresponding parameter $\kappa$. Moreover, for the critical quenches from the paramagnetic phase, the model, irrespective of the frustration, exhibits an emergent critical percolation topology at the large length scales.	翻訳日:2023-04-26 23:12:37 公開日:2023-04-25
# MMC:テキスト記述を用いた画像のマルチモーダルカラー化 MMC: Multi-Modal Colorization of Images using Textual Descriptions ( http://arxiv.org/abs/2304.11993v2 ) ライセンス: Link先を確認	Subhankar Ghosh, Saumik Bhattacharya, Prasun Roy, Umapada Pal, and Michael Blumenstein	(参考訳) 異なる色でさまざまなオブジェクトを扱うことは、画像のカラー化技術にとって大きな課題である。したがって、複雑な現実世界のシーンでは、既存のカラー化アルゴリズムは色の一貫性を保たないことが多い。本研究では,カラー化されるグレースケール画像とともに,補助条件としてテキスト記述を統合することにより,カラー化プロセスの忠実性を向上させる。そこで我々は,2つの入力(grayscale imageと各エンコードされたテキスト記述)を取り込んで,関連する色成分の予測を試みるディープネットワークを提案する。また、画像内の各オブジェクトを予測し、それぞれの記述で色付けし、それぞれの属性を色化プロセスに組み込む。その後、融合モデルがすべての画像オブジェクト(セグメント)を融合して最終的な色付け画像を生成する。各テキスト記述には画像に存在するオブジェクトの色情報が含まれているため、テキストエンコーディングは予測された色の全体的な品質を改善するのに役立つ。提案手法は,LPIPS,PSNR,SSIMの指標を用いて,既存のカラー化手法よりも優れた性能を示す。 Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To do so, we have proposed a deep network that takes two inputs (grayscale image and the respective encoded text description) and tries to predict the relevant color components. Also, we have predicted each object in the image and have colorized them with their individual description to incorporate their specific attributes in the colorization process. After that, a fusion model fuses all the image objects (segments) to generate the final colorized image. As the respective textual descriptions contain color information of the objects present in the image, text encoding helps to improve the overall quality of predicted colors. In terms of performance, the proposed method outperforms existing colorization techniques in terms of LPIPS, PSNR and SSIM metrics.	翻訳日:2023-04-26 23:12:12 公開日:2023-04-25
# クディット・クリフォード階層におけるW状態回路のスケーリング Scaling W state circuits in the qudit Clifford hierarchy ( http://arxiv.org/abs/2304.12504v1 ) ライセンス: Link先を確認	Lia Yeh	(参考訳) 我々は$\sqrt[d]{Z}$ gate と呼ばれる新しいqudit gateを識別する。これはクリフォード階層の $d^{\text{th}}$ における任意の奇数素数次元 $d$ に対する qutrit $t$ ゲートの別の一般化である。このゲートはフォールトトレラントに実現可能であり、ある予想が成立するならば、qudit $\{ \|0\rangle , \|1\rangle \}$ subspace においてclifford+$\sqrt[d]{z}$ gate set, $d$-qubit $w$ states を決定論的に構成する。立方体の場合、決定論的かつフォールトトレラントな構成は、qubit $W$ サイズ3、T$ カウント3、6、パワー3に対して与えられる。さらに、これらの構成を適用して、$W$状態サイズを任意のサイズに再帰的にスケールし、$O(N)$ gate countと$O(\text{log }N)$ depthにします。これは任意のサイズ qubit $W$ state に対してより決定論的であり、任意の素数 $d$-dimensional qudit $W$ state に対して、サイズは$d$である。これらの目的のために、任意の素数のクディット次元における \|0\rangle $- controlled pauli $x$ ゲートと制御された hadamard ゲートの構成を考案する。これらの分解はクリフォード+$T$ for $d > 3$で正確な合成が知られていないが、独立な興味を持つ。 We identify a novel qudit gate which we call the $\sqrt[d]{Z}$ gate. This is an alternate generalization of the qutrit $T$ gate to any odd prime dimension $d$, in the $d^{\text{th}}$ level of the Clifford hierarchy. Using this gate which is efficiently realizable fault-tolerantly should a certain conjecture hold, we deterministically construct in the Clifford+$\sqrt[d]{Z}$ gate set, $d$-qubit $W$ states in the qudit $\{ \|0\rangle , \|1\rangle \}$ subspace. For qutrits, this gives deterministic and fault-tolerant constructions for the qubit $W$ state of sizes three with $T$ count 3, six, and powers of three. Furthermore, we adapt these constructions to recursively scale the $W$ state size to arbitrary size $N$, in $O(N)$ gate count and $O(\text{log }N)$ depth. This is moreover deterministic for any size qubit $W$ state, and for any prime $d$-dimensional qudit $W$ state, size a power of $d$. For these purposes, we devise constructions of the $ \|0\rangle $-controlled Pauli $X$ gate and the controlled Hadamard gate in any prime qudit dimension. These decompositions, for which exact synthesis is unknown in Clifford+$T$ for $d > 3$, may be of independent interest.	翻訳日:2023-04-26 22:29:14 公開日:2023-04-25
# CNN支援ステガノグラフィー-確立されたステガノグラフィー技術による機械学習の統合 CNN-Assisted Steganography -- Integrating Machine Learning with Established Steganographic Techniques ( http://arxiv.org/abs/2304.12503v1 ) ライセンス: Link先を確認	Andrew Havard, Theodore Manikas, Eric C. Larson, Mitchell A. Thornton	(参考訳) ステグアナリシスによってステゴメディアの発見にレジリエンスを増すことによりステガノグラフィを改善する方法を提案する。本手法は,steganographic assistant convolutional neural network (sa-cnn) の導入により,steganographic approachのクラスを強化する。従来の研究では、ステゴイメージングに適用されるステガナライザーとしてトレーニングされたニューラルネットワークを使用して、ステゴイメージ内に隠された情報の存在を発見することに成功した。以上の結果から, ステガナリザーは, ステゴイメージ発生時にSA-CNNを併用した場合, 効果が低いことが明らかとなった。我々はまた、連続的な空間ではなく、より小さく離散的な空間内でsa-cnnの可能な全てのアウトプットを表現する利点とデメリットを探求する。我々のSA-CNNは、情報を埋め込むカバーメディアの特性に基づいて、ある種のパラメトリックステガノグラフィーアルゴリズムをカスタマイズすることを可能にする。したがって、sa-cnnは、カバーメディアの特定のインスタンスごとにコアステガノグラフィーアルゴリズムを特に構成できるという意味で適応的である。 S-UNIWARD を用いたSA-CNN の使用, 使用の有無の両面での実験結果が得られた。次に、SA-CNNと非対応のステガナライザーであるYedroudj-Netに対して、両方のステガナライザーを合成し、その結果を比較した。ニューラルネットワークと手作りアルゴリズムの統合に対するこのアプローチは、ステガノグラフアルゴリズムの信頼性と適応性を増大させると考えている。 We propose a method to improve steganography by increasing the resilience of stego-media to discovery through steganalysis. Our approach enhances a class of steganographic approaches through the inclusion of a steganographic assistant convolutional neural network (SA-CNN). Previous research showed success in discovering the presence of hidden information within stego-images using trained neural networks as steganalyzers that are applied to stego-images. Our results show that such steganalyzers are less effective when SA-CNN is employed during the generation of a stego-image. We also explore the advantages and disadvantages of representing all the possible outputs of our SA-CNN within a smaller, discrete space, rather than a continuous space. Our SA-CNN enables certain classes of parametric steganographic algorithms to be customized based on characteristics of the cover media in which information is to be embedded. Thus, SA-CNN is adaptive in the sense that it enables the core steganographic algorithm to be especially configured for each particular instance of cover media. Experimental results are provided that employ a recent steganographic technique, S-UNIWARD, both with and without the use of SA-CNN. We then apply both sets of stego-images, those produced with and without SA-CNN, to an exmaple steganalyzer, Yedroudj-Net, and we compare the results. We believe that this approach for the integration of neural networks with hand-crafted algorithms increases the reliability and adaptability of steganographic algorithms.	翻訳日:2023-04-26 22:28:40 公開日:2023-04-25
# デジタル双生児のための因果意味コミュニケーション : 一般化可能な模倣学習アプローチ Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach ( http://arxiv.org/abs/2304.12502v1 ) ライセンス: Link先を確認	Christo Kurisummoottil Thomas, Walid Saad, Yong Xiao	(参考訳) デジタルツイン(dt)は、コミュニケーション(例えば6g)、コンピューティング(例えばエッジコンピューティング)、人工知能(ai)技術と共に、物理的世界の仮想表現を活用して、多くの接続されたインテリジェンスサービスを可能にする。ディジタルツイン(DT)に基づく大量のネットワークデータを扱うために、無線システムは、因果推論などのAI技術を活用して、厳密な通信制約下での情報意思決定を容易にするために意味コミュニケーション(SC)のパラダイムを利用することができる。本稿では,DTベースの無線システムに対して,因果意味通信(CSC)と呼ばれる新しいフレームワークを提案する。 CSCシステムは、DTを用いた最適なネットワーク制御ポリシーにアクセス可能な送信機が、帯域制限された無線チャネル上でSCを使用して、最適な制御アクションを実行するための知識を改善する方法を教える、模倣学習(IL)問題として提起される。ソースデータの因果構造は、エンド・ツー・エンド因果推論(deep end-to-end causal inference)の枠組みから新たなアプローチを用いて抽出され、因果的に不変である意味表現の作成を可能にする。受信機のCSCデコーダは、高いセマンティック信頼性を確保しつつセマンティック情報を抽出して推定するように設計されている。受信制御ポリシ、セマンティックデコーダ、因果推論は、変分推論フレームワーク内の二段階最適化問題として定式化される。この問題は、生成aiの世界モデルにインスパイアされたネットワーク状態モデルと呼ばれる新しい概念を用いて解決され、データ生成につながる環境ダイナミクスを忠実に表現する。シミュレーションの結果,提案したCSCシステムは,より優れたセマンティック信頼性とより少ないセマンティック表現を実現することにより,最先端のSCシステムよりも優れていた。 A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints by utilizing AI techniques such as causal reasoning. In this paper, a novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems. The CSC system is posed as an imitation learning (IL) problem, where the transmitter, with access to optimal network control policies using a DT, teaches the receiver using SC over a bandwidth limited wireless channel how to improve its knowledge to perform optimal control actions. The causal structure in the source data is extracted using novel approaches from the framework of deep end-to-end causal inference, thereby enabling the creation of a semantic representation that is causally invariant, which in turn helps generalize the learned knowledge of the system to unseen scenarios. The CSC decoder at the receiver is designed to extract and estimate semantic information while ensuring high semantic reliability. The receiver control policies, semantic decoder, and causal inference are formulated as a bi-level optimization problem within a variational inference framework. This problem is solved using a novel concept called network state models, inspired from world models in generative AI, that faithfully represents the environment dynamics leading to data generation. Simulation results demonstrate that the proposed CSC system outperforms state-of-the-art SC systems by achieving better semantic reliability and reduced semantic representation.	翻訳日:2023-04-26 22:28:15 公開日:2023-04-25
# 量子ニューラルネットワークとテンソルネットワークを用いた断面ストックリターン予測 The cross-sectional stock return predictions via quantum neural network and tensor network ( http://arxiv.org/abs/2304.12501v1 ) ライセンス: Link先を確認	Nozomu Kobayashi, Yoshiyuki Suimon, Koichi Miyamoto, Kosuke Mitarai	(参考訳) 本稿では,量子および量子に触発された機械学習アルゴリズムのストックリターン予測への応用について検討する。具体的には,ノイズの多い中間スケール量子コンピュータに適したアルゴリズムであるquantum neural networkと,線形回帰やニューラルネットワークなどの古典モデルに対する量子学習アルゴリズムであるtensor networkの性能を評価する。それらの能力を評価するため、予測に基づいてポートフォリオを構築し、投資実績を測定する。日本の株式市場における実証研究によれば、テンソルネットワークモデルは、線形およびニューラルネットワークモデルを含む従来のベンチマークモデルよりも優れた性能を達成している。量子ニューラルネットワークモデルは、古典的ニューラルネットワークモデルよりもずっと低いリスク調整過剰リターンを達成するが、量子ニューラルネットワークとテンソルネットワークモデルの両方が、最新の市場環境において優れたパフォーマンスを示し、入力特徴間の非線形性を捉える能力を示している。 In this paper we investigate the application of quantum and quantum-inspired machine learning algorithms to stock return predictions. Specifically, we evaluate performance of quantum neural network, an algorithm suited for noisy intermediate-scale quantum computers, and tensor network, a quantum-inspired machine learning algorithm, against classical models such as linear regression and neural networks. To evaluate their abilities, we construct portfolios based on their predictions and measure investment performances. The empirical study on the Japanese stock market shows the tensor network model achieves superior performance compared to classical benchmark models, including linear and neural network models. Though the quantum neural network model attains the lowered risk-adjusted excess return than the classical neural network models over the whole period, both the quantum neural network and tensor network models have superior performances in the latest market environment, which suggests capability of model's capturing non-linearity between input features.	翻訳日:2023-04-26 22:27:46 公開日:2023-04-25
# パッチ拡散:拡散モデルの高速化とデータ効率の向上 Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models ( http://arxiv.org/abs/2304.12526v1 ) ライセンス: Link先を確認	Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou	(参考訳) 拡散モデルは強力ですが、トレーニングには多くの時間とデータが必要です。汎用的なパッチ指向トレーニングフレームワークであるパッチ拡散(Patch Diffusion)を提案し,データ効率を改善しながらトレーニング時間を大幅に削減し,より広範なユーザへの拡散モデルトレーニングの民主化を支援する。私たちのイノベーションの核心は、パッチレベルの新しい条件スコア関数で、元のイメージのパッチ位置を追加の座標チャネルとして含み、一方、パッチサイズはトレーニング中にランダム化され、多様化され、複数のスケールでクロスリージョン依存関係をエンコードする。本手法によるサンプリングは元の拡散モデルと同じくらい簡単である。 Patch Diffusionを通じて、同等またはより良い世代品質を維持しながら、より高速なトレーニングを実現することができます。一方、パッチ拡散は比較的小さなデータセット(例えば$$)で訓練された拡散モデルの性能を、ゼロからトレーニングするために5000イメージまで改善する。我々はCelebA-64$\times$64で1.77、AFHQv2-Wild-64$\times$64で1.93を達成する。コードとトレーニング済みのモデルを近々共有する予定です。 Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.	翻訳日:2023-04-26 22:19:09 公開日:2023-04-25
# CIMLA:微分因果関係の推論のための解釈可能なAI CIMLA: Interpretable AI for inference of differential causal networks ( http://arxiv.org/abs/2304.12523v1 ) ライセンス: Link先を確認	Payam Dibaeinia, Saurabh Sinha	(参考訳) 高次元データからの因果関係の発見は、バイオインフォマティクスにおける大きな問題である。機械学習と特徴帰属モデルは、この文脈で大きな期待を示してきたが、因果解釈は欠如している。本稿では,ある変数が他の変数に与える影響を反映した因果量を,ある仮定の下で推定する特徴帰属モデルを示す。我々はこの知見を利用して、因果関係の条件依存的な変化を発見するための新しいツールCIMLAを実装した。 CIMLAを用いて生物学的条件間の遺伝子制御ネットワークの差異を同定し,近年注目されている。シミュレーションデータセットの広範なベンチマークを用いて、CIMLAは変数の分離に頑健であり、先行手法よりも精度が高いことを示す。最後に、我々はCIMLAを用いて、アルツハイマー病(AD)の患者から収集した単細胞RNA-seqデータセットを分析し、ADのいくつかの潜在的な調節因子を発見した。 The discovery of causal relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model estimates a causal quantity reflecting the influence of one variable on another, under certain assumptions. We leverage this insight to implement a new tool, CIMLA, for discovering condition-dependent changes in causal relationships. We then use CIMLA to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data sets, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Finally, we employ CIMLA to analyze a previously published single-cell RNA-seq data set collected from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.	翻訳日:2023-04-26 22:18:47 公開日:2023-04-25
# ロバストな位相検索のための適応的停止条件を持つ新しい近近線形アルゴリズム A New Inexact Proximal Linear Algorithm with Adaptive Stopping Criteria for Robust Phase Retrieval ( http://arxiv.org/abs/2304.12522v1 ) ライセンス: Link先を確認	Zhong Zheng, Shiqian Ma, and Lingzhou Xue	(参考訳) 本稿では,非平滑かつ非凸最適化問題であるロバスト位相探索問題を考察する。サブプロブレムを不正確に解いた不正確な近位線形アルゴリズムを提案する。我々の貢献はサブプロブレムに対する2つの適応的停止基準である。提案手法の収束挙動を解析した。合成データと実データの両方について実験を行い,本手法が従来の近位線形アルゴリズムや劣勾配法よりも効率的であることを実証した。 This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.	翻訳日:2023-04-26 22:18:27 公開日:2023-04-25
# hint-aug: ファウンデーションビジョントランスフォーマーからのヒントをブーストされたマイナショットパラメーター効率のチューニングへ Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning ( http://arxiv.org/abs/2304.12520v1 ) ライセンス: Link先を確認	Zhongzhi Yu, Shang Wu, Yonggan Fu, Shunyao Zhang, Yingyan (Celine) Lin	(参考訳) 下流タスクにおけるファンデーション・ビジョン・トランスフォーマー(FViT)のチューニング需要が増大しているにもかかわらず、データ制限シナリオ(例:数ショットチューニング)下でのFViTのポテンシャルを完全に解放することは、FViTsのデータハングリーの性質のため、依然として課題である。一般的なデータ拡張技術はこの文脈では、わずかなチューニングデータに含まれる機能に制限があるため、不足している。事前学習されたFViT自身は、広く使われているパラメータ効率のチューニングで完全に保存されている大規模事前学習データから、非常に代表的な特徴をすでに習得している。そこで我々は、これらの学習機能を活用してチューニングデータを増強することで、FViTチューニングの有効性を高めることができると仮定した。そこで,本研究では,事前学習したfvitsの学習機能を用いて,サンプルの過剰に適合した部分の強化を行い,少数音調律におけるfvitの強化を目的とした,ヒントベースデータ拡張(hint-aug)というフレームワークを提案する。特に、Hint-Augは、2つの重要なイネーブルを統合している: 1) ファンデーションViTの過信パッチを検出するための注意深い過剰適合検知器(AOD)、(2) コンフュージョンベースの特徴注入(CFI)モジュールは、事前訓練されたFViTから上記AODが検出した過信パッチを注入し、チューニング中の特徴の多様性を高める。 5つのデータセットと3つのパラメータ効率のチューニング技術に関する大規模な実験とアブレーション研究は、Hint-Augの有効性を一貫して検証している。例えば、Petデータセットでは、Hint-AugはSOTAデータ拡張メソッドよりも50%少ないトレーニングデータで2.22%高い精度を達成する。 Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs' potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug's effectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.	翻訳日:2023-04-26 22:18:19 公開日:2023-04-25
# RenderDiffusion: 画像生成としてのテキスト生成 RenderDiffusion: Text Generation as Image Generation ( http://arxiv.org/abs/2304.12519v1 ) ライセンス: Link先を確認	Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen	(参考訳) 拡散モデルはテキスト生成の新しい生成パラダイムとなっている。本稿では,テキストの個別な分類的性質を考慮し,テキスト誘導画像生成によるテキスト生成のための新しい拡散手法である「textsc{RenderDiffusion}」を提案する。私たちのキーとなるアイデアは、ターゲットのテキストを視覚言語コンテンツを含む \emph{glyph image} としてレンダリングすることです。このように、条件付きテキスト生成をグリフ画像生成タスクとしてキャストすることができ、離散的なテキストに連続拡散モデルを適用するのは自然である。特に,入力テキストで条件付けされた高忠実度グリフ画像を生成するために,カスケードされたアーキテクチャ(ベースと超解像拡散モデル)を利用する。さらに,生成されたグリフ画像から視覚言語コンテンツを最終的なテキストに変換するために,テキスト接地モジュールを設計した。 4つの条件付きテキスト生成タスクと2種類のメトリクス(品質と多様性)に対する実験では、事前訓練された言語モデルを含むいくつかのベースラインよりも同等またはそれ以上の結果が得られる。また,最近の拡散モデルと比較して大きな改善がみられた。 Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose \textsc{RenderDiffusion}, a novel diffusion approach for text generation via text-guided image generation. Our key idea is to render the target text as a \emph{glyph image} containing visual language content. In this way, conditional text generation can be cast as a glyph image generation task, and it is then natural to apply continuous diffusion models to discrete texts. Specially, we utilize a cascaded architecture (\ie a base and a super-resolution diffusion model) to generate high-fidelity glyph images, conditioned on the input text. Furthermore, we design a text grounding module to transform and refine the visual language content from generated glyph images into the final texts. In experiments over four conditional text generation tasks and two classes of metrics (\ie quality and diversity), \textsc{RenderDiffusion} can achieve comparable or even better results than several baselines, including pretrained language models. Our model also makes significant improvements compared to the recent diffusion model.	翻訳日:2023-04-26 22:17:38 公開日:2023-04-25
# IMUPoser:電話・時計・イヤホンにおけるIMUを用いたフルボディポーズ推定 IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds ( http://arxiv.org/abs/2304.12518v1 ) ライセンス: Link先を確認	Vimal Mollyn, Riku Arakawa, Mayank Goel, Chris Harrison, Karan Ahuja	(参考訳) 体の動きの追跡は、フィットネス、モバイルゲーム、コンテキスト対応バーチャルアシスタント、リハビリに強力な用途を持つ可能性がある。しかし、ユーザーはこの目的を達成するために特別なスーツやセンサーアレイを装着する可能性は低い。代わりに、多くのユーザーが所有しているスマートフォン、スマートウォッチ、イヤホンなどのデバイスで既にIMUを使って身体のポーズを推定できる可能性を探る。このアプローチには、低価格のコモディティimusからのノイズデータや、ユーザ本体の計測点数がばらばらで流動的であることなど、いくつかの課題がある。私たちのパイプラインは、利用可能なIMUデータのサブセットを受け取ります。このモデルを評価するために、我々は、さまざまなアクティビティコンテキストにわたって、市販の消費者デバイスを装着または保持する10人の参加者から収集したimmposerデータセットを作成した。 IMUデータセットと既存のデータセットの両方でベンチマークを行い、システムの包括的な評価を行う。 Tracking body pose on-the-go could have powerful uses in fitness, mobile gaming, context-aware virtual assistants, and rehabilitation. However, users are unlikely to buy and wear special suits or sensor arrays to achieve this end. Instead, in this work, we explore the feasibility of estimating body pose using IMUs already in devices that many users own -- namely smartphones, smartwatches, and earbuds. This approach has several challenges, including noisy data from low-cost commodity IMUs, and the fact that the number of instrumentation points on a users body is both sparse and in flux. Our pipeline receives whatever subset of IMU data is available, potentially from just a single device, and produces a best-guess pose. To evaluate our model, we created the IMUPoser Dataset, collected from 10 participants wearing or holding off-the-shelf consumer devices and across a variety of activity contexts. We provide a comprehensive evaluation of our system, benchmarking it on both our own and existing IMU datasets.	翻訳日:2023-04-26 22:17:18 公開日:2023-04-25
# 大規模言語モデルによる意味圧縮 Semantic Compression With Large Language Models ( http://arxiv.org/abs/2304.12512v1 ) ライセンス: Link先を確認	Henry Gilbert, Michael Sandborn, Douglas C. Schmidt, Jesse Spencer-Smith, Jules White	(参考訳) 大規模言語モデル(LLM)の台頭は、情報検索、質問応答、要約、コード生成タスクに革命をもたらしている。しかしながら、事実的に不正確な情報を時折提示すること(「幻覚」と呼ばれる)に加えて、llmは本質的に一度に処理できる入出力トークンの数によって制限されるため、大きなセットや連続的な情報ストリームを処理するタスクでは効果が低下する可能性がある。データのサイズを減らす一般的なアプローチは、ロスレス圧縮またはロスレス圧縮である。しかし、いくつかのケースでは、必要な意味的精度や意図が伝達される限り、元のデータからすべての詳細を完全回復する必要はないかもしれない。本稿では,LLMの研究への3つの貢献について述べる。まず, GPT-3.5 と GPT-4 を ChatGPT インタフェースを用いて, LLM を用いた近似圧縮の実現可能性について検討した。第2に,LLMがテキストやコードを圧縮し,プロンプトの圧縮表現をリコールし,操作する能力について検討し,定量化する。第3に,本研究では,LLMによって圧縮されたテキストと非圧縮されたテキスト間の保存意図のレベルを定量化する2つの新しい指標,ERE(Exact Reconstructive Effectiveness)とSRE(Semantic Reconstructive Effectiveness)を提案する。我々の最初の結果は、GPT-4がテキストのセマンティックな意味を保ちながら、テキストを効果的に圧縮して再構築できることを示し、現在の制限よりも$\sim$5$\times$多くのトークンを活用するための道を提供する。 The rise of large language models (LLMs) is revolutionizing information retrieval, question answering, summarization, and code generation tasks. However, in addition to confidently presenting factually inaccurate information at times (known as "hallucinations"), LLMs are also inherently limited by the number of input and output tokens that can be processed at once, making them potentially less effective on tasks that require processing a large set or continuous stream of information. A common approach to reducing the size of data is through lossless or lossy compression. Yet, in some cases it may not be strictly necessary to perfectly recover every detail from the original data, as long as a requisite level of semantic precision or intent is conveyed. This paper presents three contributions to research on LLMs. First, we present the results from experiments exploring the viability of approximate compression using LLMs, focusing specifically on GPT-3.5 and GPT-4 via ChatGPT interfaces. Second, we investigate and quantify the capability of LLMs to compress text and code, as well as to recall and manipulate compressed representations of prompts. Third, we present two novel metrics -- Exact Reconstructive Effectiveness (ERE) and Semantic Reconstruction Effectiveness (SRE) -- that quantify the level of preserved intent between text compressed and decompressed by the LLMs we studied. Our initial results indicate that GPT-4 can effectively compress and reconstruct text while preserving the semantic essence of the original text, providing a path to leverage $\sim$5$\times$ more tokens than present limits allow.	翻訳日:2023-04-26 22:16:59 公開日:2023-04-25
# モデルフリー強化学習による形式的仕様の実現 Fulfilling Formal Specifications ASAP by Model-free Reinforcement Learning ( http://arxiv.org/abs/2304.12508v1 ) ライセンス: Link先を確認	Mengyu Liu and Pengyuan Lu and Xin Chen and Fanxin Kong and Oleg Sokolsky and Insup Lee	(参考訳) モデルレス強化学習ソリューション,すなわちASAP-Phiフレームワークを提案し,エージェントがASAPの正式な仕様を満たすことを奨励する。このフレームワークは、仕様を満たさないトレースに定量的なセマンティック報酬を割り当てるピースワイズ報酬関数と、残りに対して高い一定報酬を付与する。次に、soft actor-critic(sac)やdeep deterministic policy gradient(ddpg)などのアクタ-クリティックベースのアルゴリズムでエージェントを訓練する。さらに、ASAP-Phiは仕様の達成を優先するポリシーを生成する。最先端ベンチマークに関するアブレーション研究を含む広範な実験が行われている。その結果,97\%のテストケースで十分な速度のトラジェクタを見つけ出すことができ,ベースラインを打ち破ることができた。 We propose a model-free reinforcement learning solution, namely the ASAP-Phi framework, to encourage an agent to fulfill a formal specification ASAP. The framework leverages a piece-wise reward function that assigns quantitative semantic reward to traces not satisfying the specification, and a high constant reward to the remaining. Then, it trains an agent with an actor-critic-based algorithm, such as soft actor-critic (SAC), or deep deterministic policy gradient (DDPG). Moreover, we prove that ASAP-Phi produces policies that prioritize fulfilling a specification ASAP. Extensive experiments are run, including ablation studies, on state-of-the-art benchmarks. Results show that our framework succeeds in finding sufficiently fast trajectories for up to 97\% test cases and defeats baselines.	翻訳日:2023-04-26 22:16:32 公開日:2023-04-25
# 加速度MRIのためのタスク特化戦略の学習 Learning Task-Specific Strategies for Accelerated MRI ( http://arxiv.org/abs/2304.12507v1 ) ライセンス: Link先を確認	Zihui Wu, Tianwei Yin, Yu Sun, Robert Frost, Andre van der Kouwe, Adrian V. Dalca, Katherine L. Bouman	(参考訳) 圧縮型磁気共鳴イメージング(CS-MRI)は、診断タスクのためのサブサンプル計測から視覚情報を回復しようとする。従来のCS-MRI法は、計測サブサンプリング、画像再構成、タスク予測を別々に扱うことが多く、結果として準最適エンドツーエンドのパフォーマンスが得られる。本研究では,特定のタスクに適したCS-MRIシステムを設計するための統合フレームワークとしてTACKLEを提案する。最近の共同設計技術を活用して、TACKLEはサブサンプリング、再構築、予測戦略を共同で最適化し、下流タスクのパフォーマンスを向上させる。提案手法は従来のCS-MRI法よりも様々なタスクの性能向上を実現している。また、トレーニングデータから異なる取得設定を用いて新しいデータセットを実験的に収集することにより、TACKLEの一般化能力を評価する。さらなる微調整がなければ、TACKLEは堅牢に機能し、数値と視覚の両方の改善につながる。 Compressed sensing magnetic resonance imaging (CS-MRI) seeks to recover visual information from subsampled measurements for diagnostic tasks. Traditional CS-MRI methods often separately address measurement subsampling, image reconstruction, and task prediction, resulting in suboptimal end-to-end performance. In this work, we propose TACKLE as a unified framework for designing CS-MRI systems tailored to specific tasks. Leveraging recent co-design techniques, TACKLE jointly optimizes subsampling, reconstruction, and prediction strategies to enhance the performance on the downstream task. Our results on multiple public MRI datasets show that the proposed framework achieves improved performance on various tasks over traditional CS-MRI methods. We also evaluate the generalization ability of TACKLE by experimentally collecting a new dataset using different acquisition setups from the training data. Without additional fine-tuning, TACKLE functions robustly and leads to both numerical and visual improvements.	翻訳日:2023-04-26 22:16:20 公開日:2023-04-25
# 一般化ベイズ加法回帰木に対する後方濃度の理論 Theory of Posterior Concentration for Generalized Bayesian Additive Regression Trees ( http://arxiv.org/abs/2304.12505v1 ) ライセンス: Link先を確認	Enakshi Saha	(参考訳) Bayesian Additive Regression Trees (BART) は非線形回帰関数をモデル化するための強力な半パラメトリックアンサンブル学習技術である。当初、BARTは連続した応答変数とバイナリな応答変数のみを予測するために提案されていたが、長年にわたって、様々なアプリケーション領域においてより広範な応答変数(例えばカテゴリやカウントデータ)を推定するのに適する複数の拡張が出現してきた。本稿では、ベイズ木に対する一般化された枠組みと、応答変数が指数関数的なファミリー分布から来る付加的なアンサンブルについて述べる。応答分布について十分な条件を導出し, 後部が最小マックスで集中する条件を対数係数まで導出する。本稿では,BARTとその変種を実証的に成功させる理論的根拠を提供する。 Bayesian Additive Regression Trees (BART) are a powerful semiparametric ensemble learning technique for modeling nonlinear regression functions. Although initially BART was proposed for predicting only continuous and binary response variables, over the years multiple extensions have emerged that are suitable for estimating a wider class of response variables (e.g. categorical and count data) in a multitude of application areas. In this paper we describe a Generalized framework for Bayesian trees and their additive ensembles where the response variable comes from an exponential family distribution and hence encompasses a majority of these variants of BART. We derive sufficient conditions on the response distribution, under which the posterior concentrates at a minimax rate, up to a logarithmic factor. In this regard our results provide theoretical justification for the empirical success of BART and its variants.	翻訳日:2023-04-26 22:16:04 公開日:2023-04-25
# オブジェクトセマンティクスは私たちが必要とする深さを与える:空中深度補完へのマルチタスクアプローチ Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion ( http://arxiv.org/abs/2304.12542v1 ) ライセンス: Link先を確認	Sara Hatami Gazani, Fardad Dadboud, Miodrag Bolic, Iraj Mantegh, Homayoun Najjaran	(参考訳) 深度完了と物体検出は、しばしば空中3Dマッピング、経路計画、無人航空機(UAV)の衝突回避に使用される2つの重要なタスクである。一般的な解決策としては、LiDARセンサーによる測定があるが、生成された点雲はスパースで不規則であり、3Dレンダリングと安全クリティカルな意思決定におけるシステムの能力を制限していることが多い。この課題を軽減するために、UAV上の他のセンサー(オブジェクト検出に使用されるカメラ)からの情報を利用して、深度補正プロセスがより高密度な3Dモデルを生成するのに役立つ。 2つのセンサーからのデータを融合させながら、空中深度補完と物体検出の両方を実行することは、資源効率に課題をもたらす。本稿では,2つのタスクをひとつのパスで共同実行するための新しいアプローチを提案する。提案手法は,2つのタスクを共同学習機能に公開するエンコーダに着目したマルチタスク学習モデルに基づく。物体検出経路によって学習されたシーンにおける物体の意味的期待が、不足した深さ値を置きながら深さ完了経路の性能をいかに高めるかを示す。実験の結果,提案するマルチタスクネットワークは,特に欠陥入力に対して,シングルタスクネットワークよりも優れていることがわかった。 Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.	翻訳日:2023-04-26 22:10:32 公開日:2023-04-25
# 物理インフォームドインバータブルニューラルネットワークを用いた逆問題に対する効率的なベイズ推論 Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems ( http://arxiv.org/abs/2304.12541v1 ) ライセンス: Link先を確認	Xiaofei Guan, Xintong Wang, Hao Wu	(参考訳) 本稿では,物理インバータブルニューラルネットワーク (pi-inn) を用いたベイズ逆問題に対する新しい解法を提案する。 PI-INNのアーキテクチャは、可逆ニューラルネットワーク(INN)とニューラルネットワーク(NB-Net)の2つのサブネットワークで構成されている。 NB-Netの助けを借りてパラメトリック入力とIPN出力の間の可逆写像を構築し、後方分布の抽出可能な推定を行い、効率的なサンプリングと精度の高い密度評価を可能にする。さらに、PI-INNの損失関数は、残基物理インフォームド損失項と、新しい独立損失項の2つの成分を含む。提案する独立損失項は、推定密度関数を有効利用することにより、ランダム潜在変数をガウス化し、inn出力の2つの部分間の統計的独立性を確保することができる。逆運動学, 1-d, 2-d拡散方程式の逆問題, 地震時トモグラフィなど, 提案したPI-INNの効率と精度を示す数値実験を行った。 In the paper, we propose a novel approach for solving Bayesian inverse problems with physics-informed invertible neural networks (PI-INN). The architecture of PI-INN consists of two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The invertible map between the parametric input and the INN output with the aid of NB-Net is constructed to provide a tractable estimation of the posterior distribution, which enables efficient sampling and accurate density evaluation. Furthermore, the loss function of PI-INN includes two components: a residual-based physics-informed loss term and a new independence loss term. The presented independence loss term can Gaussianize the random latent variables and ensure statistical independence between two parts of INN output by effectively utilizing the estimated density function. Several numerical experiments are presented to demonstrate the efficiency and accuracy of the proposed PI-INN, including inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations, and seismic traveltime tomography.	翻訳日:2023-04-26 22:10:11 公開日:2023-04-25
# 敵対的ネットワーク摂動下における意見制御--stackelbergゲームアプローチ Opinion Control under Adversarial Network Perturbation: A Stackelberg Game Approach ( http://arxiv.org/abs/2304.12540v1 ) ライセンス: Link先を確認	Yuejiang Li, Zhanjiang Chen, H. Vicky Zhao	(参考訳) 新たなソーシャルネットワークプラットフォームによって、ユーザーは自分の意見を共有したり、他人と意見を交換したりできる。しかし、悪質なユーザーが故意に極端な意見や噂、誤った情報を他人に広める、敵対的なネットワークの混乱は、ソーシャルネットワークにおいてユビキタスである。このような敵対的ネットワークの摂動は、世論の形成に大きな影響を与え、我々の社会を脅かす。したがって、敵ネットワーク摂動の影響を研究・制御することが重要である。学界と業界の両方で、世論のダイナミクスをガイドし、制御するために多大な努力がなされてきたが、これらの研究の多くは、ネットワークが静的であり、そのような敵のネットワーク摂動を無視していると仮定している。本研究は,Friedkin-Johnsen意見力学モデルに基づいて,敵対的ネットワーク摂動をモデル化し,そのネットワークの意見への影響を分析する。そして, 敵の視点から, その最適ネットワーク摂動を解析し, ネットワークの意見を最大に変化させる。次に,ネットワークディフェンダーの観点から,stackelbergゲームを定式化し,そのような敵対的ネットワーク摂動下でもネットワークの意見を制御することを目的とする。定式化されたstackelbergゲームを解くために,計画的サブグレードエントアルゴリズムを考案する。実社会ネットワークにおける大規模シミュレーションは, 敵ネットワーク摂動の影響と, 提案した意見制御アルゴリズムの有効性を検証した。 The emerging social network platforms enable users to share their own opinions, as well as to exchange opinions with others. However, adversarial network perturbation, where malicious users intentionally spread their extreme opinions, rumors, and misinformation to others, is ubiquitous in social networks. Such adversarial network perturbation greatly influences the opinion formation of the public and threatens our societies. Thus, it is critical to study and control the influence of adversarial network perturbation. Although tremendous efforts have been made in both academia and industry to guide and control the public opinion dynamics, most of these works assume that the network is static, and ignore such adversarial network perturbation. In this work, based on the well-accepted Friedkin-Johnsen opinion dynamics model, we model the adversarial network perturbation and analyze its impact on the networks' opinion. Then, from the adversary's perspective, we analyze its optimal network perturbation, which maximally changes the network's opinion. Next, from the network defender's perspective, we formulate a Stackelberg game and aim to control the network's opinion even under such adversarial network perturbation. We devise a projected subgradient algorithm to solve the formulated Stackelberg game. Extensive simulations on real social networks validate our analysis of the adversarial network perturbation's influence and the effectiveness of the proposed opinion control algorithm.	翻訳日:2023-04-26 22:09:51 公開日:2023-04-25
# 空間制約付きテキスト誘導眼鏡操作 Text-guided Eyeglasses Manipulation with Spatial Constraints ( http://arxiv.org/abs/2304.12539v1 ) ライセンス: Link先を確認	Jiacheng Wang, Ping Liu, Jingen Liu, Wei Xu	(参考訳) メガネのバーチャル試着には、異なる形状とスタイルの眼鏡を物理的に試すことなく、顔画像に配置する。既存の方法は印象的な結果を示しているが、様々な眼鏡のスタイルは限られており、相互作用は常に直感的あるいは効率的であるとは限らない。そこで本稿では,これらの制約に対処するために,バイナリマスクとテキストに基づく眼鏡形状とスタイルをそれぞれ制御可能な眼鏡操作方式を提案する。具体的には,マスク条件を抽出するマスクエンコーダと,テキストとマスク条件を同時に注入可能な変調モジュールを提案する。この設計により、テクスト記述と空間制約の両方に基づいて眼鏡の外観を細かく制御することができる。提案手法は,無関係な領域を保存し,局所的な編集を向上する疎結合マッパーと疎結合戦略を含む。様々なモーダリティ条件の異なる収束速度を扱うために2段階のトレーニングスキームを用い,眼鏡の形状とスタイルの両方をうまく制御した。広範な比較実験とアブレーション分析により,無関係領域を保ちながら多様な眼鏡スタイルを実現するためのアプローチの有効性が示された。 Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on. While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient. To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively. Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions. This design allows for fine-grained control of the eyeglasses' appearance based on both textual descriptions and spatial constraints. Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing. We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses. Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.	翻訳日:2023-04-26 22:09:27 公開日:2023-04-25
# GARCIA:多粒性コントラスト学習を用いたロングテールクエリの表現 GARCIA: Powering Representations of Long-tail Query with Multi-granularity Contrastive Learning ( http://arxiv.org/abs/2304.12537v1 ) ライセンス: Link先を確認	Weifan Wang, Binbin Hu, Zhicheng Peng, Mingjie Zhong, Zhiqiang Zhang, Zhongyi Liu, Guannan Zhang, Jun Zhou	(参考訳) 近年、サービスプラットフォームの成長は、ユーザと商店双方にとって大きな便宜をもたらし、サービス検索エンジンは、テキストクエリによる望ましい結果の迅速な取得により、ユーザエクスペリエンスの向上に重要な役割を担っている。残念ながら、ユーザーの制御不能な検索習慣は、通常大量のロングテールクエリを持ち込み、検索モデルの能力を著しく脅かす。最近出現しているグラフニューラルネットワーク(GNN)とコントラスト学習(CL)に触発されて、ロングテール問題を緩和し、かなりのパフォーマンスを達成するために、いくつかの取り組みが行われた。それでも、いくつかの大きな弱点に直面している。最も重要なことは、彼らは効果的な知識伝達のために頭と尾の間の文脈構造を明示的に利用せず、より一般化された表現に対して意図レベル情報は一般的に無視されることである。そこで本研究では,グラフに基づく知識伝達と意図に基づく表現一般化を対比的に活用する新しい枠組み garcia を開発した。特に、適応エンコーダを用いて、クエリやサービスの情報表現と、意図の階層構造を考慮した表現を生成する。テールクエリとサービスを完全に理解するために,我々は,知識伝達,構造拡張,意図の一般化を通じて表現を駆動する,新しい多粒性コントラスト学習モジュールをGARCIAに装備する。その後、完全なガルシアは事前訓練と微調整の方法でよく訓練される。最後に、オフライン環境とオンライン環境の両方で広範な実験を行い、サービス検索シナリオにおけるテールクエリの改善と全体的なパフォーマンスに関するGARCIAの優れた能力を示す。 Recently, the growth of service platforms brings great convenience to both users and merchants, where the service search engine plays a vital role in improving the user experience by quickly obtaining desirable results via textual queries. Unfortunately, users' uncontrollable search customs usually bring vast amounts of long-tail queries, which severely threaten the capability of search models. Inspired by recently emerging graph neural networks (GNNs) and contrastive learning (CL), several efforts have been made in alleviating the long-tail issue and achieve considerable performance. Nevertheless, they still face a few major weaknesses. Most importantly, they do not explicitly utilize the contextual structure between heads and tails for effective knowledge transfer, and intention-level information is commonly ignored for more generalized representations. To this end, we develop a novel framework GARCIA, which exploits the graph based knowledge transfer and intention based representation generalization in a contrastive setting. In particular, we employ an adaptive encoder to produce informative representations for queries and services, as well as hierarchical structure aware representations of intentions. To fully understand tail queries and services, we equip GARCIA with a novel multi-granularity contrastive learning module, which powers representations through knowledge transfer, structure enhancement and intention generalization. Subsequently, the complete GARCIA is well trained in a pre-training&fine-tuning manner. At last, we conduct extensive experiments on both offline and online environments, which demonstrates the superior capability of GARCIA in improving tail queries and overall performance in service search scenarios.	翻訳日:2023-04-26 22:09:09 公開日:2023-04-25
# ラテント分類器誘導による合成視覚生成の探索 Exploring Compositional Visual Generation with Latent Classifier Guidance ( http://arxiv.org/abs/2304.12536v1 ) ライセンス: Link先を確認	Changhao Shi, Haomiao Ni, Kai Li, Shaobo Han, Mingfu Liang, Martin Renqiang Min	(参考訳) 拡散確率モデルは画像生成と操作の分野で大きな成功を収めている。本稿では,合成視覚タスクの潜在意味空間における拡散モデルと分類器指導を用いた新しいパラダイムについて検討する。リニアファッション。具体的には、有意味な潜在空間を持つ任意の事前学習された生成モデルに対して、潜在拡散モデルと補助潜在分類器を訓練し、潜在表現生成の非線形ナビゲーションを容易にする。潜在分類器指導による条件付き生成は,訓練中の条件付きログ確率の下限を最大化する。操作中に元のセマンティクスを維持するために,合成性を達成する上で重要な新しい指導用語を導入する。さらなる仮定により、非線形演算は単純な潜在算術的アプローチに還元されることを示す。潜在分類器指導に基づくこのパラダイムは,事前学習した生成モデルと無関係であり,実画像および合成画像の逐次操作と画像生成における競合結果を示す。以上の結果から,潜在型分類法は,他の強力な競合手法が存在する場合でも,さらなる探索に役立つ有望なアプローチであることが示唆された。 Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. linear fashion. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.	翻訳日:2023-04-26 22:08:42 公開日:2023-04-25
# Img2Vec:Token-Diversityの教師がマスクオートエンコーダーを支援 Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders ( http://arxiv.org/abs/2304.12535v1 ) ライセンス: Link先を確認	Heng Pan, Chenyang Liu, Wenxiao Wang, Li Yuan, Hongfa Wang, Zhifeng Li, Wei Liu	(参考訳) 本稿では,マスク画像モデリング(mim)のための画像からベクトルへのパイプライン(img2vec)を提案する。学習対象としてmimにどのような深い特徴が適しているかを検討するため,我々は,学習対象として画像から特徴ベクトルに変換するための訓練された自己教師モデルを用いた簡易mimフレームワークを提案し,その特徴抽出器を教師モデルとしても知られている。驚くべきことに、MIMモデルは、Transformerベースのモデル(例えば、ViT-Large、307M)のような面倒な教師によるものよりも、より軽いモデル(例えば、ResNet-50、26M)によって生成される画像特徴の恩恵を経験的に見出した。この注目すべき現象を分析するために,新しい特徴であるトークン多様性を考案し,異なるモデルから生成した特徴の特性を評価する。トークンの多様性は、異なるトークン間の特徴差を測定する。広範な実験と可視化を通じて,大規模モデルがmimを改善できるという認識を超えて,教師モデルの高いトークン多様性も重要であると仮定する。以上の議論に基づき、Img2Vecは高いトークン多様性を持つ教師モデルを採用し、画像特徴を生成する。 Img2VecはImageNetの未ラベルデータにViT-Bで事前トレーニングされた。さらに、大型モデル、ViT-L と ViT-H で Img2Vec をスケールアップし、それぞれ 86.7\% と 87.5\% の精度を得る。また、COCOでは51.8\% mAP、ADE20Kでは50.7\% mIoUなど、他の下流タスクでは最先端の結果も達成している。 Img2Vecは、MIM学習を深く特徴付けるのに適した、シンプルで効果的なフレームワークである。 We present a pipeline of Image to Vector (Img2Vec) for masked image modeling (MIM) with deep features. To study which type of deep features is appropriate for MIM as a learning target, we propose a simple MIM framework with serials of well-trained self-supervised models to convert an Image to a feature Vector as the learning target of MIM, where the feature extractor is also known as a teacher model. Surprisingly, we empirically find that an MIM model benefits more from image features generated by some lighter models (e.g., ResNet-50, 26M) than from those by a cumbersome teacher like Transformer-based models (e.g., ViT-Large, 307M). To analyze this remarkable phenomenon, we devise a novel attribute, token diversity, to evaluate the characteristics of generated features from different models. Token diversity measures the feature dissimilarity among different tokens. Through extensive experiments and visualizations, we hypothesize that beyond the acknowledgment that a large model can improve MIM, a high token-diversity of a teacher model is also crucial. Based on the above discussion, Img2Vec adopts a teacher model with high token-diversity to generate image features. Img2Vec pre-trained on ImageNet unlabeled data with ViT-B yields 85.1\% top-1 accuracy on fine-tuning. Moreover, we scale up Img2Vec on larger models, ViT-L and ViT-H, and get $86.7\%$ and $87.5\%$ accuracy respectively. It also achieves state-of-the-art results on other downstream tasks, e.g., 51.8\% mAP on COCO and 50.7\% mIoU on ADE20K. Img2Vec is a simple yet effective framework tailored to deep feature MIM learning, accomplishing superb comprehensive performance on representative vision tasks.	翻訳日:2023-04-26 22:08:25 公開日:2023-04-25
# ランダムウォーク確率ADMMによる個人化フェデレーション学習の安定化 Mobilizing Personalized Federated Learning via Random Walk Stochastic ADMM ( http://arxiv.org/abs/2304.12534v1 ) ライセンス: Link先を確認	Ziba Parsons, Fei Dou, Houyi Du, Jin Lu	(参考訳) 本研究では,中央サーバと全クライアントとの一貫した接続が維持できず,データ分散が不均一である実世界シナリオにおいて,連合学習(fl)を実装する際の障壁について検討する。これらの課題に対処するために、サーバが隣接するクライアントのグループ間を移動してローカルモデルを学習するフェデレーション設定の動員に焦点を当てる。具体的には,モデルの学習に十分な数の接続クライアントがある限り,動的およびアドホックなネットワーク条件に適応可能な乗算器のランダムウォーク確率的交互方向法(rwsadmm)を提案する。 RWSADMMでは、サーバはクライアントのグループに向かってランダムに歩く。データの不均一性に対処するコンセンサス更新の代わりに、ハード不等式制約に基づいて、隣接するクライアント間の局所的な近接を定式化する。提案手法は,集中サーバが通信するクライアント数を削減し,通信コストを低減し,スケーラビリティを向上させる。 In this research, we investigate the barriers associated with implementing Federated Learning (FL) in real-world scenarios, where a consistent connection between the central server and all clients cannot be maintained, and data distribution is heterogeneous. To address these challenges, we focus on mobilizing the federated setting, where the server moves between groups of adjacent clients to learn local models. Specifically, we propose a new algorithm, Random Walk Stochastic Alternating Direction Method of Multipliers (RWSADMM), capable of adapting to dynamic and ad-hoc network conditions as long as a sufficient number of connected clients are available for model training. In RWSADMM, the server walks randomly toward a group of clients. It formulates local proximity among adjacent clients based on hard inequality constraints instead of consensus updates to address data heterogeneity. Our proposed method is convergent, reduces communication costs, and enhances scalability by reducing the number of clients the central server needs to communicate with.	翻訳日:2023-04-26 22:07:50 公開日:2023-04-25
# SEA:マルチエージェント強化学習のための空間的拡張アーキテクチャ SEA: A Spatially Explicit Architecture for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2304.12532v1 ) ライセンス: Link先を確認	Dapeng Li, Zhiwei Xu, Bin Zhang, Guoliang Fan	(参考訳) 空間情報は様々な分野において不可欠である。エージェントの空間的位置に応じて明確にモデル化する方法は、特にエージェントの数が変化し、スケールが巨大である場合に、マルチエージェント問題にとって非常に重要である。本稿では,コンピュータビジョンにおけるポイントクラウドタスクに着想を得て,マルチエージェント強化学習のための空間情報抽出構造を提案する。エージェントは、空間エンコーダデコーダ構造を通じて、近隣とグローバル情報を効果的に共有することができる。本手法は,分散実行(CTDE)パラダイムを用いた集中型学習に準じる。さらに,本手法は,既存の多種多様な強化学習アルゴリズムに対して,小さな修正を加えることで適用可能であり,様々なエージェントで問題に対処することができる。複数のマルチエージェントシナリオにおける実験は、既存の手法が空間的に明示的なアーキテクチャを追加することで説得力のある結果が得られることを示している。 Spatial information is essential in various fields. How to explicitly model according to the spatial location of agents is also very important for the multi-agent problem, especially when the number of agents is changing and the scale is enormous. Inspired by the point cloud task in computer vision, we propose a spatial information extraction structure for multi-agent reinforcement learning in this paper. Agents can effectively share the neighborhood and global information through a spatially encoder-decoder structure. Our method follows the centralized training with decentralized execution (CTDE) paradigm. In addition, our structure can be applied to various existing mainstream reinforcement learning algorithms with minor modifications and can deal with the problem with a variable number of agents. The experiments in several multi-agent scenarios show that the existing methods can get convincing results by adding our spatially explicit architecture.	翻訳日:2023-04-26 22:07:33 公開日:2023-04-25
# ChatGPTを用いた人間-ロボット協調の信頼性向上 Improved Trust in Human-Robot Collaboration with ChatGPT ( http://arxiv.org/abs/2304.12529v1 ) ライセンス: Link先を確認	Yang Ye, Hengxu You, Jing Du	(参考訳) 人工知能の時代、ロボットが人間の生活の様々な側面に深く関わるようになるにつれ、ロボットのコラボレーションはますます重要になっている。しかし、ロボットを信頼する人間のオペレーターの問題は、主に人間とロボット間の適切な意味理解とコミュニケーションが欠如しているため、重要な関心事である。 ChatGPTのような大規模言語モデル(LLM)の出現は、対話的でコミュニケーション的で堅牢な人間とロボットのコラボレーションアプローチを開発する機会を提供する。本稿では,ChatGPTが人間とロボットの協調作業における信頼に与える影響を考察する。本研究は, chatgpt を用いたロボット制御システム robogpt を設計, 操作者が道具を取り出すのを助けるために7自由度ロボットアームを制御し, 操作者は自然言語でロボットアームを操作できる。ロボットにChatGPTを組み込むことは、ロボットが人間とより効果的にコミュニケーションできる能力に起因して、ロボットとロボットのコラボレーションに対する信頼を著しく高めることを示した。さらに、ChatGPTは人間の言語のニュアンスを理解し、適切に応答することで、より自然で直感的な人間とロボットの相互作用を構築するのに役立つ。本研究の成果は,人間ロボット協調システムの開発に重要な意味を持つ。 Human robot collaboration is becoming increasingly important as robots become more involved in various aspects of human life in the era of Artificial Intelligence. However, the issue of human operators trust in robots remains a significant concern, primarily due to the lack of adequate semantic understanding and communication between humans and robots. The emergence of Large Language Models (LLMs), such as ChatGPT, provides an opportunity to develop an interactive, communicative, and robust human-robot collaboration approach. This paper explores the impact of ChatGPT on trust in a human-robot collaboration assembly task. This study designs a robot control system called RoboGPT using ChatGPT to control a 7-degree-of-freedom robot arm to help human operators fetch, and place tools, while human operators can communicate with and control the robot arm using natural language. A human-subject experiment showed that incorporating ChatGPT in robots significantly increased trust in human-robot collaboration, which can be attributed to the robot's ability to communicate more effectively with humans. Furthermore, ChatGPT ability to understand the nuances of human language and respond appropriately helps to build a more natural and intuitive human-robot interaction. The findings of this study have significant implications for the development of human-robot collaboration systems.	翻訳日:2023-04-26 22:07:20 公開日:2023-04-25
# 画像テキスト検索のための学習可能なピラーベースリグレード Learnable Pillar-based Re-ranking for Image-Text Retrieval ( http://arxiv.org/abs/2304.12570v1 ) ライセンス: Link先を確認	Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, Tat-Seng Chua	(参考訳) 画像テキスト検索は、意味的類似性に基づいて、モダリティギャップを橋渡しし、クロスモーダルコンテンツを取得することを目的としている。先行研究は通常、ペアワイズ関係(すなわち、データサンプルが他のデータと一致するかどうか)に焦点を当てるが、高次隣接関係(すなわち、複数のデータサンプル間のマッチング構造)を無視している。一般的なポストプロセッシング手法であるリグレードは, 単一モダリティ検索タスクにおいて, 隣り合う関係を捕捉する優位性を明らかにしている。しかし、既存の再分類アルゴリズムを直接画像テキスト検索に拡張するのは効果がない。本稿では,一般化,柔軟性,スパーシティ,非対称性という4つの視点から理由を分析し,新しい学習可能な柱型再ランキングパラダイムを提案する。具体的には,まず最上位の個体間およびモード間近傍を柱として選択し,それらと柱間の隣接関係でデータサンプルを再構成する。このように、各サンプルは類似性のみを用いてマルチモーダルピラー空間にマッピングでき、一般化が保証される。その後、関係を柔軟に活用し、近傍のばらばらな正の項目を発掘するために、隣り合うグラフ推論モジュールを設計する。また,クロスモーダル協調を促進し,非対称モダリティを整合させる構造アライメント制約を提案する。さまざまなベースバックボーンに加えて,flickr30kとms-cocoという2つのベンチマークデータセットで広範な実験を行い,提案手法の有効性,優越性,一般化,転送性について実証した。 Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities. Prior work usually focuses on the pairwise relations (i.e., whether a data sample matches another) but ignores the higher-order neighbor relations (i.e., a matching structure among multiple data samples). Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks. However, it is ineffective to directly extend existing re-ranking algorithms to image-text retrieval. In this paper, we analyze the reason from four perspectives, i.e., generalization, flexibility, sparsity, and asymmetry, and propose a novel learnable pillar-based re-ranking paradigm. Concretely, we first select top-ranked intra- and inter-modal neighbors as pillars, and then reconstruct data samples with the neighbor relations between them and the pillars. In this way, each sample can be mapped into a multimodal pillar space only using similarities, ensuring generalization. After that, we design a neighbor-aware graph reasoning module to flexibly exploit the relations and excavate the sparse positive items within a neighborhood. We also present a structure alignment constraint to promote cross-modal collaboration and align the asymmetric modalities. On top of various base backbones, we carry out extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, demonstrating the effectiveness, superiority, generalization, and transferability of our proposed re-ranking paradigm.	翻訳日:2023-04-26 22:00:27 公開日:2023-04-25
# KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis ( http://arxiv.org/abs/2304.12569v1 ) ライセンス: Link先を確認	Antoine Nzeyimana	(参考訳) 本稿では、著者が「セメヴァル-2023タスク12:アフリカ語感情分析」に入力したシステムについて述べる。システムはKinyarwanda言語に焦点を当て、言語固有のモデルを使用する。 kinyarwanda形態素は2層トランスフォーマーアーキテクチャでモデル化され、トランスフォーマーモデルはマルチタスクマスク形態素予測を用いて大きなテキストコーパスで事前学習される。このモデルは実験的なプラットフォームにデプロイされ、ユーザーは機械学習コードを書くことなく、トレーニング済みの言語モデルの微調整を試すことができる。共有タスクへの最後の応募は、34チーム中2位を獲得し、72.50%の重み付きF1得点を達成しました。評価結果の分析は,タスクの高精度化における課題を強調し,改善すべき領域を特定する。 This paper describes the system entered by the author to the SemEval-2023 Task 12: Sentiment analysis for African languages. The system focuses on the Kinyarwanda language and uses a language-specific model. Kinyarwanda morphology is modeled in a two tier transformer architecture and the transformer model is pre-trained on a large text corpus using multi-task masked morphology prediction. The model is deployed on an experimental platform that allows users to experiment with the pre-trained language model fine-tuning without the need to write machine learning code. Our final submission to the shared task achieves second ranking out of 34 teams in the competition, achieving 72.50% weighted F1 score. Our analysis of the evaluation results highlights challenges in achieving high accuracy on the task and identifies areas for improvement.	翻訳日:2023-04-26 21:59:58 公開日:2023-04-25
# マルチモーダルモデリングと異種GNNを用いた性能最適化 Performance Optimization using Multimodal Modeling and Heterogeneous GNN ( http://arxiv.org/abs/2304.12568v1 ) ライセンス: Link先を確認	Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Anna Sikora, Eduardo Cesar, Ali Jannesari	(参考訳) HPCアーキテクチャにおける不均一性と構成性の向上は、これらのシステムにおける自動チューニングアプリケーションとランタイムパラメータを非常に複雑にしている。ユーザはパラメータを設定するためのオプションを多数提示する。アプリケーション固有のソリューションに加えて、汎用的な検索戦略を使用することも一般的なアプローチであり、最良の構成や収束までの時間を特定することが大きな障壁となることが多い。したがって、様々なチューニングタスクに容易にスケールして適応できる汎用的で効率的なチューニングアプローチが必要となる。本稿では,複数のタスクに適応できるほど汎用的な並列コード領域のチューニング手法を提案する。本稿では、IRに基づくプログラミングモデルを分析し、タスク固有の性能最適化を行う。この目的のために,多モードグラフニューラルネットワークとオートエンコーダ(MGA)チューナを提案する。これは,異種グラフニューラルネットワークに適応したマルチモーダル深層学習に基づくアプローチであり,別個のモダリティとして機能するIRベースのコード表現をモデル化するための自動エンコーダをデノライズする。このアプローチは、並列コード領域/カーネルをチューニングするための構文、セマンティクス、構造対応irベースのコード表現をモデル化するパイプラインの一部として使用します。我々はPolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, LULESHベンチマークから得られたOpenMPおよびOpenCLコード領域/カーネルを広範囲に実験した。タスクにマルチモーダル学習技術を適用する。 i)openmpループにおけるスレッド数、スケジューリングポリシー、チャンクサイズを最適化すること。 ii)openclカーネルの異種デバイスマッピングのための最善のデバイスを特定すること。実験の結果,このマルチモーダル学習に基づくアプローチは,すべての実験で最先端技術を上回ることがわかった。 Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoizing Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply our multimodal learning techniques to the tasks of i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.	翻訳日:2023-04-26 21:59:45 公開日:2023-04-25
# proto-value network: 補助タスクによる表現学習のスケーリング Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks ( http://arxiv.org/abs/2304.12567v1 ) ライセンス: Link先を確認	Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare	(参考訳) 補助的タスクは、深層強化学習エージェントが学習した表現を改善する。分析学的には、それらの効果は合理的によく理解されているが、実際には、その主な用途は、表現の学習方法としてではなく、主要な学習目標を支持することである。多くの補助的なタスクが手続き的に定義されるので、環境に関する情報の本質的に無限の情報源として扱うことができるので、これはおそらく驚くべきことである。本研究は,エージェントネットワークのタスク数とサイズを同時に増加させる設定に着目し,豊かな表現を学習するための補助的タスクの有効性について検討する。この目的のために、後継の尺度に基づく補助タスクの新しいファミリーを導出する。これらのタスクは実装が容易であり、理論的な特性をアピールする。適切なオフポリシー学習ルールと組み合わせることで、結果は表現学習アルゴリズムであり、mahadevan & maggioni (2007)のproto-value関数を深層強化学習に拡張したものと解釈できる。アーケード学習環境における一連の実験を通じて,proto-valueネットワークは,線形近似と環境の報酬関数との相互作用(約4m)のみを用いて,確立されたアルゴリズムに匹敵する性能を得るための豊富な特徴を生成できることを実証した。 Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function.	翻訳日:2023-04-26 21:59:14 公開日:2023-04-25
# AdaNPC:テスト時間適応のための非パラメトリック分類器の探索 AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation ( http://arxiv.org/abs/2304.12566v1 ) ライセンス: Link先を確認	Yi-Fan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan	(参考訳) 最近の機械学習タスクの多くは、未認識分布に一般化できるモデルの開発に重点を置いている。ドメイン一般化(DG)は、様々な分野において重要なトピックの一つとなっている。いくつかの文献では、DGはターゲットのドメイン情報を利用せずに任意に困難であることを示している。この問題に対処するため,テスト時適応(TTA)手法を提案する。既存のTTA手法では、推論段階でオフラインのターゲットデータや高度な最適化手順が必要となる。本研究では,テスト時間適応(AdaNPC)を実行するために非パラメトリック分類を用いる。特に、トレーニングドメインの特徴とラベルペアを含むメモリを構築します。推論中、テストインスタンスが与えられた場合、AdaNPCはまずメモリからK個のクローズドサンプルをリコールして予測を投票し、次にテスト機能と予測ラベルをメモリに追加する。このように、メモリ内のサンプル分布は、トレーニング分布からテスト分布へと徐々に変化し、余分な計算コストが少なくなる。提案手法の背後にある合理性を理論的に正当化する。さらに,広範な数値実験でモデルをテストする。 AdaNPCは様々なDGベンチマークの競争ベースラインを大幅に上回っている。特に、適応ターゲットが一連のドメインである場合、AdaNPCの適応精度は高度なTTA法よりも50%高い。コードはhttps://github.com/yfzhang114/AdaNPCで入手できる。 Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target data or extra sophisticated optimization procedures during the inference stage. In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC). In particular, we construct a memory that contains the feature and label pairs from training domains. During inference, given a test instance, AdaNPC first recalls K closed samples from the memory to vote for the prediction, and then the test feature and predicted label are added to the memory. In this way, the sample distribution in the memory can be gradually changed from the training distribution towards the test distribution with very little extra computation cost. We theoretically justify the rationality behind the proposed method. Besides, we test our model on extensive numerical experiments. AdaNPC significantly outperforms competitive baselines on various DG benchmarks. In particular, when the adaptation target is a series of domains, the adaptation accuracy of AdaNPC is 50% higher than advanced TTA methods. The code is available at https://github.com/yfzhang114/AdaNPC.	翻訳日:2023-04-26 21:58:51 公開日:2023-04-25
# 要求情報検索におけるChatGPTの予備評価 A Preliminary Evaluation of ChatGPT in Requirements Information Retrieval ( http://arxiv.org/abs/2304.12562v1 ) ライセンス: Link先を確認	Jianzhang Zhang, Yiyang Chen, Nan Niu, Chuang Liu	(参考訳) コンテキスト: 最近では、ChatGPTがプログラミングタスクを実行し、一般的なドメインの質問に答える素晴らしい能力を示しています。目的:我々は,ChatGPTが要求分析タスクでどのように機能するかを実証的に評価し,ChatGPTが表現する大規模言語モデルの生成が,要求工学における自然言語処理の研究と実践に与える影響について考察する。方法:2つの共通要件情報検索タスク,2つの典型的な要件アーチファクトを含む4つの公開データセット,ChatGPTとタスクプロンプトのクエリ,定量的および定性的な結果分析を含む評価パイプラインを設計する。結果: 定量的な結果から、ChatGPTはゼロショット設定ですべてのデータセットで同等またはそれ以上のF\beta$値を達成する。定性的分析は、ChatGPTの強力な自然言語処理能力と限定的な要求工学ドメイン知識を示している。結論: 評価結果から,chatgptはゼロショット設定下で複数の言語を含む異なるタイプのアーティファクトから要求情報を取得することができる。大規模言語モデルに基づく要求検索モデルの研究と,それに対応するツールの開発は,研究コミュニティや産業コミュニティにとって重要である。 Context: Recently, many illustrative examples have shown ChatGPT's impressive ability to perform programming tasks and answer general domain questions. Objective: We empirically evaluate how ChatGPT performs on requirements analysis tasks to derive insights into how generative large language model, represented by ChatGPT, influence the research and practice of natural language processing for requirements engineering. Method: We design an evaluation pipeline including two common requirements information retrieval tasks, four public datasets involving two typical requirements artifacts, querying ChatGPT with fixed task prompts, and quantitative and qualitative results analysis. Results: Quantitative results show that ChatGPT achieves comparable or better $F\beta$ values in all datasets under a zero-shot setting. Qualitative analysis further illustrates ChatGPT's powerful natural language processing ability and limited requirements engineering domain knowledge. Conclusion: The evaluation results demonstrate ChatGPT' impressive ability to retrieve requirements information from different types artifacts involving multiple languages under a zero-shot setting. It is worthy for the research and industry communities to study generative large language model based requirements retrieval models and to develop corresponding tools.	翻訳日:2023-04-26 21:58:34 公開日:2023-04-25
# TCR:ショートビデオのタイトル生成とアテンションリファインメントによるカバー選択 TCR: Short Video Title Generation and Cover Selection with Attention Refinement ( http://arxiv.org/abs/2304.12561v1 ) ライセンス: Link先を確認	Yakun Yu, Jiuding Yang, Weidong Guo, Hui Liu, Yu Xu, and Di Niu	(参考訳) ユーザー生成ショートビデオの普及に伴い、コンテンツクリエイターがコンテンツを潜在的視聴者に宣伝することはますます困難になっている。短いビデオのタイトルやカバーを自動的に生成することで、視聴者の注意を引くことができる。既存のビデオキャプションの研究は主に、視聴者の注意を引くためのビデオタイトルに適合しない行動の事実記述を生成することに焦点を当てている。さらに,マルチモーダル情報に基づくカバー選択の研究は少ない。これらの問題は、短いビデオタイトル生成とカバーセレクション(TG-CS)のジョイントタスクを具体的にサポートするための調整された方法の必要性と、研究を支援するための対応するデータセットの作成の必要性を動機付けている。本稿では,まず,魅力あるタイトルとカバー付きビデオを含む,SVTG(Short Video Title Generation)という実世界のデータセットを収集し,提示する。そこで我々は,TG-CS の注意再定義 (TCR) 手法を用いたタイトル生成とカバー選択を提案する。精錬手順は、モデルトレーニングを洗練させるために、各サンプル内の高品質なサンプルと高関連フレームとテキストトークンを段階的に選択する。広範にわたる実験により,tcr手法は既存の様々な字幕生成手法より優れており,ノイズの多い実世界のショートビデオに対して,より優れたカバーを選択できることを示した。 With the widespread popularity of user-generated short videos, it becomes increasingly challenging for content creators to promote their content to potential viewers. Automatically generating appealing titles and covers for short videos can help grab viewers' attention. Existing studies on video captioning mostly focus on generating factual descriptions of actions, which do not conform to video titles intended for catching viewer attention. Furthermore, research for cover selection based on multimodal information is sparse. These problems motivate the need for tailored methods to specifically support the joint task of short video title generation and cover selection (TG-CS) as well as the demand for creating corresponding datasets to support the studies. In this paper, we first collect and present a real-world dataset named Short Video Title Generation (SVTG) that contains videos with appealing titles and covers. We then propose a Title generation and Cover selection with attention Refinement (TCR) method for TG-CS. The refinement procedure progressively selects high-quality samples and highly relevant frames and text tokens within each sample to refine model training. Extensive experiments show that our TCR method is superior to various existing video captioning methods in generating titles and is able to select better covers for noisy real-world short videos.	翻訳日:2023-04-26 21:58:13 公開日:2023-04-25
# SwinFSR: SwinIRと周波数領域知識を用いたステレオ画像超解法 SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge ( http://arxiv.org/abs/2304.12556v1 ) ライセンス: Link先を確認	Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang and Jun Chen	(参考訳) ステレオ・イメージ・スーパーレゾリューション(stereosr)は近年、携帯電話、自動運転車、ロボットにデュアルカメラを多用するなど、大きな注目を集めている。本稿では,swiinirの拡張と高速フーリエ畳み込み(ffc)によって得られた周波数領域知識をもとに,swiinfsrという新しいステレオsr手法を提案する。具体的には、グローバル情報を効果的に収集するために、FFCを用いて周波数領域の知識を明示的に取り入れ、特徴抽出に残りのSwin Fourier Transformerブロック(RSFTB)を用いることで、SwinIRのResidual Swin Transformerブロック(RSTB)を変更する。また、ステレオビューの効率良く正確な融合のために、rcamと呼ばれる新しいクロスアテンションモジュールを提案し、最先端のクロスアテンションモジュールよりも少ない計算コストで高い競合性能を実現する。広範な実験結果とアブレーション実験により,提案するsinfsrの有効性と有効性が実証された。 Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.	翻訳日:2023-04-26 21:57:53 公開日:2023-04-25
# 対人訓練と対人訓練の併用 Combining Adversaries with Anti-adversaries in Training ( http://arxiv.org/abs/2304.12550v1 ) ライセンス: Link先を確認	Xiaoling Zhou, Nan Yang, Ou Wu	(参考訳) 敵対的トレーニングは、ディープニューラルネットワークの堅牢性を改善する効果的な学習技術である。本研究では,異なるサンプルが異なる摂動方向(対向方向,反対向方向)と様々な摂動境界を持つことができるというより一般的な摂動範囲の下で,対向学習が深層学習モデルに与える影響を理論的に検討した。理論的な考察から,学習における反逆者(反逆者摂動のサンプル)と反逆者(反逆者摂動のサンプル)の組み合わせは,いくつかの典型的な学習シナリオ(例えば,ノイズラベル学習と不均衡学習)において,クラス間の公正性向上と頑健性と一般化のトレードオフの改善に有効であることが示唆された。本研究の理論的知見に基づいて,各トレーニングサンプルに異なる境界を持つ敵と反敵を結合した,より一般的な学習目標を示す。メタ学習は組み合わせ重量を最適化するために利用される。異なる学習シナリオにおけるベンチマークデータセットの実験により,提案手法の有効性が検証された。 Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.	翻訳日:2023-04-26 21:57:33 公開日:2023-04-25
# coupa: オンラインからオフラインのサービスプラットフォームのための産業向けレコメンデーションシステム COUPA: An Industrial Recommender System for Online to Offline Service Platforms ( http://arxiv.org/abs/2304.12549v1 ) ライセンス: Link先を確認	Sicong Xie, Binbin Hu, Fengze Li, Ziqi Liu, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou	(参考訳) ユーザーが小売サービス(例えばエンターテイメントやダイニング)をローカルに発見することを支援するため、オンライン・トゥ・オフライン(O2O)サービスプラットフォームは近年人気を集めており、現在のレコメンデーターシステムに大きく挑戦している。 O2OサービスのフィードライクなシナリオであるAlipayの実際のデータから、当社のシナリオでは、繰り返しベースの時間パターンと位置バイアスが一般的に存在していることが分かり、推奨の有効性を脅かしている。そこで本研究では, ユーザの嗜好を特徴付ける産業システムであるCOUPAを提案する。(1) 時間意識的嗜好: 注意機構を備えた連続時間意識的ポイントプロセスを用いて, 推奨のための時間パターンを完全に把握する。 2)位置認識選好:位置個人化モジュールを備えた位置セレクタコンポーネントは、位置バイアスをパーソナライズするパーソナライズされた方法で緩和する。最後に、Alipay上でCOUPAを慎重に実装、デプロイし、エッジ、ストリーミング、バッチコンピューティング、および2段階のオンラインサービスモードで、いくつかの一般的な推奨シナリオをサポートする。我々は、COUPAが一貫して優れたパフォーマンスを達成し、助言のための直感的な証拠を提供する可能性を実証する広範な実験を行う。 Aiming at helping users locally discovery retail services (e.g., entertainment and dinning), Online to Offline (O2O) service platforms have become popular in recent years, which greatly challenge current recommender systems. With the real data in Alipay, a feeds-like scenario for O2O services, we find that recurrence based temporal patterns and position biases commonly exist in our scenarios, which seriously threaten the recommendation effectiveness. To this end, we propose COUPA, an industrial system targeting for characterizing user preference with following two considerations: (1) Time aware preference: we employ the continuous time aware point process equipped with an attention mechanism to fully capture temporal patterns for recommendation. (2) Position aware preference: a position selector component equipped with a position personalization module is elaborately designed to mitigate position bias in a personalized manner. Finally, we carefully implement and deploy COUPA on Alipay with a cooperation of edge, streaming and batch computing, as well as a two-stage online serving mode, to support several popular recommendation scenarios. We conduct extensive experiments to demonstrate that COUPA consistently achieves superior performance and has potential to provide intuitive evidences for recommendation	翻訳日:2023-04-26 21:57:12 公開日:2023-04-25
# MMRDN:オブジェクト指向シーンにおけるマルチビュー操作関係検出のための一貫性表現 MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes ( http://arxiv.org/abs/2304.12592v1 ) ライセンス: Link先を確認	Han Wang, Jiayuan Zhang, Lipeng Wan, Xingyu Chen, Xuguang Lan, Nanning Zheng	(参考訳) 操作関係検出(mrd)は、ロボットが物体を正しい順に掴むように誘導することを目的としており、物体の積み重ねられた場面における把持の安全性と信頼性を確保するために重要である。事前定義された視点から収集されたデータでトレーニングされたディープニューラルネットワークによる操作関係は、非構造化環境での視覚的転位に制限がある。マルチビューデータは、より包括的な空間情報を提供するが、マルチビューMDDの課題はドメインシフトである。本稿では,2次元および3次元マルチビューデータを用いて訓練を行うマルチビューmrdネットワーク(mmrdn)という,新しいマルチビュー融合フレームワークを提案する。異なるビューからの2Dデータを共通の隠れ空間に投影し、埋め込みをVon-Mises-Fisher分布の集合に適合させて一貫した表現を学習する。さらに、3Dデータ内の位置情報を利用して、各オブジェクト対の点雲からK$Maximum Vertical Neighbors (KMVN) 点のセットを選択し、これら2つのオブジェクトの相対的な位置を符号化する。最後に、多視点2Dデータと3Dデータの特徴を結合して、オブジェクトの相互関係を予測する。挑戦的なREGRADデータセットの実験結果から、MMRDNはマルチビューMDDタスクにおいて最先端の手法よりも優れていることが示された。また,合成データで学習したモデルが実世界のシナリオに移行できることも実証した。 Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-view data provide more comprehensive information in space, while a challenge of multi-view MRD is domain shift. In this paper, we propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN), which is trained by 2D and 3D multi-view data. We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions to learn the consistent representations. Besides, taking advantage of position information within the 3D data, we select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects. Finally, the features of multi-view 2D and 3D data are concatenated to predict the pairwise relationship of objects. Experimental results on the challenging REGRAD dataset show that MMRDN outperforms the state-of-the-art methods in multi-view MRD tasks. The results also demonstrate that our model trained by synthetic data is capable to transfer to real-world scenarios.	翻訳日:2023-04-26 21:51:12 公開日:2023-04-25
# コントラスト学習と一貫した意味・構造制約による教師なし合成画像の洗練 Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic and Structure Constraints ( http://arxiv.org/abs/2304.12591v1 ) ライセンス: Link先を確認	Ganning Zhao, Tingwei Shen, Suya You, and C.-C. Jay Kuo	(参考訳) ディープニューラルネットワーク(dnn)トレーニングには,コンピュータ生成合成画像のリアリズムの確保が不可欠である。合成されたデータセットと実世界のデータセットのセマンティックな分布が異なるため、合成された画像と精巧な画像の間にセマンティックなミスマッチが存在する。近年,相関パッチの抽出と非相関パッチの分離にコントラスト学習(cl)が成功している。本研究では,合成画像と精細画像間の意味的・構造的整合性を利用して,意味的歪みを低減するためにCLを採用する。さらに, 高い負のマイニングを取り入れて, さらなる性能向上を図る。定性的および定量的な測定値を用いた他のベンチマーク手法と比較し,本手法が最先端の性能を提供することを示す。 Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlated patches together and push uncorrelated ones apart. In this work, we exploit semantic and structural consistency between synthetic and refined images and adopt CL to reduce the semantic distortion. Besides, we incorporate hard negative mining to improve the performance furthermore. We compare the performance of our method with several other benchmarking methods using qualitative and quantitative measures and show that our method offers the state-of-the-art performance.	翻訳日:2023-04-26 21:50:45 公開日:2023-04-25
# ContrastMotion:大規模LiDAR点雲に対する自己教師型シーンモーション学習 ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds ( http://arxiv.org/abs/2304.12589v1 ) ライセンス: Link先を確認	Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma	(参考訳) 本稿では,BEV表現を用いたLiDARに基づく自律走行のための新しい自律走行推定器を提案する。通常採用されるデータレベルの構造一貫性のための自己教師付き戦略とは異なり、連続フレーム内の柱間の特徴レベルの一貫性を通じてシーンの動きを予測することにより、動的シーンにおけるノイズ点と視点交換点雲の影響を解消する。具体的には,識別的かつロバストな特徴を対比学習で学習するために,ネットワークにより疑似教師付き信号を提供する \textit{soft discriminative loss} を提案する。また,ポイントクラウドフレーム間の正当な補償を自動学習し,特徴抽出を促進する \textit{gated multi-frame fusion}ブロックを提案する。最後に、特徴距離に基づいて柱対応確率を予測し、さらにシーンの動きを予測するために、‘textit{pillar association} を提案する。広汎な実験により,シーンフローと動き予測の両タスクにおけるtextbf{ContrastMotion}の有効性と優位性を示した。コードはすぐに入手できる。 In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.	翻訳日:2023-04-26 21:50:30 公開日:2023-04-25
# MixNeRF: 特徴混在ハッシュテーブルを備えたメモリ効率の良いNeRF MixNeRF: Memory Efficient NeRF with Feature Mixed-up Hash Table ( http://arxiv.org/abs/2304.12587v1 ) ライセンス: Link先を確認	Yongjae Lee, Li Yang and Deliang Fan	(参考訳) ニューラル・ラディアンス・フィールド(NeRF)はフォトリアリスティック・ノベルビューの生成において顕著な性能を示した。 NeRFの出現以来,多層パーセプトロン(MLP)ネットワークの複雑さを減らし,グリッドなどの明示的な構造を持つ特徴を管理することで,極めて高速なトレーニングを実現している研究が数多く行われている。しかし、高密度グリッドに格納するには大きなメモリスペースが必要であり、それによってコンピュータシステムのメモリボトルネックが発生し、トレーニング時間も大きくなる。そこで本研究では,メモリ効率を向上し,復元品質を維持しつつトレーニング時間を短縮するために混合ハッシュテーブルを用いたメモリ効率のよいnrfフレームワークであるmixnerfを提案する。まず,マルチレベル機能グリッドの一部を1つに適応的に混合し,単一のハッシュテーブルにマップする \textit{mixed-up hash table} を設計した。その後、グリッド点の正しいインデックスを得るために、任意のレベルグリッドのインデックスを標準グリッドのインデックスに変換する \textit{index transformation} 法をさらに設計する。最先端のInstant-NGP、TensoRF、DVGOとベンチマークした大規模な実験によると、MixNeRFは、同じGPUハードウェア上で、同様のあるいはそれ以上のリコンストラクション品質で、最速のトレーニング時間を達成できる。ソースコードは \url{https://github.com/nfyfamr/MixNeRF} で入手できる。 Neural radiance field (NeRF) has shown remarkable performance in generating photo-realistic novel views. Since the emergence of NeRF, many studies have been conducted, among which managing features with explicit structures such as grids has achieved exceptionally fast training by reducing the complexity of multilayer perceptron (MLP) networks. However, storing features in dense grids requires significantly large memory space, which leads to memory bottleneck in computer systems and thus large training time. To address this issue, in this work, we propose MixNeRF, a memory-efficient NeRF framework that employs a mixed-up hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. We first design a \textit{mixed-up hash table} to adaptively mix part of multi-level feature grids into one and map it to a single hash table. Following that, in order to obtain the correct index of a grid point, we further design an \textit{index transformation} method that transforms indices of an arbitrary level grid to those of a canonical grid. Extensive experiments benchmarking with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MixNeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality. Source code is available at \url{https://github.com/nfyfamr/MixNeRF}.	翻訳日:2023-04-26 21:50:14 公開日:2023-04-25
# 複素力学系における創発的組織のための物理インフォームド表現学習 Physics-Informed Representation Learning for Emergent Organization in Complex Dynamical Systems ( http://arxiv.org/abs/2304.12586v1 ) ライセンス: Link先を確認	Adam Rupe and Karthik Kashinath and Nalini Kumar and James P. Crutchfield	(参考訳) 非線形に相互作用するシステムコンポーネントは、新しい特性と異なる時空スケールで現象を生成する不安定性を導入することが多い。これは自発的自己組織化として知られ、熱力学的平衡から遠く離れた系においてユビキタスである。我々は,データ駆動型アルゴリズムを実践的に構築する,創発的組織のための理論的基盤的枠組みを導入する。ビルディングブロックは時空の光錐で、局所的な相互作用を通じてシステムがどのように情報を伝達するかを捉えます。複雑な時空間系において,光円錐,局所因果状態,組織的挙動,コヒーレント構造の予測等価クラスが成立することを示す。物理インフォームド機械学習アルゴリズムと高性能コンピューティング実装を用いて,実世界の領域科学問題に対する局所因果状態の適用性を実証した。局所因果状態が渦を捕捉し, 2次元乱流中におけるパワーロー崩壊挙動を示す。そして、既知の(ハリケーンや大気の川)と新しい極端な気象事象がピクセルレベルで識別され、高解像度の気候データで時間を通して追跡されることを示す。 Nonlinearly interacting system components often introduce instabilities that generate phenomena with new properties and at different space-time scales than the components. This is known as spontaneous self-organization and is ubiquitous in systems far from thermodynamic equilibrium. We introduce a theoretically-grounded framework for emergent organization that, via data-driven algorithms, is constructive in practice. Its building blocks are spacetime lightcones that capture how information propagates across a system through local interactions. We show that predictive equivalence classes of lightcones, local causal states, capture organized behaviors and coherent structures in complex spatiotemporal systems. Using our unsupervised physics-informed machine learning algorithm and a high-performance computing implementation, we demonstrate the applicability of the local causal states for real-world domain science problems. We show that the local causal states capture vortices and their power-law decay behavior in two-dimensional turbulence. We then show that known (hurricanes and atmospheric rivers) and novel extreme weather events can be identified on a pixel-level basis and tracked through time in high-resolution climate data.	翻訳日:2023-04-26 21:49:47 公開日:2023-04-25
# 光学顕微鏡観察から直接学習するイメージングメカニズム Learning imaging mechanism directly from optical microscopy observations ( http://arxiv.org/abs/2304.12584v1 ) ライセンス: Link先を確認	Ze-Hao Wang (1 and 2), Long-Kun Shan (1 and 2), Tong-Tian Weng (1 and 2), Tian-Long Chen (3), Qi-Yu Wang (1 and 2), Xiang-Dong Chen (1, 2 and 4), Zhang-Yang Wang (3), Guang-Can Guo (1, 2 and 4), Fang-Wen Sun (1, 2 and 4) ((1) CAS Key Laboratory of Quantum Information, University of Science and Technology of China, Hefei, 230026, China, (2) CAS Center For Excellence in Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, 230026, China, (3) University of Texas at Austin, Austin, TX 78705, USA, (4) Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China)	(参考訳) 光顕微鏡画像はナノ世界の直接可視化を通じて科学研究において重要な役割を担い、そこではイメージング機構は点拡散関数(PSF)とエミッターの畳み込みとして記述される。 PSFや同等のPSFの事前知識に基づいて、ナノ世界のより正確な探査が可能である。しかし,psfを顕微鏡画像から直接抽出することは極めて困難である。本稿では,自己教師型学習の助けを借りて,生の顕微鏡画像から直接PSFとエミッタの学習可能な推定を可能にする物理インフォームドマスク付きオートエンコーダ(PiMAE)を提案する。本手法を合成データと実世界実験で実証し,高い精度と雑音のロバスト性を示した。 PiMAEは、正規化ルート平均二乗誤差(NRMSE)測定値によって測定された平均19.6\%と50.7\%(35タスク)で、合成データタスクにおいてDeepSTORMとRichardson-Lucyアルゴリズムを上回っている。これは、DeepSTORMで使われている教師ありアプローチや、リチャードソン・ルーシーアルゴリズムで知られているPSF仮定とは対照的である。本手法は,光学顕微鏡における隠蔽イメージング機構の実現に有効な手法であり,さらに多くのシステムで隠蔽機構を学習することができる。 Optical microscopy image plays an important role in scientific research through the direct visualization of the nanoworld, where the imaging mechanism is described as the convolution of the point spread function (PSF) and emitters. Based on a priori knowledge of the PSF or equivalent PSF, it is possible to achieve more precise exploration of the nanoworld. However, it is an outstanding challenge to directly extract the PSF from microscopy images. Here, with the help of self-supervised learning, we propose a physics-informed masked autoencoder (PiMAE) that enables a learnable estimation of the PSF and emitters directly from the raw microscopy images. We demonstrate our method in synthetic data and real-world experiments with significant accuracy and noise robustness. PiMAE outperforms DeepSTORM and the Richardson-Lucy algorithm in synthetic data tasks with an average improvement of 19.6\% and 50.7\% (35 tasks), respectively, as measured by the normalized root mean square error (NRMSE) metric. This is achieved without prior knowledge of the PSF, in contrast to the supervised approach used by DeepSTORM and the known PSF assumption in the Richardson-Lucy algorithm. Our method, PiMAE, provides a feasible scheme for achieving the hidden imaging mechanism in optical microscopy and has the potential to learn hidden mechanisms in many more systems.	翻訳日:2023-04-26 21:49:29 公開日:2023-04-25
# 非定常環境における動的システムのリアルタイム安全性評価:方法と手法のレビュー Real-time Safety Assessment of Dynamic Systems in Non-stationary Environments: A Review of Methods and Techniques ( http://arxiv.org/abs/2304.12583v1 ) ライセンス: Link先を確認	Zeyi Liu and Songqiao Hu and Xiao He	(参考訳) 動的システムのリアルタイム安全性評価(RTSA)は,特に非定常環境において,産業や輸送などの分野において重要な意味を持つ重要な課題である。しかし,非定常環境におけるリアルタイム安全性評価手法の包括的レビューの欠如は,関連手法の進歩と洗練を妨げている。本稿では,非定常環境におけるRTSAタスクの手法と手法について概説する。特に、非定常環境におけるrtsaアプローチの背景と意義を最初に強調する。次に、定義、分類、および主な課題をカバーする問題記述を示す。本稿では,オンラインアクティブラーニング,オンラインセミ教師付きラーニング,オンライン転送学習,オンライン異常検出といった関連技術の最近の進歩を概観する。最後に,今後の展望と今後の研究の方向性について論じる。本総説は,非定常環境におけるリアルタイム安全評価手法の総合的かつ最新の概観を提供することを目的としており,この分野の研究者や実践者にとって貴重な資源となる。 Real-time safety assessment (RTSA) of dynamic systems is a critical task that has significant implications for various fields such as industrial and transportation applications, especially in non-stationary environments. However, the absence of a comprehensive review of real-time safety assessment methods in non-stationary environments impedes the progress and refinement of related methods. In this paper, a review of methods and techniques for RTSA tasks in non-stationary environments is provided. Specifically, the background and significance of RTSA approaches in non-stationary environments are firstly highlighted. We then present a problem description that covers the definition, classification, and main challenges. We review recent developments in related technologies such as online active learning, online semi-supervised learning, online transfer learning, and online anomaly detection. Finally, we discuss future outlooks and potential directions for further research. Our review aims to provide a comprehensive and up-to-date overview of real-time safety assessment methods in non-stationary environments, which can serve as a valuable resource for researchers and practitioners in this field.	翻訳日:2023-04-26 21:49:04 公開日:2023-04-25
# 学習軌跡は一般化指標である Learning Trajectories are Generalization Indicators ( http://arxiv.org/abs/2304.12579v1 ) ライセンス: Link先を確認	Jingwen Fu, Zhizheng Zhang, Dacheng Yin, Yan Lu, Nanning Zheng	(参考訳) 本稿では,深層ニューラルネットワーク(dnn)の学習軌跡と,それに対応する一般化能力との関係について,広範に使用される勾配降下法と確率的勾配降下法を用いて検討する。本稿では,軌道情報をモデル化するための線形近似関数を構築し,それに基づくよりリッチな軌道情報を持つ新しい一般化を提案する。提案する一般化は,学習軌跡の複雑さと,学習集合のバイアスと多様性の比率に依存する。実験結果から,提案手法は様々な学習段階,学習率,ラベルノイズレベルの一般化傾向を効果的に捉えていることがわかった。 The aim of this paper is to investigate the connection between learning trajectories of the Deep Neural Networks (DNNs) and their corresponding generalization capabilities when being optimized with broadly used gradient descent and stochastic gradient descent algorithms. In this paper, we construct Linear Approximation Function to model the trajectory information and we propose a new generalization bound with richer trajectory information based on it. Our proposed generalization bound relies on the complexity of learning trajectory and the ratio between the bias and diversity of training set. Experimental results indicate that the proposed method effectively captures the generalization trend across various training steps, learning rates, and label noise levels.	翻訳日:2023-04-26 21:48:50 公開日:2023-04-25
# CPUアーキテクチャの高レベルループとテンソル抽象化によるディープラーニングとHPCカーネルの調和 Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures ( http://arxiv.org/abs/2304.12576v1 ) ライセンス: Link先を確認	Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke	(参考訳) 過去10年間で、ディープラーニング(DL)アルゴリズム、プログラミングシステム、ハードウェアは、ハイパフォーマンスコンピューティング(HPC)のアルゴリズムと融合してきた。それでも、DLとHPCシステムのプログラミング手法は停滞しており、高度に最適化されているが、プラットフォーム固有の非フレキシブルなベンダー最適化ライブラリに依存している。このようなライブラリは、特定のプラットフォーム、カーネル、その形状に対して、ベンダーが専用の最適化作業を行っているのに対して、残りのユースケースではパフォーマンスが劣り、パフォーマンスガラスジャウの非可搬性コードが得られる。この研究は、現代的なCPUアーキテクチャのための効率的でポータブルなDLとHPCカーネルを開発するためのフレームワークを導入する。カーネル開発を2つのステップで分解する。 1)テンソル処理プリミティブ(tpps: compact, versatile set of 2d-tensor operator)を用いた計算コアの表現 2) TPPのまわりの論理ループを高水準で宣言的に表現するのに対して, 正確なインスタンス化(順序付け, タイリング, 並列化)は単純なノブによって決定される。我々は、スタンドアロンカーネルと、さまざまなCPUプラットフォームにおける最先端実装よりも優れたエンドツーエンドワークロードを使用して、このアプローチの有効性を実証する。 During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant, relying on highly-optimized, yet platform-specific and inflexible vendor-optimized libraries. Such libraries provide close-to-peak performance on specific platforms, kernels and shapes thereof that vendors have dedicated optimizations efforts, while they underperform in the remaining use-cases, yielding non-portable codes with performance glass-jaws. This work introduces a framework to develop efficient, portable DL and HPC kernels for modern CPU architectures. We decompose the kernel development in two steps: 1) Expressing the computational core using Tensor Processing Primitives (TPPs): a compact, versatile set of 2D-tensor operators, 2) Expressing the logical loops around TPPs in a high-level, declarative fashion whereas the exact instantiation (ordering, tiling, parallelization) is determined via simple knobs. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.	翻訳日:2023-04-26 21:48:38 公開日:2023-04-25
# 真理探索アルゴリズムの公正性とバイアス:実験的解析 Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis ( http://arxiv.org/abs/2304.12573v1 ) ライセンス: Link先を確認	Simone Lazier, Saravanan Thirumuruganathan, Hadis Anahideh	(参考訳) 機械学習(ML)ベースのアプローチは、社会的影響のある多くのアプリケーションでますます使われている。 mlモデルのトレーニングには大量のラベル付きデータが必要であり、クラウドソーシングは複数のワーカーからラベルを取得するための主要なパラダイムである。群衆労働者は信頼できないラベルを提供し、これに対処するために、多数決のような真理発見(TD)アルゴリズムを適用して、労働者の反応の矛盾からコンセンサスラベルを決定する。しかし、これらのコンセンサスラベルは、性別、人種、政治的所属といったセンシティブな属性に基づいてバイアスを受ける可能性があることに注意する必要がある。センシティブな属性が関与していない場合でも、ラベルは毒性のような主観的な側面の異なる視点でバイアスを受けることができる。本稿では,TDアルゴリズムのバイアスと公平性を系統的に研究する。既存の2つの群集ラベル付きデータセットを用いて,非自明な割合の労働者が偏りのある結果をもたらし,TDに対する単純なアプローチが準最適であることを明らかにする。私たちの研究は、一般的なTDアルゴリズムがパナセアではないことも示しています。さらに,このような不公平な作業員が下流のmlタスクに与える影響を定量化し,公平性とラベルバイアスの修正が非効率であることを示す。我々はこれらの問題を改善できる新しいバイアス対応の真理発見アルゴリズムの設計を訴えて、論文を締めくくります。 Machine learning (ML) based approaches are increasingly being used in a number of applications with societal impact. Training ML models often require vast amounts of labeled data, and crowdsourcing is a dominant paradigm for obtaining labels from multiple workers. Crowd workers may sometimes provide unreliable labels, and to address this, truth discovery (TD) algorithms such as majority voting are applied to determine the consensus labels from conflicting worker responses. However, it is important to note that these consensus labels may still be biased based on sensitive attributes such as gender, race, or political affiliation. Even when sensitive attributes are not involved, the labels can be biased due to different perspectives of subjective aspects such as toxicity. In this paper, we conduct a systematic study of the bias and fairness of TD algorithms. Our findings using two existing crowd-labeled datasets, reveal that a non-trivial proportion of workers provide biased results, and using simple approaches for TD is sub-optimal. Our study also demonstrates that popular TD algorithms are not a panacea. Additionally, we quantify the impact of these unfair workers on downstream ML tasks and show that conventional methods for achieving fairness and correcting label biases are ineffective in this setting. We end the paper with a plea for the design of novel bias-aware truth discovery algorithms that can ameliorate these issues.	翻訳日:2023-04-26 21:48:16 公開日:2023-04-25
# 遺伝的にインスパイアされた対流伝熱促進 Genetically-inspired convective heat transfer enhancement ( http://arxiv.org/abs/2304.12618v1 ) ライセンス: Link先を確認	Rodrigo Castellanos and Andrea Ianiro and Stefano Discetti	(参考訳) 平坦なプレート上の乱流境界層(TBL)における対流熱伝達を、線形遺伝的アルゴリズム制御(LGAC)に基づく人工知能アプローチを用いて促進する。アクチュエータは、フリーストリームに整列した6つのスロットジェットの集合である。開ループ最適周期強制は、キャリア周波数、デューティサイクル、アクチュエータ間の位相差を制御パラメータとして定義する。制御法則は、未飽和のTBLと定常ジェットによる作動に関して最適化される。コスト関数は、壁対流熱伝達率とアクチュエータのコストを含む。制御器の性能は赤外線サーモグラフィにより評価され、粒子画像速度測定でも特徴付けられる。最適制御器はわずかに非対称な流れ場を与える。 LGACアルゴリズムは、すべてのアクチュエータに対して同じ周波数とデューティサイクルに収束する。この周波数は, 壁近傍で発生する大規模乱流構造の特性移動時間の逆数と著しく等しいことに注意が必要である。複数のジェットアクチュエータ間の位相差は非常に関係があることが示され、フロー非対称性の主要因となった。その結果、アクティベーション空間内の未探索のコントローラに対する機械学習制御の可能性が特定される。さらに,本研究は,高度な計測技術と高度なアルゴリズムを併用した実験研究の可能性を実証するものである。 The convective heat transfer in a turbulent boundary layer (TBL) on a flat plate is enhanced using an artificial intelligence approach based on linear genetic algorithms control (LGAC). The actuator is a set of six slot jets in crossflow aligned with the freestream. An open-loop optimal periodic forcing is defined by the carrier frequency, the duty cycle and the phase difference between actuators as control parameters. The control laws are optimised with respect to the unperturbed TBL and to the actuation with a steady jet. The cost function includes the wall convective heat transfer rate and the cost of the actuation. The performance of the controller is assessed by infrared thermography and characterised also with particle image velocimetry measurements. The optimal controller yields a slightly asymmetric flow field. The LGAC algorithm converges to the same frequency and duty cycle for all the actuators. It is noted that such frequency is strikingly equal to the inverse of the characteristic travel time of large-scale turbulent structures advected within the near-wall region. The phase difference between multiple jet actuation has shown to be very relevant and the main driver of flow asymmetry. The results pinpoint the potential of machine learning control in unravelling unexplored controllers within the actuation space. Our study furthermore demonstrates the viability of employing sophisticated measurement techniques together with advanced algorithms in an experimental investigation.	翻訳日:2023-04-26 21:41:10 公開日:2023-04-25
# 双方向セマンティック整合性制約を用いた弱覚的時間的行動定位 Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint ( http://arxiv.org/abs/2304.12616v1 ) ライセンス: Link先を確認	Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao	(参考訳) WTAL(Weakly Supervised Temporal Action Localization)は、トレーニングデータセット内のビデオレベルのカテゴリラベルのみを考慮し、ビデオに対するアクションの時間的境界を分類し、ローカライズすることを目的としている。トレーニング中の境界情報の欠如により、既存のアプローチではwtalを分類問題、すなわち局所化のための時間クラス活性化マップ(t-cam)の生成として定式化している。しかし、分類損失のみの場合、モデルはサブ最適化されるため、アクション関連のシーンは異なるクラスラベルを区別するのに十分である。アクション関連シーンにおける他のアクション(すなわち、ポジティブアクションと同じシーン)について、このサブ最適化モデルは、コシーンアクションをポジティブアクションと誤分類する。この誤分類に対処するために,双方向意味一貫性制約(bi-scc)という,単純かつ効率的な手法を提案する。提案するbi-sccは,まず,映像間における肯定的行動とコシーン的動作の相関関係を破る拡張映像を生成するために,時間的文脈拡張を採用し,その後に意味的一貫性制約(scc)を用いて,オリジナル映像と拡張映像の予測を一貫性を持たせ,コシーン動作を抑制する。しかし、この拡張ビデオは、当初の時間的文脈を破壊してしまう。一貫性の制約を単純に適用すれば、局所化されたポジティブアクションの完全性に影響を及ぼす。そこで我々は,オリジナルビデオと拡張ビデオの相互監督により,協調行動の抑制と肯定的行動の整合性を確保しつつ,双方向的にSCCを増強する。最後に,提案するBi-SCCを現在のWTALアプローチに適用し,その性能を向上する。実験の結果,THUMOS14およびActivityNetの最先端手法よりも優れた性能を示した。 Weakly Supervised Temporal Action Localization (WTAL) aims to classify and localize temporal boundaries of actions for the video, given only video-level category labels in the training datasets. Due to the lack of boundary information during training, existing approaches formulate WTAL as a classificationproblem, i.e., generating the temporal class activation map (T-CAM) for localization. However, with only classification loss, the model would be sub-optimized, i.e., the action-related scenes are enough to distinguish different class labels. Regarding other actions in the action-related scene ( i.e., the scene same as positive actions) as co-scene actions, this sub-optimized model would misclassify the co-scene actions as positive actions. To address this misclassification, we propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi-SCC), to discriminate the positive actions from co-scene actions. The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions. However, we find that this augmented video would destroy the original temporal context. Simply applying the consistency constraint would affect the completeness of localized positive actions. Hence, we boost the SCC in a bidirectional way to suppress co-scene actions while ensuring the integrity of positive actions, by cross-supervising the original and augmented videos. Finally, our proposed Bi-SCC can be applied to current WTAL approaches, and improve their performance. Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.	翻訳日:2023-04-26 21:40:52 公開日:2023-04-25
# STM-UNet:スウィントランスとマルチスケールMLPを用いた医用画像分割のための効率的なU字型アーキテクチャ STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation ( http://arxiv.org/abs/2304.12615v1 ) ライセンス: Link先を確認	Lei Shi, Tianyu Gao, Zheng Zhang and Junxing Zhang	(参考訳) 自動医療画像分割は、医師がより早く正確に診断するのに役立つ。近年,医用画像分割のための深層学習モデルが大きな進歩を遂げている。しかし、既存のモデルはu字型アーキテクチャを効率的に改善するためにトランスフォーマーやmlpを効果的に活用できなかった。さらに,MLPのマルチスケール特徴は,U字型アーキテクチャのボトルネックにおいて完全に抽出されていない。本稿では,Swin TransformerとマルチスケールMLP,すなわちSTM-UNetに基づく効率的なU字型アーキテクチャを提案する。特に、スウィントランスブロックは、残留接続の形でstm-unetの接続をスキップするために追加され、グローバル特徴のモデリング能力と長距離依存性を高めることができる。一方,並列畳み込みモジュールを備えた新しいpcas-mlpは,セグメンテーション性能の向上に寄与するため,アーキテクチャのボトルネックとして設計・実装されている。 isic 2016とisic 2018の実験結果は,提案手法の有効性を示している。また,本手法はIoUとDiceの観点から,最先端の手法よりも優れている。提案手法は,高セグメンテーション精度と低モデル複雑性とのトレードオフを向上した。 Automated medical image segmentation can assist doctors to diagnose faster and more accurate. Deep learning based models for medical image segmentation have made great progress in recent years. However, the existing models fail to effectively leverage Transformer and MLP for improving U-shaped architecture efficiently. In addition, the multi-scale features of the MLP have not been fully extracted in the bottleneck of U-shaped architecture. In this paper, we propose an efficient U-shaped architecture based on Swin Transformer and multi-scale MLP, namely STM-UNet. Specifically, the Swin Transformer block is added to skip connection of STM-UNet in form of residual connection, which can enhance the modeling ability of global features and long-range dependency. Meanwhile, a novel PCAS-MLP with parallel convolution module is designed and placed into the bottleneck of our architecture to contribute to the improvement of segmentation performance. The experimental results on ISIC 2016 and ISIC 2018 demonstrate the effectiveness of our proposed method. Our method also outperforms several state-of-the-art methods in terms of IoU and Dice. Our method has achieved a better trade-off between high segmentation accuracy and low model complexity.	翻訳日:2023-04-26 21:40:19 公開日:2023-04-25
# ゼーマン場におけるスピン軌道結合を持つ量子磁石中のマグノンのボース・アインシュタイン凝縮 Bose-Einstein condensations of magnons in quantum magnets with spin-orbit coupling in a Zeeman field ( http://arxiv.org/abs/2304.12612v1 ) ライセンス: Link先を確認	Fadi Sun and Jinwu Ye	(参考訳) スピン軌道結合(soc)を有する量子磁石のゼーマン場への応答を,効果的な動作の構築と再正規化群(rg)解析により検討した。低$ h_{c1} $ と上限臨界場 $ h_{c2} $ の量子相転移には、コンペンサート (c-) または in-commensurate (ic-) でのマグノン凝縮によって駆動される新しいクラスが存在する。中間IC-スカイミオン結晶(IC-SkX)相は、k_0$とラベルされたRGフローの固定点の列によって制御される。我々は、IC-SkX相のスピン軌道構造を決定する実効作用の量子スピンと順序パラメータの関係を導出する。また、ic-skx内のエキゾチック励起スペクトルを決定する$ h_{c1} $および$ h_{c2} $付近の演算子内容も分析した。 c-およびic-momentaにおけるマグノン凝縮の差を考察した。 2つの臨界場 $ h_{c1} < h_{c2} $ と中間 IC-SkX 相はゼーマン場における SOC を持つ任意の量子磁石の一般的な特徴である。ゼーマン場におけるSOCを含むいくつかの材料や低温原子系への実験的含意を示す。 We study the response of a quantum magnet with spin-orbit coupling (SOC) to a Zeeman field by constructing effective actions and performing Renormalization Group (RG) analysis. There are several novel classes of quantum phase transitions at a low $ h_{c1} $ and an upper critical field $ h_{c2} $ driven by magnon condensations at commensurate (C-) or in-commensurate (IC-) momenta $ 0 < k_0 < \pi $. The intermediate IC- Skyrmion crystal (IC-SkX) phase is controlled by a line of fixed points in the RG flows labeled by $ k_0 $. We derive the relations between the quantum spin and the order parameters of the effective actions which determine the spin-orbital structures of the IC-SkX phase. We also analyze the operator contents near $ h_{c1} $ and $ h_{c2} $ which determine the exotic excitation spectra inside the IC-SkX. The intrinsic differences between the magnon condensations at the C- and IC- momenta are explored. The two critical fields $ h_{c1} < h_{c2} $ and the intermediate IC-SkX phase could be a generic feature to any quantum magnets with SOC in a Zeeman field. Experimental implications to some materials or cold atom systems with SOC in a Zeeman field are presented.	翻訳日:2023-04-26 21:40:00 公開日:2023-04-25
# 不確かさ・劣化ヒステリシスシステムのモデリングのための双方向DeepONetアプローチ A Bi-fidelity DeepONet Approach for Modeling Uncertain and Degrading Hysteretic Systems ( http://arxiv.org/abs/2304.12609v1 ) ライセンス: Link先を確認	Subhayan De and Patrick T. Brewick	(参考訳) ヒステリシスの低下のような非線形系は、工学的応用においてしばしば発生する。また,不確実性の存在やそのようなシステムのモデル化がますます困難になっている。一方で、劣化効果の性質を知らずに開発された原始モデルからのデータセットを容易に得ることができる。本稿では,ヒステリティックシステムの劣化効果を,ディープオペレータネットワーク(deeponet)を訓練する真のシステムの動作の重要な特性の多くをキャプチャする低忠実性表現として考慮せずに,原始モデルからのデータセットを使用する。 3つの数値例を用いて,低忠実度モデルと真のシステムの応答との差をモデル化するためにdeeponetsを用いた場合,モデルパラメータの不確実性が存在する場合の予測誤差が大幅に向上することを示す。 Nonlinear systems, such as with degrading hysteretic behavior, are often encountered in engineering applications. In addition, due to the ubiquitous presence of uncertainty and the modeling of such systems becomes increasingly difficult. On the other hand, datasets from pristine models developed without knowing the nature of the degrading effects can be easily obtained. In this paper, we use datasets from pristine models without considering the degrading effects of hysteretic systems as low-fidelity representations that capture many of the important characteristics of the true system's behavior to train a deep operator network (DeepONet). Three numerical examples are used to show that the proposed use of the DeepONets to model the discrepancies between the low-fidelity model and the true system's response leads to significant improvements in the prediction error in the presence of uncertainty in the model parameters for degrading hysteretic systems.	翻訳日:2023-04-26 21:39:29 公開日:2023-04-25
# 医療保険コスト予測における回帰モデルの性能評価 Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance ( http://arxiv.org/abs/2304.12605v1 ) ライセンス: Link先を確認	Jonelle Angelo S. Cenita, Paul Richie F. Asuncion, Jayson M. Victoriano	(参考訳) 本研究は,医療保険の費用予測における回帰モデルの性能評価を目的とした。機械学習における3つの回帰モデル、すなわち線形回帰、グラディエントブースティング、サポートベクトルマシンが使用された。性能はRMSE(Root Mean Square)、r2(R Square)、K-Fold Cross-validationを用いて評価される。この研究はまた、医療保険のコストを予測する上で最も重要な特徴を指摘し、データベース(KDD)プロセスにおける知識発見に依存している。 (KDD)プロセスとは、データから有用な知識を発見するプロセス全般を指す。その結果, 3つの回帰モデルのうち, 勾配ブースティングが最も高い r2 (r 平方) 0.892 と最低 rmse (根平均平方根) 1336.594 が得られた。さらに,3つの回帰モデルのr2(R角)結果と10Foldクロスバリデーション重み付き平均値の有意差は認められなかった。また、記述統計のボックスプロットを用いた探索データ解析(eda)では、電荷と喫煙者の特徴として、あるグループの中央値が他のグループのボックスの外側にあることが観察されているため、2つのグループの間には違いがある。グラディエント・ブースティングは3つの回帰モデルでより良い性能を発揮すると結論付けている。 K-Fold Cross-Validation は、3つの回帰モデルが良いと結論付けた。さらに、記述統計のボックスプロットを用いた探索データ分析(EDA)では、最も高い料金は喫煙者の特徴によるものであると断定する。 The study aimed to evaluate the regression models' performance in predicting the cost of medical insurance. The Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used. The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation. The study also sought to pinpoint the feature that would be most important in predicting the cost of medical insurance.The study is anchored on the knowledge discovery in databases (KDD) process. (KDD) process refers to the overall process of discovering useful knowledge from data. It show the performance evaluation results reveal that among the three (3) Regression models, Gradient boosting received the highest r2 (R Square) 0.892 and the lowest RMSE (Root Mean Square) 1336.594. Furthermore, the 10-Fold Cross-validation weighted mean findings are not significantly different from the r2 (R Square) results of the three (3) regression models. In addition, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics observed that in the charges and smoker features the median of one group lies outside of the box of the other group, so there is a difference between the two groups. It concludes that Gradient boosting appears to perform better among the three (3) regression models. K-Fold Cross-Validation concluded that the three (3) regression models are good. Moreover, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics ceases that the highest charges are due to the smoker feature.	翻訳日:2023-04-26 21:39:15 公開日:2023-04-25
# 時間知識グラフ推論のための適応パスメモリネットワーク Adaptive Path-Memory Network for Temporal Knowledge Graph Reasoning ( http://arxiv.org/abs/2304.12604v1 ) ライセンス: Link先を確認	Hao Dong, Zhiyuan Ning, Pengyang Wang, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou, Yanjie Fu	(参考訳) 時間的知識グラフ(TKG)推論は、歴史情報に基づく未来の行方不明事実の予測を目的としており、近年研究の関心が高まっている。推論作業の歴史的構造と時間的特性をモデル化するための多くの作品が作成されている。ほとんどの既存の作業は、主にエンティティ表現に依存するグラフ構造をモデル化している。しかしながら、現実のシナリオにおけるtkgエンティティの大きさは相当であり、時間が経つにつれて新しいエンティティが増えている。そこで本研究では,問合せ対象と各対象候補間の時間的経路情報を履歴時間を通して適応的にモデル化する適応パスメモリネットワーク(daemon)というtkgの特徴を持つ新しいアーキテクチャモデルを提案する。実体表現に頼らずに歴史的な情報をモデル化する。具体的には、DaeMonはパスメモリを使用して、隣接するタイムスタンプ間のメモリパス戦略を考慮して、パス集約ユニットから得られた時間パス情報を記録する。実世界の4つのTKGデータセットで実施された大規模な実験により、提案モデルが大幅に性能向上し、MRRにおいて最大4.8%の絶対値を達成することを示した。 Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently. Lots of works have been made to model the historical structural and temporal characteristics for the reasoning task. Most existing works model the graph structure mainly depending on entity representation. However, the magnitude of TKG entities in real-world scenarios is considerable, and an increasing number of new entities will arise as time goes on. Therefore, we propose a novel architecture modeling with relation feature of TKG, namely aDAptivE path-MemOry Network (DaeMon), which adaptively models the temporal path information between query subject and each object candidate across history time. It models the historical information without depending on entity representation. Specifically, DaeMon uses path memory to record the temporal path information derived from path aggregation unit across timeline considering the memory passing strategy between adjacent timestamps. Extensive experiments conducted on four real-world TKG datasets demonstrate that our proposed model obtains substantial performance improvement and outperforms the state-of-the-art up to 4.8% absolute in MRR.	翻訳日:2023-04-26 21:38:49 公開日:2023-04-25
# 深層学習は純粋数学者にとって有用なツールか? Is deep learning a useful tool for the pure mathematician? ( http://arxiv.org/abs/2304.12602v1 ) ライセンス: Link先を確認	Geordie Williamson	(参考訳) 純粋数学者がディープラーニングのツールを研究で使う際に期待するものを、個人的および非公式に説明します。 A personal and informal account of what a pure mathematician might expect when using tools from deep learning in their research.	翻訳日:2023-04-26 21:38:28 公開日:2023-04-25
# セグメンテーション・オールモデルの土木インフラ欠陥評価への適用 Application of Segment Anything Model for Civil Infrastructure Defect Assessment ( http://arxiv.org/abs/2304.12600v1 ) ライセンス: Link先を確認	Mohsen Ahmadi, Ahmad Gholizadeh Lonbar, Abbas Sharifi, Ali Tarlani Beris, Mohammadsadegh Nouri, Amir Sharifzadeh Javidi	(参考訳) 本研究では,コンクリート構造物のひび割れ検出のための2つの深層学習モデルSAMとU-Netの性能評価を行う。その結果, 各モデルにはそれぞれ, 異なる種類のひび割れを検出するための強みと限界があることがわかった。 SAMのユニークな亀裂検出手法を用いて、画像は亀裂の位置を識別する様々な部分に分割され、縦断裂の検出をより効果的にする。一方、U-Netモデルは正のラベル画素を識別し、スポーリングクラックのサイズと位置を正確に検出する。両モデルを組み合わせることで、より正確で包括的なき裂検出結果が得られる。コンクリート構造物の安全性と長寿命を確保するためには, ひび割れ検出に先進技術を用いることが重要である。本研究は, 橋梁, 建物, 道路など, 各種コンクリート構造物にSAMおよびU-Netモデルを用いることで, ひび割れ検出の精度と効率を向上し, 維持・修理に要する時間と資源を削減できることから, 土木工学に重要な意味を持つ可能性がある。結論として,本研究で提示されたSAMおよびU-Netモデルは,コンクリート構造物のひび割れを検知し,より正確かつ包括的な結果をもたらすような両モデルの強度を活用する,有望なソリューションを提供する。 This research assesses the performance of two deep learning models, SAM and U-Net, for detecting cracks in concrete structures. The results indicate that each model has its own strengths and limitations for detecting different types of cracks. Using the SAM's unique crack detection approach, the image is divided into various parts that identify the location of the crack, making it more effective at detecting longitudinal cracks. On the other hand, the U-Net model can identify positive label pixels to accurately detect the size and location of spalling cracks. By combining both models, more accurate and comprehensive crack detection results can be achieved. The importance of using advanced technologies for crack detection in ensuring the safety and longevity of concrete structures cannot be overstated. This research can have significant implications for civil engineering, as the SAM and U-Net model can be used for a variety of concrete structures, including bridges, buildings, and roads, improving the accuracy and efficiency of crack detection and saving time and resources in maintenance and repair. In conclusion, the SAM and U-Net model presented in this study offer promising solutions for detecting cracks in concrete structures and leveraging the strengths of both models that can lead to more accurate and comprehensive results.	翻訳日:2023-04-26 21:38:25 公開日:2023-04-25
# 変圧器とUNetの深層学習モデルによる舗装き裂の検出 Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet ( http://arxiv.org/abs/2304.12596v1 ) ライセンス: Link先を確認	Yu Zhang and Lin Zhang	(参考訳) 破壊は、建物や道路などのエンジニアリング構造の主要な破壊モードの1つである。表面き裂の効果的検出は損傷評価と構造維持に重要である。近年,深層学習技術の出現と発展により,表面き裂検出が容易になる可能性が示唆されている。現在、ほとんどのタスクは畳み込みニューラルネットワーク(CNN)によって実行されており、CNNの制限は、最近導入されたトランスフォーマーアーキテクチャによって改善される可能性がある。本研究では, モデル精度, 計算複雑性, モデル安定性により, 舗装面き裂検出の性能を評価するための9つの有望モデルについて検討した。亀裂ラベル付き224×224ピクセルの711画像を作成し、最適損失関数を選択し、検証データセットとテストデータセットの評価指標を比較し、データ詳細を分析し、各モデルのセグメンテーション結果を確認した。一般に、トランスフォーマーベースのモデルは、トレーニングプロセス中に収束しやすく、精度が高いが、通常、メモリ消費が増加し、処理効率が低下する。 9つのモデルのうち、スウィノネットは他の2つのトランスフォーマーよりも優れており、9つのモデルの中で最も高い精度を示している。その結果,様々な深層学習モデルによる表面き裂検出に光を当て,今後の応用の指針を提供する必要がある。 Fracture is one of the main failure modes of engineering structures such as buildings and roads. Effective detection of surface cracks is significant for damage evaluation and structure maintenance. In recent years, the emergence and development of deep learning techniques have shown great potential to facilitate surface crack detection. Currently, most reported tasks were performed by a convolutional neural network (CNN), while the limitation of CNN may be improved by the transformer architecture introduced recently. In this study, we investigated nine promising models to evaluate their performance in pavement surface crack detection by model accuracy, computational complexity, and model stability. We created 711 images of 224 by 224 pixels with crack labels, selected an optimal loss function, compared the evaluation metrics of the validation dataset and test dataset, analyzed the data details, and checked the segmentation outcomes of each model. We find that transformer-based models generally are easier to converge during the training process and have higher accuracy, but usually exhibit more memory consumption and low processing efficiency. Among nine models, SwinUNet outperforms the other two transformers and shows the highest accuracy among nine models. The results should shed light on surface crack detection by various deep-learning models and provide a guideline for future applications in this field.	翻訳日:2023-04-26 21:38:02 公開日:2023-04-25
# 医用画像のためのジェネリストビジョン基礎モデル:ゼロショット・メディカル・セグメンテーションにおけるセグメンテーションモデルの一事例 Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation ( http://arxiv.org/abs/2304.12637v1 ) ライセンス: Link先を確認	Peilun Shi, Jianing Qiu, Sai Mu Dalike Abaxi, Hao Wei, Frank P.-W. Lo, Wu Yuan	(参考訳) 医用画像における最近のsegment anything model (sam) について検討し、光学コヒーレンス断層撮影(oct)、磁気共鳴画像(mri)、ct(ct)などの様々な画像モダリティをカバーする9つの医用画像分割ベンチマークの定量的・定性的ゼロショットセグメンテーション結果と、皮膚科、眼科、放射線科の異なる応用について報告する。実験の結果,samは一般領域の画像,例えば医用画像などではゼロショットセグメンテーション性能は限定されているものの,一般領域の画像では見事なセグメンテーション性能を示していることが明らかとなった。さらに、SAMは、異なる未知の医療領域で異なるゼロショットセグメンテーション性能を示した。例えば、0.8704の平均ダイススコアは、網膜オクターのブラッハ膜層下のセグメンテーションで、セグメンテーション精度は、網膜色素上皮のセグメンテーション時に0.0688に低下する。血管などの特定の構造的標的ではSAMのゼロショットセグメンテーションは完全に失敗したが、少量のデータによる単純な微調整はセグメンテーションの品質を著しく改善する可能性がある。本研究は,医用画像における特定の課題を解くための汎用的ビジョン基盤モデルの汎用性を示し,また,様々な医療データセットにアクセスし,医療領域の複雑さを克服する上での課題に対処する上で,必要なパフォーマンスを実現する大きな可能性を示した。 We examine the recent Segment Anything Model (SAM) on medical images, and report both quantitative and qualitative zero-shot segmentation results on nine medical image segmentation benchmarks, covering various imaging modalities, such as optical coherence tomography (OCT), magnetic resonance imaging (MRI), and computed tomography (CT), as well as different applications including dermatology, ophthalmology, and radiology. Our experiments reveal that while SAM demonstrates stunning segmentation performance on images from the general domain, for those out-of-distribution images, e.g., medical images, its zero-shot segmentation performance is still limited. Furthermore, SAM demonstrated varying zero-shot segmentation performance across different unseen medical domains. For example, it had a 0.8704 mean Dice score on segmenting under-bruch's membrane layer of retinal OCT, whereas the segmentation accuracy drops to 0.0688 when segmenting retinal pigment epithelium. For certain structured targets, e.g., blood vessels, the zero-shot segmentation of SAM completely failed, whereas a simple fine-tuning of it with small amount of data could lead to remarkable improvements of the segmentation quality. Our study indicates the versatility of generalist vision foundation models on solving specific tasks in medical imaging, and their great potential to achieve desired performance through fine-turning and eventually tackle the challenges of accessing large diverse medical datasets and the complexity of medical domains.	翻訳日:2023-04-26 21:32:50 公開日:2023-04-25
# 教師なし人物再識別のためのカメラ内類似性を用いた擬似ラベル再構成 Pseudo Labels Refinement with Intra-camera Similarity for Unsupervised Person Re-identification ( http://arxiv.org/abs/2304.12634v1 ) ライセンス: Link先を確認	Pengna Li, Kangyi Wu, Sanping Zhou.Qianxin Huang, Jinjun Wang	(参考訳) unsupervised person re-id(re-id)は、個人識別ラベルなしで、カメラ間で人物画像を取得することを目的としている。ほとんどのクラスタリングベースの方法は、画像の特徴をクラスタに大まかに分割し、異なるカメラ間のドメインシフトによる特徴分布ノイズを無視する。この課題に対処するために,カメラ内類似性をクラスタリングした新しいラベルリファインメントフレームワークを提案する。カメラ内特徴分布は、歩行者やラベルの出現により多くの注意を払っている。我々は各カメラにそれぞれローカルクラスタを取得するためのカメラ内トレーニングを行い、各カメラ間クラスタを局所的な結果で洗練する。したがって、信頼性の高い擬似ラベルを自己ペーストした方法でRe-IDモデルをトレーニングする。実験により,提案手法が最先端性能を上回ることを示した。 Unsupervised person re-identification (Re-ID) aims to retrieve person images across cameras without any identity labels. Most clustering-based methods roughly divide image features into clusters and neglect the feature distribution noise caused by domain shifts among different cameras, leading to inevitable performance degradation. To address this challenge, we propose a novel label refinement framework with clustering intra-camera similarity. Intra-camera feature distribution pays more attention to the appearance of pedestrians and labels are more reliable. We conduct intra-camera training to get local clusters in each camera, respectively, and refine inter-camera clusters with local results. We hence train the Re-ID model with refined reliable pseudo labels in a self-paced way. Extensive experiments demonstrate that the proposed method surpasses state-of-the-art performance.	翻訳日:2023-04-26 21:32:18 公開日:2023-04-25
# PUNR:ニュースレコメンデーションのためのユーザ行動モデリングによる事前学習 PUNR: Pre-training with User Behavior Modeling for News Recommendation ( http://arxiv.org/abs/2304.12633v1 ) ライセンス: Link先を確認	Guangyuan Ma, Hongtao Liu, Xing Wu, Wanhui Qian, Zhepeng Lv, Qing Yang, Songlin Hu	(参考訳) ニュースレコメンデーションは、ユーザーの行動に基づいてクリック行動を予測することを目的としている。ユーザの表現を効果的にモデル化する方法は、望ましいニュースを推奨するキーとなる。既存の作品は、主に監督された微調整段階の改善に焦点を当てている。しかし、ユーザ表現に最適化された PLM ベースの教師なし事前学習手法がまだ存在しない。本研究では,ユーザ行動マスキングとユーザ行動生成という2つのタスクを備えた教師なし事前学習パラダイムを提案する。まず,ユーザ行動マスキング事前学習タスクを導入し,その状況行動に基づいてマスキングユーザ行動の復元を行う。このようにして、このモデルはより強く、より包括的なユーザーニュースリーディングパターンを捉えることができる。さらに,ユーザエンコーダから派生したユーザ表現ベクトルを強化するために,新しいユーザ行動生成事前学習タスクを導入する。上記の事前学習したユーザモデリングエンコーダを用いて、下流の微調整でニュースやユーザ表現を得る。実世界のニュースベンチマークの評価では、既存のベースラインよりも大幅にパフォーマンスが向上している。 News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.	翻訳日:2023-04-26 21:32:04 公開日:2023-04-25
# 私がbm25のように説明する:密集したモデルのランクリストをスパース近似で解釈する Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation ( http://arxiv.org/abs/2304.12631v1 ) ライセンス: Link先を確認	Michael Llordes, Debasis Ganguly, Sumit Bhatia and Chirag Agarwal	(参考訳) ニューラル検索モデル (NRM) は、密集した文書表現を通して意味的意味を捉える能力により、統計的に優れていることが示されている。しかしこれらのモデルは、明示的な項マッチングに依存しないため、解釈性に乏しい。局所的なクエリごとの説明の一形態として,NAMの結果とスパース検索システムの結果集合との類似性を最大化することによって生成される等価クエリの概念を導入する。このアプローチをrm3ベースのクエリ拡張や検索効率のコントラストの違い、および各アプローチによって生成された用語と比較する。 Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system with the equivalent query. We then compare this approach with existing methods such as RM3-based query expansion and contrast differences in retrieval effectiveness and in the terms generated by each approach.	翻訳日:2023-04-26 21:31:50 公開日:2023-04-25
# 都市全体の大気汚染予測のための時空間グラフ畳み込みニューラルネットワークモデル Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting ( http://arxiv.org/abs/2304.12630v1 ) ライセンス: Link先を確認	Van-Duc Le	(参考訳) 都市全体の大気汚染予測は、市全体の大気質を正確に予測しようと試みている。大気汚染は時空間的に変化し、多くの複雑な要因に依存するため、この問題は解決される。過去の研究では,都市全体をイメージとして考慮し,コンボリューショナル・ロング・短期記憶(ConvLSTM)モデルを用いて時空間の特徴を学習することで,この問題を解決している。しかし、大気汚染やその他の影響要因が自然グラフ構造を持つため、画像に基づく表現は理想的ではないかもしれない。本研究では, グラフ畳み込みネットワーク (gcn) が都市全体の空気品質を読み取る空間的特徴を効率的に表現できることを考察する。具体的には、GCNアーキテクチャをRNN構造に密に統合し、空気質値とその影響因子の時空間特性を効率よく学習することにより、ConvLSTMモデルを時空間グラフ畳み込みリカレントニューラルネットワーク(時空間GCRNN)モデルに拡張する。提案手法は, 大気汚染予測のための最新のConvLSTMモデルと比較して, パラメータの数ははるかに少ないが, 優れた性能を示す。また,本手法は,実世界の大気汚染データセットにおけるハイブリッドgcn法よりも優れている。 Citywide Air Pollution Forecasting tries to precisely predict the air quality multiple hours ahead for the entire city. This topic is challenged since air pollution varies in a spatiotemporal manner and depends on many complicated factors. Our previous research has solved the problem by considering the whole city as an image and leveraged a Convolutional Long Short-Term Memory (ConvLSTM) model to learn the spatiotemporal features. However, an image-based representation may not be ideal as air pollution and other impact factors have natural graph structures. In this research, we argue that a Graph Convolutional Network (GCN) can efficiently represent the spatial features of air quality readings in the whole city. Specially, we extend the ConvLSTM model to a Spatiotemporal Graph Convolutional Recurrent Neural Network (Spatiotemporal GCRNN) model by tightly integrating a GCN architecture into an RNN structure for efficient learning spatiotemporal characteristics of air quality values and their influential factors. Our extensive experiments prove the proposed model has a better performance compare to the state-of-the-art ConvLSTM model for air pollution predicting while the number of parameters is much smaller. Moreover, our approach is also superior to a hybrid GCN-based method in a real-world air pollution dataset.	翻訳日:2023-04-26 21:31:39 公開日:2023-04-25
# ヤンミルズ方程式に基づく角運動波の予測 Predicting Angular-Momentum Waves Based on Yang-Mills Equation ( http://arxiv.org/abs/2304.12625v1 ) ライセンス: Link先を確認	Xing-Yan Fan, Xiang-Ru Xie, and Jing-Ling Chen	(参考訳) 物理学における最もエレガントな理論の1つとして、ヤン=ミルズ理論は古典的な電磁現象を統一するマクスウェルの方程式を取り入れるだけでなく、電弱と強い相互作用を簡潔に説明する標準模型を基礎としている。アービアン$U(1)$の場合、電磁場はヤン・ミルズ方程式の最も単純な古典解である。それにもかかわらず、最も単純な量子状態、すなわち、マクスウェルの非可換ポテンシャルを持つ方程式における「磁気」と「電気」の場の考察について、多くの研究がなされている。マクスウェル方程式によって予測される電磁波と同様に、最も単純なyang-mills方程式の量子解はsu(2)角運動量波を予測できる。このような角運動量波は、スピン角運動量(ディラック電子の'spin zitterbewegung''のような)の振動の実験で実現可能である。 As one of the most elegant theories in physics, Yang-Mills theory not only incorporates Maxwell's equations unifying the classical electromagnetic phenomena, but also underpins the standard model explaining the electroweak and strong interactions in a succinct way. As an Abelian $U(1)$ case, the electromagnetic field is the simplest classical solution of Yang-Mills equation. Notwithstanding, there is a paucity of studies about the simplest quantum situation, namely the consideration of the ``magnetic'' and ``electric'' fields in Maxwell's equations with non-Abelian potentials, which is exactly the staple of our present work. Akin to the electromagnetic waves predicted by Maxwell's equations, the quantum solution of the simplest Yang-Mills equation may predict the SU(2) angular-momentum waves. Such angular-momentum waves can be possibly realized in the experiments with oscillations of the spin angular momentum (such as the ``spin Zitterbewegung'' of Dirac's electron).	翻訳日:2023-04-26 21:31:19 公開日:2023-04-25
# 形状ネット:3次元形状の知識蒸留を付加入力として用いたパノラマ画像からのルームレイアウト推定 Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs ( http://arxiv.org/abs/2304.12624v1 ) ライセンス: Link先を確認	Mizuki Tabata, Kana Kurata, Junichiro Tamamatsu	(参考訳) パノラマ画像から部屋のレイアウトを推定することは、バーチャル/拡張現実と家具レイアウトシミュレーションにおいて重要である。これは、角や境界の位置などの3次元(3D)幾何を識別し、3D再構成を行う。しかし,オクルージョンは部屋のレイアウト推定に悪影響を及ぼしうる一般的な問題であり,これまでは十分に研究されていない。建物の図面として部屋の3次元形状情報とコーナーの座標を画像データセットから得ることができるので,2次元パノラマ情報と3次元情報の両方をモデルに提供して咬合を効果的に処理することを提案する。しかし、モデルに3d情報を送るだけでは、遮蔽領域の形状情報を利用するには不十分である。そこで我々は、3D情報を有効に活用するために、3Dインターセクション・オーバー・ユニオン(IoU)ロスを導入した。図面が手に入らない場合や、図面から逸脱した場合もある。そこで本研究では,画像と3次元情報の両方を訓練したモデルから,画像のみを入力とするモデルへ知識を抽出する手法を提案する。提案モデルはShape-Netと呼ばれ,ベンチマークデータセット上での最先端(SOTA)性能を実現する。また, 既存のモデルと比較して咬合像の精度が有意に向上し, 咬合処置の有効性も確認した。 Estimating the layout of a room from a single-shot panoramic image is important in virtual/augmented reality and furniture layout simulation. This involves identifying three-dimensional (3D) geometry, such as the location of corners and boundaries, and performing 3D reconstruction. However, occlusion is a common issue that can negatively impact room layout estimation, and this has not been thoroughly studied to date. It is possible to obtain 3D shape information of rooms as drawings of buildings and coordinates of corners from image datasets, thus we propose providing both 2D panoramic and 3D information to a model to effectively deal with occlusion. However, simply feeding 3D information to a model is not sufficient to utilize the shape information for an occluded area. Therefore, we improve the model by introducing 3D Intersection over Union (IoU) loss to effectively use 3D information. In some cases, drawings are not available or the construction deviates from a drawing. Considering such practical cases, we propose a method for distilling knowledge from a model trained with both images and 3D information to a model that takes only images as input. The proposed model, which is called Shape-Net, achieves state-of-the-art (SOTA) performance on benchmark datasets. We also confirmed its effectiveness in dealing with occlusion through significantly improved accuracy on images with occlusion compared with existing models.	翻訳日:2023-04-26 21:30:55 公開日:2023-04-25
# Rydberg-dressedatomによるハドロン状態の量子シミュレーション Quantum simulation of hadronic states with Rydberg-dressed atoms ( http://arxiv.org/abs/2304.12623v1 ) ライセンス: Link先を確認	Zihan Wang, Feiyang Wang, Joseph Vovrosh, Johannes Knolle, Florian Mintert and Rick Mukherjee	(参考訳) 閉じ込め現象は高エネルギー物理学でよく知られており、一次元量子スピン鎖の低エネルギー領域壁励起に対しても実現可能である。 2つのドメイン壁からなるバウンド状態は中間子のように振る舞うことができ、最近のvovrosh et alの作品ではそうである。 [PRX Quantum 3, 040309 (2022)], 一対の中間子がハドロン状態に類似したメタ安定閉じ込め誘起境界状態(4つのドメイン壁からなる)を動的に形成できることが実証された。しかし、このプロトコルはVovroshらで議論された。 [prx量子3,040309 (2022)] 特性的に非単調な距離依存性を持つ相互作用の使用は、自然界では容易ではないため、実験的な実現への挑戦となる。この点において、リドバーグ原子は閉じ込め関連物理学をシミュレートするために必要なプラットフォームを提供することができる。一次元の逆場イジングモデルに対するスピン-スピン相互作用を工学するために、Rydberg-dressed 原子を相互作用させることによって得られる柔軟性を利用する。我々の数値シミュレーションは、Rydberg-dressedの相互作用がハドロン生成に適する様々な有効なポテンシャルをもたらすことを示しており、現在の捕捉イオン実験の代替として、Rydbergプラットフォームによる閉じ込め物理学をシミュレートする可能性を開く。 The phenomenon of confinement is well known in high-energy physics and can also be realized for low-energy domain-wall excitations in one-dimensional quantum spin chains. A bound state consisting of two domain-walls can behave like a meson, and in a recent work of Vovrosh et al. [PRX Quantum 3, 040309 (2022)] , it was demonstrated that a pair of mesons could dynamically form a meta-stable confinement-induced bound state (consisting of four domain-walls) akin to a hadronic state. However, the protocol discussed in Vovrosh et al. [PRX Quantum 3, 040309 (2022)] involving the use of interactions with characteristically non-monotonic distance dependence is not easy to come by in nature, thus, posing a challenge for its experimental realization. In this regard, Rydberg atoms can provide the required platform for simulating confinement-related physics. We exploit the flexibility offered by interacting Rydberg-dressed atoms to engineering modified spin-spin interactions for the one-dimensional transverse field Ising model. Our numerical simulations show how Rydberg-dressed interactions can give rise to a variety of effective potentials that are suitable for hadron formation, which opens the possibility of simulating confinement physics with Rydberg platforms as a viable alternative to current trapped-ion experiments.	翻訳日:2023-04-26 21:30:13 公開日:2023-04-25
# 刈り取った視覚モデルのバイアス : 奥行き解析と対策 Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures ( http://arxiv.org/abs/2304.12622v1 ) ライセンス: Link先を確認	Eugenia Iofinova, Alexandra Peste, Dan Alistarh	(参考訳) プルーニング(pruning) - すなわち、ニューラルネットワークのパラメータのかなりのサブセットをゼロに設定する - は、モデル圧縮の最も一般的な方法の1つである。しかし、最近のいくつかの研究は、プルーニングが圧縮モデルの出力にバイアスを誘導または悪化させる可能性があるという問題を提起している。この現象の既存の証拠にもかかわらず、ニューラルネットワークのプルーニングと誘導バイアスの関係はよく理解されていない。本研究では,コンピュータビジョンのための畳み込みニューラルネットワークにおいて,この現象を系統的に研究し,特徴付ける。第一に, 密度の高いモデルに比べて精度が低下せず, バイアスが著しく増加するような, 10%未満の残量で, 高いスパースモデルを得ることが可能であることを示す。同時に、高い空間では、プルーニングされたモデルは出力に高い不確実性を示し、相関性も増加し、バイアスの増加に直接関連していることもわかりました。本研究では,非圧縮モデルのみに基づいて,刈り込みによってバイアスが増大するかどうかを判定し,圧縮後の予測に最も影響を受けやすい試料を同定する。 Pruning - that is, setting a significant subset of the parameters of a neural network to zero - is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression.	翻訳日:2023-04-26 21:29:45 公開日:2023-04-25
# 医用samアダプタ : 医用画像分割のためのsegment anythingモデルの適用 Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation ( http://arxiv.org/abs/2304.12620v1 ) ライセンス: Link先を確認	Junde Wu and Rao Fu and Huihui Fang and Yuanpei Liu and Zhaowei Wang and Yanwu Xu and Yueming Jin and Tal Arbel	(参考訳) Segment Anything Model (SAM)は画像セグメンテーションの分野で最近人気を集めている。全面的なセグメンテーションタスクとプロンプトベースのインターフェースの素晴らしい機能のおかげで、SAMはコミュニティ内で激しい議論を巻き起こした。イメージセグメンテーションのタスクはSAMによって「完了」されたと多くの名高い専門家から言われている。しかし, イメージセグメンテーションは, イメージセグメンテーションファミリーの重要な分枝であるが, セグメンテーション"Anything"の範囲には含まれていないようである。多くの個人実験や最近の研究では、SAMは医療画像のセグメンテーションのサブパールを担っていることが示されている。自然な疑問は、SAMの強力なセグメンテーション能力を医療画像セグメンテーションに拡張するために、パズルの欠片を見つける方法である。本稿では、パラメータ効率のよい微調整パラダイムに従って事前学習したSAMモデルをAdapterで微調整することで、可能な解を提案する。この単純な実装は、医療画像のセグメンテーションにおいて驚くほど優れた性能を示しており、一般的なNLP技術であるAdapterをコンピュータビジョンのケースに転送する試みの1つだ。医用SAMアダプタ (MSA) は, CT, MRI, 超音波画像, 眼底画像, 皮膚内視鏡画像など, 様々な画像モダリティを有する19の医用画像セグメンテーションタスクにおいて, 優れた性能を示した。 MSAは、nnUNet、TransUNet、UNetr、MedSegDiffなど、幅広い最先端(SOTA)医療画像セグメンテーション手法より優れている。コードは、https://github.com/WuJunde/Medical-SAM-Adapter.comでリリースされる。 The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Thanks to its impressive capabilities in all-round segmentation tasks and its prompt-based interface, SAM has sparked intensive discussion within the community. It is even said by many prestigious experts that image segmentation task has been "finished" by SAM. However, medical image segmentation, although an important branch of the image segmentation family, seems not to be included in the scope of Segmenting "Anything". Many individual experiments and recent studies have shown that SAM performs subpar in medical image segmentation. A natural question is how to find the missing piece of the puzzle to extend the strong segmentation capability of SAM to medical image segmentation. In this paper, we present a possible solution by fine-tuning the pretrained SAM model following parameter-efficient fine-tuning paradigm with Adapter. Although this work is still one of a few to transfer the popular NLP technique Adapter to computer vision cases, this simple implementation shows surprisingly good performance on medical image segmentation. A medical image adapted SAM, which we have dubbed Medical SAM Adapter (MSA), shows superior performance on 19 medical image segmentation tasks with various image modalities including CT, MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, such as nnUNet, TransUNet, UNetr, MedSegDiff, and so on. Code will be released at: https://github.com/WuJunde/Medical-SAM-Adapter.	翻訳日:2023-04-26 21:29:25 公開日:2023-04-25
# パッチベース3次元自然シーン生成の一例 Patch-based 3D Natural Scene Generation from a Single Example ( http://arxiv.org/abs/2304.12670v1 ) ライセンス: Link先を確認	Weiyu Li, Xuelin Chen, Jue Wang, Baoquan Chen	(参考訳) 典型的にはユニークで複雑な自然シーンの3次元生成モデルを対象としている。必要な量のトレーニングデータの欠如と、様々なシーン特性の存在下でアドホックなデザインを持つことの難しさにより、既存の設定が難解になる。従来のパッチベースのイメージモデルに触発されて,パッチレベルでの3Dシーンの合成を提唱する。この研究の核心は、シーン表現と生成パッチが隣のモジュールに最も近い重要なアルゴリズム設計であり、古典的な2Dパッチベースのフレームワークから3D生成まで、ユニークな課題に対処する。これらのデザイン選択は、集合レベルでは、様々な模範的なシーンで示されるように、現実的な幾何学的構造と視覚的外観の両方を持つ高品質な一般的な自然のシーンを多種多様な量で生成できる、堅牢で効果的で効率的なモデルに寄与する。 We target a 3D generative model for general natural scenes that are typically unique and intricate. Lacking the necessary volumes of training data, along with the difficulties of having ad hoc designs in presence of varying scene characteristics, renders existing setups intractable. Inspired by classical patch-based image models, we advocate for synthesizing 3D scenes at the patch level, given a single example. At the core of this work lies important algorithmic designs w.r.t the scene representation and generative patch nearest-neighbor module, that address unique challenges arising from lifting classical 2D patch-based framework to 3D generation. These design choices, on a collective level, contribute to a robust, effective, and efficient model that can generate high-quality general natural scenes with both realistic geometric structure and visual appearance, in large quantities and varieties, as demonstrated upon a variety of exemplar scenes.	翻訳日:2023-04-26 21:21:41 公開日:2023-04-25
# 反事実的説明の相違:透明性がいかに欺くか Disagreement amongst counterfactual explanations: How transparency can be deceptive ( http://arxiv.org/abs/2304.12667v1 ) ライセンス: Link先を確認	Dieter Brughmans, Lissa Melis, David Martens	(参考訳) 事実的説明は、複雑な機械学習アルゴリズムのステークホルダーにデータ駆動決定の説明を提供するための説明可能な人工知能(XAI)技術として、ますます使われている。反事実的説明の人気は、それらを生成するアルゴリズムのブームをもたらした。しかし、全てのアルゴリズムが同じインスタンスに対して一様説明を生成するわけではない。いくつかの文脈において、複数の可能な説明が有用であるが、反事実的説明の多様性が利害関係者の間で意見の相違をもたらす状況がある。倫理的な問題は、例えば、悪意のあるエージェントがこの多様性を使用して、センシティブな特徴を隠すことによって不公平な機械学習モデルを公正にするときに発生する。世界中の議員は、データ駆動、高リスクの決定に関する説明の権利を政策に含める傾向があるため、これらの倫理的な問題を理解して対処すべきである。 XAIにおける不一致問題に関する文献レビューでは、この問題は実証的に反実的な説明のために評価されていないことが判明した。そこで本研究では,40のデータセットに対して,ブラックボックスモデル2つのモデルに対して,説明生成手法12を用いて大規模な実験分析を行い,1920000以上の説明を得た。本研究は,試験方法の相違点が著しく高いことを明らかにする。悪意のあるユーザは、複数の偽りの説明が利用可能であれば、望ましい機能を除外したり、含めたりすることができる。この不一致は、主にデータセットの特徴と反現実的アルゴリズムのタイプによって引き起こされているようだ。 XAIはアルゴリズムによる意思決定の透明性を重視しているが、我々の分析は、この自己宣言の透明性を主張する。 Counterfactual explanations are increasingly used as an Explainable Artificial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are beneficial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies, these ethical issues should be understood and addressed. Our literature review on the disagreement problem in XAI reveals that this problem has never been empirically assessed for counterfactual explanations. Therefore, in this work, we conduct a large-scale empirical analysis, on 40 datasets, using 12 explanation-generating methods, for two black-box models, yielding over 192.0000 explanations. Our study finds alarmingly high disagreement levels between the methods tested. A malicious user is able to both exclude and include desired features when multiple counterfactual explanations are available. This disagreement seems to be driven mainly by the dataset characteristics and the type of counterfactual algorithm. XAI centers on the transparency of algorithmic decision-making, but our analysis advocates for transparency about this self-proclaimed transparency	翻訳日:2023-04-26 21:21:24 公開日:2023-04-25
# ベイズ最適化と自己蒸留 Bayesian Optimization Meets Self-Distillation ( http://arxiv.org/abs/2304.12666v1 ) ライセンス: Link先を確認	HyunJae Lee, Heon Song, Hyeonsoo Lee, Gi-hyeon Lee, Suyeong Park and Donggeun Yoo	(参考訳) ベイズ最適化(BO)は、複数のトレーニング試験からの観察に基づいて、約束されるハイパーパラメータ構成を反復的に提案することにより、モデル性能の向上に大きく貢献している。しかし、前回の試験から得られた部分的な知識(すなわち、トレーニングされたモデルの性能とそのハイパーパラメータ構成)のみを転送する。一方、自己蒸留(SD)はタスクモデル自体から学んだ部分的知識のみを伝達する。すべてのトレーニングトライアルから得られた知識をフル活用するために,BOとSDを組み合わせたBOSSフレームワークを提案する。 BOSS は BO を通じて有望なハイパーパラメータ構成を提案し、従来の BO プロセスでは放棄されていた SD の以前の試行から事前訓練されたモデルを慎重に選択する。 BOSSは、一般的な画像分類、ノイズラベルによる学習、半教師付き学習、医療画像解析タスクなど、幅広いタスクにおいてBOとSDの両方よりもはるかに優れたパフォーマンスを実現している。 Bayesian optimization (BO) has contributed greatly to improving model performance by suggesting promising hyperparameter configurations iteratively based on observations from multiple training trials. However, only partial knowledge (i.e., the measured performances of trained models and their hyperparameter configurations) from previous trials is transferred. On the other hand, Self-Distillation (SD) only transfers partial knowledge learned by the task model itself. To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. BOSS suggests promising hyperparameter configurations through BO and carefully selects pre-trained models from previous trials for SD, which are otherwise abandoned in the conventional BO process. BOSS achieves significantly better performance than both BO and SD in a wide range of tasks including general image classification, learning with noisy labels, semi-supervised learning, and medical image analysis tasks.	翻訳日:2023-04-26 21:20:58 公開日:2023-04-25
# 統合困難な事前評価による動的ビデオフレーム補間 Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment ( http://arxiv.org/abs/2304.12664v1 ) ライセンス: Link先を確認	Ban Chen, Xin Jin, Youxin Chen, Longhai Wu, Jie Chen, Jayoon Koo, Cheul-hee Hahm	(参考訳) ビデオフレーム補間(vfi)は近年、大きな進歩を遂げている。既存のVFIモデルは、精度と効率の良好なトレードオフを達成するのに依然として苦労している。しかし、小さな動きや明瞭なテクスチャを持つ簡単なサンプルは単純なモデルで競合する結果を得ることができ、重い計算を必要としない。本稿では,難易度評価とビデオフレーム補間を組み合わせた統合パイプラインを提案する。具体的には、まずプレアセスメントモデルを利用して入力フレームの補間難度を測定し、次に動的に適切なVFIモデルを選択して補間結果を生成する。さらに、大規模なVFI困難度評価データセットを収集し、アノテートして、事前評価モデルをトレーニングする。大規模な実験により, 高速なサンプルを高速なモデルに通過させながら, 重いモデルによる予測が困難であることが確認された。 Video frame interpolation(VFI) has witnessed great progress in recent years. While existing VFI models still struggle to achieve a good trade-off between accuracy and efficiency: fast models often have inferior accuracy; accurate models typically run slowly. However, easy samples with small motion or clear texture can achieve competitive results with simple models and do not require heavy computation. In this paper, we present an integrated pipeline which combines difficulty assessment with video frame interpolation. Specifically, it firstly leverages a pre-assessment model to measure the interpolation difficulty level of input frames, and then dynamically selects an appropriate VFI model to generate interpolation results. Furthermore, a large-scale VFI difficulty assessment dataset is collected and annotated to train our pre-assessment model. Extensive experiments show that easy samples pass through fast models while difficult samples inference with heavy models, and our proposed pipeline can improve the accuracy-efficiency trade-off for VFI.	翻訳日:2023-04-26 21:20:43 公開日:2023-04-25
# 資源割当のためのロバスト深層強化学習のためのマルチタスクアプローチ A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation ( http://arxiv.org/abs/2304.12660v1 ) ライセンス: Link先を確認	Steffen Gracla, Carsten Bockelmann, Armin Dekorsy	(参考訳) 現代のコミュニケーションシステムの複雑さが増すにつれ、機械学習アルゴリズムは研究の焦点となっている。しかし、パフォーマンス要求は複雑さと並行して厳しくなっています。医療分野など、将来のワイヤレスをターゲットとするいくつかの重要なアプリケーションでは、厳格で信頼性の高いパフォーマンス保証が不可欠だが、バニラ機械学習手法はこの種の要件に対処することが示されている。そのため、このようなアプリケーションによる要求に対処するため、これらの手法を拡張できるかどうかが疑問視される。本稿では,稀で重要なイベントを適切に処理しなければならない組み合わせ資源配分問題について考察する。本稿では,これをマルチタスク学習問題として扱い,この領域から弾性重み強化と勾配エピソディックメモリという2つの方法を選択し,それらをバニラアクタ批判スケジューラに統合する。我々は、ブラックスワンイベントを扱う際の彼らのパフォーマンスと、トレーニングデータ分布を増強する最新技術を比較し、マルチタスクアプローチが極めて有効であることを報告した。 With increasing complexity of modern communication systems, machine learning algorithms have become a focal point of research. However, performance demands have tightened in parallel to complexity. For some of the key applications targeted by future wireless, such as the medical field, strict and reliable performance guarantees are essential, but vanilla machine learning methods have been shown to struggle with these types of requirements. Therefore, the question is raised whether these methods can be extended to better deal with the demands imposed by such applications. In this paper, we look at a combinatorial resource allocation challenge with rare, significant events which must be handled properly. We propose to treat this as a multi-task learning problem, select two methods from this domain, Elastic Weight Consolidation and Gradient Episodic Memory, and integrate them into a vanilla actor-critic scheduler. We compare their performance in dealing with Black Swan Events with the state-of-the-art of augmenting the training data distribution and report that the multi-task approach proves highly effective.	翻訳日:2023-04-26 21:20:25 公開日:2023-04-25
# CoDi:混合型語彙合成のためのコントラスト拡散モデル CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis ( http://arxiv.org/abs/2304.12654v1 ) ライセンス: Link先を確認	Chaejeong Lee, Jayoung Kim, Noseong Park	(参考訳) 近年、表データへの注目が高まり、様々なタスクに合成テーブルを適用する試みが様々なシナリオに向けて拡大されている。最近の生成モデリングの進歩により、表データ合成モデルによって生成された偽データは洗練され現実的になる。しかし、表データの離散変数(コラム)のモデル化は依然として困難である。本研究では,2つの拡散モデルを用いて連続変数と離散変数を別々に処理することを提案する。 2つの拡散モデルは、互いに読み合うことによって訓練中に共存する。さらに,拡散モデルをさらにバインドするために,負のサンプリング法を用いたコントラスト学習手法を導入する。実世界の11の表型データセットと8つのベースラインメソッドを用いた実験で,提案手法であるcodiの有効性を実証した。 With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.	翻訳日:2023-04-26 21:20:07 公開日:2023-04-25
# グラフ注意に基づく部分観測可能平均場多元強化学習 Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention ( http://arxiv.org/abs/2304.12653v1 ) ライセンス: Link先を確認	Min Yang, Guanjun Liu, Ziyuan Zhou	(参考訳) 従来のマルチエージェント強化学習アルゴリズムは大規模マルチエージェント環境では難しい。近年,平均場理論の導入により,マルチエージェント強化学習のスケーラビリティが向上している。本稿では、各エージェントが一定の範囲内で他のエージェントを観察できる部分観測可能マルチエージェント強化学習(MARL)について考察する。この部分的観測性は、エージェントが周囲のエージェントの行動の質を評価する能力に影響する。本稿では,より効果的な行動を選択するために,局所観測からより効果的な情報を取り出す手法の開発に着目する。この分野での以前の研究では、近傍エージェントの平均アクションを更新するために確率分布や重み付け平均場を用いるが、近隣エージェントの特徴情報を十分に考慮せず、局所最適となる。 In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. 我々は,MAgentsフレームワークにおける3つの課題に対してGAMFQを評価する。実験により、GAMFQは最先端の部分的に観測可能な平均場強化学習アルゴリズムを含むベースラインを上回っていることが示された。 Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.	翻訳日:2023-04-26 21:19:54 公開日:2023-04-25
# 動きブレアを有する大規模シーンのためのハイブリッドニューラルレンダリング Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur ( http://arxiv.org/abs/2304.12652v1 ) ライセンス: Link先を確認	Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi	(参考訳) 新規なビューイメージのレンダリングは多くのアプリケーションにとって非常に望ましい。近年の進歩にもかかわらず、不可避なアーティファクト(例えば、動きのぼかし)で、野生のイメージから大規模シーンの高忠実さとビュー一貫性を保った斬新なビューをレンダリングすることは、依然として困難である。そこで我々は,画像ベース表現とニューラル3D表現を結合して高品質なビュー一貫性画像を生成するハイブリッドなニューラルレンダリングモデルを開発した。さらに、野生で撮影された画像には、レンダリングされた画像の品質を劣化させる動きのぼやけなど、必然的に人工物が含まれている。そこで本研究では,画像のぼかし効果をシミュレートし,ぼやけた画像の悪影響を軽減し,事前計算した品質認識重みに基づいて学習中の重要度を低減させる手法を提案する。実データおよび合成データに関する広範な実験により,新しい視点合成のための最先端のポイントベース手法を超越したモデルが証明された。コードはhttps://daipengwa.github.io/hybrid-rendering-projectpageで入手できる。 Rendering novel view images is highly desirable for many applications. Despite recent progress, it remains challenging to render high-fidelity and view-consistent novel views of large-scale scenes from in-the-wild images with inevitable artifacts (e.g., motion blur). To this end, we develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images. Besides, images captured in the wild inevitably contain artifacts, such as motion blur, which deteriorates the quality of rendered images. Accordingly, we propose strategies to simulate blur effects on the rendered images to mitigate the negative influence of blurriness images and reduce their importance during training based on precomputed quality-aware weights. Extensive experiments on real and synthetic data demonstrate our model surpasses state-of-the-art point-based methods for novel view synthesis. The code is available at https://daipengwa.github.io/Hybrid-Rendering-ProjectPage.	翻訳日:2023-04-26 21:19:32 公開日:2023-04-25
# q ベース平衡 Q-based Equilibria ( http://arxiv.org/abs/2304.12647v1 ) ライセンス: Link先を確認	Olivier Compte (Paris School of Economics)	(参考訳) 動的環境において、q-learningは、各選択肢に関連する継続値の見積もり(q値)を提供する適応規則である。ナイーブポリシーは、常に高いQ値を持つ選択肢を選択することである。例えば、協力を優先する寛大さのバイアスを組み込んだルールなど、他のルールよりも体系的にいくつかの選択肢を好むようなqに基づく政策ルールのファミリーを考える。 Compte と Postlewaite [2018] の精神では、この Q ベースの規則の族の中で平衡バイアス(あるいは Qb-平衡)を求める。各種モニタリング技術による古典ゲームについて検討する。 In dynamic environments, Q-learning is an adaptative rule that provides an estimate (a Q-value) of the continuation value associated with each alternative. A naive policy consists in always choosing the alternative with highest Q-value. We consider a family of Q-based policy rules that may systematically favor some alternatives over others, for example rules that incorporate a leniency bias that favors cooperation. In the spirit of Compte and Postlewaite [2018], we look for equilibrium biases (or Qb-equilibria) within this family of Q-based rules. We examine classic games under various monitoring technologies.	翻訳日:2023-04-26 21:19:11 公開日:2023-04-25
# 変更検出に必要な情報:深い3dポイントのクラウド変更検出を改善する Change detection needs change information: improving deep 3D point cloud change detection ( http://arxiv.org/abs/2304.12639v1 ) ライセンス: Link先を確認	Iris de G\'elis (1 and 2), Thomas Corpetti (3) and S\'ebastien Lef\`evre (2) ((1) Magellium, (2) Institut de Recherche en Informatique et Syst\`emes Al\'eatoires IRISA - UMR 6074 - Universit\'e Bretagne Sud, (3) Littoral - Environnement - T\'el\'ed\'etection - G\'eomatique LETG - UMR 6554 - Universit\'e Rennes 2)	(参考訳) 変更検出は、特にマルチテンポラリデータに関して、変更領域を迅速に識別する重要なタスクである。都市環境などの複雑な地形では、垂直情報は変化をハイライトするだけでなく、異なるカテゴリーに分類するために非常に有用な知識であることが判明した。本稿では,生の3dポイントクラウド(pcs)を用いて,ラスタ化プロセスによる情報の損失を回避するために,変更セグメント化に着目した。近年,ディープ・ラーニングがシームズ・ネットワークを通じて情報を符号化することで,このタスクの有効性を証明しているが,本研究では,ディープ・ネットワークの初期段階における変更情報の利用についても検討する。そこで我々はまず,手作り機能,特に変更関連機能を備えたSiamese KPConv State-of-The-Art(SoTA)ネットワークを提案する。これにより、変化のクラスに対するIoU(Intersection over Union)の平均は4.70 %向上する。変更関連機能により大きな改善が得られたことを考慮し、oneconvfusion、triplet kpconv、エンコーダfusion siamkpconvという3d pcs変更セグメンテーションに対応する3つの新しいアーキテクチャを提案する。 3つのネットワークは、初期段階における変更情報を考慮しており、SoTA法より優れている。特に、最後のネットワークであるEncoder Fusion SiamKPConvは、変更検出タスクのための変更情報にネットワークを集中させることの価値を強調した変更クラスよりも、IoUの平均の5%以上でSoTAを追い越している。 Change detection is an important task to rapidly identify modified areas, in particular when multi-temporal data are concerned. In landscapes with complex geometry such as urban environment, vertical information turn out to be a very useful knowledge not only to highlight changes but also to classify them into different categories. In this paper, we focus on change segmentation directly using raw 3D point clouds (PCs), to avoid any loss of information due to rasterization processes. While deep learning has recently proved its effectiveness for this particular task by encoding the information through Siamese networks, we investigate here the idea of also using change information in early steps of deep networks. To do this, we first propose to provide the Siamese KPConv State-of-The-Art (SoTA) network with hand-crafted features and especially a change-related one. This improves the mean of Intersection over Union (IoU) over classes of change by 4.70\%. Considering that the major improvement was obtained thanks to the change-related feature, we propose three new architectures to address 3D PCs change segmentation: OneConvFusion, Triplet KPConv, and Encoder Fusion SiamKPConv. All the three networks take into account change information in early steps and outperform SoTA methods. In particular, the last network, entitled Encoder Fusion SiamKPConv, overtakes SoTA with more than 5% of mean of IoU over classes of change emphasizing the value of having the network focus on change information for change detection task.	翻訳日:2023-04-26 21:19:02 公開日:2023-04-25
# Phylo2Vec:バイナリツリーのベクトル表現 Phylo2Vec: a vector representation for binary trees ( http://arxiv.org/abs/2304.12693v1 ) ライセンス: Link先を確認	Matthew J Penn, Neil Scheidwasser, Mark P Khurana, David A Duch\^ene, Christl A Donnelly, Samir Bhatt	(参考訳) 生物学的データから推定される2つの系統樹は、生物の共有進化の歴史を理解する中心である。任意の最適度基準(例えば最大可能性)による木内の潜在ノード配置の推測はnp問題であり、無数のヒューリスティックなアプローチの発展を促している。しかし、これらのヒューリスティックは、ランダムな木を均一にサンプリングしたり、因果的に成長する木空間を効果的に探索する体系的な手段を欠いていることが多い。そこで本研究では,系統樹の新規表現であるphylo2vecについて述べる。 Phylo2Vecは、$n$の葉を持つ任意の二分木を長さ$n$の整数ベクトルにマッピングする。我々はPhylo2Vecが系統樹の空間によく定義され、客観的であることを証明する。 Phylo2Vecの利点は2つある。一二分木を簡単に一様にサンプリングすること二超大型又は小型の跳躍で樹木空間を横断する系統的能力概念実証として,Phylo2Vecを用いて5つの実世界のデータセットの最大推定を行い,単純な登山に基づく最適化がランダムから最適木へのツリー空間の広さを効率的に横切ることを示す。 Binary phylogenetic trees inferred from biological data are central to understanding the shared evolutionary history of organisms. Inferring the placement of latent nodes in a tree by any optimality criterion (e.g., maximum likelihood) is an NP-hard problem, propelling the development of myriad heuristic approaches. Yet, these heuristics often lack a systematic means of uniformly sampling random trees or effectively exploring a tree space that grows factorially, which are crucial to optimisation problems such as machine learning. Accordingly, we present Phylo2Vec, a new parsimonious representation of a phylogenetic tree. Phylo2Vec maps any binary tree with $n$ leaves to an integer vector of length $n$. We prove that Phylo2Vec is both well-defined and bijective to the space of phylogenetic trees. The advantages of Phylo2Vec are twofold: i) easy uniform sampling of binary trees and ii) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill climbing-based optimisation efficiently traverses the vastness of tree space from a random to an optimal tree.	翻訳日:2023-04-26 21:13:30 公開日:2023-04-25
# 量子スキームによる古典的相関の生成 The Generations of Classical Correlations via Quantum Schemes ( http://arxiv.org/abs/2304.12690v1 ) ライセンス: Link先を確認	Zhenyu Chen and Lijinzhi Lin and Xiaodie Lin and Zhaohui Wei and Penghui Yao	(参考訳) アリスとボブの2つの分離したパーティが2部的な量子状態またはシードと呼ばれる古典的な相関を共有し、シード上で局所的な量子または古典演算を行うことで、ターゲットとなる古典的相関を生成しようとすると仮定する。 Alice と Bob が対象とする古典的相関を生成するために与えられた種を使うことができるかどうか。この問題にはリッチな数学的構造があることが示される。まず、種が純粋な二成分状態であっても、上記の決定問題はnp困難であり、種が古典的相関である場合にも同様の結論を導くことができ、この問題は一般に解くのが困難であることを示す。さらに, シードが純粋な量子状態である場合, 対象の古典的相関がシード純状態と一致する正の半定値分解の正の形式を持つかどうかを突き止め, 現在の問題と最適化理論の興味深い関係を明らかにした。この観測および他の知見に基づいて、ターゲットの古典的相関を生成するために、シード純状態が満たさなければならないいくつかの必要条件を与え、これらの条件は、シードが混合量子状態である場合にも一般化できることを示した。最後に、正の半定値分解の正の形式が問題を解く上で重要な役割を担っているため、任意の古典的相関を計算できるアルゴリズムを開発し、テストするケースで十分な性能を発揮する。 Suppose two separated parties, Alice and Bob, share a bipartite quantum state or a classical correlation called a seed, and they try to generate a target classical correlation by performing local quantum or classical operations on the seed, i.e., any communications are not allowed. We consider the following fundamental problem about this setting: whether Alice and Bob can use a given seed to generate a target classical correlation. We show that this problem has rich mathematical structures. Firstly, we prove that even if the seed is a pure bipartite state, the above decision problem is already NP-hard and a similar conclusion can also be drawn when the seed is also a classical correlation, implying that this problem is hard to solve generally. Furthermore, we prove that when the seed is a pure quantum state, solving the problem is equivalent to finding out whether the target classical correlation has some canonical form of positive semi-definite factorizations that matches the seed pure state, revealing an interesting connection between the current problem and optimization theory. Based on this observation and other insights, we give several necessary conditions where the seed pure state has to satisfy to generate the target classical correlation, and it turns out that these conditions can also be generalized to the case that the seed is a mixed quantum state. Lastly, since canonical forms of positive semi-definite factorizations play a crucial role in solving the problem, we develop an algorithm that can compute them for an arbitrary classical correlation, which has decent performance on the cases we test.	翻訳日:2023-04-26 21:13:12 公開日:2023-04-25
# 意味, 言語モデル, 理解不能なホラーの計算について On the Computation of Meaning, Language Models and Incomprehensible Horrors ( http://arxiv.org/abs/2304.12686v1 ) ライセンス: Link先を確認	Michael Timothy Bennett	(参考訳) 我々は、意味、コミュニケーション、シンボル出現に関する包括的機械論的説明を提供するために、意味の基礎理論と人工知能(agi)の数学的形式理論を統合する。この合成は、プラグマティクス、論理的真理条件意味論、パーセアン・セミオティックスを統一し、伝統的に機械的説明を避けてきた現象に対処する計算可能モデルとして、AGIと言語の性質に関するより広範な議論の両方に重要である。機械が有意義な発話や人間の意味を理解することができる条件を調べることにより、現在の言語モデルでは、人間と同じ意味の理解を持たず、その応答に特徴付けるような意味も持たないことを示す。そこで本研究では,人間の感情をシミュレーションし,弱表現を構築するための最適化モデルを提案する。我々の発見は、意味と知性の関係と、意味を理解して意図する機械を構築する方法に光を当てた。 We integrate foundational theories of meaning with a mathematical formalism of artificial general intelligence (AGI) to offer a comprehensive mechanistic explanation of meaning, communication, and symbol emergence. This synthesis holds significance for both AGI and broader debates concerning the nature of language, as it unifies pragmatics, logical truth conditional semantics, Peircean semiotics, and a computable model of enactive cognition, addressing phenomena that have traditionally evaded mechanistic explanation. By examining the conditions under which a machine can generate meaningful utterances or comprehend human meaning, we establish that the current generation of language models do not possess the same understanding of meaning as humans nor intend any meaning that we might attribute to their responses. To address this, we propose simulating human feelings and optimising models to construct weak representations. Our findings shed light on the relationship between meaning and intelligence, and how we can build machines that comprehend and intend meaning.	翻訳日:2023-04-26 21:12:43 公開日:2023-04-25
# 自己監督型シングルフレームと多フレーム深度推定の相互影響の探索 Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation ( http://arxiv.org/abs/2304.12685v1 ) ライセンス: Link先を確認	Jie Xiang, Yun Wang, Lifeng An, Haiyang Liu and Jian Liu	(参考訳) 自己教師付きシングルフレームとマルチフレーム深度推定のどちらの手法もトレーニングのためにラベル付きモノクロビデオを必要とするが、それらが利用する情報は様々である。単フレーム法と多フレーム法の相補的な情報を考えると、多フレーム深度を改善するために単フレーム深度を活用しようとする研究もある。しかし、この手法では、単一フレーム深さと多フレーム深さの違いを生かさず、多フレーム深さを改善したり、複数フレーム深さを最適化したりすることはできない。シングルフレームとマルチフレームの相互影響をフル活用するために,新しい自己教師型トレーニングフレームワークを提案する。具体的には,まず,単一フレーム深度に誘導された画素方向適応深度サンプリングモジュールを導入し,マルチフレームモデルを訓練する。次に, 最小再プロジェクションに基づく蒸留損失を活用し, 知識をマルチフレーム深度ネットワークからシングルフレームネットワークに移し, シングルフレーム深度を改善する。最後に,改良された単一フレーム深度を,複数フレーム深度推定の性能をさらに向上させる前兆とみなす。 kitti と cityscapes のデータセットにおける実験結果から,本手法は自己教師付き単眼環境における既存手法よりも優れていることが示された。 Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based features while multi-frame methods focus on geometric cues. Considering the complementary information of single-frame and multi-frame methods, some works attempt to leverage single-frame depth to improve multi-frame depth. However, these methods can neither exploit the difference between single-frame depth and multi-frame depth to improve multi-frame depth nor leverage multi-frame depth to optimize single-frame depth models. To fully utilize the mutual influence between single-frame and multi-frame methods, we propose a novel self-supervised training framework. Specifically, we first introduce a pixel-wise adaptive depth sampling module guided by single-frame depth to train the multi-frame model. Then, we leverage the minimum reprojection based distillation loss to transfer the knowledge from the multi-frame depth network to the single-frame network to improve single-frame depth. Finally, we regard the improved single-frame depth as a prior to further boost the performance of multi-frame depth estimation. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms existing approaches in the self-supervised monocular setting.	翻訳日:2023-04-26 21:12:25 公開日:2023-04-25
# ドキュメンテーション:リアルタイムスクリーンカメラのロバストなドキュメンテーション Docmarking: Real-Time Screen-Cam Robust Document Image Watermarking ( http://arxiv.org/abs/2304.12682v1 ) ライセンス: Link先を確認	Aleksey Yakushev, Yury Markin, Dmitry Obydenkov, Alexander Frolov, Stas Fomin, Manuk Akopyan, Alexander Kozachok, Arthur Gaynov	(参考訳) 本稿では,スクリーン写真の形での機密文書漏洩の調査に焦点をあてる。提案されたアプローチは、そもそもリークを防ぐのではなく、リークのソースを決定することを目的としている。方法は、スクリーンに透かしを半透明の画像と識別することで、人間の目にはほとんど認識できない。ウォーターマーク画像は静止状態であり、常にスクリーン上に残されているため、スクリーンの撮影された写真ごとにウォーターマークが存在する。このアプローチの重要なコンポーネントは3つのニューラルネットワークである。第1のネットワークは、この画像が画面上に表示されるとほとんど見えないように、埋め込みメッセージ付き画像を生成する。他の2つのニューラルネットワークは、組み込みメッセージを高精度に取得するために使用される。開発手法は異なるスクリーンとカメラで総合的にテストされた。実験の結果,提案手法は高い効率を示した。 This paper focuses on investigation of confidential documents leaks in the form of screen photographs. Proposed approach does not try to prevent leak in the first place but rather aims to determine source of the leak. Method works by applying on the screen a unique identifying watermark as semi-transparent image that is almost imperceptible for human eyes. Watermark image is static and stays on the screen all the time thus watermark present on every captured photograph of the screen. The key components of the approach are three neural networks. The first network generates an image with embedded message in a way that this image is almost invisible when displayed on the screen. The other two neural networks are used to retrieve embedded message with high accuracy. Developed method was comprehensively tested on different screen and cameras. Test results showed high efficiency of the proposed approach.	翻訳日:2023-04-26 21:11:59 公開日:2023-04-25
# 分散ロバスト最適化による微分プライバシー Differential Privacy via Distributionally Robust Optimization ( http://arxiv.org/abs/2304.12681v1 ) ライセンス: Link先を確認	Aras Selvi and Huikang Liu and Wolfram Wiesemann	(参考訳) 近年、データセットの統計情報を共有するためのデファクトスタンダードとしてディファレンシャルプライバシが登場し、関連する個人に関する個人情報の開示が制限されている。これは、公開する統計をランダムに摂動させることによって達成され、結果として、プライバシの正確さのトレードオフにつながる: より大きな摂動は、より強力なプライバシー保証を提供するが、それらは、受信者に対して低いユーティリティを提供する、正確さの少ない統計をもたらす。したがって、特に興味を持つのは、選択されたプライバシーレベルに対して最高の精度を提供する最適なメカニズムである。これまで、この分野の研究は、事前の摂動の族を特定し、その漸近的および/または最良クラスの最適性を証明することに重点を置いてきた。本稿では,非漸近的かつ非条件的最適性を保証するメカニズムのクラスを開発する。この目的のために、無限次元分布ロバスト最適化問題として機構設計問題を定式化する。この問題には強い双対性が与えられ、この双対性を利用して有限次元上界および下界問題の収束階層を構築する。上界 (primal) は実装可能な摂動に対応しており、その準最適性は下界 (dual) で有界である。両方の境界問題は、固有の問題構造を利用する切断平面技術によって数秒以内に解決できる。数値実験により,我々の摂動は,標準ベンチマーク問題と同様に人工物に関する文献のこれまでの最良の結果を上回ることができることを示した。 In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.	翻訳日:2023-04-26 21:11:50 公開日:2023-04-25
# 付加ガウス雑音下における通信制約帯域 Communication-Constrained Bandits under Additive Gaussian Noise ( http://arxiv.org/abs/2304.12680v1 ) ライセンス: Link先を確認	Prathamesh Mayekar, Jonathan Scarlett, and Vincent Y.F. Tan	(参考訳) そこで本研究では,クライアントが学習者に,対応するアームプルに対する報奨に基づいてコミュニケーション制約付きフィードバックを提供する分散確率的多腕バンディットについて検討する。私たちの設定では、クライアントは、エンコードされた報酬の第二のモーメントが$p$以下であるように、報酬をエンコードする必要があります。この設定のために、情報理論的な下限 $\omega\left(\sqrt{\frac{kt}{\mathtt{snr} \wedge1}} \right)$ が任意のスキームのミニマックス後悔に基づいて導出され、ここで $ \mathtt{snr} := \frac{p}{\sigma^2}$, $k$ と $t$ はそれぞれ腕数と時間軸数である。さらに、この下限を小さな加法係数にマッチさせるマルチフェーズ帯域幅アルゴリズム、$\mathtt{UE\text{-}UCB++}$を提案する。 $\mathtt{UE\text{-}UCB++}$は初期フェーズで一様探索を行い、最終フェーズで {\em upper confidence bound }(UCB)banditアルゴリズムを利用する。 $\mathtt{UE\text{-}UCB++}$の興味深い特徴は、一様探索フェーズで生成された平均報酬の粗い推定が、次のフェーズで符号化プロトコルを洗練させ、その後のフェーズにおける報酬のより正確な平均見積もりをもたらすことである。この正の補強サイクルは、均一な探査ラウンドの数を減らし、我々の下界と密接に一致する。 We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $\sigma^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{\sigma^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.	翻訳日:2023-04-26 21:11:25 公開日:2023-04-25
# 量子エンタングルメント浄化の進歩 Advances in quantum entanglement purification ( http://arxiv.org/abs/2304.12679v1 ) ライセンス: Link先を確認	Peishun Yan, Lan Zhou, Wei Zhong, and Yubo Sheng	(参考訳) その発見以来、量子の絡み合いは量子通信と計算において有望な資源となる。しかし、量子チャネルにノイズが存在するため、絡み合いは脆弱である。絡み合い精製は、低品質の絡み合い状態から高品質の絡み合い状態を蒸留する強力なツールである。本稿では, エンタングルメント浄化理論, 線形光学を用いたエンタングルメント浄化プロトコル(EPP), クロスカー非線形性を持つEPP, ハイパーエンタングルメントEPP, 決定論的EPP, 測定に基づくEPPなど, 絡み合い浄化の概観を紹介する。また、線形光学におけるEPPの実験的進歩についても概説する。最後に,EPPの今後の発展に向けた展望について考察する。このレビューは、将来の長距離量子通信と量子ネットワークにおける実践的な実装の道を開くかもしれない。 Since its discovery, the quantum entanglement becomes a promising resource in quantum communication and computation. However, the entanglement is fragile due to the presence of noise in quantum channels. Entanglement purification is a powerful tool to distill high quality entangled states from the low quality entangled states. In this review, we present an overview of entanglement purification, including the basic entanglement purification theory, the entanglement purification protocols (EPPs) with linear optics, EPPs with cross-Kerr nonlinearities, hyperentanglement EPPs, deterministic EPPs, and measurement-based EPPs. We also review experimental progresses of EPPs in linear optics. Finally, we give the discussion on potential outlook for the future development of EPPs. This review may pave the way for practical implementations in future long-distance quantum communication and quantum network.	翻訳日:2023-04-26 21:10:48 公開日:2023-04-25
# 最大符号化速度低減による文表現圧縮 Compressing Sentence Representation with maximum Coding Rate Reduction ( http://arxiv.org/abs/2304.12674v1 ) ライセンス: Link先を確認	Domagoj \v{S}everdija, Tomislav Prusina, Antonio Jovanovi\'c, Luka Borozan, Jurica Maltar, and Domagoj Matijevi\'c	(参考訳) ほとんどの自然言語推論問題では、文表現は意味検索タスクに必要である。近年、事前訓練された大規模言語モデルはそのような表現の計算に非常に効果的である。これらのモデルは高次元の文埋め込みを生成する。大型モデルと小型モデルの間に明らかなパフォーマンスギャップがある。したがって、空間的および時間的ハードウェアの制限により、より小さなモデルを使用する場合、大言語モデルの蒸留版である同等の結果を得る必要がある。本稿では, 汎用多様体クラスタリングのための新しい手法であるmcr2objective(maximum coding rate reduction)に基づいて学習した投影層を用いて, 事前学習した蒸留モデルの拡張により, 文表現モデル文-bertのモデル蒸留を評価する。複雑性と文埋め込みサイズを低減した新しい言語モデルは,セマンティック検索ベンチマークにおいて同等の結果が得られることを示す。 In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. In recent years, pre-trained large language models have been quite effective for computing such representations. These models produce high-dimensional sentence embeddings. An evident performance gap between large and small models exists in practice. Hence, due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model, which is usually a distilled version of the large language model. In this paper, we assess the model distillation of the sentence representation model Sentence-BERT by augmenting the pre-trained distilled model with a projection layer additionally learned on the Maximum Coding Rate Reduction (MCR2)objective, a novel approach developed for general-purpose manifold clustering. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.	翻訳日:2023-04-26 21:10:33 公開日:2023-04-25
# 時間窓における状態生成を必要とする量子プロトコルの解析ツール Tools for the analysis of quantum protocols requiring state generation within a time window ( http://arxiv.org/abs/2304.12673v1 ) ライセンス: Link先を確認	Bethany Davies, Thomas Beauchamp, Gayane Vardoyan, Stephanie Wehner	(参考訳) 量子プロトコルは一般に、特定の数の量子リソース状態を同時に利用できる必要がある。重要な例の1つは、ある数の絡み合った対を必要とする量子ネットワークプロトコルである。ここでは、プロセスが時間ステップ毎に何らかの確率 p$ を持つ量子資源状態を生成し、時間依存ノイズの対象となる量子メモリに格納する設定を考える。アプリケーションに十分な品質を維持するため、各リソース状態は$w$の時間ステップ後にメモリから破棄される。 $s$ をプロトコルによって要求される所望のリソース状態の数とする。確率分布 $X_{(w,s)}$ の量子リソース状態の年齢を特徴付け、$s$状態がウィンドウ$w$で生成される。時間依存ノイズモデルと組み合わせることで、この分布の知識は$s$量子リソースの忠実度統計量を計算することができる。また、待ち時間 $\tau_{(w,s)}$ がウィンドウ$w$内で生成されるまで、待ち時間 $\tau_{(w,s)}$ の第1と第2の瞬間の正確なソリューションも提供します。期待される待ち時間 $\mathbb{E}(\tau_{(w,s)})$ と分布 $X_{(w,s)}$ を記述する統計量に対する一般閉形式式を得るのは難しいので、あるパラメータ体系における計算を支援する2つの新しい結果を示す。この研究で示された手法は、量子プロトコルの実行の分析と最適化に利用できる。具体的には、Blind Quantum Computing(BQC)プロトコルの例として、$w$と$p$を推論して、プロトコル実行の成功率を最適化する方法について説明する。 Quantum protocols commonly require a certain number of quantum resource states to be available simultaneously. An important class of examples is quantum network protocols that require a certain number of entangled pairs. Here, we consider a setting in which a process generates a quantum resource state with some probability $p$ in each time step, and stores it in a quantum memory that is subject to time-dependent noise. To maintain sufficient quality for an application, each resource state is discarded from the memory after $w$ time steps. Let $s$ be the number of desired resource states required by a protocol. We characterise the probability distribution $X_{(w,s)}$ of the ages of the quantum resource states, once $s$ states have been generated in a window $w$. Combined with a time-dependent noise model, the knowledge of this distribution allows for the calculation of fidelity statistics of the $s$ quantum resources. We also give exact solutions for the first and second moments of the waiting time $\tau_{(w,s)}$ until $s$ resources are produced within a window $w$, which provides information about the rate of the protocol. Since it is difficult to obtain general closed-form expressions for statistical quantities describing the expected waiting time $\mathbb{E}(\tau_{(w,s)})$ and the distribution $X_{(w,s)}$, we present two novel results that aid their computation in certain parameter regimes. The methods presented in this work can be used to analyse and optimise the execution of quantum protocols. Specifically, with an example of a Blind Quantum Computing (BQC) protocol, we illustrate how they may be used to infer $w$ and $p$ to optimise the rate of successful protocol execution.	翻訳日:2023-04-26 21:10:20 公開日:2023-04-25
# 勾配ブースティング法による銀河系外電波源の形態分類 Morphological Classification of Extragalactic Radio Sources Using Gradient Boosting Methods ( http://arxiv.org/abs/2304.12729v1 ) ライセンス: Link先を確認	Abdollah Masoud Darya, Ilias Fernini, Marley Vellasco, Abir Hussain	(参考訳) 電波天文学の分野は、新しく任命された電波望遠鏡によって、1日に生成されるデータ量の増加を目撃している。この分野で最も重要な問題の1つは、銀河系外電波源のモルフォロジーに基づく自動分類である。銀河系外電波源の形態分類の分野での最近の貢献は、畳み込みニューラルネットワークに基づく分類器の提案である。あるいは、畳み込みニューラルネットワークに対するデータ効率の代替として、主成分分析を伴う勾配向上機械学習手法を提案する。近年, 表型データを用いた分類問題に対して, 深層学習における勾配促進法の有効性が示されている。この研究で考慮された勾配向上手法は、XGBoost、LightGBM、CatBoostの実装に基づいている。本研究は,データセットサイズが分類器の性能に及ぼす影響についても検討する。この研究では、Best-Heckmanサンプルからの電波源を用いて、クラス0、クラスI、クラスIIの3つの主要なファナロフ・ライリークラスに基づいて、3クラス分類問題を考える。提案された3つの勾配向上手法は、画像の4分の1未満を使用して、最先端の畳み込みニューラルネットワークベースの分類器より優れており、CatBoostが最も精度が高い。これは主にファナロフ・ライリークラスiiソースの分類における勾配促進法の精度が向上し、3-4\%高いリコールが得られたためである。 The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3--4\% higher recall.	翻訳日:2023-04-26 21:03:35 公開日:2023-04-25
# 非エルミート2軌道モデルの位相的性質 Topological properties of a non-Hermitian two-orbital model ( http://arxiv.org/abs/2304.12723v1 ) ライセンス: Link先を確認	Dipendu Halder and Saurabh Basu	(参考訳) 単位セル毎に2つの軌道からなるタイトな結合鎖の非エルミタン(NH)バージョンを徹底的に解析する。非ハーミティシティは、それぞれPT対称および非PT対称のケースに分岐し、非相互近接ホッピング振幅と純粋に想像上の現場電位エネルギーによって特徴づけられる。モデルの局所化とトポロジカルな性質に関する研究は、いくつかの興味深い結果を示している。例えば、それらは異なる特徴、すなわち非PT対称の場合のラインギャップとPT対称の場合のポイントギャップを持つ複素エネルギーギャップを持つ。さらに、NH系の特徴であるNH皮膚効果は、ここでは存在せず、状態の局所密度を計算することによって確認される。両NH変種に対するバルク境界対応は両直交条件に従う。さらに、逆参加比によって得られるエッジモードの局在化は、ハミルトニアンのパラメータに様々な依存性を示す。また、位相的性質は位相不変量の挙動、すなわち有限値からゼロへの急な遷移を示す複素ベリー位相と区別できる。興味深いことに、PT対称系は、パラメータの値によってPT破壊と未破壊の位相に分割される。最後に、結果がhermitianモデルでベンチマークされ、nh変種で得られた結果の比較と対比が行われる。 We perform a thorough analysis of a non-Hermitian (NH) version of a tight binding chain comprising of two orbitals per unit cell. The non-Hermiticity is further bifurcated into PT symmetric and non-PT symmetric cases, respectively, characterized by non-reciprocal nearest neighbour hopping amplitudes and purely imaginary onsite potential energies. The studies on the localization and the topological properties of our models reveal several intriguing results. For example, they have complex energy gaps with distinct features, that is, a line gap for the non-PT symmetric case and a point gap for the PT symmetric case. Further, the NH skin effect, a distinctive feature of the NH system, is non-existent here and is confirmed via computing the local density of states. The bulk-boundary correspondence for both the NH variants obeys a bi-orthogonal condition. Moreover, the localization of the edge modes obtained via the inverse participation ratio shows diverse dependencies on the parameters of the Hamiltonian. Also, the topological properties are discernible from the behaviour of the topological invariant, namely, the complex Berry phase, which shows a sharp transition from a finite value to zero. Interestingly, the PT symmetric system is found to split between a PT broken and an unbroken phase depending on the values of the parameters. Finally, the results are benchmarked with the Hermitian model to compare and contrast those obtained for the NH variants.	翻訳日:2023-04-26 21:03:14 公開日:2023-04-25
# dual cross-attention を用いた眼球追跡誘導型深層複数インスタンス学習による眼底疾患検出 Eye tracking guided deep multiple instance learning with dual cross-attention for fundus disease detection ( http://arxiv.org/abs/2304.12719v1 ) ライセンス: Link先を確認	Hongyang Jiang, Jingqi Huang, Chen Tang, Xiaoqing Zhang, Mengdi Gao, Jiang Liu	(参考訳) 深層ニューラルネットワーク(dnns)は,眼科医の診断ミスや誤診率の軽減を支援するため,眼底疾患のコンピュータ支援診断(cad)システムの開発を促進する。しかし、CADシステムの大部分はデータ駆動であるが、パフォーマンスに優しい医学的事前知識が不足している。そこで本稿では,眼科医の視線追跡情報を利用したHuman-in-the-loop (HITL) CADシステムを提案する。具体的には,視線追跡による視線マップがチェリーピックの診断関連事例に有用であったマルチ・インスタンス・ラーニング(MIL)に基づいてHITL CADシステムを実装した。さらに, 二重クロスアテンションMIL (DCAMIL) ネットワークを用いて, ノイズの抑制効果について検討した。一方, トレーニングバッグのインスタンスを充実・標準化するために, シーケンス拡張モジュールとドメイン逆数モジュールの両方を導入し, 本手法の堅牢性を高めた。我々は,新たに構築したデータセット(amd-gazeとdr-gaze)について,それぞれamdと早期dr検出のための比較実験を行った。眼科医の視線追跡情報を完全に探索し, HITL CADシステムの実現可能性と提案したDCAMILの優位性を実証した。これらの調査から,医学的先行知識として医師の視線マップが臨床疾患のCADシステムに寄与する可能性が示唆された。 Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, helping ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by leveraging ophthalmologists' eye-tracking information, which is more efficient and accurate. Concretely, the HITL CAD system was implemented on the multiple instance learning (MIL), where eye-tracking gaze maps were beneficial to cherry-pick diagnosis-related instances. Furthermore, the dual-cross-attention MIL (DCAMIL) network was utilized to curb the adverse effects of noisy instances. Meanwhile, both sequence augmentation module and domain adversarial module were introduced to enrich and standardize instances in the training bag, respectively, thereby enhancing the robustness of our method. We conduct comparative experiments on our newly constructed datasets (namely, AMD-Gaze and DR-Gaze), respectively for the AMD and early DR detection. Rigorous experiments demonstrate the feasibility of our HITL CAD system and the superiority of the proposed DCAMIL, fully exploring the ophthalmologists' eye-tracking information. These investigations indicate that physicians' gaze maps, as medical prior knowledge, is potential to contribute to the CAD systems of clinical diseases.	翻訳日:2023-04-26 21:02:55 公開日:2023-04-25
# 量子サービス提供の比較:MaxCutにおけるQAOAの事例 Comparing Quantum Service Offerings: A Case Study of QAOA for MaxCut ( http://arxiv.org/abs/2304.12718v1 ) ライセンス: Link先を確認	Julian Obst and Johanna Barzen and Martin Beisel and Frank Leymann and Marie Salm and Felix Truger	(参考訳) 量子コンピューティングの出現に伴い、多くの量子デバイスがクラウド経由でアクセスできるようになった。しかし、この分野の急速な発展により、これらの量子特化サービスの提供は、ソフトウェア開発者に課す能力と要件が著しく異なる。これは、これらのサービスをアプリケーションの一部として使用することに関心がある量子コンピューティング領域の外部の実践者にとって、特に困難である。本稿では,異なるハードウェア技術に基づく複数のデバイスを比較し,それぞれに同じ実験を行うことにより,異なる提供物を通じて提供する。実験から得られた教訓を文書化することにより,量子特化製品の利用を簡素化し,主要な量子ハードウェア技術間の差異を明らかにすることを目的とする。 With the emergence of quantum computing, a growing number of quantum devices is accessible via cloud offerings. However, due to the rapid development of the field, these quantum-specific service offerings vary significantly in capabilities and requirements they impose on software developers. This is particularly challenging for practitioners from outside the quantum computing domain who are interested in using these offerings as parts of their applications. In this paper, we compare several devices based on different hardware technologies and provided through different offerings, by conducting the same experiment on each of them. By documenting the lessons learned from our experiments, we aim to simplify the usage of quantum-specific offerings and illustrate the differences between predominant quantum hardware technologies.	翻訳日:2023-04-26 21:02:28 公開日:2023-04-25
# 格子ゲージ理論における閉じ込め物質のメソスコピックスケールにおけるコヒーレンス Coherence of confined matter in lattice gauge theories at the mesoscopic scale ( http://arxiv.org/abs/2304.12713v1 ) ライセンス: Link先を確認	Enrico C. Domanti, Paolo Castorina, Dario Zappal\`a and Luigi Amico	(参考訳) ゲージ理論は時空局所対称性を示す物理系に現れる。基本的な相互作用から統計力学、凝縮物質、最近では量子計算まで、物理学の重要な領域の強力な記述を提供する。そのため、この分野では極めて深い理解が得られている。量子技術の出現により、量子シミュレーションによって元の量子場理論の重要な特徴を捉えることができる低エネルギーアナログが集中的に研究されている。本稿では,メソスコピック空間スケールに制約のある格子ゲージ理論について検討する。そこで本研究では,メゾスコピックサイズのリング状格子に存在する中間子を有効磁場で貫通するダイナミクスについて検討する。これらの条件下では、中間子はユニークな特徴によって特徴づけられる。我々は、磁場と中間子の内部構造との結合を反映した新しいタイプのアハロノフ・ボーム振動を発見した。中間子のコヒーレンス特性は、持続電流と相関関数の特定の特徴によって定量化される。磁場がクエンチされると、アハロノフ・ボーム振動と中間子電流のメゾスコピック的特徴と相関関係は特定の物質波ダイナミクスを開始する。 Gauge theories arise in physical systems displaying space-time local symmetries. They provide a powerful description of important realms of physics ranging from fundamental interactions, to statistical mechanics, condensed matter and more recently quantum computation. As such, a remarkably deep understanding has been achieved in the field. With the advent of quantum technology, lower energy analogs, capable to capture important features of the original quantum field theories through quantum simulation, have been intensively studied. Here, we study lattice gauge theories constrained to mesoscopic spatial scales. To this end, we study the dynamics of mesons residing in a ring-shaped lattice of mesoscopic size pierced by an effective magnetic field. We demonstrate that, in these conditions, mesons are characterized by unique features. We find a new type of Aharonov-Bohm oscillations reflecting the coupling between the magnetic field and the internal structure of the meson. The coherence properties of the meson are quantified by the persistent current and by specific features of the correlation functions. When the magnetic field is quenched, Aharonov-Bohm oscillations and mesoscopic features of the meson current and correlations start a specific matter-wave dynamics.	翻訳日:2023-04-26 21:02:18 公開日:2023-04-25
# ロバスト深部平衡モデルの学習 Learning Robust Deep Equilibrium Models ( http://arxiv.org/abs/2304.12707v1 ) ライセンス: Link先を確認	Haoyu Chu, Shikui Wei, and Ting Liu	(参考訳) 深層平衡(deq)モデルは、単一の非線形層の不動点を解くことで従来の深さを捨てる深層学習において有望な暗黙層モデルのクラスとして出現した。その成功にもかかわらず、これらのモデルの不動点の安定性は未だよく分かっていない。近年、Lyapunov理論は、別のタイプの暗黙的層モデルであるNeural ODEsに適用され、対向的ロバスト性を示す。非線形力学系としてDECモデルを考慮し、リアプノフ理論による証明可能な安定性を保証した頑健なDECモデルLyaDEQを提案する。我々の手法の要点は、DEC モデルの固定点が Lyapunov 安定であることを保証することで、LyaDEQ モデルが小さな初期摂動に抵抗することを可能にすることである。互いに近接するリアプノフ安定不動点による逆防御の悪さを避けるため、リアプノフ安定モジュールの後に直交完全連結層を追加して異なる不動点を分離する。 lyadeqモデルは,よく知られた敵の攻撃下,広く使用されているデータセット上で評価され,実験によりロバスト性が著しく向上した。さらに,LyaDEQモデルは,対戦訓練などの他の防御手法と組み合わせることで,より優れた対戦力を実現することができることを示す。 Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models in deep learning, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. Recently, Lyapunov theory has been applied to Neural ODEs, another type of implicit layer model, to confer adversarial robustness. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed provable stability via Lyapunov theory. The crux of our method is ensuring the fixed points of the DEQ models are Lyapunov stable, which enables the LyaDEQ models to resist the minor initial perturbations. To avoid poor adversarial defense due to Lyapunov-stable fixed points being located near each other, we add an orthogonal fully connected layer after the Lyapunov stability module to separate different fixed points. We evaluate LyaDEQ models on several widely used datasets under well-known adversarial attacks, and experimental results demonstrate significant improvement in robustness. Furthermore, we show that the LyaDEQ model can be combined with other defense methods, such as adversarial training, to achieve even better adversarial robustness.	翻訳日:2023-04-26 21:02:00 公開日:2023-04-25
# BERTはプロソディについて何を学ぶのか? What does BERT learn about prosody? ( http://arxiv.org/abs/2304.12706v1 ) ライセンス: Link先を確認	Sofoklis Kakouros and Johannah O'Mahony	(参考訳) 自然言語処理アプリケーションでは、言語モデルはほぼ至るところで使われている。モデル設計は、訓練中に所定の言語目標を定義するのではなく、言語の一般化された表現を学習することを目的としているため、モデルが暗黙的にキャプチャする表現の分析と解釈は、解釈可能性とモデル性能のギャップを埋める上で重要である。いくつかの研究は、モデルが表現能力に関する洞察を与える言語情報を探索してきた。しかし、現在の研究では、韻律がモデルが学習する言語の構造情報の一部であるかどうかについて検討していない。本研究では,異なる層でキャプチャされた表現をBERTで探索する実験を行った。以上の結果から,韻律的優位性に関する情報は多くの層にまたがるが,中層を中心にして,BERTは構文情報や意味情報に大きく依存していることが示唆された。 Language models have become nearly ubiquitous in natural language processing applications achieving state-of-the-art results in many tasks including prosody. As the model design does not define predetermined linguistic targets during training but rather aims at learning generalized representations of the language, analyzing and interpreting the representations that models implicitly capture is important in bridging the gap between interpretability and model performance. Several studies have explored the linguistic information that models capture providing some insights on their representational capacity. However, the current studies have not explored whether prosody is part of the structural information of the language that models learn. In this work, we perform a series of experiments on BERT probing the representations captured at different layers. Our results show that information about prosodic prominence spans across many layers but is mostly focused in middle layers suggesting that BERT relies mostly on syntactic and semantic information.	翻訳日:2023-04-26 21:01:37 公開日:2023-04-25
# 野生動物保護者エンパワーメント:深層学習と3/4gカメラトラップを用いた生物多様性保全のための公平なデジタルスチュワードシップと報酬システム Empowering Wildlife Guardians: An Equitable Digital Stewardship and Reward System for Biodiversity Conservation using Deep Learning and 3/4G Camera Traps ( http://arxiv.org/abs/2304.12703v1 ) ライセンス: Link先を確認	Paul Fergus, Carl Chalmers, Steven Longmore, Serge Wich, Carmen Warmenhove, Jonathan Swart, Thuto Ngongwane, Andr\'e Burger, Jonathan Ledgard, and Erik Meijaard	(参考訳) 我々の惑星の生物多様性は脅威にさらされており、約100万種が数十年以内に絶滅すると予想されている。理由は、狩猟、過剰漁、汚染、都市化と農業のための土地の転換など、ネガティブな人間の行動である。自然に利益をもたらす活動のための慈善団体や政府からのかなりの投資にもかかわらず、世界の野生生物の数は減少し続けている。地域の野生生物保護者は歴史的に地球環境保全活動において重要な役割を担い、様々なレベルで持続可能性を達成する能力を示した。 2021年、COP26は彼らの貢献を認め、年間170億米ドルを約束したが、これは地球生物多様性の80%を保護するため、利用可能な世界の生物多様性予算(年間1240億米ドルと年間143億米ドル)のごく一部である。本稿では,動物が自身の資金を所有する「種間貨幣」に基づく急進的な新しいソリューションを提案する。デジタル双生児を各種のために作ることで、動物は提供したサービスのために保護者に資金を分配することができる。例えば、サイは、生存状態が良好である限り、カメラトラップで検出されるたびに、その保護者に対して支払いを行うことができる。このアプローチの有効性をテストするため、南アフリカのリンポポ州のヴェルゲヴォンデンゲーム保護区の400km2のエリアに27台のカメラトラップが配備された。モーショントリガーで撮影されたカメラトラップは10ヶ月間動作し、ディープラーニングを使って12種の動物を撮影しました。各種について、その場しのぎの銀行口座を設置し、クレジットは {\pounds}100。動物がカメラで捕獲され、うまく分類された度に、1ペニー(種の実際の価値を決定するための任意の量-メカニズム)が動物アカウントから関連する保護者に転送された。 The biodiversity of our planet is under threat, with approximately one million species expected to become extinct within decades. The reason; negative human actions, which include hunting, overfishing, pollution, and the conversion of land for urbanisation and agricultural purposes. Despite significant investment from charities and governments for activities that benefit nature, global wildlife populations continue to decline. Local wildlife guardians have historically played a critical role in global conservation efforts and have shown their ability to achieve sustainability at various levels. In 2021, COP26 recognised their contributions and pledged US$1.7 billion per year; however, this is a fraction of the global biodiversity budget available (between US$124 billion and US$143 billion annually) given they protect 80% of the planets biodiversity. This paper proposes a radical new solution based on "Interspecies Money," where animals own their own money. Creating a digital twin for each species allows animals to dispense funds to their guardians for the services they provide. For example, a rhinoceros may release a payment to its guardian each time it is detected in a camera trap as long as it remains alive and well. To test the efficacy of this approach 27 camera traps were deployed over a 400km2 area in Welgevonden Game Reserve in Limpopo Province in South Africa. The motion-triggered camera traps were operational for ten months and, using deep learning, we managed to capture images of 12 distinct animal species. For each species, a makeshift bank account was set up and credited with {\pounds}100. Each time an animal was captured in a camera and successfully classified, 1 penny (an arbitrary amount - mechanisms still need to be developed to determine the real value of species) was transferred from the animal account to its associated guardian.	翻訳日:2023-04-26 21:01:21 公開日:2023-04-25
# 参加ゲーム The Participation Game ( http://arxiv.org/abs/2304.12700v1 ) ライセンス: Link先を確認	Mark Thomas Kennedy, Nelson Phillips	(参考訳) チューリングの有名な「模擬ゲーム」や、先進的な事前学習型トランスフォーマーの最近の進歩にインスパイアされた私たちは、AI進化における新たなフロンティアを指して、機械が社会構築プロセスに参加することを示します。参加ゲームは創造的で遊び心のある競争であり、人間が世界を理解し秩序づけるために使用するカテゴリを適用、曲げ、拡張することを要求する。ゲームを定義し、aiのテストとして模倣を超えた理由を与えると、参加ゲームと人間の知性を示す社会構築のプロセスとの類似性が強調される。次に社会の基本構成とガバナンスの選択肢について論じる。 Inspired by Turing's famous "imitation game" and recent advances in generative pre-trained transformers, we pose the participation game to point to a new frontier in AI evolution where machines will join with humans as participants in social construction processes. The participation game is a creative, playful competition that calls for applying, bending, and stretching the categories humans use to make sense of and order their worlds. After defining the game and giving reasons for moving beyond imitation as a test of AI, we highlight parallels between the participation game and processes of social construction, a hallmark of human intelligence. We then discuss implications for fundamental constructs of societies and options for governance.	翻訳日:2023-04-26 21:00:47 公開日:2023-04-25
# 漏洩波ホログラムによる軌道角運動量発生器の設計のためのディープラーニングフレームワーク Deep Learning Framework for the Design of Orbital Angular Momentum Generators Enabled by Leaky-wave Holograms ( http://arxiv.org/abs/2304.12695v1 ) ライセンス: Link先を確認	Naser Omrani, Fardin Ghorbani, Sina Beyraghi, Homayoon Oraizi, Hossein Soleimani	(参考訳) 本稿では,Flat Optics (FO) と機械学習 (ML) 技術を組み合わせて,OAMを駆動する電磁波を発生させる漏洩波ホログラフィックアンテナの設計手法を提案する。本システムの性能を向上させるために,機械学習を用いて放射線パターン全体を効果的に制御できる数学的関数,すなわち,放射線パターンの中心ヌル深さを増加させると同時にサイドローブレベル(sll)を低下させる。様々なシナリオにおいて最適な結果を得るためには,ホログラフィック理論に基づくインピーダンス方程式のパラメータの精密チューニングが必要である。本研究では,パラメータの近似値を決定するために機械学習を適用した。各パラメータの最適な値を決定でき、合計77,000個の生成されたデータセットを使用して、所望の放射線パターンが得られる。さらに、MLの使用は時間を節約するだけでなく、手動パラメータチューニングや従来の最適化手法よりも正確で正確な結果をもたらす。 In this paper, we present a novel approach for the design of leaky-wave holographic antennas that generates OAM-carrying electromagnetic waves by combining Flat Optics (FO) and machine learning (ML) techniques. To improve the performance of our system, we use a machine learning technique to discover a mathematical function that can effectively control the entire radiation pattern, i.e., decrease the side lobe level (SLL) while simultaneously increasing the central null depth of the radiation pattern. Precise tuning of the parameters of the impedance equation based on holographic theory is necessary to achieve optimal results in a variety of scenarios. In this research, we applied machine learning to determine the approximate values of the parameters. We can determine the optimal values for each parameter, resulting in the desired radiation pattern, using a total of 77,000 generated datasets. Furthermore, the use of ML not only saves time, but also yields more precise and accurate results than manual parameter tuning and conventional optimization methods.	翻訳日:2023-04-26 21:00:34 公開日:2023-04-25
# 高効率・長期依存性学習能力を有する平行スパイキングニューロン Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability ( http://arxiv.org/abs/2304.12760v1 ) ライセンス: Link先を確認	Wei Fang, Zhaofei Yu, Zhaokun Zhou, Yanqi Chen, Zhengyu Ma, Timoth\'ee Masquelier, Yonghong Tian	(参考訳) スパイキングニューラルネットワーク(SNN)のバニラスパイクニューロンは、チャージ・ファイア・リセット・ニューラルダイナミクスを使用し、シリアルでしかシミュレートできず、長期間の依存関係を学べない。リセットを取り除くと、ニューロンのダイナミクスは非イテレーティブな形で再構成され、並列化できる。一般定式化にリセットすることなくニューロンのダイナミクスを書き換えることにより,時間ステップ間の密接な接続を用いて時間情報の利用を最大化する並列スパイキングニューロン(psn)を提案する。低遅延推論における将来の入力の使用を避けるため、重みにマスクを追加し、マスク付きPSNを得る。時間ステップ間で重みを共有することにより、スライディングPSNは可変長のシーケンスを扱うことができる。シミュレーション速度と時間・静的データ分類におけるpsnファミリーの評価を行い,psnファミリーの効率と精度において圧倒的な優位性を示した。私たちの知る限りでは、これはスパイクニューロンの並列化に関する最初の研究であり、スパイク深層学習コミュニティの基盤となるでしょう。我々のコードは \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron} で公開されている。 Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated in serial and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics are reformulated in a non-iterative form and can be parallelized. By rewriting neuronal dynamics without resetting to a general formulation, we propose the Parallel Spiking Neuron (PSN), which uses dense connections between time-steps to maximize the utilization of temporal information. To avoid the use of future inputs for low-latency inference, we add masks on the weights and obtain the masked PSN. By sharing weights across time-steps, the sliding PSN is proposed with the ability to deal with sequences with variant lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To our best knowledge, this is the first research about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning community. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.	翻訳日:2023-04-26 20:54:57 公開日:2023-04-25
# Node機能拡張によるネットワークアライメントの仮想化 Node Feature Augmentation Vitaminizes Network Alignment ( http://arxiv.org/abs/2304.12751v1 ) ライセンス: Link先を確認	Jin-Duk Park, Cong Tran, Won-Yong Shin, Xin Cao	(参考訳) ネットワークアライメント(NA)は、与えられたネットワークのトポロジカルおよび/または特徴情報を用いて、複数のネットワークにまたがるノード対応を発見するタスクである。 naメソッドは無数のシナリオで目覚ましい成功を収めてきたが、プライバシの懸念やアクセス制限のために常に利用できるとは限らない、事前のアンカーリンクや/またはノード機能などの追加情報なしでは有効ではない。そこで本研究では,新しいna法であるgrad-align+を提案する。grad-align+は最先端のna法であるgrad-alignをベースとし,全てのノード対が見つかるまでノード対の一部のみを徐々に発見する。 Grad-Align+を設計する際には、NAタスクの実行という意味でノード機能を拡張する方法と、拡張ノード機能を最大限活用してNAメソッドを設計する方法を説明します。この目的を達成するために、3つの主要コンポーネントからなるGrad-Align+を開発します。 1)中心性に基づくノード特徴増強(CNFA) 2)グラフニューラルネットワーク(gnn)による拡張ノードの特徴と組込み類似度計算 3)アライメント・クロスネットワーク・ニアペア(ACN)の情報を用いた類似度計算による段階的NA。包括的実験を通して、Grad-Align+が示すことを実証する。 (a)ベンチマークNAメソッドよりも大きなマージンによる優位性。 (b)CNFAの有効性を確認するための実証的検証と理論的知見。 (c)各構成要素の影響 (d)ネットワークノイズに対する堅牢性、及び (e)計算効率。 Network alignment (NA) is the task of discovering node correspondences across multiple networks using topological and/or feature information of given networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their effectiveness is not without additional information such as prior anchor links and/or node features, which may not always be available due to privacy concerns or access restrictions. To tackle this practical challenge, we propose Grad-Align+, a novel NA method built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers only a part of node pairs until all node pairs are found. In designing Grad-Align+, we account for how to augment node features in the sense of performing the NA task and how to design our NA method by maximally exploiting the augmented node features. To achieve this goal, we develop Grad-Align+ consisting of three key components: 1) centrality-based node feature augmentation (CNFA), 2) graph neural network (GNN)-aided embedding similarity calculation alongside the augmented node features, and 3) gradual NA with similarity calculation using the information of aligned cross-network neighbor-pairs (ACNs). Through comprehensive experiments, we demonstrate that Grad-Align+ exhibits (a) the superiority over benchmark NA methods by a large margin, (b) empirical validations as well as our theoretical findings to see the effectiveness of CNFA, (c) the influence of each component, (d) the robustness to network noises, and (e) the computational efficiency.	翻訳日:2023-04-26 20:54:38 公開日:2023-04-25
# ブロックチェーンの大規模言語モデル Blockchain Large Language Models ( http://arxiv.org/abs/2304.12749v1 ) ライセンス: Link先を確認	Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais	(参考訳) 本稿では,異常なブロックチェーントランザクションを検出するための動的,リアルタイムなアプローチを提案する。提案するツールであるTXRANKは,ブロックチェーンアクティビティのトレース表現を生成して,大規模な言語モデルをスクラッチからトレーニングして,リアルタイム侵入検出システムとして動作させる。従来の方法とは異なり、txrankは制限のない検索空間を提供し、事前定義されたルールやパターンに依存しないように設計されている。本稿では,Ethereumトランザクションの異常検出ツールとしてTXRANKの有効性を示す。実験では,68万トランザクションのデータセット間の異常なトランザクションを効果的に識別し,バッチ処理のスループットは平均で2284トランザクションである。以上の結果から,TXRANKは,被害者契約と相互作用する最も異常なトランザクションのうち,124件中49件をランク付けし,異常なトランザクションを識別した。この研究は、トランスフォーマーアーキテクチャと互換性のあるカスタムデータエンコーディング、ドメイン固有のトークン化技術、Ethereum仮想マシン(EVM)トレース表現用に特別に開発されたツリーエンコーディングメソッドを導入することで、ブロックチェーントランザクション分析の分野に貢献する。 This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, TXRANK, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, TXRANK is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of TXRANK through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, TXRANK identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.	翻訳日:2023-04-26 20:54:17 公開日:2023-04-25
# 暗黙的カメラモデル学習による撮像過程の反転 Inverting the Imaging Process by Learning an Implicit Camera Model ( http://arxiv.org/abs/2304.12748v1 ) ライセンス: Link先を確認	Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Qing Wang	(参考訳) 暗黙の座標に基づくニューラルネットワークによる視覚的信号の表現は、従来の離散的信号表現の効果的な代替として、コンピュータビジョンやグラフィックスでかなりの人気を集めている。本稿では、シーンのみをモデル化する既存の暗黙的ニューラルネットワークとは対照的に、カメラの物理的撮像過程をディープニューラルネットワークとして表現する新しい暗黙的カメラモデルを提案する。新たな暗黙カメラモデルが2つの逆イメージングタスクに与える影響を実証する。一オールインフォーカス写真を生成して ii) hdrイメージング。具体的には、暗黙のぼかし発生器と暗黙のトーンマッパーを考案し、それぞれカメラの撮像プロセスの開口と露出をモデル化する。我々の暗黙カメラモデルは、マルチフォーカススタックとマルチ露光ブラケット監視の下で暗黙のシーンモデルと共同で学習する。我々は,多数のテスト画像とビデオに対して,新しいモデルの有効性を実証し,高精度で視覚に訴えるオールインフォーカスと高ダイナミックレンジの画像を生成する。原則として、新しい暗黙的ニューラルカメラモデルは、他の様々な逆画像処理の恩恵を受ける可能性がある。 Representing visual signals with implicit coordinate-based neural networks, as an effective replacement of the traditional discrete signal representation, has gained considerable popularity in computer vision and graphics. In contrast to existing implicit neural representations which focus on modelling the scene only, this paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network. We demonstrate the power of this new implicit camera model on two inverse imaging tasks: i) generating all-in-focus photos, and ii) HDR imaging. Specifically, we devise an implicit blur generator and an implicit tone mapper to model the aperture and exposure of the camera's imaging process, respectively. Our implicit camera model is jointly learned together with implicit scene models under multi-focus stack and multi-exposure bracket supervision. We have demonstrated the effectiveness of our new model on a large number of test images and videos, producing accurate and visually appealing all-in-focus and high dynamic range images. In principle, our new implicit neural camera model has the potential to benefit a wide array of other inverse imaging tasks.	翻訳日:2023-04-26 20:53:56 公開日:2023-04-25
# データベース管理システムのためのディープラーニングに基づくオートチューニング Deep learning based Auto Tuning for Database Management System ( http://arxiv.org/abs/2304.12747v1 ) ライセンス: Link先を確認	Karthick Prasad Gunasekaran, Kajal Tiwari, Rachana Acharya	(参考訳) データベースシステム構成の管理は、システムのあらゆる側面を制御する何百もの構成ノブがあるため、難しい作業である。これは、これらのノブが標準化、独立、あるいは普遍的でないという事実によって複雑であり、最適な設定を決定するのが困難である。 An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. 本手法は,本論文で提案する手法の拡張に関するものである。我々はgmmクラスタリングを用いて,ランダムフォレストなどのアンサンブルモデルとニューラルネットワークなどの非線形モデルを組み合わせた予測モデルを構築した。 The management of database system configurations is a challenging task, as there are hundreds of configuration knobs that control every aspect of the system. This is complicated by the fact that these knobs are not standardized, independent, or universal, making it difficult to determine optimal settings. An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. Our approach involves the expansion of the methods proposed in the original paper. We use GMM clustering to prune metrics and combine ensemble models, such as RandomForest, with non-linear models, like neural networks, for prediction modeling.	翻訳日:2023-04-26 20:53:38 公開日:2023-04-25
# 一般化ラミアンス場表現のための局所暗黙的光線関数 Local Implicit Ray Function for Generalizable Radiance Field Representation ( http://arxiv.org/abs/2304.12746v1 ) ライセンス: Link先を確認	Xin Huang, Qi Zhang, Ying Feng, Xiaoyu Li, Xuan Wang, Qing Wang	(参考訳) 本稿では、新しいビューレンダリングのための一般化可能なニューラルレンダリング手法であるLIRF(Local Implicit Ray Function)を提案する。現在一般化されているneural radiance fields(nerf)メソッドは、ピクセル毎に1光線でシーンをサンプリングし、入力ビューとレンダリングビューが異なる解像度でシーンコンテンツをキャプチャすると、ぼやけやエイリアスされたビューをレンダリングする。そこで本研究では,円錐フラストラムからの情報を集約して光線を構成するLIRFを提案する。円錐フラスタム内の3次元位置が与えられた場合、LIRFは3次元座標と円錐フラスタムの特徴を入力とし、局所体積放射場を予測する。座標は連続しているため、LIRFはボリュームレンダリングを通じて、高品質の新規ビューを継続的に評価する。さらに,トランスを用いた特徴マッチングによる各入力ビューの可視重量予測を行い,閉鎖領域の性能向上を図る。実世界のシーンにおける実験結果から,任意のスケールで見ないシーンの新規なビューレンダリングにおいて,この手法が最先端の手法よりも優れていることが確認された。 We propose LIRF (Local Implicit Ray Function), a generalizable neural rendering approach for novel view rendering. Current generalizable neural radiance fields (NeRF) methods sample a scene with a single ray per pixel and may therefore render blurred or aliased views when the input views and rendered views capture scene content with different resolutions. To solve this problem, we propose LIRF to aggregate the information from conical frustums to construct a ray. Given 3D positions within conical frustums, LIRF takes 3D coordinates and the features of conical frustums as inputs and predicts a local volumetric radiance field. Since the coordinates are continuous, LIRF renders high-quality novel views at a continuously-valued scale via volume rendering. Besides, we predict the visible weights for each input view via transformer-based feature matching to improve the performance in occluded areas. Experimental results on real-world scenes validate that our method outperforms state-of-the-art methods on novel view rendering of unseen scenes at arbitrary scales.	翻訳日:2023-04-26 20:53:23 公開日:2023-04-25
# サブギガヘルツ周波数における小型インダクタ・キャパシタ共振器 Compact inductor-capacitor resonators at sub-gigahertz frequencies ( http://arxiv.org/abs/2304.12744v1 ) ライセンス: Link先を確認	Qi-Ming Chen and Priyank Singh and Rostislav Duda and Giacomo Catto and Aarne Ker\"anen and Arman Alizadeh and Timm M\"orstedt and Aashish Sah and Andr\'as Gunyh\'o and Wei Liu and Mikko M\"ott\"onen	(参考訳) 小型インダクタ・キャパシタ(LC)共振器は、コプラナー導波路(CPW)共振器とは対照的に、単純なラムプ素子回路表現を持つが、通常は高精度なモデリングのための洗練された有限要素法(FEM)シミュレーションを必要とする。本稿では、回路形状から直接電気的特性を満足な精度で得られるコプラナーLC共振器の簡単な解析モデルを提案する。 Q_{\rm i}\gtrsim 2\times 10^{5}$(約300,{\rm MHz}$から1,{\rm GHz}$)の高内部品質共振器(Q_{\rm i}\gtrsim 2\times 10^{5}$)に対する実験結果は、導出した解析モデルと詳細なFEMシミュレーションの両方に優れた整合性を示す。これらの結果から、共鳴周波数の偏差が2\%未満のサブギガヘルツ共振器の設計が可能となり、例えば超感度低温検出器の実装にすぐに応用できることを示した。達成された2乗ミリオーダーのコンパクト共振器サイズは、数百個のマイクロ波共振器を単一のチップに統合してフォトニック格子を実現するための実現可能な方法を示している。 Compact inductor-capacitor (LC) resonators, in contrast to coplanar waveguide (CPW) resonators, have a simple lumped-element circuit representation but usually call for sophisticated finite-element method (FEM) simulations for an accurate modelling. Here we present an simple analytical model for a family of coplanar LC resonators where the electrical properties are directly obtained from the circuit geometry with a satisfying accuracy. Our experimental results on $10$ high-internal-quality-factor resonators ($Q_{\rm i}\gtrsim 2\times 10^{5}$), with frequency ranging roughly from $300\,{\rm MHz}$ to $1\,{\rm GHz}$, show an excellent consistency with both the derived analytical model and detailed FEM simulations. These results showcase the ability to design sub-gigahertz resonators with less than $2\%$ deviation in the resonance frequency, which has immediate applications, for example, in the implementation of ultrasensitive cryogenic detectors. The achieved compact resonator size of the order of a square millimeter indicates a feasible way to integrate hundreds of microwave resonators on a single chip for realizing photonic lattices.	翻訳日:2023-04-26 20:53:05 公開日:2023-04-25
# 二次元系における位相相転移の偏光ジャンプ Polarization Jumps across Topological Phase Transitions in Two-dimensional Systems ( http://arxiv.org/abs/2304.12742v1 ) ライセンス: Link先を確認	Hiroki Yoshida, Tiantian Zhang, Shuichi Murakami	(参考訳) チャーン数や$\mathbb{z}_2$位相不変量のような位相不変量の変化を伴う位相相転移では、ギャップは閉まり、電気分極は遷移時に定義されない。本稿では,2次元の位相相転移における偏極の跳躍が,中間ワイル半金属相におけるワイル点の位置と単極電荷によって説明されることを示す。偏極の跳躍は、チャーン数の値を変えることなく、$\mathbb{z}_2$位相相転移および相転移においてワイル双極子によって記述される。一方、チャーン数が相転移で変化するとき、ジャンプは相互空間の基準点から測定されたワイル点の相対的な位置で表される。 In topological phase transitions involving a change in topological invariants such as the Chern number and the $\mathbb{Z}_2$ topological invariant, the gap closes, and the electric polarization becomes undefined at the transition. In this paper, we show that the jump of polarization across such topological phase transitions in two dimensions is described in terms of positions and monopole charges of Weyl points in the intermediate Weyl semimetal phase. We find that the jump of polarization is described by the Weyl dipole at $\mathbb{Z}_2$ topological phase transitions and at phase transitions without any change in the value of the Chern number. Meanwhile, when the Chern number changes at the phase transition, the jump is expressed in terms of the relative positions of Weyl points measured from a reference point in the reciprocal space.	翻訳日:2023-04-26 20:52:36 公開日:2023-04-25
# ICU外傷患者の早期発作発症予測のためのNPRL:夜間プロファイル表現学習 NPRL: Nightly Profile Representation Learning for Early Sepsis Onset Prediction in ICU Trauma Patients ( http://arxiv.org/abs/2304.12737v1 ) ライセンス: Link先を確認	Tucker Stewart, Katherine Stern, Grant O'Keefe, Ankur Teredesai, Juhua Hu	(参考訳) セプシス(Sepsis)は、感染の有無に応じて発症する症候群である。重篤な臓器機能障害を特徴とし、世界中の集中治療室(ICU)で死因の1つとなっている。これらの合併症は抗生物質の早期投与によって軽減できるため、敗血症の発症を早期に予測する能力は患者の生存と幸福に不可欠である。医療インフラ内に展開されている現在の機械学習アルゴリズムは、パフォーマンスが悪く、早期の敗血症を予測できない。近年では、深層学習の手法がセプシスを予測するために提案されているが、発症時期(例えば、患者の全訪問をセプシスの発症と分類するなど)を把握できないものや、医療施設(例えば、発症時期をアプリオリと呼ぶ必要があるような固定時間を用いてトレーニングインスタンスを作成するなど)に展開することができないものもある。そこで本研究では,夜間に収集したデータを用いて,毎朝24時間以内に敗血症が発症するか否かを予測する新しい現実的な予測フレームワークを提案する。しかし, 予測率を日次に引き上げるにつれ, 負のインスタンス数が増加する一方, 正のインスタンスのインスタンス数は同じである。その後,重度のクラス不均衡が問題となり,稀な敗血症症例の把握が困難となった。この問題に対処するため,各患者に対して夜間プロファイル表現学習(NPRL)を提案する。 nprlが理論的にレアイベント問題を緩和できることを証明します。レベル1トラウマセンターのデータを用いた実証研究により,提案手法の有効性がさらに示された。 Sepsis is a syndrome that develops in response to the presence of infection. It is characterized by severe organ dysfunction and is one of the leading causes of mortality in Intensive Care Units (ICUs) worldwide. These complications can be reduced through early application of antibiotics, hence the ability to anticipate the onset of sepsis early is crucial to the survival and well-being of patients. Current machine learning algorithms deployed inside medical infrastructures have demonstrated poor performance and are insufficient for anticipating sepsis onset early. In recent years, deep learning methodologies have been proposed to predict sepsis, but some fail to capture the time of onset (e.g., classifying patients' entire visits as developing sepsis or not) and others are unrealistic to be deployed into medical facilities (e.g., creating training instances using a fixed time to onset where the time of onset needs to be known apriori). Therefore, in this paper, we first propose a novel but realistic prediction framework that predicts each morning whether sepsis onset will occur within the next 24 hours using data collected at night, when patient-provider ratios are higher due to cross-coverage resulting in limited observation to each patient. However, as we increase the prediction rate into daily, the number of negative instances will increase while that of positive ones remain the same. Thereafter, we have a severe class imbalance problem, making a machine learning model hard to capture rare sepsis cases. To address this problem, we propose to do nightly profile representation learning (NPRL) for each patient. We prove that NPRL can theoretically alleviate the rare event problem. Our empirical study using data from a level-1 trauma center further demonstrates the effectiveness of our proposal.	翻訳日:2023-04-26 20:52:20 公開日:2023-04-25
# CitePrompt: 科学論文の引用内容の特定にPromptsを使う CitePrompt: Using Prompts to Identify Citation Intent in Scientific Papers ( http://arxiv.org/abs/2304.12730v1 ) ライセンス: Link先を確認	Avishek Lahiri, Debarshi Kumar Sanyal, Imon Mukherjee	(参考訳) 科学論文の引用は、知的系統の追跡に役立つだけでなく、作品の科学的意義を示す有用な指標でもある。引用意図は、与えられた文脈における引用の役割を特定することで有益である。本稿では,引用意図分類のためのプロンプトベース学習のhherto unexploredアプローチを用いたフレームワークであるcitepromptを提案する。我々は、事前学習された言語モデル、プロンプトテンプレート、およびプロンプト言語化の適切な選択により、最先端の手法で得られたものよりも優れた結果を得るだけでなく、科学的文書に関する外部情報よりも少ない結果を得ることができると主張している。 ACL-ARCデータセットの最先端結果を報告するとともに、SciCiteデータセットは1つを除くすべてのベースラインモデルに対して大幅に改善されている。引用意図分類のための大きなラベル付きデータセットを見つけるのは非常に難しいため、まず、このタスクを少数ショットおよびゼロショット設定に変換することを提案する。 ACL-ARCデータセットでは、ゼロショット設定で53.86%のF1スコアを報告し、5ショット設定と10ショット設定でそれぞれ63.61%と66.99%に改善した。 Citations in scientific papers not only help us trace the intellectual lineage but also are a useful indicator of the scientific significance of the work. Citation intents prove beneficial as they specify the role of the citation in a given context. In this paper, we present CitePrompt, a framework which uses the hitherto unexplored approach of prompt-based learning for citation intent classification. We argue that with the proper choice of the pretrained language model, the prompt template, and the prompt verbalizer, we can not only get results that are better than or comparable to those obtained with the state-of-the-art methods but also do it with much less exterior information about the scientific document. We report state-of-the-art results on the ACL-ARC dataset, and also show significant improvement on the SciCite dataset over all baseline models except one. As suitably large labelled datasets for citation intent classification can be quite hard to find, in a first, we propose the conversion of this task to the few-shot and zero-shot settings. For the ACL-ARC dataset, we report a 53.86% F1 score for the zero-shot setting, which improves to 63.61% and 66.99% for the 5-shot and 10-shot settings, respectively.	翻訳日:2023-04-26 20:51:53 公開日:2023-04-25
# ガスおよび超臨界キセノンの2光子励起と吸収分光 Two-photon excitation and absorption spectroscopy of gaseous and supercritical xenon ( http://arxiv.org/abs/2304.12803v1 ) ライセンス: Link先を確認	Thilo vom H\"ovel (1), Franz Huybrechts (1), Eric Boltersdorf (1), Christian Wahl (1), Frank Vewinger (1), Martin Weitz (1) ((1) Institut f\"ur Angewandte Physik, Universit\"at Bonn)	(参考訳) 高圧条件下での気体の分光は、プラズマ物理学や天体物理学などの様々な分野に関心がある。近年,光子ボース・アインシュタイン凝縮体の波長範囲を真空紫外へ拡張するために,高気圧の希ガス環境を熱化媒体として利用することも提案されている。本研究では,5p^6$電子基底状態から5p^56p$および5p^56p^\prime$励起状態状態状態への遷移を推定し,95 \; \text{bar}$の圧力に対するガス状および超臨界キセノンの2光子分光の実験結果について報告する。将来的な真空紫外光子凝縮のポンプ方式の探求を目指して,これらの高密度キセノン試料の2光子励起スペクトルの縮退を観測した。さらに, キセノンの第2エキシマ連続体における放射の再吸収が, ストークスシフトの影響を受け, 補助光場の照射によって促進されるかどうかを検討した。この目的のために吸収測定が行われ、5p^6 \rightarrow 5p^56p$ 2-photon遷移を非退化させる。 Spectroscopy of gases under high-pressure conditions is of interest in various fields such as plasma physics or astrophysics. Recently, it has also been proposed to utilize a high-pressure noble gas environment as a thermalization medium to extend the wavelength range of photon Bose-Einstein condensates to the vacuum-ultraviolet, from the presently accessible visible and near-infrared spectral regimes. In this work, we report on experimental results of two-photon spectroscopy of gaseous and supercritical xenon for pressures as high as $95 \; \text{bar}$, probing the transitions from the $5p^6$ electronic ground state to the $5p^56p$ and $5p^56p^\prime$ excited state configurations. Aiming at the exploration of possible pumping schemes for future vacuum-ultraviolet photon condensates, we have recorded degenerate two-photon excitation spectra of such dense xenon samples. In further measurements, we have investigated whether irradiation of an auxiliary light field can enhance the reabsorption of the emission on the second excimer continuum of xenon, which is subject to a large Stokes shift. To this end, absorption measurements have been conducted, driving the $5p^6 \rightarrow 5p^56p$ two-photon transitions non-degenerately.	翻訳日:2023-04-26 20:45:15 公開日:2023-04-25
# 拡張クラスタ: ニューラルネットワークの正確なパラメータ回復 Expand-and-Cluster: Exact Parameter Recovery of Neural Networks ( http://arxiv.org/abs/2304.12794v1 ) ライセンス: Link先を確認	Flavio Martinelli, Berfin Simsek, Johanni Brea and Wulfram Gerstner	(参考訳) インプット・アウトプット・マッピングを用いて,ニューラルネットワーク(ANN)の隠れパラメータを復元できるか? 本稿では,全ネットワークパラメータを識別するために,隠れレイヤの数と探索されたANNのアクティベーション関数だけを必要とする,'Expand-and-Cluster'と呼ばれる方式を提案する。拡張フェーズでは,教師としてANNの探索データを用いて,学生ネットワークの規模を拡大する一連のネットワークを訓練する。拡張は、特定のサイズの学生ネットワークにおいて最小限の損失が一貫して到達した場合に停止する。クラスタリングフェーズでは、拡張した学生の重みベクトルがクラスター化され、超流動ニューロンを原理的に構造的プルーニングすることができる。因子4の過度パラメータ化は、最小数のニューロンを確実に同定し、元のネットワークパラメータを、可変困難な150の玩具問題のファミリーで80\%のタスクで検索するのに十分である。さらに、MNISTデータに基づいてトレーニングされた教師ネットワークは、ニューロン番号の5\%以下のオーバーヘッドで識別することができる。したがって、教師と同一の大きさの学生ネットワークの直接訓練は、非凸損失関数のため事実上不可能であるが、軽度のオーバーパラメータ化とクラスタリングと構造化プルーニングによるトレーニングは、ターゲットネットワークを正しく識別する。 Can we recover the hidden parameters of an Artificial Neural Network (ANN) by probing its input-output mapping? We propose a systematic method, called `Expand-and-Cluster' that needs only the number of hidden layers and the activation function of the probed ANN to identify all network parameters. In the expansion phase, we train a series of student networks of increasing size using the probed data of the ANN as a teacher. Expansion stops when a minimal loss is consistently reached in student networks of a given size. In the clustering phase, weight vectors of the expanded students are clustered, which allows structured pruning of superfluous neurons in a principled way. We find that an overparameterization of a factor four is sufficient to reliably identify the minimal number of neurons and to retrieve the original network parameters in $80\%$ of tasks across a family of 150 toy problems of variable difficulty. Furthermore, a teacher network trained on MNIST data can be identified with less than $5\%$ overhead in the neuron number. Thus, while direct training of a student network with a size identical to that of the teacher is practically impossible because of the non-convex loss function, training with mild overparameterization followed by clustering and structured pruning correctly identifies the target network.	翻訳日:2023-04-26 20:44:50 公開日:2023-04-25
# 超音波イメージングのためのクラッタフィルタとしての特異値分解法について On the Use of Singular Value Decomposition as a Clutter Filter for Ultrasound Flow Imaging ( http://arxiv.org/abs/2304.12783v1 ) ライセンス: Link先を確認	Kai Riemer, Marcelo Lerendegui, Matthieu Toulemonde, Jiaqi Zhu, Christopher Dunsby, Peter D. Weinberg, Meng-Xing Tang	(参考訳) Singular Value Decomposition (SVD) に基づくフィルタリングは, 高フレームレート超音波流画像におけるクラッタ, 流れ, ノイズをかなり分離する。 SVDをクラッタフィルタとして用いることで、ベクトルフローイメージング、機能超音波、超高分解能超音波ローカライゼーション顕微鏡などの技術が大幅に改善された。クラッタとノイズの除去は、組織、流れ、ノイズがそれぞれ特異値の異なる部分集合で表されるという仮定に依存し、それらの信号は非相関であり直交部分空間に置かれる。この仮定は、近壁や微小血管の流れといった組織の動きの存在に失敗し、特異値閾値の誤った選択の影響を受けうる。したがって、フロー、クラッタ、ノイズの分離は不完全であり、元のデータに存在しない画像アーティファクトにつながる可能性がある。強度の時間的および空間的変動は最も一般的なアーティファクトであり、外観や強度によって異なる。フロー信号がばらばらに分布する微小血管では、ゴーストとスプリットアーティファクトが観察される。特異値閾値選択, 組織運動, フレーム速度, 流れ信号振幅, 取得長は, これらの人工物の有病率に影響を及ぼす。 SVDクラッタやノイズ除去による人工物の原因を理解することは,その解釈に必要である。 Filtering based on Singular Value Decomposition (SVD) provides substantial separation of clutter, flow and noise in high frame rate ultrasound flow imaging. The use of SVD as a clutter filter has greatly improved techniques such as vector flow imaging, functional ultrasound and super-resolution ultrasound localization microscopy. The removal of clutter and noise relies on the assumption that tissue, flow and noise are each represented by different subsets of singular values, so that their signals are uncorrelated and lay on orthogonal sub-spaces. This assumption fails in the presence of tissue motion, for near-wall or microvascular flow, and can be influenced by an incorrect choice of singular value thresholds. Consequently, separation of flow, clutter and noise is imperfect, which can lead to image artefacts not present in the original data. Temporal and spatial fluctuation in intensity are the commonest artefacts, which vary in appearance and strengths. Ghosting and splitting artefacts are observed in the microvasculature where the flow signal is sparsely distributed. Singular value threshold selection, tissue motion, frame rate, flow signal amplitude and acquisition length affect the prevalence of these artefacts. Understanding what causes artefacts due to SVD clutter and noise removal is necessary for their interpretation.	翻訳日:2023-04-26 20:44:28 公開日:2023-04-25
# 分散強化学習における学習力向上のための損失と後退 Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning ( http://arxiv.org/abs/2304.12778v1 ) ライセンス: Link先を確認	Martin Holen, Per-Arne Andersen, Kristian Muri Knausg{\aa}rd, Morten Goodwin	(参考訳) 本稿では,Reinforcement Learning (RL)環境における分散エージェントの学習手法として,Reward-Weighted (R-Weighted) とLos-Weighted (L-Weighted) の2つを紹介する。 R/L重み付け法は、勾配の和や平均化など、複数のエージェントを訓練するための標準的な慣行を置き換える。我々の手法のコアは、報酬(R-Weighted)や損失(L-Weighted)が他のアクターと比較してどれだけ高いかに基づいて、各アクターの勾配をスケールすることである。トレーニング中、各エージェントは同じ環境の異なる初期化バージョンで動作し、異なるアクターとは異なる勾配を与える。本質的に、各エージェントのr重みとl重みは、他のエージェントにその可能性を知らせ、学習のためにどの環境を優先すべきかを再び報告する。分散学習のこのアプローチは、報酬や損失の少ない環境の方が、報酬や損失の少ない環境よりも重要な情報を持っているため可能である。 R-Weighted法は複数のRL環境において最先端の手法よりも優れていることを実証的に実証した。 This paper introduces two learning schemes for distributed agents in Reinforcement Learning (RL) environments, namely Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. The R/L weighted methods replace standard practices for training multiple agents, such as summing or averaging the gradients. The core of our methods is to scale the gradient of each actor based on how high the reward (for R-Weighted) or the loss (for L-Weighted) is compared to the other actors. During training, each agent operates in differently initialized versions of the same environment, which gives different gradients from different actors. In essence, the R-Weights and L-Weights of each agent inform the other agents of its potential, which again reports which environment should be prioritized for learning. This approach of distributed learning is possible because environments that yield higher rewards, or low losses, have more critical information than environments that yield lower rewards or higher losses. We empirically demonstrate that the R-Weighted methods work superior to the state-of-the-art in multiple RL environments.	翻訳日:2023-04-26 20:44:08 公開日:2023-04-25
# クラス注意伝達に基づく知識蒸留 Class Attention Transfer Based Knowledge Distillation ( http://arxiv.org/abs/2304.12777v1 ) ライセンス: Link先を確認	Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin	(参考訳) 従来の知識蒸留法は, モデル圧縮作業において, 優れた性能を示してきたが, 学生ネットワークの性能向上にどのように役立つかを説明することは困難である。本研究では,高い解釈性と競争性能を有する知識蒸留法を提案する。まず、主流CNNモデルの構造を再検討し、クラス識別領域を識別する能力を持つことがCNNにとって重要であることを明らかにした。さらに,クラスアクティベーションマップの転送により,この能力の獲得と向上が可能であることを示す。そこで本研究では,cat-kd(class attention transfer based knowledge distillation)を提案する。従来のKD法とは違って,CAT-KDの解釈性の向上だけでなく,CNNの理解の向上にも寄与する知識のいくつかの特性を探索し,提示する。高い解釈性を持つ一方で、CAT-KDは複数のベンチマークで最先端のパフォーマンスを達成する。コードはhttps://github.com/gzyaftermath/cat-kd。 Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.	翻訳日:2023-04-26 20:43:46 公開日:2023-04-25
# 状態空間が不十分:機械翻訳に注意が必要 State Spaces Aren't Enough: Machine Translation Needs Attention ( http://arxiv.org/abs/2304.12776v1 ) ライセンス: Link先を確認	Ali Vardasbi, Telmo Pessoa Pires, Robin M. Schmidt, Stephan Peitz	(参考訳) 構造化状態空間 (Structured State Spaces for Sequences, S4) は、視覚、言語モデリング、オーディオなどの様々なタスクで成功したシーケンスモデルである。数学的定式化のおかげで、入力を1つの隠れた状態に圧縮し、注意のメカニズムを必要とせずに、長距離の依存関係をキャプチャできる。本研究では,S4を機械翻訳(MT)に適用し,WMT'14とWMT'16のエンコーダ・デコーダの変種を評価する。言語モデリングの成功とは対照的に、S4 は Transformer の約4 BLEU ポイントで遅れており、長文に反故意に苦労している。最後に、このギャップは、s4が完全なソース文を単一の隠れ状態において要約できないことによるものであり、注意機構を導入することでギャップを閉じることができることを示す。 Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT'14 and WMT'16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4's inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.	翻訳日:2023-04-26 20:43:29 公開日:2023-04-25
# デコーダネットワーク上の逆リプシッツ制約による後部崩壊の制御 Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network ( http://arxiv.org/abs/2304.12770v1 ) ライセンス: Link先を確認	Yuri Kinoshita, Kenta Oono, Kenji Fukumizu, Yuichi Yoshida, Shin-ichi Maeda	(参考訳) 変分オートエンコーダ(VAE)は、過去数十年で大きな成功を収めてきた深層生成モデルの1つである。しかし、実際には、エンコーダが一致したり、あるいは崩壊した場合に発生する後方崩壊と呼ばれる問題に苦しんでおり、前者は入力データの潜在構造からの情報を取得していない。本研究では,デコーダに逆リプシッツニューラルネットワークを導入し,このアーキテクチャに基づいて,具体的な理論的保証を備えた多種多様なVAEモデルに対する後方崩壊の度合いを,単純かつ明確な方法で制御できる新しい手法を提案する。また,いくつかの数値実験により,本手法の有効性を示す。 Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.	翻訳日:2023-04-26 20:43:12 公開日:2023-04-25
# ゼロサム行列ゲームにおける学習の1次クエリ複雑度(近似)ナッシュ平衡のキャラクタリゼーション Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games ( http://arxiv.org/abs/2304.12768v1 ) ライセンス: Link先を確認	H\'edi Hadiji, Sarah Sachs (UvA), Tim van Erven (UvA), Wouter M. Koolen (CWI)	(参考訳) 0-sum $K\times K$Matrixゲームに対する1次クエリモデルでは、プレイヤーは、相手がプレイするランダム化アクションの下で、可能なすべてのアクションに対する期待された支払いを観測する。これは古典的なモデルであり、rakhlinとsridharanによって、$\epsilon$-approximate nash equilibriaが$o(\ln k / \epsilon)$の代わりに$o(\ln k / \epsilon^2)$クエリから効率的に計算できることが発見された後、新たな関心を集めている。まず、厳密な平衡値(\epsilon=0$)を学習するクエリの複雑さを、線形の$k$という多くのクエリを必要とすることを示すことによって、完全に特徴付けします。第二に、$\epsilon > 0$ の場合、電流の複雑さの上界は$O(\min(\ln(K) / \epsilon , K))$である。そのような行列は単一の問合せで完全同定できるので、既知の可算集合において入出力値を持つハード行列を構築することによって、ノローバウンドが導出可能であることを証明できる。これにより、例えば、ハイパーキューブ上のtoa部分モジュラー最適化問題をバイナリ行列としてエンコードすることで減らすことができる。次に、下界に対する新しい手法を導入し、$\tilde\Omega(\log(1 / (K\epsilon))$を任意の$\epsilon \leq1 / cK^4$に対して、$c$は$K$の一定の独立性を持つ。我々は,このギャップを上界で縮めるために,我々の技術を改善するための今後の方向性をさらに明らかにする。 In the first-order query model for zero-sum $K\times K$ matrix games, playersobserve the expected pay-offs for all their possible actions under therandomized action played by their opponent. This is a classical model,which has received renewed interest after the discoveryby Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computedefficiently from $O(\ln K / \epsilon) $ instead of $O( \ln K / \epsilon^2)$ queries.Surprisingly, the optimal number of such queries, as a function of both$\epsilon$ and $K$, is not known.We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linearin $K$, which means that it is essentially as hard as querying the wholematrix, which can also be done with $K$ queries. Second, for $\epsilon > 0$, the currentquery complexity upper bound stands at $O(\min(\ln(K) / \epsilon , K))$. We argue that, unfortunately, obtaining matchinglower bound is not possible with existing techniques: we prove that nolower bound can be derived by constructing hard matrices whose entriestake values in a known countable set, because such matrices can be fullyidentified by a single query. This rules out, for instance, reducing toa submodular optimization problem over the hypercube by encoding itas a binary matrix. We then introduce a new technique for lower bounds,which allows us to obtain lower bounds of order$\tilde\Omega(\log(1 / (K\epsilon)))$ for any $\epsilon \leq1 / cK^4$, where $c$ is a constant independent of $K$. We furtherdiscuss possible future directions to improve on our techniques in orderto close the gap with the upper bounds.	翻訳日:2023-04-26 20:43:00 公開日:2023-04-25
# 損失関数からの量子表現の分離 Decoupling Quantile Representations from Loss Functions ( http://arxiv.org/abs/2304.12766v1 ) ライセンス: Link先を確認	Aditya Challa, Snehanshu Saha, Soma Dhavala	(参考訳) 同時量子化回帰(sqr)手法はディープラーニングモデルの不確かさを推定するために用いられてきたが、その応用は中央量子化の解 ({\tau} = 0.5) が平均絶対誤差 (mae) を最小化しなければならないという要件によって制限されている。本稿では、この制限を、同時二項量子化回帰(SBQR)の場合に、量子化と推定確率の双対性を示すことによって解決する。これにより、損失関数から量子化表現の構成を分離し、中央の量子化関数に任意の分類器 f(x) を割り当て、異なる {\tau} の値でSBQR量子化表現の全スペクトルを生成することができる。アプローチを2つのアプリケーションで検証します。 (i)分布外サンプルを検出し、量子表現が標準確率出力を上回ることを示す。 (II) 歪みに対する量子表現のロバスト性を示すモデルを校正する。結論として,これらの結果から生じるいくつかの仮説を考察した。 The simultaneous quantile regression (SQR) technique has been used to estimate uncertainties for deep learning models, but its application is limited by the requirement that the solution at the median quantile ({\tau} = 0.5) must minimize the mean absolute error (MAE). In this article, we address this limitation by demonstrating a duality between quantiles and estimated probabilities in the case of simultaneous binary quantile regression (SBQR). This allows us to decouple the construction of quantile representations from the loss function, enabling us to assign an arbitrary classifier f(x) at the median quantile and generate the full spectrum of SBQR quantile representations at different {\tau} values. We validate our approach through two applications: (i) detecting out-of-distribution samples, where we show that quantile representations outperform standard probability outputs, and (ii) calibrating models, where we demonstrate the robustness of quantile representations to distortions. We conclude with a discussion of several hypotheses arising from these findings.	翻訳日:2023-04-26 20:42:17 公開日:2023-04-25
# 摂動一貫性学習によるテスト時間適応 Test-Time Adaptation with Perturbation Consistency Learning ( http://arxiv.org/abs/2304.12764v1 ) ライセンス: Link先を確認	Yi Su, Yixin Ji, Juntao Li, Hai Ye, Min Zhang	(参考訳) 現在、事前学習された言語モデル(plm)は、分散シフト問題にうまく対応できず、実際のテストシナリオで失敗するトレーニングセットでトレーニングされたモデルとなる。この問題に対処するため、テスト時間適応(TTA)は、テスト時にテストデータに適合するようにモデルパラメータを更新する大きな可能性を示す。既存のTTA手法は、よく設計された補助的タスクや擬似ラベルに基づく自己学習戦略に依存している。しかし,これらの手法は性能向上と計算コストに関して良好なトレードオフを達成できない。このようなジレンマに関するいくつかの知見を得るために、我々は2つの代表的TTA手法、すなわち、テントとオイルを探索し、安定した予測が良いバランスを達成するための鍵であることを確かめる。そこで本研究では, 分散シフトを伴うサンプルに対して安定な予測を行うために, 簡易なテスト時間適応法である摂動整合学習(PCL)を提案する。逆方向の強靭性および言語間移動に関する広範囲な実験により,本手法は強いPLMバックボーンと従来の最先端TTA法よりも推論時間が少なく,高い,あるいは同等の性能を達成できることが証明された。 Currently, pre-trained language models (PLMs) do not cope well with the distribution shift problem, resulting in models trained on the training set failing in real test scenarios. To address this problem, the test-time adaptation (TTA) shows great potential, which updates model parameters to suit the test data at the testing time. Existing TTA methods rely on well-designed auxiliary tasks or self-training strategies based on pseudo-label. However, these methods do not achieve good trade-offs regarding performance gains and computational costs. To obtain some insights into such a dilemma, we take two representative TTA methods, i.e., Tent and OIL, for exploration and find that stable prediction is the key to achieving a good balance. Accordingly, in this paper, we propose perturbation consistency learning (PCL), a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts. Extensive experiments on adversarial robustness and cross-lingual transferring demonstrate that our method can achieve higher or comparable performance with less inference time over strong PLM backbones and previous state-of-the-art TTA methods.	翻訳日:2023-04-26 20:41:57 公開日:2023-04-25
# 自然言語処理のための市民科学プロジェクトから学んだこと Lessons Learned from a Citizen Science Project for Natural Language Processing ( http://arxiv.org/abs/2304.12836v1 ) ライセンス: Link先を確認	Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, G\"ozde G\"ul \c{S}ahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho, Iryna Gurevych	(参考訳) 多くの自然言語処理(nlp)システムは、訓練と評価に注釈付きコーパスを使用する。しかし、ラベル付きデータはしばしば入手するのにコストがかかり、アノテーションプロジェクトのスケーリングは難しいため、アノテーションタスクは有料のクラウドワーカーにアウトソースされることが多い。市民科学はクラウドソーシングの代替であり、NLPの文脈では比較的研究されていない。この環境で市民科学がどの程度有効かを調べるため、既存のクラウドソースデータセットの一部を注釈付けすることで、NLPの市民科学における様々なボランティアグループへの参加を探索研究する。この結果から,高品質なアノテーションが得られ,モチベーションの高いボランティアを惹きつけるだけでなく,スケーラビリティや時間的関与,法的・倫理的問題といった要因も考慮する必要があることがわかった。ガイドラインの形で学んだ教訓を要約し、市民科学の今後の取り組みを支援するコードとデータを提供します。 Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.	翻訳日:2023-04-26 20:35:46 公開日:2023-04-25
# 機械学習のための確実性の新しい情報理論 A New Information Theory of Certainty for Machine Learning ( http://arxiv.org/abs/2304.12833v1 ) ライセンス: Link先を確認	Arthur Jun Zhang	(参考訳) クロード・シャノンは、通信符号化理論におけるランダム分布の不確かさを定量化するためにエントロピーを考案した。エントロピーの不確実性は、数学的モデリングにおける直接的使用を制限する。そこで我々は,エントロピーの正準双対として,基礎となる分布の確実性を定量化する新しい概念トロエンピーを提案する。機械学習の応用例を2つ紹介する。まず, 古典的文書分類について, 文書クラスラベルを活用すべく, トレンピーに基づく重み付けスキームを開発した。 2つ目は、シーケンシャルデータに対する自己トロエンピー重み付け方式で、ニューラルネットワークベースの言語モデルに簡単に組み込めることを示し、劇的なパープレキシティ低減を実現する。また、量子トレンピーをフォン・ノイマンエントロピーの双対として定義し、量子系の確実性を定量化する。 Claude Shannon coined entropy to quantify the uncertainty of a random distribution for communication coding theory. We observe that the uncertainty nature of entropy also limits its direct usage in mathematical modeling. Therefore we propose a new concept troenpy,as the canonical dual of entropy, to quantify the certainty of the underlying distribution. We demonstrate two applications in machine learning. The first is for the classical document classification, we develop a troenpy based weighting scheme to leverage the document class label. The second is a self-troenpy weighting scheme for sequential data and show that it can be easily included in neural network based language models and achieve dramatic perplexity reduction. We also define quantum troenpy as the dual of the Von Neumann entropy to quantify the certainty of quantum systems.	翻訳日:2023-04-26 20:35:32 公開日:2023-04-25
# 深部量子化ニューラルネットワークによる敵攻撃に対するロバスト性の改善 Improving Robustness Against Adversarial Attacks with Deeply Quantized Neural Networks ( http://arxiv.org/abs/2304.12829v1 ) ライセンス: Link先を確認	Ferheen Ayaz, Idris Zakariyya, Jos\'e Cano, Sye Loong Keoh, Jeremy Singer, Danilo Pau, Mounia Kharbouche-Harrari	(参考訳) 機械学習(ML)モデル、特にディープニューラルネットワーク(DNN)のメモリフットプリントを削減することは、リソースに制約のある小さなデバイスへのデプロイメントを可能にする上で不可欠である。しかし、DNNモデルの欠点は、敵攻撃に対する脆弱性であり、入力にわずかな摂動を加えることで騙される可能性がある。そのため、リソース制約のある組み込みデバイスにデプロイ可能な正確で堅牢で小さなDNNモデルをどうやって作るかが課題である。本稿では,学習ループに深い量子化損失を考慮に入れたQKeras(QKeras)という,自動量子化対応トレーニングフレームワークでトレーニングした,敵のブラックボックス攻撃に対して堅牢な,小さなDNNモデルの開発結果について報告する。そこで我々は,QKeras とヤコビアン正則化 (JR) が,DNN トポロジとJR 層ごとのアプローチを利用して,頑健で微妙に量子化された DNN モデルを生成することで,共最適化戦略を実現する方法について検討した。その結果、この共最適化戦略を実装した新しいdnnモデルが、画像とオーディオの入力の両方を含む3つのデータセット上で考案、開発、テストされ、そのパフォーマンスは、ホワイトボックスとブラックボックスのさまざまな攻撃に対する既存のベンチマークと比較された。実験結果から,提案したDNNモデルの平均精度は,CIFAR-10画像データセットとGoogle Speech Commands音声データセットのサブセットに対して,MLCommons/Tinyベンチマークのホワイトボックス,ブラックボックス攻撃の有無で8.3%,79.5%高かった。 SVHNイメージデータセットのブラックボックス攻撃でも6.5%精度が向上した。 Reducing the memory footprint of Machine Learning (ML) models, particularly Deep Neural Networks (DNNs), is essential to enable their deployment into resource-constrained tiny devices. However, a disadvantage of DNN models is their vulnerability to adversarial attacks, as they can be fooled by adding slight perturbations to the inputs. Therefore, the challenge is how to create accurate, robust, and tiny DNN models deployable on resource-constrained embedded devices. This paper reports the results of devising a tiny DNN model, robust to adversarial black and white box attacks, trained with an automatic quantizationaware training framework, i.e. QKeras, with deep quantization loss accounted in the learning loop, thereby making the designed DNNs more accurate for deployment on tiny devices. We investigated how QKeras and an adversarial robustness technique, Jacobian Regularization (JR), can provide a co-optimization strategy by exploiting the DNN topology and the per layer JR approach to produce robust yet tiny deeply quantized DNN models. As a result, a new DNN model implementing this cooptimization strategy was conceived, developed and tested on three datasets containing both images and audio inputs, as well as compared its performance with existing benchmarks against various white-box and black-box attacks. Experimental results demonstrated that on average our proposed DNN model resulted in 8.3% and 79.5% higher accuracy than MLCommons/Tiny benchmarks in the presence of white-box and black-box attacks on the CIFAR-10 image dataset and a subset of the Google Speech Commands audio dataset respectively. It was also 6.5% more accurate for black-box attacks on the SVHN image dataset.	翻訳日:2023-04-26 20:35:16 公開日:2023-04-25
# 深層強化学習に基づく漢方処方計画のための最適化フレームワーク A optimization framework for herbal prescription planning based on deep reinforcement learning ( http://arxiv.org/abs/2304.12828v1 ) ライセンス: Link先を確認	Kuo Yang, Zecong Yu, Xin Su, Xiong He, Ning Wang, Qiguang Zheng, Feidie Yu, Zhuang Liu, Tiancai Wen and Xuezhong Zhou	(参考訳) 慢性疾患の治療計画は、医学的人工知能、特に伝統中国医学(tcm)において重要な課題である。しかし, 臨床経験の異なる慢性疾患患者に対して, 最適な逐次治療戦略を作成することは, さらなる探索を必要とする課題である。本研究では,慢性疾患治療のための深層強化学習(PrescDRL)に基づくTCMハーブ処方計画フレームワークを提案する。 PrescDRLは、すべてのステップで最大報酬を得るのではなく、長期的な効果に焦点を当てたシーケンシャルなハーバル処方の最適化モデルである。糖尿病の経時的診断と治療のための高品質ベンチマークデータセットを構築し,本ベンチマークに対するprescdrlの評価を行った。以上の結果から,PrescDRLは,医師と比較して1段階の報酬が117%,153%改善した。さらにprescdrlは処方薬の予測においてベンチマークを上回り、精度は40.5%向上し、リコールは63%向上した。本研究は,TCMにおける臨床知能診断と治療の改善に人工知能を用いる可能性を示すものである。 Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription planning framework based on deep reinforcement learning for chronic disease treatment (PrescDRL). PrescDRL is a sequential herbal prescription optimization model that focuses on long-term effectiveness rather than achieving maximum reward at every step, thereby ensuring better patient outcomes. We constructed a high-quality benchmark dataset for sequential diagnosis and treatment of diabetes and evaluated PrescDRL against this benchmark. Our results showed that PrescDRL achieved a higher curative effect, with the single-step reward improving by 117% and 153% compared to doctors. Furthermore, PrescDRL outperformed the benchmark in prescription prediction, with precision improving by 40.5% and recall improving by 63%. Overall, our study demonstrates the potential of using artificial intelligence to improve clinical intelligent diagnosis and treatment in TCM.	翻訳日:2023-04-26 20:34:42 公開日:2023-04-25
# オフライン強化学習におけるExact Energy-Guided Diffusion Smplingのコントラストエネルギー予測 Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning ( http://arxiv.org/abs/2304.12824v1 ) ライセンス: Link先を確認	Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu	(参考訳) ガイドサンプリングは実世界のタスクに拡散モデルを適用するための重要なアプローチであり、サンプリング手順中に人間の定義したガイダンスを埋め込む。本稿では、誘導が(正規化されていない)エネルギー関数によって定義される一般的な設定を考える。この設定の主な課題は、サンプリング分布とエネルギー関数によって共同で定義される拡散サンプリング手順の中間ガイダンスが未知であり、推定が難しいことである。この課題に対処するために,中間ガイダンスの正確な定式化と,コントラストエネルギー予測(CEP)と呼ばれる新たなトレーニング目標を提案する。提案手法は,モデル容量とデータサンプルの無制限で正確なガイダンスに収束することが保証されている。オフライン強化学習(RL)に適用することで,本手法の有効性を示す。 D4RLベンチマークの大規模な実験により、我々の手法は既存の最先端アルゴリズムよりも優れていることが示された。また,高次元データにおけるCEPのスケーラビリティを示すために,画像合成にCEPを適用する例を示す。 Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.	翻訳日:2023-04-26 20:34:22 公開日:2023-04-25
# シャノン情報と重み付けスキームの新規双対化 A Novel Dual of Shannon Information and Weighting Scheme ( http://arxiv.org/abs/2304.12814v1 ) ライセンス: Link先を確認	Arthur Jun Zhang	(参考訳) シャノン情報理論は、もともと開発された通信技術だけでなく、機械学習や人工知能といった多くの科学や工学分野でも大きな成功を収めている。有名な重み付けスキームtf-idfに触発されて,情報エントロピーが自然双対であることを発見した。古典的シャノン情報理論を補うために、新しい量、すなわちトレンピーを提案する。トレンピーは、基盤となる分布の確実性、共通性、および類似性を測定する。そこで本研究では,クラスラベル付き文書に対するトレンピーに基づく重み付け方式,すなわち正のクラス周波数(pcf)を提案する。公開データセットの集合では、PCFに基づく重み付け方式が古典的なTF-IDFと、kNN設定で一般的な最適輸送に基づく単語移動距離アルゴリズムより優れていることを示す。我々はさらに,情報量エントロピーとトロエンピーの期待オッズ比とみなすことができる,新たなオッズ比型機能である期待クラス情報バイアス(ECIB)を開発した。実験では、単純なロジスティック回帰モデルにおいて、新しいECIB機能と単純なバイナリ項機能を含めることで、さらなる性能向上が期待できる。単純な新しい重み付けスキームとECIB機能は、非常に効果的であり、線形順序複雑性で計算できる。 Shannon Information theory has achieved great success in not only communication technology where it was originally developed for but also many other science and engineering fields such as machine learning and artificial intelligence. Inspired by the famous weighting scheme TF-IDF, we discovered that information entropy has a natural dual. We complement the classical Shannon information theory by proposing a novel quantity, namely troenpy. Troenpy measures the certainty, commonness and similarity of the underlying distribution. To demonstrate its usefulness, we propose a troenpy based weighting scheme for document with class labels, namely positive class frequency (PCF). On a collection of public datasets we show the PCF based weighting scheme outperforms the classical TF-IDF and a popular Optimal Transportation based word moving distance algorithm in a kNN setting. We further developed a new odds-ratio type feature, namely Expected Class Information Bias(ECIB), which can be regarded as the expected odds ratio of the information quantity entropy and troenpy. In the experiments we observe that including the new ECIB features and simple binary term features in a simple logistic regression model can further significantly improve the performance. The simple new weighting scheme and ECIB features are very effective and can be computed with linear order complexity.	翻訳日:2023-04-26 20:34:03 公開日:2023-04-25
# 多光子高次元GHZ状態の合成 Preparation of multiphoton high-dimensional GHZ state ( http://arxiv.org/abs/2304.12813v1 ) ライセンス: Link先を確認	Wen-Bo Xing, Xiao-Min Hu, Yu Guo, Bi-Heng Liu, Chuan-Feng Li and Guang-Can Guo	(参考訳) 多部類高次元絡み合わせは多部類2次元絡み合わせとは異なる物理を呈する。しかし、多次元高次元絡み合わせの作り方はまだ線形光学の課題である。本稿では,光学系において任意の次元の準備プロトコルを持つ多光子GHZ状態を提案する。本プロトコルでは,高次元エンタングルメントゲートを実現するために補助エンタングルメントを用い,高次元エンタングルペアを多成分の高次元ghz状態に接続する。具体的には、光子の経路自由度を用いて4粒子の3次元ghz状態を作成する例を示す。本手法は他の自由度まで拡張でき、任意の次元で任意のghz絡み合いを生成することができる。 Multipartite high-dimensional entanglement presents different physics from multipartite two-dimensional entanglement. However, how to prepare multipartite high-dimensional entanglement is still a challenge with linear optics. In this paper, a multiphoton GHZ state with arbitrary dimensions preparation protocol is proposed in optical systems. In this protocol, we use auxiliary entanglements to realize a high-dimensional entanglement gate, so that high-dimensional entangled pairs can be connected into a multipartite high-dimensional GHZ state. Specifically, we give an example of using photons' path degree of freedom to prepare a 4-particle 3-dimensional GHZ state. Our method can be extended to other degrees of freedom and can generate arbitrary GHZ entanglement in any dimension.	翻訳日:2023-04-26 20:33:43 公開日:2023-04-25
# 科学のための教師なしドメイン転送:ディファリング応答モデルを用いたLArTPC検出器シミュレーション間の翻訳のための深層学習手法の探索 Unsupervised Domain Transfer for Science: Exploring Deep Learning Methods for Translation between LArTPC Detector Simulations with Differing Response Models ( http://arxiv.org/abs/2304.12858v1 ) ライセンス: Link先を確認	Yi Huang, Dmitrii Torbunov, Brett Viren, Haiwang Yu, Jin Huang, Meifeng Lin, Yihui Ren	(参考訳) 深層学習(DL)技術は科学、特に潜在的な解法や発見への道筋の合理化に広く応用されている。しかし、DLモデルは実際の実験データに適用されていないシミュレーションの結果に基づいて訓練されることが多い。このように、シミュレーションされたデータと実際のデータの系統的な違いは、モデルのパフォーマンスを低下させる可能性がある。本研究は,シミュレーションデータと実データとの系統的差異の玩具モデルに関する研究である。完全に教師なしでタスクに依存しない方法で、体系的に異なる2つのサンプルの違いを減らす。本手法は, 画像対画像変換技術の最近の進歩に基づき, 模擬液体アルゴン時間投影室 (lartpc) 検出器の2組の試料について検証を行い, シミュレーションデータと実データとの共通系統的差異を制御的に示す。 LArTPCベースの検出器は次世代粒子検出器を表現し、独自の高分解能粒子トラックデータを生成する。この研究は、Simple Liquid-Argon Track Samples(SLATS)と呼ばれる生成されたLArTPCデータセットをオープンソースとして公開した。 Deep learning (DL) techniques have broad applications in science, especially in seeking to streamline the pathway to potential solutions and discoveries. Frequently, however, DL models are trained on the results of simulation yet applied to real experimental data. As such, any systematic differences between the simulated and real data may degrade the model's performance -- an effect known as "domain shift." This work studies a toy model of the systematic differences between simulated and real data. It presents a fully unsupervised, task-agnostic method to reduce differences between two systematically different samples. The method is based on the recent advances in unpaired image-to-image translation techniques and is validated on two sets of samples of simulated Liquid Argon Time Projection Chamber (LArTPC) detector events, created to illustrate common systematic differences between the simulated and real data in a controlled way. LArTPC-based detectors represent the next-generation particle detectors, producing unique high-resolution particle track data. This work open-sources the generated LArTPC data set, called Simple Liquid-Argon Track Samples (or SLATS), allowing researchers from diverse domains to study the LArTPC-like data for the first time.	翻訳日:2023-04-26 20:25:58 公開日:2023-04-25
# 機械学習アプリケーションにおける例外の原因は何か? stack overflowにおける機械学習関連スタックトレースのマイニング What Causes Exceptions in Machine Learning Applications? Mining Machine Learning-Related Stack Traces on Stack Overflow ( http://arxiv.org/abs/2304.12857v1 ) ライセンス: Link先を確認	Amin Ghadesi, and Maxime Lamothe, and Heng Li	(参考訳) ディープラーニングを含む機械学習(ML)は、最近、広範囲のアプリケーションで大きな人気を集めている。しかし、従来のソフトウェアと同様に、MLアプリケーションはプログラミングエラーに起因するバグに免疫がない。明示的なプログラミングエラーは通常、エラーメッセージとスタックトレースを通じて現れる。これらのスタックトレースは、異常な状況や例外につながる関数呼び出しの連鎖を記述する。実際、これらの例外はソフトウェアスタック全体(アプリケーションやライブラリを含む)にまたがる可能性がある。したがって、スタックトレースのパターンを研究することは、実践者や研究者がMLアプリケーションにおける例外の原因と、ML開発者が直面する課題を理解するのに役立つ。そのために、Stack Overflow (SO)をマイニングし、7つの人気のあるPython MLライブラリに関連する11,449のスタックトレースを調査しました。まず,スタックトレースを含むML質問は,スタックトレースのない質問よりも人気が高いが,回答が受け入れられる可能性は低い。第2に,mlスタックトレースに繰り返し発生するパターンは,さまざまなmlライブラリにわたっても存在し,多数のスタックトレースをカバーするパターンはごく一部である。第3に、スタックトレースパターンから5つの高レベルカテゴリと25の低レベルタイプを導出します。ほとんどのパターンは、ピソンの基本構文、モデルのトレーニング、並列化、データ変換、サブプロセスの実行と関連しています。さらに、サブプロセス呼び出し、外部モジュール実行、リモートAPI呼び出しに関連するパターンは、SOで受け入れられる可能性が最も低い。この結果から,研究者,MLライブラリプロバイダ,およびMLアプリケーション開発者に,MLライブラリとそのアプリケーションの品質向上に関する知見が得られた。 Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 11,449 stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces gain more popularity than questions without stack traces; however, they are less likely to get accepted answers. Second, we observe that recurrent patterns exists in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 25 low-level types from the stack trace patterns: most patterns are related to python basic syntax, model training, parallelization, data transformation, and subprocess invocation. Furthermore, the patterns related to subprocess invocation, external module execution, and remote API call are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library providers, and ML application developers to improve the quality of ML libraries and their applications.	翻訳日:2023-04-26 20:25:36 公開日:2023-04-25
# マルチレゾリューション・コンテクストネットワークによる網膜血管セグメンテーションと逆学習 Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning ( http://arxiv.org/abs/2304.12856v1 ) ライセンス: Link先を確認	Tariq M. Khan, Syed S. Naqvi, Antonio Robles-Kelly, Imran Razzak	(参考訳) 網膜疾患のタイムリーで手頃なコンピュータ支援診断は、視覚障害の予防に不可欠である。正確な網膜血管セグメンテーションは、このような視力低下疾患の進行と診断において重要な役割を果たす。そこで本稿では,意味的に異なる特徴間のコンテキスト依存を学習するためのマルチスケール特徴を抽出し,複数方向のリカレント学習を用いて従来と後者の依存関係をモデル化することにより,これらの問題に対処する多分解コンテキストネットワーク(MRC-Net)を提案する。もう1つの鍵となるアイデアは、地域ベースのスコアの最適化による前景セグメンテーション改善のための敵の設定のトレーニングである。この新たな戦略は、訓練可能なパラメータの数を比較的低く保ちながら、サイススコア(およびそれに対応するジャカードインデックス)の観点からセグメンテーションネットワークの性能を高める。我々は,drive,stare, chaseの3つのベンチマークデータセットを用いて本手法を評価し,他の文献と比較して優れた性能を示す。 Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former-latter and latter-former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score (and correspondingly Jaccard index) while keeping the number of trainable parameters comparatively low. We have evaluated our method on three benchmark datasets, including DRIVE, STARE, and CHASE, demonstrating its superior performance as compared with competitive approaches elsewhere in the literature.	翻訳日:2023-04-26 20:25:12 公開日:2023-04-25
# デジタルヘルスツインユースケースのための適応型サービス機能チェーンオーケストレーション:ヒューリスティックブーストq-learningアプローチ Adaptive Services Function Chain Orchestration For Digital Health Twin Use Cases: Heuristic-boosted Q-Learning Approach ( http://arxiv.org/abs/2304.12853v1 ) ライセンス: Link先を確認	Jamila Alsayed Kassem, Li Zhong, Arie Taal, Paola Grosso	(参考訳) デジタルツイン(Digital Twin, DT)は、医療部門で活用および展開するための重要な技術である。しかし、このようなアプリケーションで直面する主な課題は、厳格な健康データ共有ポリシー、高性能ネットワーク要件、インフラストラクチャリソースの制限である。本稿では,vnfs(adaptive virtual network function)をプロビジョニングすることで,さまざまなデータ共有シナリオに関連するセキュリティポリシを強制することによる,すべての課題に対処する。柔軟性と動的コンテナスケジューリングのためのマルチノードクラスタインフラストラクチャ上に,Cloud-Native Networkオーケストレータを定義します。提案フレームワークでは,対象とするデータ共有ユースケース,関連するポリシ,インフラストラクチャ構成を考慮し,サービス機能チェーン(sfc)をプロビジョニングし,人的介入をほとんど必要とせずにルーティング構成を提供する。さらに、SFCをデプロイする際の \textit{optimal} はユースケース自体に依存しており、パフォーマンス要件を満たすためにリソース利用やレイテンシを優先するようにハイパーパラメータを調整します。その結果、デジタルヘルスツインのユースケースに対して、ポリシーアウェア、要件アウェア、リソースアウェアといった適応型ネットワークオーケストレーションを提供する。 Digital Twin (DT) is a prominent technology to utilise and deploy within the healthcare sector. Yet, the main challenges facing such applications are: Strict health data-sharing policies, high-performance network requirements, and possible infrastructure resource limitations. In this paper, we address all the challenges by provisioning adaptive Virtual Network Functions (VNFs) to enforce security policies associated with different data-sharing scenarios. We define a Cloud-Native Network orchestrator on top of a multi-node cluster mesh infrastructure for flexible and dynamic container scheduling. The proposed framework considers the intended data-sharing use case, the policies associated, and infrastructure configurations, then provision Service Function Chaining (SFC) and provides routing configurations accordingly with little to no human intervention. Moreover, what is \textit{optimal} when deploying SFC is dependent on the use case itself, and we tune the hyperparameters to prioritise resource utilisation or latency in an effort to comply with the performance requirements. As a result, we provide an adaptive network orchestration for digital health twin use cases, that is policy-aware, requirements-aware, and resource-aware.	翻訳日:2023-04-26 20:24:55 公開日:2023-04-25
# モノのインターネットによるスマート教育に向けて:レビュー Towards Smart Education through the Internet of Things: A Review ( http://arxiv.org/abs/2304.12851v1 ) ライセンス: Link先を確認	Afzal Badshah, Anwar Ghani, Ali Daud, Ateeqa Jalal, Muhammad Bilal, Jon Crowcroft	(参考訳) IoTは、効果的な対面およびオンライン教育システムを支援するスマートスペースを作成するための基本的な技術である。スマート教育(IoTとAIを教育システムに統合する)への移行は、学習者のエンゲージメント、モチベーション、出席、深層学習に具体的な影響を与えている。伝統的な教育は管理、教育、評価、教室の監督など多くの課題に直面している。近年のICT(IoT、AI、5Gなど)の発展は、様々な面でのスマートソリューションを生み出しているが、スマートソリューションは教育システムにはあまり組み込まれていない。特に新型コロナウイルスのパンデミックは、教育における新しいスマートソリューションの採用をさらに強調してきた。本研究は関連研究をレビューし,対処する。 (i)解決可能な伝統的教育制度の問題点 (二)スマート教育への移行、及び (III)スマート教育への移行(計算と社会的抵抗)における研究課題これらの研究を踏まえ、従来のシステムの問題に対して、スマートソリューション(スマート教育、スマートアセスメント、スマート教室、スマート管理など)が導入された。この探索的研究は、ICT、IoT、AIをスマート教育に統合する学者や市場にとって、新たなトレンドを開くものだ。 IoT is a fundamental enabling technology for creating smart spaces, which can assist the effective face-to-face and online education systems. The transition to smart education (integrating IoT and AI into the education system) is appealing, which has a concrete impact on learners' engagement, motivation, attendance, and deep learning. Traditional education faces many challenges, including administration, pedagogy, assessment, and classroom supervision. Recent developments in ICT (e.g., IoT, AI and 5G, etc.) have yielded lots of smart solutions for various aspects of life; however, smart solutions are not well integrated into the education system. In particular, the COVID-19 pandemic situation had further emphasized the adoption of new smart solutions in education. This study reviews the related studies and addresses the (i) problems in the traditional education system with possible solutions, (ii) the transition towards smart education, and (iii) research challenges in the transition to smart education (i.e, computational and social resistance). Considering these studies, smart solutions (e.g., smart pedagogy, smart assessment, smart classroom, smart administration, etc.) are introduced to the problems of the traditional system. This exploratory study opens new trends for scholars and the market to integrate ICT, IoT, and AI into smart education.	翻訳日:2023-04-26 20:24:35 公開日:2023-04-25
# 単眼深度推定のための深さ関係自己注意 Depth-Relative Self Attention for Monocular Depth Estimation ( http://arxiv.org/abs/2304.12849v1 ) ライセンス: Link先を確認	Kyuhong Shim, Jiyoung Kim, Gusang Lee, Byonghyo Shim	(参考訳) 単一のRGB画像において、正確な深さの手がかりが不完全であるため、単眼深度推定は非常に難しい。この制限を克服するために、ディープニューラルネットワークは、RGB情報から抽出されたサイズ、日陰、テクスチャなど、さまざまな視覚的ヒントに依存している。しかし,そのようなヒントを過度に活用すると,網羅的な視点を考慮せずにRGB情報に偏りが生じる。本稿では,相対深度を自己注意のガイダンスとして用いたRelative Depth Transformer (RED-T) という新しい深度推定モデルを提案する。特に、モデルでは、高い注意重みを近深さの画素に、低い注意重みを遠深のピクセルに割り当てる。その結果、類似した深度の特徴は互いにより近づきやすくなり、視覚的ヒントが誤用されることが少なくなる。提案モデルでは, 単分子深度推定ベンチマークにおいて競合結果が得られ, RGB情報に偏りが小さいことを示す。さらに,学習中の観測可能な深度範囲を制限し,未知の深度に対するモデルのロバスト性を評価するための新しい単眼深度推定ベンチマークを提案する。 Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehensive view. We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention. Specifically, the model assigns high attention weights to pixels of close depth and low attention weights to pixels of distant depth. As a result, the features of similar depth can become more likely to each other and thus less prone to misused visual hints. We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths.	翻訳日:2023-04-26 20:24:16 公開日:2023-04-25
# semeval-2023タスク10:不均衡データセットにおけるテキスト分類性能に及ぼすデータ拡張と半教師付き学習技術の影響 NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset ( http://arxiv.org/abs/2304.12847v1 ) ライセンス: Link先を確認	Sana Sabah Al-Azzawi, Gy\"orgy Kov\'acs, Filip Nilsson, Tosin Adewumi, Marcus Liwicki	(参考訳) 本稿では,ソーシャルメディア投稿におけるオンライン性差別の検出と分類に着目し,semeval23タスク10の方法論を提案する。ソーシャルメディアプラットフォーム上で有害なコンテンツを検出することは、こうした投稿のユーザーへの害を軽減する上で非常に重要である。このタスクの解決策は、細調整されたトランスフォーマーベースモデル(BERTweet、RoBERTa、DeBERTa)のアンサンブルに基づいています。クラス不均衡に関する問題を緩和し,モデルの一般化能力を向上させるため,データ強化と半教師付き学習も実験した。特に、データ拡張では、すべてのクラスで、または表現不足のクラスでのみ、バックトランスレーションを使用します。これらの戦略がパイプライン全体の性能に与える影響を広範な実験を通じて分析する。半教師付き学習では、かなりの量のドメイン内データが利用可能な場合、半教師付き学習は特定のモデルの性能を高めることができる。提案手法(Githubでソースコードが公開されている)では,サブタスクAのF1スコアが0.8613に達した。 In this paper, we propose a methodology for task 10 of SemEval23, focusing on detecting and classifying online sexism in social media posts. The task is tackling a serious issue, as detecting harmful content on social media platforms is crucial for mitigating the harm of these posts on users. Our solution for this task is based on an ensemble of fine-tuned transformer-based models (BERTweet, RoBERTa, and DeBERTa). To alleviate problems related to class imbalance, and to improve the generalization capability of our model, we also experiment with data augmentation and semi-supervised learning. In particular, for data augmentation, we use back-translation, either on all classes, or on the underrepresented classes only. We analyze the impact of these strategies on the overall performance of the pipeline through extensive experiments. while for semi-supervised learning, we found that with a substantial amount of unlabelled, in-domain data available, semi-supervised learning can enhance the performance of certain models. Our proposed method (for which the source code is available on Github attains an F1-score of 0.8613 for sub-taskA, which ranked us 10th in the competition	翻訳日:2023-04-26 20:23:57 公開日:2023-04-25
# (地方)差別プライバシーは公平性に異なる影響を与えない (Local) Differential Privacy has NO Disparate Impact on Fairness ( http://arxiv.org/abs/2304.12845v1 ) ライセンス: Link先を確認	H\'eber H. Arcolezi, Karima Makhlouf, Catuscia Palamidessi	(参考訳) 近年、堅牢なプライバシー保護手法であるローカル微分プライバシー(LDP)が、現実世界のアプリケーションに広く採用されている。 LDPを使えば、ユーザーは分析のためにデータを送信する前にデバイス上でデータを摂動することができる。しかし、複数の機密情報の収集が様々な産業で普及するにつれて、LDPの下での単一機密属性の収集は不十分である。データ内の関連属性は、それでも機密属性に関する推論につながる可能性がある。本稿では,LPP下での複数属性の収集が公平性に及ぼす影響を実証研究する。機密属性のドメインサイズの変化を考慮した新しいプライバシ予算配分方式を提案する。これは一般的に、最先端のソリューションよりも、私たちの実験におけるプライバシーと実用性と公正性のトレードオフに結びつきました。その結果, LDPは, モデルの性能に悪影響を及ぼすことなく, 学習問題の公平性をわずかに向上させることがわかった。我々は,グループフェアネスの指標と7つの最新LDPプロトコルを用いて,3つのベンチマークデータセットの評価実験を行った。全体として、この研究は、差分プライバシーが機械学習における公平性の悪化につながるという一般的な信念に挑戦する。 In recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in real-world applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industries, collecting a single sensitive attribute under LDP may not be sufficient. Correlated attributes in the data may still lead to inferences about the sensitive attribute. This paper empirically studies the impact of collecting multiple sensitive attributes under LDP on fairness. We propose a novel privacy budget allocation scheme that considers the varying domain size of sensitive attributes. This generally led to a better privacy-utility-fairness trade-off in our experiments than the state-of-art solution. Our results show that LDP leads to slightly improved fairness in learning problems without significantly affecting the performance of the models. We conduct extensive experiments evaluating three benchmark datasets using several group fairness metrics and seven state-of-the-art LDP protocols. Overall, this study challenges the common belief that differential privacy necessarily leads to worsened fairness in machine learning.	翻訳日:2023-04-26 20:23:36 公開日:2023-04-25
# 寒冷電磁ダイスプロシウムダイポール用最先端装置の包括的特性評価 Comprehensive Characterization of a State-of-the-Art Apparatus for Cold Electromagnetic Dysprosium Dipoles ( http://arxiv.org/abs/2304.12844v1 ) ライセンス: Link先を確認	Gregor Anich, Rudolf Grimm, Emil Kirilov	(参考訳) 我々は、量子ガス顕微鏡(qgm)を4分の1マイクロメートルの解像度で組み込んだ新しい超低温ジスプロシウム(dy)装置を開発した。 QGMと冷却・トラップ領域は同じ真空ガラス容器内にあり、それらの間の単純な原子輸送を保証している。我々は,レーザーおよび蒸発冷却,格子負荷,輸送およびQGM焦点面におけるボゾン同位体164 Dyの雲の正確な位置決めについて実験を行った。フル容量化に向けたQGMの基本的特徴と今後の計画について概説する。また、大きな磁気と電気の双極子モーメントを持つDyの密接な正対準レベルを利用すれば、XYZモデルのような量子磁性の複雑なスピンモデルをシミュレートできるプラットフォームも提示する。磁気双極子-双極子結合を持ち,Ising,交換,スピン軌道を含む縮退アイソスピン-1/2系を分離する。最後は、格子幾何学に依存する非対称な可変率を持つスピンモデルをもたらす。 We developed a new advanced ultra-cold Dysprosium (Dy) apparatus, which incorporates a quantum gas microscope (QGM) with a resolution of a quarter micrometer. The QGM and the cooling and trapping regions are within the same vacuum glass vessel assuring simple atom transport between them. We demonstrate the essential experimental steps of laser and evaporative cooling, lattice loading, transporting and precise positioning of a cloud of the bosonic isotope 164 Dy at the QGM focal plane. Preliminary basic characterization of the QGM and future plans in enabling its full capacity are outlined. We also present a feasible platform for simulating complex spin models of quantum magnetism, such as XYZ model, by exploiting a set of closely spaced opposite parity levels in Dy with a large magnetic and electric dipole moment. We isolate a degenerate isospin-1/2 system, which possesses both magnetic and electric dipole-dipole coupling, containing Ising, exchange and spin-orbit terms. The last gives rise to a spin model with asymmetric tunable rates, dependable on the lattice geometry.	翻訳日:2023-04-26 20:23:18 公開日:2023-04-25
# 都市ビブランシーにおける時空間性差 Spatiotemporal gender differences in urban vibrancy ( http://arxiv.org/abs/2304.12840v1 ) ライセンス: Link先を確認	Thomas R. Collins and Riccardo Di Clemente and Mario Guti\'errez-Roig and Federico Botta	(参考訳) 都市活力は都市部における人間のダイナミックな活動である。都市の特徴や人間との交流の機会によって異なる場合もあるが、都市住民の社会環境や社会環境によっても異なる可能性がある。異なる人口集団がどのように都市を経験するかの不均一性は、住民の嗜好、アクセシビリティと機会、大規模な移動行動の違いにより、性別分離を引き起こす可能性がある。しかし、伝統的な研究は、都市の活力と都市の特徴との関係、異性間の違い、都市における人種差別にどのように影響するかについて、高頻度で理解できていない。以上の結果から,(1)都会の活力には男女差があり,(2)「関心の点」と交通ネットワークの相違がみられ,(3)各都市に肯定的・否定的な「空間的流出」が存在することが示唆された。そこで我々は,携帯電話のほぼユビキタスな利用を生かしたコールディテールデータを用いた定量的手法を用いて,イタリア7都市における空間行動の高周波観測を行う。都会の特徴から直接的効果と「スパイルオーバー」効果の空間モデルによる男女差の比較を行った。私たちの結果は、都市における不平等と将来の都市をより公平にする方法についての理解を深めます。 Urban vibrancy is the dynamic activity of humans in urban locations. It can vary with urban features and the opportunities for human interactions, but it might also differ according to the underlying social conditions of city inhabitants across and within social surroundings. Such heterogeneity in how different demographic groups may experience cities has the potential to cause gender segregation because of differences in the preferences of inhabitants, their accessibility and opportunities, and large-scale mobility behaviours. However, traditional studies have failed to capture fully a high-frequency understanding of how urban vibrancy is linked to urban features, how this might differ for different genders, and how this might affect segregation in cities. Our results show that (1) there are differences between males and females in terms of urban vibrancy, (2) the differences relate to `Points of Interest` as well as transportation networks, and (3) that there are both positive and negative `spatial spillovers` existing across each city. To do this, we use a quantitative approach using Call Detail Record data--taking advantage of the near-ubiquitous use of mobile phones--to gain high-frequency observations of spatial behaviours across the seven most prominent cities of Italy. We use a spatial model comparison approach of the direct and `spillover` effects from urban features on male-female differences. Our results increase our understanding of inequality in cities and how we can make future cities fairer.	翻訳日:2023-04-26 20:22:58 公開日:2023-04-25
# parity アーキテクチャにおけるフレキシブル制約コンパイル Flexible constraint compilation in the parity architecture ( http://arxiv.org/abs/2304.12879v1 ) ライセンス: Link先を確認	Roeland ter Hoeven, Anette Messinger, Wolfgang Lechner	(参考訳) 本稿では,任意の接続グラフを持つディジタル量子コンピューティングデバイスへのパリティコンパイルを一般化するツールと手法を提案し,高階制約付きバイナリ最適化問題の制約ハミルトニアンの回路実装について述べる。特に,非局所制約でさえ,高価なSWAPゲートを使わずに効率的に実装できることを示す。本稿では,並列性アーキテクチャにおける量子近似最適化アルゴリズムの全回路深さとcnot数を最適化する手法を示し,様々な例を用いたフレキシブルコンパイルの利点を強調する。開発したゲートシーケンスとスワップゲートを用いた従来のアプローチとの関係を導出する。この結果は、他の多くの非局所作用素の実装を改善するために適用することができる。 We present tools and methods to generalize parity compilation to digital quantum computing devices with arbitrary connectivity graphs and construct circuit implementations for the constraint Hamiltonian of higher-order constrained binary optimization problems. In particular, we show how even non-local constraints can be efficiently implemented without expensive SWAP gates. We show how the presented tools can be used to optimize the total circuit depth and CNOT count of the quantum approximate optimization algorithm in the parity architecture and highlight the advantages of the flexible compilation using various examples. We derive the relation between the developed gate sequences and the traditional approach that uses SWAP gates. The result can be applied to improve the implementation of many other non-local operators.	翻訳日:2023-04-26 20:17:44 公開日:2023-04-25
# 強化学習エージェントのための近位カリキュラム Proximal Curriculum for Reinforcement Learning Agents ( http://arxiv.org/abs/2304.12877v1 ) ライセンス: Link先を確認	Georgios Tzannetos, B\'arbara Gomes Ribeiro, Parameswaran Kamalaruban, Adish Singla	(参考訳) マルチタスク環境における強化学習(RL)エージェントのカリキュラム設計の問題点を考察する。既存の自動カリキュラム設計技術では、ドメイン固有のハイパーパラメータチューニングが必要か、理論的な基盤が限られている。これらの制約に対処するため,我々は,ZPD(Zone of Proximal Development)という教育的概念に触発されたカリキュラム戦略であるProCuRLを設計する。 ProCuRLは、学習者が難しすぎても難しすぎるタスクを選択するとき、学習の進捗が最大になるという直感を捉えます。 ProCuRLは2つの簡単な学習条件を解析することで数学的に導出する。また,最小限のハイパーパラメータチューニングを施した深部RLフレームワークと直接統合可能なProCuRLの実用版も提示する。各種領域に対する実験結果から, 深部RLエージェントのトレーニングプロセスの促進に向け, 最先端のベースラインに対するカリキュラム戦略の有効性が示された。 We consider the problem of curriculum design for reinforcement learning (RL) agents in contextual multi-task settings. Existing techniques on automatic curriculum design typically require domain-specific hyperparameter tuning or have limited theoretical underpinnings. To tackle these limitations, we design our curriculum strategy, ProCuRL, inspired by the pedagogical concept of Zone of Proximal Development (ZPD). ProCuRL captures the intuition that learning progress is maximized when picking tasks that are neither too hard nor too easy for the learner. We mathematically derive ProCuRL by analyzing two simple learning settings. We also present a practical variant of ProCuRL that can be directly integrated with deep RL frameworks with minimal hyperparameter tuning. Experimental results on a variety of domains demonstrate the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.	翻訳日:2023-04-26 20:17:32 公開日:2023-04-25
# レーザ注入による埋め込みニューラルネットワークに対するパラメータベース攻撃の評価 Evaluation of Parameter-based Attacks against Embedded Neural Networks with Laser Injection ( http://arxiv.org/abs/2304.12876v1 ) ライセンス: Link先を確認	Mathieu Dumont, Kevin Hector, Pierre-Alain Moellic, Jean-Max Dutertre, Simon Ponti\'e	(参考訳) 機械学習(ML)ベースのシステムのセキュリティに関する今後の認証アクションは、多くのハードウェアプラットフォームにおけるモデルの大規模展開によって増幅される大きな評価課題を提起する。最近まで、ほとんどの研究は、MLモデルを純粋にアルゴリズムの抽象化と見なすAPIベースの攻撃に焦点を当てていた。しかし、新しい実装ベースの脅威が明らかになり、モデルの堅牢性を適切に評価する実用的な手法とシミュレーションベースの手法の両方を提案する緊急性を強調している。主な関心事はパラメータベースの攻撃(Bit-Flip Attack, BFAなど)であり、メモリに格納された内部パラメータの正確かつ最適な変更に直面した場合、典型的なディープニューラルネットワークモデルの堅牢性の欠如を強調する。セキュリティテストの目的で設定されたこの研究は、32ビットのcortex-mマイクロコントローラにレーザーフォールトインジェクションを用いてbfaの派生型を初めて報告した。セキュリティ評価のための標準的なフォールトインジェクション手段であり、空間的および時間的に正確な障害を注入することができる。非現実的なブルートフォース戦略を避けるため、シミュレーションはレーザー断層モデルを考慮したパラメータから最も敏感なビットセットを選択するのにどのように役立つかを示す。 Upcoming certification actions related to the security of machine learning (ML) based systems raise major evaluation challenges that are amplified by the large-scale deployment of models in many hardware platforms. Until recently, most of research works focused on API-based attacks that consider a ML model as a pure algorithmic abstraction. However, new implementation-based threats have been revealed, emphasizing the urgency to propose both practical and simulation-based methods to properly evaluate the robustness of models. A major concern is parameter-based attacks (such as the Bit-Flip Attack, BFA) that highlight the lack of robustness of typical deep neural network models when confronted by accurate and optimal alterations of their internal parameters stored in memory. Setting in a security testing purpose, this work practically reports, for the first time, a successful variant of the BFA on a 32-bit Cortex-M microcontroller using laser fault injection. It is a standard fault injection means for security evaluation, that enables to inject spatially and temporally accurate faults. To avoid unrealistic brute-force strategies, we show how simulations help selecting the most sensitive set of bits from the parameters taking into account the laser fault model.	翻訳日:2023-04-26 20:17:17 公開日:2023-04-25
# 交互局所列挙(TnALE):低評価によるテンソルネットワーク構造探索の解法 Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations ( http://arxiv.org/abs/2304.12875v1 ) ライセンス: Link先を確認	Chao Li, Junhua Zeng, Chunmei Li, Cesar Caiafa, Qibin Zhao	(参考訳) テンソルネットワーク(TN)は機械学習の強力なフレームワークであるが、TN構造探索(TN-SS)として知られる優れたTNモデルを選択することは困難で計算集約的なタスクである。 TNLS~\cite{li2022permutation} の最近のアプローチは、このタスクに対して有望な結果を示したが、その計算効率はまだ不満足であり、目的関数の評価が多すぎる。本稿では,TNLSと比較して,各構造関連変数を局所列挙によって交互に更新するアルゴリズムであるTnALEを提案する。 TNLS と TnALE の降下ステップを理論的に検討し、両アルゴリズムが各近傍で目的の十分な減算が \emph{reached} であれば、定数まで線形収束を達成できることを証明した。また、TNLS と TnALE の評価効率も比較し、TNLS では \emph{reaching} に対して $\Omega(2^N)$ 評価が要求されるのに対し、理想的には $O(N^2R)$ 評価は TnALE では十分であり、$N$ はテンソル次数を表し、$R$ は近隣の 'emph{``low-rankness'' を反映する。実験の結果、TnALEは最先端のアルゴリズムよりもはるかに少ない評価で、実用的に優れたTNランクと置換を見出すことができた。 Tensor network (TN) is a powerful framework in machine learning, but selecting a good TN model, known as TN structure search (TN-SS), is a challenging and computationally intensive task. The recent approach TNLS~\cite{li2022permutation} showed promising results for this task, however, its computational efficiency is still unaffordable, requiring too many evaluations of the objective function. We propose TnALE, a new algorithm that updates each structure-related variable alternately by local enumeration, \emph{greatly} reducing the number of evaluations compared to TNLS. We theoretically investigate the descent steps for TNLS and TnALE, proving that both algorithms can achieve linear convergence up to a constant if a sufficient reduction of the objective is \emph{reached} in each neighborhood. We also compare the evaluation efficiency of TNLS and TnALE, revealing that $\Omega(2^N)$ evaluations are typically required in TNLS for \emph{reaching} the objective reduction in the neighborhood, while ideally $O(N^2R)$ evaluations are sufficient in TnALE, where $N$ denotes the tensor order and $R$ reflects the \emph{``low-rankness''} of the neighborhood. Experimental results verify that TnALE can find practically good TN-ranks and permutations with vastly fewer evaluations than the state-of-the-art algorithms.	翻訳日:2023-04-26 20:16:40 公開日:2023-04-25
# 相対論的量子系におけるベルの不等式とハイゼンベルク測定 Bell's Inequality and Heisenberg Measurements on Relativistic Quantum Systems ( http://arxiv.org/abs/2304.12873v1 ) ライセンス: Link先を確認	Ulrich Faigle	(参考訳) ベルの不等式は、量子論の物理的現実に関するアインシュタイン問題において重要な役割を果たす。ベルの不等式は一般にヒルベルト空間の量子モデルの幾何学的枠組みの中で見なされるが、現在の注意はハイゼンベルク測定の理論を一般直交幾何学空間の表現、特に相対性理論のミンコフスキー空間を持つ量子系へ拡張するものである。ファインマンの数値例では、ミンコフスキー空間における合同確率論的解釈がヒルベルト空間では観測できないものの、2つの測定値を示す。この分析は、量子測定の確率論的解釈は測定器やシステム状態だけでなく、測定を行う幾何学的空間にも依存することを示している。特に、明快な数値の例は、ミンコフスキー空間におけるベルの不等式に反し、ヒルベルト空間でそれを満たすような可観測性の完全な集合を持つハイゼンベルク測定から与えられる。 Bell's inequality plays an important role with respect to the Einsteinian question about the physical reality of quantum theory. While Bell's inequality is usually viewed within the geometric framework of a Hilbert space quantum model, the present note extends the theory of Heisenberg measurements to quantum systems with representations in general orthogonal geometric spaces and, in particular, the Minkowski spaces of relativity theory. A Feynmanian numerical example exhibits two measurements that admit a joint probabilistic interpretation in Minkowski space while they are not jointly observable in Hilbert space. The analysis shows that probabilistic interpretations of quantum measurements may depend not only on the measuring instruments and the system states but also on the geometric space in which the measurements are conducted. In particular, an explicit numerical example is given of a Heisenberg measurement with a complete set of common observables that violates Bell's inequality in Minkowski space but, mutatatis mutandis, satisfies it in Hilbert space.	翻訳日:2023-04-26 20:15:50 公開日:2023-04-25
# 量子アニーリングにおける指数閉ギャップとしてのアンチクロスの発生 Anti-crossings occurrence as exponentially closing gaps in Quantum Annealing ( http://arxiv.org/abs/2304.12872v1 ) ライセンス: Link先を確認	Arthur Braida, Simon Martiel and Ioan Todinca	(参考訳) 本稿では,量子アニーリングにおける回避レベル交差現象について考察する。量子コンピューティングのための将来的なフレームワークであり,特定のタスクに量子的優位性をもたらす可能性がある。量子アニーリング(quantum annealing)は、最終状態の測定を通じて最適化問題に対する最適解を得ることを目的として、Schr\\odinger方程式に従って量子システムを進化させる。しかしながら、量子アニーリングの連続性は解析解析を特に瞬時固有エネルギーに関して困難にする。断熱定理は、最小スペクトルギャップの2乗に反比例する高い確率で最適解を得るのに必要なアニーリング時間の理論的結果を与える。回避されたレベルの交差は指数関数的に閉じるギャップを生じさせ、最適化問題に対して指数関数的に長い実行時間をもたらす。本稿では, 焼鈍過程における回避レベル交差の発生条件を導出するために, 摂動膨張を用いた。次に、この条件を二部グラフ上のMaxCut問題に適用する。正規二部グラフに対して指数的に小さなギャップは生じないことを示し、QAがMaxCutを効率的に解けることを示唆する。一方,頂点度の不規則性は,回避された踏切発生条件の満足度につながる可能性が示唆された。この理論的発展を支える数値的な証拠を提供し,指数閉ギャップの存在と量子アニーリングの失敗との関係について論じる。 This paper explores the phenomenon of avoided level crossings in quantum annealing, a promising framework for quantum computing that may provide a quantum advantage for certain tasks. Quantum annealing involves letting a quantum system evolve according to the Schr\"odinger equation, with the goal of obtaining the optimal solution to an optimization problem through measurements of the final state. However, the continuous nature of quantum annealing makes analytical analysis challenging, particularly with regard to the instantaneous eigenenergies. The adiabatic theorem provides a theoretical result for the annealing time required to obtain the optimal solution with high probability, which is inversely proportional to the square of the minimum spectral gap. Avoided level crossings can create exponentially closing gaps, which can lead to exponentially long running times for optimization problems. In this paper, we use a perturbative expansion to derive a condition for the occurrence of an avoided level crossing during the annealing process. We then apply this condition to the MaxCut problem on bipartite graphs. We show that no exponentially small gaps arise for regular bipartite graphs, implying that QA can efficiently solve MaxCut in that case. On the other hand, we show that irregularities in the vertex degrees can lead to the satisfaction of the avoided level crossing occurrence condition. We provide numerical evidence to support this theoretical development, and discuss the relation between the presence of exponentially closing gaps and the failure of quantum annealing.	翻訳日:2023-04-26 20:15:22 公開日:2023-04-25
# 計算化学のための線形スケーリング量子回路 Linear-Scaling Quantum Circuits for Computational Chemistry ( http://arxiv.org/abs/2304.12870v1 ) ライセンス: Link先を確認	Ilias Magoulas and Francesco A. Evangelista	(参考訳) 我々は最近、任意の多体ランク(I. Magoulas and F.A. Evangelista, J. Chem. Theory Comput. 19, 822 (2023))のフェルミオンおよび量子ビット励起のためのコンパクトでCNOT効率の良い量子回路を構築した。ここでは,CNOT数を大幅に減少させる回路の近似について述べる。予備的な数値データは、選択された射影量子固有解法を用いて、帰納対称性の破れが本質的に無視される一方で、親実装と比較して、実質的にエネルギーの精度の損失はないことを示す。 We have recently constructed compact, CNOT-efficient, quantum circuits for fermionic and qubit excitations of arbitrary many-body rank [I. Magoulas and F.A. Evangelista, J. Chem. Theory Comput. 19, 822 (2023)]. Here, we present approximations to these circuits that substantially reduce the CNOT counts even further. Our preliminary numerical data, using the selected projective quantum eigensolver approach, demonstrate that there is practically no loss of accuracy in the energies compared to the parent implementation while the ensuing symmetry breaking is essentially negligible.	翻訳日:2023-04-26 20:14:57 公開日:2023-04-25
# 量子有限オートマタの浅実装のためのGAP GAPs for Shallow Implementation of Quantum Finite Automata ( http://arxiv.org/abs/2304.12868v1 ) ライセンス: Link先を確認	Mansur Ziiatdinov, Aliya Khadieva, Abuzer Yakary{\i}lmaz	(参考訳) 量子フィンガープリントは古典的な入力語を量子状態にマッピングする技法である。結果として生じる量子状態は元の単語よりもはるかに短く、その処理はリソースを少なくし、量子アルゴリズム、通信、暗号において有用である。量子フィンガープリントの例の1つは、$MOD_{p}=\{a^{i\cdot p} \mid i \geq 0\}$ languageの量子オートマトンであり、$p$は素数である。しかし、このオートマトンを現在の量子ハードウェアで実装することは効率的ではない。量子フィンギプリントは$x \in \{0,1\}^{n}$ of length $n$ to a state $\|\psi(x)\rangle$ of $o(\log \log n)$ qubitsであり、$o(\log n)$ ユニタリ演算を必要とする。現在の量子コンピュータの全メモリを用いた量子指紋の計算は、多くの量子演算が必要なため、現在不可能である。量子フィンガープリントを実用的なものにするためには、回路の幅よりも奥行きを最適化する必要がある。一般化算術進行法(gaps)などの加法コンビネータのツールに基づく量子フィンガープリントの明示的な手法を提案し,これらの手法が確率的手法に匹敵する回路深さを提供することを示す。また,提案手法を,明示的な量子フィンガープリンティング手法の先行研究と比較した。 Quantum fingerprinting is a technique that maps classical input word to a quantum state. The resulting quantum state is much shorter than original word, and its processing requires less resources, making it useful in quantum algorithms, communication and cryptography. One of the examples of quantum fingerprinting is quantum automaton for $MOD_{p}=\{a^{i\cdot p} \mid i \geq 0\}$ language, where $p$ is a prime number. However, implementing this automata in current quantum hardware is not efficient. Quantum fingeprinting maps a word $x \in \{0,1\}^{n}$ of length $n$ to a state $\|\psi(x)\rangle$ of $O(\log \log n)$ qubits, and requires $O(\log n)$ unitary operations. Computing quantum fingerprint using all memory of the current quantum computers is currently infeasible due to the large number of quantum operations necessary. In order to make quantum fingerprinting practical, we must optimize the circuit for depth instead of width as previous works did. We propose explicit methods of quantum fingerprinting based on tools from additive combinatorics, such as generalized arithmetic progressions (GAPs), and prove that these methods provide circuit depth comparable to probabilistic method. We also compare our method to prior work on explicit quantum fingerprinting methods.	翻訳日:2023-04-26 20:14:44 公開日:2023-04-25
# 高能率ニューロモルフィック深層学習の両眼確率性によるソフトウェア精度の向上 Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracy ( http://arxiv.org/abs/2304.12866v1 ) ライセンス: Link先を確認	Yang Li, Wei Wang, Ming Wang, Chunmeng Dou, Zhengyu Ma, Huihui Zhou, Peng Zhang, Nicola Lepri, Xumeng Zhang, Qing Luo, Xiaoxin Xu, Guanhua Yang, Feng Zhang, Ling Li, Daniele Ielmini, and Ming Liu	(参考訳) ディープラーニングには、フォワーディング信号の高精度処理、バックプロパゲーションエラー、ウェイトのアップデートが必要だ。これは、勾配降下学習規則が部分微分の連鎖積に依存しているため、学習アルゴリズムによって本質的に必要となる。しかし, 人工シナプスとしてノイズの多いアナログメムリスタを用いるハードウェアシステムにおいて, 生物学的に妥当でないような深層学習を実現することは困難である。 memristorベースの実装は一般に、ニューロン回路の過大なコストと理想化されたシナプスデバイスに対する厳しい要求をもたらす。そこで本研究では,高精度の要求は不要であり,この要求が解除された場合により効率的な深層学習を実現することを実証する。本稿では,すべての基本ニューラルネットワーク操作を修飾する二元確率学習アルゴリズムを提案する。 (i)転送信号と活性化関数の導関数の確率的二乗化 (ii)バックプロパゲーションエラーの符号付きバイナリ化、 (iii)段階的な重み付け更新。ソフトウェアシミュレーションとハードウェア実験のハイブリッドアプローチにより、二進確率深層学習システムは、高精度学習アルゴリズムを用いて、ソフトウェアベースのベンチマークよりも優れた性能を提供できることがわかった。また、二項確率アルゴリズムはハードウェアにおけるニューラルネットワーク操作を強く単純化し、乗算および累積演算のエネルギー効率を3桁以上改善する。 Deep learning needs high-precision handling of forwarding signals, backpropagating errors, and updating weights. This is inherently required by the learning algorithm since the gradient descent learning rule relies on the chain product of partial derivatives. However, it is challenging to implement deep learning in hardware systems that use noisy analog memristors as artificial synapses, as well as not being biologically plausible. Memristor-based implementations generally result in an excessive cost of neuronal circuits and stringent demands for idealized synaptic devices. Here, we demonstrate that the requirement for high precision is not necessary and that more efficient deep learning can be achieved when this requirement is lifted. We propose a binary stochastic learning algorithm that modifies all elementary neural network operations, by introducing (i) stochastic binarization of both the forwarding signals and the activation function derivatives, (ii) signed binarization of the backpropagating errors, and (iii) step-wised weight updates. Through an extensive hybrid approach of software simulation and hardware experiments, we find that binary stochastic deep learning systems can provide better performance than the software-based benchmarks using the high-precision learning algorithm. Also, the binary stochastic algorithm strongly simplifies the neural network operations in hardware, resulting in an improvement of the energy efficiency for the multiply-and-accumulate operations by more than three orders of magnitudes.	翻訳日:2023-04-26 20:14:15 公開日:2023-04-25
# 簡易性と有効性を追求する: 分布特性テストのための量子アルゴリズム Striving for simplicity and effectiveness: quantum algorithm for distribution property testing ( http://arxiv.org/abs/2304.12916v1 ) ライセンス: Link先を確認	Jingquan Luo and Lvzhou Li	(参考訳) 分布特性の試験法の基本問題に対する潜在的な量子スピードアップについて検討する。特に、2つの異なる問題に焦点を当てている: 1つは、2つの未知の古典分布が十分に近いか遠くにあるかをテストし、もう1つは$\{0, 1\}^n$ 上の与えられた分布が$k$-wise 一様か、あるいは任意の$k$-wise 一様分布から遠く離れているかをテストすることである。最初の問題として、現在最高の量子アルゴリズムを$l^1$-distanceと$l^2$-distanceのメトリクスで提案する。量子特異値変換 (qsvt) の手法に依存する \cite{gilyen2019distributional} の最新の結果と比較すると, アルゴリズムはより簡潔なだけでなく, より効率的である。後者の問題に対しては、最先端の古典的アルゴリズムを2次高速化する最初の量子アルゴリズムを提案する。量子アルゴリズムの分析は、従来のものよりもはるかに直感的で簡潔であることは注目に値する。 We explore potential quantum speedups for the fundamental problem of testing properties of distributions. In particular, we focus on two different problems: the first one is to test whether two unknown classical distributions are close or far enough, and the second one is to test whether a given distribution over $\{0, 1\}^n$ is $k$-wise uniform or far from any $k$-wise uniform distribution. For the first problem, we propose the currently best quantum algorithm under the metrics of $l^1$-distance and $l^2$-distance. Compared with the latest result given in \cite{gilyen2019distributional} which relied on the technique of quantum singular value transformation (QSVT), our algorithm is not only more concise, but also more efficient. For the latter problem, we propose the first quantum algorithm achieving a quadratic speedup over the state-of-the-art classical algorithm. It is worthy noting that the analysis of our quantum algorithm is much more intuitive and concise than that of the classical one.	翻訳日:2023-04-26 20:06:18 公開日:2023-04-25
# 動的断熱工学による例外点近傍のキラルおよび非キラル急速モード変換 Chiral and non-chiral swift mode conversion near an exception point with dynamic adiabaticity engineering ( http://arxiv.org/abs/2304.12912v1 ) ライセンス: Link先を確認	Dong Wang, Wen-Xi Huang, Pei-Chao Cao, Yu-Gui Peng, Xue-Feng Zhu, Ying Li	(参考訳) 非エルミート的ハミルトニアンの固有値は、しばしば自己交差リーマン曲面を形成するため、ハミルトニアンが例外点 (EP) の周りの特定のループ経路に沿って進化する際、ユニークなモード変換現象を引き起こす。モード変換の速度は断熱的な要求によって制限され、キラリティーは自由に制御できない。我々は、同じ経路上でカイラルモードと非カイラルモードの変換を可能にする非エルミートハミルトニアンの進化における動的工学的断熱法を提案する。本手法は, 即時断熱性の定量化と制御を基本とし, 経路全体の不均一な進化を可能にする。断熱性の分布的性質に基づく進化を最適化することにより、従来の準断熱的進化と同じ品質を3分の1の時間で達成する。我々のアプローチはEPを取り巻くスピードとキラリティの問題に対処するための包括的で普遍的な解決策を提供する。また、非断熱的なプロセスの動的な操作と制御が容易になり、操作を加速し、様々なモード変換パターンを選択できる。 The eigenvalue of a non-Hermitian Hamiltonian often forms a self-intersecting Riemann surface, leading to a unique mode conversion phenomenon when the Hamiltonian evolves along certain loop paths around an exceptional point (EP). However, two fundamental problems exist with the conventional scheme of EP encircling: the speed of mode conversion is restricted by the adiabatic requirement, and the chirality cannot be freely controlled. We introduce a method for dynamically engineering adiabaticity in the evolution of non-Hermitian Hamiltonians that allows for both chiral and non-chiral mode conversion on the same path. Our method is based on quantifying and controlling the instantaneous adiabaticity, allowing for non-uniform evolution throughout the entire path. By optimizing the evolution based on the distributed nature of adiabaticity, we achieve the same quality as conventional quasi-adiabatic evolution in only one-third of the time. Our approach provides a comprehensive and universal solution to address the speed and chirality challenges associated with EP encircling. It also facilitates the dynamic manipulation and regulation of non-adiabatic processes, thereby accelerating the operation and allowing for a selection among various mode conversion patterns.	翻訳日:2023-04-26 20:05:59 公開日:2023-04-25
# インプシット生成モデルのためのスコア差流 The Score-Difference Flow for Implicit Generative Modeling ( http://arxiv.org/abs/2304.12906v1 ) ライセンス: Link先を確認	Romann M. Weber	(参考訳) 暗黙的生成モデリング(igm)は、ターゲットデータ分布の特性にマッチする合成データのサンプルを作成することを目的としている。最近の研究(例えばスコアマッチングネットワーク、拡散モデル)は、動的摂動や周囲空間の流れを通じて、合成音源データを目標分布へ押し上げるという観点から、igm問題にアプローチしている。 We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling. Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.	翻訳日:2023-04-26 20:05:38 公開日:2023-04-25
# 可変原子ミラーを用いた非エルミート導波路キャビティQED Non-Hermitian Waveguide Cavity QED with Tunable Atomic Mirrors ( http://arxiv.org/abs/2304.12897v1 ) ライセンス: Link先を確認	Wei Nie, Tao Shi, Yu-xi Liu, Franco Nori	(参考訳) 光鏡は光反射により空洞特性を決定する。不完全な反射は光子損失を伴う開空洞を引き起こす。可変反射スペクトルを持つ原子-二量体ミラーからなる開空洞について検討した。原子空洞は反$\mathcal{PT}$対称性を示す。鏡内の原子カップリングによって制御される反$\mathcal{PT}$相転移は、2つの退化キャビティスーパーモデムの出現を示す。興味深いことに、強いコヒーレントな空洞-原子結合を実現するためにミラー反射のしきい値が同定される。この反射閾値は、良好なキャビティを生み出すために原子鏡の基準を明らかにする。さらに、プローブ原子を持つキャビティ量子電磁力学は、キャビティとプローブ原子によって形成される反射依存性のポーラリトンを含むミラーチューニング特性を示す。我々の研究は、反$\mathcal{PT}$原子空洞の非エルミート理論を示し、量子光学や量子計算に応用できるかもしれない。 Optical mirrors determine cavity properties by means of light reflection. Imperfect reflection gives rise to open cavities with photon loss. We study an open cavity made of atom-dimer mirrors with a tunable reflection spectrum. We find that the atomic cavity shows anti-$\mathcal{PT}$ symmetry. The anti-$\mathcal{PT}$ phase transition controlled by atomic couplings in mirrors indicates the emergence of two degenerate cavity supermodes. Interestingly, a threshold of mirror reflection is identified for realizing strong coherent cavity-atom coupling. This reflection threshold reveals the criterion of atomic mirrors to produce a good cavity. Moreover, cavity quantum electrodynamics with a probe atom shows mirror-tuned properties, including reflection-dependent polaritons formed by the cavity and probe atom. Our work presents a non-Hermitian theory of an anti-$\mathcal{PT}$ atomic cavity, which may have applications in quantum optics and quantum computation.	翻訳日:2023-04-26 20:05:19 公開日:2023-04-25
# グラフ生成アルゴリズムの発見 Discovering Graph Generation Algorithms ( http://arxiv.org/abs/2304.12895v1 ) ライセンス: Link先を確認	Mihai Babiac, Karolis Martinkus and Roger Wattenhofer	(参考訳) グラフ生成モデルを構築するための新しいアプローチを提案する。従来の確率モデルや深層生成モデルを使う代わりに、データを生成するアルゴリズムを見つけることを提案する。ランダムに初期化グラフニューラルネットワークによって実装された進化探索と強力な適合関数を用いてこれを実現する。これは、例えば、最後のグラフ生成プロセスがPython関数として表現されるため、トレーニング外分布の一般化と直接解釈可能性を高めるために、現在の深層生成モデルに対していくつかの利点をもたらす。我々は、このアプローチが深い生成モデルと競合し、ある状況下では真のグラフ生成プロセスを見つけることさえでき、それが完全に一般化できることを示す。 We provide a novel approach to construct generative models for graphs. Instead of using the traditional probabilistic models or deep generative models, we propose to instead find an algorithm that generates the data. We achieve this using evolutionary search and a powerful fitness function, implemented by a randomly initialized graph neural network. This brings certain advantages over current deep generative models, for instance, a higher potential for out-of-training-distribution generalization and direct interpretability, as the final graph generative process is expressed as a Python function. We show that this approach can be competitive with deep generative models and under some circumstances can even find the true graph generative process, and as such perfectly generalize.	翻訳日:2023-04-26 20:05:07 公開日:2023-04-25
# 正確な不確実性定量化を伴う生成降水流の潜時拡散モデル Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification ( http://arxiv.org/abs/2304.12891v1 ) ライセンス: Link先を確認	Jussi Leinonen, Ulrich Hamann, Daniele Nerini, Urs Germann, Gabriele Franch	(参考訳) 拡散モデルは画像生成に広く採用されており、gans(generative adversarial network)よりも高品質で多様なサンプルを生成する。本研究では,最新の観測データに基づく降水予測のための潜時拡散モデル (ldm) を提案する。 LDMはより安定しており、GANよりも訓練に少ない計算を必要とする。 GANをベースとしたDGMR(Deep Generative Models of Rainfall)と統計モデルPySTEPSを比較検討した。 ldmはより正確な降水予測を生成するが、降水が事前定義された閾値を超えるかどうかを予測する場合の比較はより混合される。 LDMの最も明確な利点は、DGMRやPySTEPSよりも多様な予測を生成することである。ランク分布試験は, LDMからの試料の分布が予測の不確かさを正確に反映していることを示す。したがって、LCMは気象や気候など不確実性定量化が重要であるあらゆる応用に期待できる。 Diffusion models have been widely adopted in image generation, producing higher-quality and more diverse samples than generative adversarial networks (GANs). We introduce a latent diffusion model (LDM) for precipitation nowcasting - short-term forecasting based on the latest observational data. The LDM is more stable and requires less computation to train than GANs, albeit with more computationally expensive generation. We benchmark it against the GAN-based Deep Generative Models of Rainfall (DGMR) and a statistical model, PySTEPS. The LDM produces more accurate precipitation predictions, while the comparisons are more mixed when predicting whether the precipitation exceeds predefined thresholds. The clearest advantage of the LDM is that it generates more diverse predictions than DGMR or PySTEPS. Rank distribution tests indicate that the distribution of samples from the LDM accurately reflects the uncertainty of the predictions. Thus, LDMs are promising for any applications where uncertainty quantification is important, such as weather and climate.	翻訳日:2023-04-26 20:04:56 公開日:2023-04-25
# internet-of-thingsにおける信頼性の高い実行環境におけるセキュアアグリゲーションによるブロックチェーンベースのフェデレーション学習 Blockchain-based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things ( http://arxiv.org/abs/2304.12889v1 ) ライセンス: Link先を確認	Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammad Saidur Rahman, Mohammed Atiquzzaman, Xun Yi, and Mahathir Almashor	(参考訳) 本稿では,Intel Software Guard Extension (SGX) ベースのTrusted Execution Environment (TEE) を用いたブロックチェーンベースのフェデレートラーニング(FL)フレームワークを提案する。 FLでは、ローカルモデルは攻撃者によって改ざんされる。したがって、改ざんされた局所モデルから生成される大域的なモデルは誤りである。そのため、提案フレームワークはセキュアなモデルアグリゲーションにブロックチェーンネットワークを活用する。各ブロックチェーンノードはSGX対応プロセッサをホストし、FLベースの集約タスクを安全に実行してグローバルモデルを生成する。ブロックチェーンノードは、集約されたモデルの信頼性を検証し、モデルの完全性を保証するためにブロックチェーンコンセンサスメカニズムを実行し、タンパ保護ストレージのために分散台帳に追加することができる。各クラスタは、ブロックチェーンから集約モデルを取得し、それを使用する前にその整合性を検証することができる。提案フレームワークの性能を評価するために,様々なcnnモデルとデータセットを用いていくつかの実験を行った。 This paper proposes a blockchain-based Federated Learning (FL) framework with Intel Software Guard Extension (SGX)-based Trusted Execution Environment (TEE) to securely aggregate local models in Industrial Internet-of-Things (IIoTs). In FL, local models can be tampered with by attackers. Hence, a global model generated from the tampered local models can be erroneous. Therefore, the proposed framework leverages a blockchain network for secure model aggregation. Each blockchain node hosts an SGX-enabled processor that securely performs the FL-based aggregation tasks to generate a global model. Blockchain nodes can verify the authenticity of the aggregated model, run a blockchain consensus mechanism to ensure the integrity of the model, and add it to the distributed ledger for tamper-proof storage. Each cluster can obtain the aggregated model from the blockchain and verify its integrity before using it. We conducted several experiments with different CNN models and datasets to evaluate the performance of the proposed framework.	翻訳日:2023-04-26 20:04:39 公開日:2023-04-25
# 二重対向的偏りによる分布外証拠認識フェイクニュース検出 Out-of-distribution Evidence-aware Fake News Detection via Dual Adversarial Debiasing ( http://arxiv.org/abs/2304.12888v1 ) ライセンス: Link先を確認	Qiang Liu, Junfei Wu, Shu Wu, Liang Wang	(参考訳) Evidence-aware fake news detectionは、ニュースコンテンツに基づいて検索されるニュースとエビデンスの間の推論を行い、一様性や矛盾を見つけることを目的としている。しかし,エビデンス認識検出モデルでは,ニュース・エビデンスコンテンツと真・偽のニュースラベルの相関関係がみられ,アウトオブオフ・ディストリビューション(OOD)の状況に一般化することは困難である。そこで本研究では,新しい対向学習(dal)手法を提案する。 DALには、真偽のニュースラベルをターゲットとするニュースアスペクションとエビデンスアスペクティブアスペクティブアスペクティブアスペクティブデバイアスニングの識別器が組み込まれている。そして、DALは、ニュースやエビデンスコンテンツバイアスの影響を軽減するために、ニュース・アスペクトとエビデンス・エイビデンス・デバイアスをリバースに最適化する。同時に、DALはメインのフェイクニュース予測器を最適化し、ニュース・エビデンス・インタラクション・モジュールを学習できるようにする。このプロセスにより、エビデンスを意識した偽ニュース検出モデルを教え、ニュースエビデンス推論をより効果的に実施し、コンテンツバイアスの影響を最小限に抑えることができる。ちなみに、提案しているdalアプローチは、既存のバックボーンとうまく連携するプラグアンドプレイモジュールです。 2つのOOD設定下で総合的な実験を行い、4つの証拠を意識した偽ニュース検出バックボーンにDALを挿入する。その結果、DALは元の背骨といくつかの競争的脱バイアス法を著しく、安定的に上回っていることがわかった。 Evidence-aware fake news detection aims to conduct reasoning between news and evidence, which is retrieved based on news content, to find uniformity or inconsistency. However, we find evidence-aware detection models suffer from biases, i.e., spurious correlations between news/evidence contents and true/fake news labels, and are hard to be generalized to Out-Of-Distribution (OOD) situations. To deal with this, we propose a novel Dual Adversarial Learning (DAL) approach. We incorporate news-aspect and evidence-aspect debiasing discriminators, whose targets are both true/fake news labels, in DAL. Then, DAL reversely optimizes news-aspect and evidence-aspect debiasing discriminators to mitigate the impact of news and evidence content biases. At the same time, DAL also optimizes the main fake news predictor, so that the news-evidence interaction module can be learned. This process allows us to teach evidence-aware fake news detection models to better conduct news-evidence reasoning, and minimize the impact of content biases. To be noted, our proposed DAL approach is a plug-and-play module that works well with existing backbones. We conduct comprehensive experiments under two OOD settings, and plug DAL in four evidence-aware fake news detection backbones. Results demonstrate that, DAL significantly and stably outperforms the original backbones and some competitive debiasing methods.	翻訳日:2023-04-26 20:04:21 公開日:2023-04-25
# 関数近似を用いた効率的なオンラインRLにおける一般的なカバレッジ条件の利点 Provable benefits of general coverage conditions in efficient online RL with function approximation ( http://arxiv.org/abs/2304.12886v1 ) ライセンス: Link先を確認	Fanghui Liu, Luca Viano, Volkan Cevher	(参考訳) オンライン強化学習(RL)では、マルコフ決定プロセス(MDP)の標準的な構造仮定を採用する代わりに、特定のカバレッジ条件(元々オフラインRLから)を用いることで、サンプル効率の保証を確保するのに十分である(Xie et al. 2023)。本研究では,この新たな方向性に焦点をあてて,より可能かつ一般的なカバレッジ条件を掘り下げ,効率的なオンラインrlにおけるその可能性と有用性について検討する。我々は、集中度の変化、密度比の再現性、部分/レスト被覆条件でのトレードオフなど、より多くの概念を同定し、サンプル効率の良いオンラインRLにも有益であり、改善された後悔境界を達成できる。さらに,オンラインrlでは,探索的オフラインデータを用いることで,統計的かつ計算効率のよい保証を実現することができる。さらに、mdp構造(例えば線形mdp)が与えられたとしても、良好なカバレッジ条件は、$\widetilde{o}(\sqrt{t})$ を超えるより早い後悔を得るのに有益であり、また対数順序の後悔も得られる。これらの結果は、効率的なオンラインRLにおける一般的なカバレッジ条件の使用を正当化する。 In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL. We identify more concepts, including the $L^p$ variant of concentrability, the density ratio realizability, and trade-off on the partial/rest coverage condition, that can be also beneficial to sample-efficient online RL, achieving improved regret bound. Furthermore, if exploratory offline data are used, under our coverage conditions, both statistically and computationally efficient guarantees can be achieved for online RL. Besides, even though the MDP structure is given, e.g., linear MDP, we elucidate that, good coverage conditions are still beneficial to obtain faster regret bound beyond $\widetilde{O}(\sqrt{T})$ and even a logarithmic order regret. These results provide a good justification for the usage of general coverage conditions in efficient online RL.	翻訳日:2023-04-26 20:03:53 公開日:2023-04-25
# ポテンシャル流としての生成モデルにおける潜在トラバース Latent Traversals in Generative Models as Potential Flows ( http://arxiv.org/abs/2304.12944v1 ) ライセンス: Link先を確認	Yue Song, Andy Keller, Nicu Sebe, Max Welling	(参考訳) 深層生成モデルにおける最近の顕著な進歩にもかかわらず、それらの潜在空間の構造はいまだに理解されていないため、意味論的に意味のある潜在トラバーサルの実行はオープンな研究課題である。ほとんどの先行研究はこの課題を、潜在構造を線形にモデル化し、対応する線形方向を見出すことで解決することを目的としている。そこで本研究では,学習された動的ポテンシャルランドスケープを持つ潜在構造物をモデル化し,ランドスケープの勾配を下るサンプルの流れとして潜在トラバースを行う。物理、最適輸送、神経科学にインスパイアされたこれらの潜在的景観は、物理的に現実的な偏微分方程式として学習され、空間と時間の両方で柔軟に変化する。絡み合いを実現するために、複数の電位を同時に学習し、分類器によって区別され、意味的に自己整合する。実験により,本手法は最先端のベースラインよりも定性的かつ定量的に歪んだ軌跡を達成できることが実証された。さらに,本手法をトレーニング中に正規化項として統合することにより,構造化表現の学習に対する帰納的バイアスとして作用し,最終的に類似した構造化データに対するモデル可能性を向上させることを実証する。 Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.	翻訳日:2023-04-26 19:57:00 公開日:2023-04-25
# ユーザ中心のフェデレーション学習:パーソナライズのためのワイヤレスリソースの取引 User-Centric Federated Learning: Trading off Wireless Resources for Personalization ( http://arxiv.org/abs/2304.12930v1 ) ライセンス: Link先を確認	Mohamad Mestoukirdi, Matteo Zecchin, David Gesbert, Qianrui Li	(参考訳) フェデレートラーニング(FL)システムにおけるクライアント間の統計的不均一性はアルゴリズム収束時間を増加させ、一般化性能を低下させ、貧弱なモデルに見返りに大きな通信オーバーヘッドをもたらす。 FLが課すプライバシー制約に違反することなく、上記の問題に対処するためには、パーソナライズされたFLメソッドは、プライバシー保護転送を保証するために、データに直接アクセスすることなく、統計的に類似したクライアントを結合する必要がある。本研究では,パラメータサーバ(PS)におけるユーザ中心のアグリゲーションルールを設計し,容易に利用可能な勾配情報に基づいて各FLクライアントに対してパーソナライズされたモデルを生成する。提案する集約ルールは,重み付き集計経験的リスク最小化器の上限値に着想を得たものである。第2に,ユーザクラスタリングに基づく通信効率のよい変種を導出し,通信制約のあるシステムへの適用性を大幅に向上させる。提案アルゴリズムは,平均精度,ノード性能,通信オーバヘッドの訓練において,パーソナライズされたFLベースラインを上回っている。 Statistical heterogeneity across clients in a Federated Learning (FL) system increases the algorithm convergence time and reduces the generalization performance, resulting in a large communication overhead in return for a poor model. To tackle the above problems without violating the privacy constraints that FL imposes, personalized FL methods have to couple statistically similar clients without directly accessing their data in order to guarantee a privacy-preserving transfer. In this work, we design user-centric aggregation rules at the parameter server (PS) that are based on readily available gradient information and are capable of producing personalized models for each FL client. The proposed aggregation rules are inspired by an upper bound of the weighted aggregate empirical risk minimizer. Secondly, we derive a communication-efficient variant based on user clustering which greatly enhances its applicability to communication-constrained systems. Our algorithm outperforms popular personalized FL baselines in terms of average accuracy, worst node performance, and training communication overhead.	翻訳日:2023-04-26 19:55:45 公開日:2023-04-25
# ベイズ最適化のための量子ガウス過程回帰 Quantum Gaussian Process Regression for Bayesian Optimization ( http://arxiv.org/abs/2304.12923v1 ) ライセンス: Link先を確認	Frederic Rapp and Marco Roth	(参考訳) ガウス過程回帰は確立されたベイズ機械学習手法である。本稿では,パラメータ化量子回路に基づく量子カーネルを用いたガウス過程の回帰手法を提案する。ハードウェア効率の良い特徴写像とグラム行列の注意的な正則化を用いて、得られた量子ガウス過程の分散情報を保存できることを実証する。また,量子ガウス過程がベイズ最適化の代用モデルとして利用できることを示す。この量子ベイズ最適化アルゴリズムの性能を示すために,実世界のデータセット上で回帰を行う機械学習モデルのハイパーパラメータ最適化に適用する。我々は,量子ベイズ最適化を古典版と比較し,量子版がその性能に合致することを示す。 Gaussian process regression is a well-established Bayesian machine learning method. We propose a new approach to Gaussian process regression using quantum kernels based on parameterized quantum circuits. By employing a hardware-efficient feature map and careful regularization of the Gram matrix, we demonstrate that the variance information of the resulting quantum Gaussian process can be preserved. We also show that quantum Gaussian processes can be used as a surrogate model for Bayesian optimization, a task that critically relies on the variance of the surrogate model. To demonstrate the performance of this quantum Bayesian optimization algorithm, we apply it to the hyperparameter optimization of a machine learning model which performs regression on a real-world dataset. We benchmark the quantum Bayesian optimization against its classical counterpart and show that quantum version can match its performance.	翻訳日:2023-04-26 19:55:27 公開日:2023-04-25
# gmnlp at semeval-2023 task 12: 系統別アダプタを用いた感情分析 GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters ( http://arxiv.org/abs/2304.12979v1 ) ライセンス: Link先を確認	Md Mahfuz Ibn Alam, Ruoyu Xie, Fahim Faisal, Antonios Anastasopoulos	(参考訳) 本報告では,SemEval-2023共有タスクAfriSenti-SemEvalに対するGMUの感情分析システムについて述べる。我々は,モノリンガル,マルチリンガル,ゼロショットの3つのサブタスクに参加した。 AfroXLMR-largeはアフリカ語で訓練された訓練済みの多言語言語モデルである。また,オリジナルトレーニングデータとともに拡張トレーニングデータも導入する。微調整と並行して,複数のモデルを作成し,最終提案に最適なモデルをアサンブルするために,系統ベースのアダプタチューニングを行う。本システムでは,5:Amharicで最高のF1スコアを達成し,F1スコアを6.2ポイント上回っている。システム全体では、全15トラックに参加する10のシステムの中で5位です。 This report describes GMU's sentiment analysis system for the SemEval-2023 shared task AfriSenti-SemEval. We participated in all three sub-tasks: Monolingual, Multilingual, and Zero-Shot. Our approach uses models initialized with AfroXLMR-large, a pre-trained multilingual language model trained on African languages and fine-tuned correspondingly. We also introduce augmented training data along with original training data. Alongside finetuning, we perform phylogeny-based adapter tuning to create several models and ensemble the best models for the final submission. Our system achieves the best F1-score on track 5: Amharic, with 6.2 points higher F1-score than the second-best performing system on this track. Overall, our system ranks 5th among the 10 systems participating in all 15 tracks.	翻訳日:2023-04-26 19:48:45 公開日:2023-04-25
# 逆強化学習の理論的理解に向けて Towards Theoretical Understanding of Inverse Reinforcement Learning ( http://arxiv.org/abs/2304.12966v1 ) ライセンス: Link先を確認	Alberto Maria Metelli, Filippo Lazzati, Marcello Restelli	(参考訳) 逆強化学習(IRL)は、専門家が示す振る舞いを正当化する報酬関数を回復するアルゴリズムの強力なファミリーである。 IRLのよく知られた制限は、観察された振る舞いを説明する複数の報酬が存在するため、報酬関数の選択の曖昧さである。この制限は、IRLを実現可能な報酬セット、すなわち専門家の行動に適合する報酬の領域を推定する問題として定式化することによって、近年回避されている。本稿では、生成モデルを用いた有限ホライゾン問題において、irlの理論ギャップを閉じる一歩を踏み出す。まず、実現可能な報酬セット、対応するPAC要件を推定し、特定の報酬のクラスの性質を議論する問題を正式に導入することから始める。次に、サンプル複雑性に関する最初のミニマックス下界を、次数${\Omega}\Bigl( \frac{H^3SA}{\epsilon^2} \bigl( \log \bigl(\frac{1}{\delta}\bigl) + S \bigl)\Bigl)$, $S$と$A$のそれぞれ状態と動作の数、水平線$H$、$\epsilon$所望の精度、$\delta$の信頼度を推定する問題に対して与える。均一サンプリング戦略 (us-irl) のサンプル複雑性を分析し, 対数因子に対する上限値の一致を証明した。最後に、IRLにおけるいくつかのオープンな質問について概説し、今後の研究方向性を提案する。 Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent. A well-known limitation of IRL is the ambiguity in the choice of the reward function, due to the existence of multiple rewards that explain the observed behavior. This limitation has been recently circumvented by formulating IRL as the problem of estimating the feasible reward set, i.e., the region of the rewards compatible with the expert's behavior. In this paper, we make a step towards closing the theory gap of IRL in the case of finite-horizon problems with a generative model. We start by formally introducing the problem of estimating the feasible reward set, the corresponding PAC requirement, and discussing the properties of particular classes of rewards. Then, we provide the first minimax lower bound on the sample complexity for the problem of estimating the feasible reward set of order ${\Omega}\Bigl( \frac{H^3SA}{\epsilon^2} \bigl( \log \bigl(\frac{1}{\delta}\bigl) + S \bigl)\Bigl)$, being $S$ and $A$ the number of states and actions respectively, $H$ the horizon, $\epsilon$ the desired accuracy, and $\delta$ the confidence. We analyze the sample complexity of a uniform sampling strategy (US-IRL), proving a matching upper bound up to logarithmic factors. Finally, we outline several open questions in IRL and propose future research directions.	翻訳日:2023-04-26 19:48:31 公開日:2023-04-25
# ユニタリ回路ゲームにおける絡み合い遷移 Entanglement Transitions in Unitary Circuit Games ( http://arxiv.org/abs/2304.12965v1 ) ライセンス: Link先を確認	Ra\'ul Morral-Yepes, Adam Smith, S. L. Sondhi, Frank Pollmann	(参考訳) ユニタリ回路における繰り返しの投影的測定は、測定速度が調整されるにつれて、絡み合い相転移を引き起こす可能性がある。そこで本研究では,射影測度を動的に選択したユニタリゲートに置き換え,絡み合いを最小限に抑える異なる設定について考察する。これは、2人のプレーヤーがランダムに割り当てられた結合に異なるレートでユニタリゲートを配置する1次元のユニタリ回路ゲームであると見なすことができる。異端者」は状態に関する限られた知識に基づいて、割り当てられた結合上の絡み合いのエントロピーを減らし、有限(領域法則)の絡み合いに制限することを目的としてユニタリゲートを選択する。結果として生じる絡み合いのダイナミクスを明らかにするために、3つの異なるシナリオを考えます。 (i)古典的な離散高さモデル (ii)クリフォード回路、及び (iii)一般的な$U(4)$ユニタリ回路。古典的回路モデルとクリフォード回路モデルの両方が、確率的フレドキン連鎖との接続を通して理解できるような類似した性質を持つゲートを解離器が配置する速度の関数として位相遷移を示す。対照的に、''entangler'' は常にハールランダムなユニタリゲートを使用するときに勝利し、エンタングリングの非ゼロな速度に対する広範囲な体積法的な絡み合いを観察する。 Repeated projective measurements in unitary circuits can lead to an entanglement phase transition as the measurement rate is tuned. In this work, we consider a different setting in which the projective measurements are replaced by dynamically chosen unitary gates that minimize the entanglement. This can be seen as a one-dimensional unitary circuit game in which two players get to place unitary gates on randomly assigned bonds at different rates: The "entangler" applies a random local unitary gate with the aim of generating extensive (volume law) entanglement. The "disentangler", based on limited knowledge about the state, chooses a unitary gate to reduce the entanglement entropy on the assigned bond with the goal of limiting to only finite (area law) entanglement. In order to elucidate the resulting entanglement dynamics, we consider three different scenarios: (i) a classical discrete height model, (ii) a Clifford circuit, and (iii) a general $U(4)$ unitary circuit. We find that both the classical and Clifford circuit models exhibit phase transitions as a function of the rate that the disentangler places a gate, which have similar properties that can be understood through a connection to the stochastic Fredkin chain. In contrast, the ``entangler'' always wins when using Haar random unitary gates and we observe extensive, volume law entanglement for all non-zero rates of entangling.	翻訳日:2023-04-26 19:47:55 公開日:2023-04-25
# chameleon: 連合学習における耐久性のあるバックドアの植え付けのためのピアイメージへの適応 Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning ( http://arxiv.org/abs/2304.12961v1 ) ライセンス: Link先を確認	Yanbo Dai, Songze Li	(参考訳) フェデレーション学習(fl)システムでは、分散クライアントはローカルモデルを中央サーバにアップロードしてグローバルモデルに集約する。悪意のあるクライアントは、毒入りのローカルモデルをアップロードすることで、グローバルモデルにバックドアを植え込み、特定のパターンを持つ画像を一部のターゲットラベルに誤分類する。現在の攻撃によって植えられたバックドアは耐久性がなく、攻撃者がモデル中毒をやめるとすぐに消滅する。本稿では,flバックドアの耐久性と良性画像と有毒画像との関係について検討する。具体的には、原画像と被毒画像の標的ラベルとの良性画像は、バックドア耐久性に重要な影響を及ぼす。そこで我々は,より耐久性の高いバックドアに向けて,その効果をさらに増幅するためにコントラスト学習を利用する新たな攻撃であるChameleonを提案する。広範な実験により、chameleonは、幅広い画像データセット、バックドアタイプ、およびモデルアーキテクチャに対して、ベースラインよりもバックドアの寿命を12\times \sim 4\times$で大幅に伸ばすことが示されている。 In a federated learning (FL) system, distributed clients upload their local models to a central server to aggregate into a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing images with specific patterns to be misclassified into some target labels. Backdoors planted by current attacks are not durable, and vanish quickly once the attackers stop model poisoning. In this paper, we investigate the connection between the durability of FL backdoors and the relationships between benign images and poisoned images (i.e., the images whose labels are flipped to the target label during local training). Specifically, benign images with the original and the target labels of the poisoned images are found to have key effects on backdoor durability. Consequently, we propose a novel attack, Chameleon, which utilizes contrastive learning to further amplify such effects towards a more durable backdoor. Extensive experiments demonstrate that Chameleon significantly extends the backdoor lifespan over baselines by $1.2\times \sim 4\times$, for a wide range of image datasets, backdoor types, and model architectures.	翻訳日:2023-04-26 19:47:28 公開日:2023-04-25
# 機械翻訳における文レベルパラダイムの回避 Escaping the sentence-level paradigm in machine translation ( http://arxiv.org/abs/2304.12959v1 ) ライセンス: Link先を確認	Matt Post and Marcin Junczys-Dowmunt	(参考訳) 文書の文脈は、翻訳の曖昧さを解消するのに不可欠であり、実際、文書の設定は、ほぼ全ての翻訳にとって最も自然な設定である。したがって、機械翻訳(研究と生産の両方)が数十年前の文レベルの翻訳パラダイムに留まっているのは残念である。また、ドキュメントベースの大規模言語モデルによる競合的なプレッシャーに照らされつつある問題でもある。文書-文脈機械翻訳の多くの作業は存在するが、様々な理由により、保持できない。本稿では,3つの障害を一度に解決することで,このルートから抜け出す方法を提案する。ドキュメントレベルの情報をどこで取得すればよいのか? それが良いかどうかどうやって知るのか? 特殊アーキテクチャの作業とは対照的に,標準的な Transformer アーキテクチャでは十分なキャパシティがあれば十分であることを示す。次に、データ提供が容易であるだけでなく、機械翻訳出力を含むかもしれない並列文書データよりも高品質である逆変換データのみから文書サンプルを取ることで、トレーニングデータ問題に対処する。最後に,文書システム間でより識別し易い既存のコントラスト指標の生成変種を提案する。大規模な4つの言語ペア(DE$\rightarrow$EN, EN$\rightarrow$DE, EN$\rightarrow$FR, EN$\rightarrow$RU)の結果は、ドキュメントレベルのパフォーマンスを改善するために、これら3つを一緒に成功させる。 It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation. It is therefore unfortunate that machine translation -- both research and production -- largely remains stuck in a decades-old sentence-level translation paradigm. It is also an increasingly glaring problem in light of competitive pressure from large language models, which are natively document-based. Much work in document-context machine translation exists, but for various reasons has been unable to catch hold. This paper suggests a path out of this rut by addressing three impediments at once: what architectures should we use? where do we get document-level information for training them? and how do we know whether they are any good? In contrast to work on specialized architectures, we show that the standard Transformer architecture is sufficient, provided it has enough capacity. Next, we address the training data issue by taking document samples from back-translated data only, where the data is not only more readily available, but is also of higher quality compared to parallel document data, which may contain machine translation output. Finally, we propose generative variants of existing contrastive metrics that are better able to discriminate among document systems. Results in four large-data language pairs (DE$\rightarrow$EN, EN$\rightarrow$DE, EN$\rightarrow$FR, and EN$\rightarrow$RU) establish the success of these three pieces together in improving document-level performance.	翻訳日:2023-04-26 19:47:07 公開日:2023-04-25
# 高レベルロボット説明のための逆解法について A Closer Look at Reward Decomposition for High-Level Robotic Explanations ( http://arxiv.org/abs/2304.12958v1 ) ライセンス: Link先を確認	Wenhao Lu, Sven Magg, Xufeng Zhao, Martin Gromniak, Stefan Wermter	(参考訳) ロボットのような知的エージェントの振る舞いを人間に説明することは、その理解不能な摂理状態、変動的中間的目標、結果として予測不可能性のために困難である。さらに、強化学習エージェントのワンステップ説明は、各遷移におけるエージェントの将来の振る舞いを説明できないため曖昧であり、ロボットの動作を説明する複雑さが増す。タスク固有のプリミティブにマップする抽象的なアクションを活用することで、動作レベルの説明を避けることができる。提案するフレームワークは、報酬分解(RD)と抽象的な行動空間を組み合わせて説明可能な学習フレームワークとし、タスクのオブジェクト特性に基づいた曖昧で高レベルな説明を可能にする。本研究では,人間の理解が容易なRD説明の出力成果から,視覚的・テキスト的説明を提示する2つのロボットシナリオの定量的・定性的な分析を通じて,フレームワークの有効性を実証する。さらに,これらのアーティファクトを大規模言語モデルと統合して推論やインタラクティブなクエリを行う汎用性を示す。 Explaining the behavior of intelligent agents such as robots to humans is challenging due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for reinforcement learning agents can be ambiguous as they fail to account for the agent's future behavior at each transition, adding to the complexity of explaining robot actions. By leveraging abstracted actions that map to task-specific primitives, we avoid explanations on the movement level. Our proposed framework combines reward decomposition (RD) with abstracted action spaces into an explainable learning framework, allowing for non-ambiguous and high-level explanations based on object properties in the task. We demonstrate the effectiveness of our framework through quantitative and qualitative analysis of two robot scenarios, showcasing visual and textual explanations, from output artifacts of RD explanation, that are easy for humans to comprehend. Additionally, we demonstrate the versatility of integrating these artifacts with large language models for reasoning and interactive querying.	翻訳日:2023-04-26 19:46:45 公開日:2023-04-25
# 量子スピン鎖における弦破れの動的局在転移 Dynamical localization transition of string breaking in quantum spin chains ( http://arxiv.org/abs/2304.12957v1 ) ライセンス: Link先を確認	Roberto Verdel and Guo-Yi Zhu and Markus Heyl	(参考訳) 2つの電荷を繋ぐ弦の分裂は、ゲージ理論における最も驚くべき現象の1つである。この過程のダイナミクスは最近の集中的な研究の主題であり、多くの数値的な結果が次の二分法を示唆している。ここでは, この二分法の基礎となるメカニズムとして, 動的局在遷移を提唱する。この目的のために、閉じ込められたスピン鎖の光中間セクターにおける効果的な弦破壊記述を導出し、この問題をフォック空間における動的局所化遷移と見なすことができることを示す。高速および抑制された文字列破壊ダイナミクスは、それぞれ非局在化および局所化動作と識別される。次に、弦が中間子浴に浸漬された「不純物」として表される量子不純物モデルへの動的弦破れ問題のさらなる軽減を与える。この現象学モデルは局所化-非局在化遷移を特徴とし、定性的に異なる弦の破れ状態を理解するための一般的で単純な物理的基礎を与える。これらの発見は1次元以上の密閉格子モデルのより広いクラスに直接関係しており、現在のrydberg量子シミュレータで実現することができる。 The fission of a string connecting two charges is one of the most astounding phenomena in confining gauge theories. The dynamics of this process has been the subject of recent intensive studies, in which plenty of numerical results suggest the following dichotomy: the confining string can decay relatively fast or persist up to extremely long times. Here, we put forward a dynamical localization transition as the mechanism underlying this dichotomy. To this end, we derive an effective string breaking description in the light-meson sector of a confined spin chain and show that the problem can be regarded as a dynamical localization transition in Fock space. Fast and suppressed string breaking dynamics are identified with delocalized and localized behavior, respectively. We then provide a further reduction of the dynamical string breaking problem onto a quantum impurity model, where the string is represented as an "impurity" immersed in a meson bath. It is shown that this phenomenological model features a localization-delocalization transition, giving a general and simple physical basis to understand the qualitatively distinct string breaking regimes. These findings are directly relevant for a wider class of confining lattice models in one and higher dimensions and could be realized on present-day Rydberg quantum simulators.	翻訳日:2023-04-26 19:46:27 公開日:2023-04-25
# 単一能動素子多重光子源 Single-active-element demultiplexed multi-photon source ( http://arxiv.org/abs/2304.12956v1 ) ライセンス: Link先を確認	Lena M. Hansen, Lorenzo Carosini, Lennart Jehle, Francesco Giorgino, Romane Houvenaghel, Michal Vyvlecka, Juan C. Loredo, Philip Walther	(参考訳) 時空間デマルチプレキシングは、同じ空間モードの非同時事象を異なる出力軌跡にルートする。この技術は、固体量子エミッタを利用する際に、より多数の多光子状態にアクセスするために広く採用されている。しかしながら、これまでの実装では、リソースの制約に素早く直面している、アクティブな要素の数を常に増加させる必要があった。本稿では,任意の数の出力をルーティングするために,単一のアクティブ要素のみを利用するデマルチプレクシング手法を提案し,実証する。我々は、高効率量子ドットベースの単一光子源と組み合わせ、最大8個の非多重化高識別可能な単一光子を測定する。本稿では,本手法の実用的限界について考察し,十数個のアウトプットをデマルチプレックスに使用できる条件について述べる。以上の結果から,資源効率の高い大規模マルチ光子源の創出が期待できる。 Temporal-to-spatial demultiplexing routes non-simultaneous events of the same spatial mode to distinct output trajectories. This technique has now been widely adopted because it gives access to higher-number multi-photon states when exploiting solid-state quantum emitters. However, implementations so far have required an always-increasing number of active elements, rapidly facing resource constraints. Here, we propose and demonstrate a demultiplexing approach that utilizes only a single active element for routing to, in principle, an arbitrary number of outputs. We employ our device in combination with a high-efficiency quantum dot based single-photon source, and measure up to eight demultiplexed highly indistinguishable single photons. We discuss the practical limitations of our approach, and describe in which conditions it can be used to demultiplex, e.g., tens of outputs. Our results thus provides a path for the preparation of resource-efficient larger-scale multi-photon sources.	翻訳日:2023-04-26 19:46:07 公開日:2023-04-25
# ニューラルネットワークにおける非決定論的スタック Nondeterministic Stacks in Neural Networks ( http://arxiv.org/abs/2304.12955v1 ) ライセンス: Link先を確認	Brian DuSell	(参考訳) ニューラルネットワークは、言語を処理するコンピュータシステムの画期的な改善に寄与しているが、広く使われているニューラルネットワークアーキテクチャは、構文を処理する能力の限界をまだ示している。この問題に対処するため、以前の研究では、ニューラルネットワークにスタックデータ構造を追加し、構文とスタック間の理論的接続からインスピレーションを得ている。しかし、これらの手法は一度に1つのパースを追跡するように設計された決定論的スタックを用いるが、構文的曖昧さは解析に非決定論的スタックを必要とするが、言語では極めて一般的である。この論文では,非決定論的スタックをニューラルネットワークに組み込む手法を提案することで,この不一致を解消する。本研究では,動的プログラミングアルゴリズムを用いて,指数関数数を表す非決定論的プッシュダウンオートマトンを効率的にシミュレートする微分可能なデータ構造を開発する。このモジュールをリカレントニューラルネットワーク(RNN)とトランスフォーマーの2つの主要なアーキテクチャに組み込む。これにより、任意の文脈自由言語に対する形式的認識能力が向上し、決定論的文脈自由言語においてもトレーニングを支援することが示される。経験的に、非決定論的スタックを持つニューラルネットワークは、理論的に最大解析の難しい言語を含む、以前のスタック推論モデルよりもずっと効果的に文脈自由言語を学習する。また,非決定性スタックを付加したrnnでは,非コンテキストフリーパターンであるクロスシリアル依存性の学習など,驚くほど強力な動作が可能であることも示している。自然言語モデリングの改善を実証し,構文一般化ベンチマークの分析を行う。この作業は、より人間的な方法で構文の使用を学ぶシステムを構築するための重要なステップである。 Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.	翻訳日:2023-04-26 19:45:54 公開日:2023-04-25
# 射影確率近似における漸近的挙動と相転移:ジャンプ拡散アプローチ Asymptotic Behaviors and Phase Transitions in Projected Stochastic Approximation: A Jump Diffusion Approach ( http://arxiv.org/abs/2304.12953v1 ) ライセンス: Link先を確認	Jiadong Liang, Yuze Han, Xiang Li, Zhihua Zhang	(参考訳) 本稿では,線形制約付き最適化問題を考察し,ループレス射影確率近似(LPSA)アルゴリズムを提案する。実行可能性を確保するために、n$-thイテレーションで確率$p_n$でプロジェクションを実行する。確率 $p_n$ とステップサイズ $\eta_n$ の特定の族を考えると、漸近的かつ連続的な観点からアルゴリズムを分析する。新しいジャンプ拡散近似を用いて、それらの再スケールされた最後のイテレートを特定の確率微分方程式(sdes)の解に弱収束させる軌道を示す。 SDEを解析することにより、LPSAの漸近挙動を$(p_n, \eta_n)$の異なる選択に対して同定する。このアルゴリズムは興味深い漸近バイアス分散トレードオフを示し、相対等級$p_n$ w.r.t.$\eta_n$に従って位相遷移現象を生じる。この発見は射影コストを最小化するために${(p_n, \eta_n)}_{n \geq 1}$を選択するための洞察を与える。さらに,ジャンプ拡散近似の実用的応用として,脱バイアスLPSA(DLPSA)を提案する。 DLPSAはバニラLPSAと比較してプロジェクションの複雑さを効果的に減少させる。 In this paper we consider linearly constrained optimization problems and propose a loopless projection stochastic approximation (LPSA) algorithm. It performs the projection with probability $p_n$ at the $n$-th iteration to ensure feasibility. Considering a specific family of the probability $p_n$ and step size $\eta_n$, we analyze our algorithm from an asymptotic and continuous perspective. Using a novel jump diffusion approximation, we show that the trajectories connecting those properly rescaled last iterates weakly converge to the solution of specific stochastic differential equations (SDEs). By analyzing SDEs, we identify the asymptotic behaviors of LPSA for different choices of $(p_n, \eta_n)$. We find that the algorithm presents an intriguing asymptotic bias-variance trade-off and yields phase transition phenomenons, according to the relative magnitude of $p_n$ w.r.t. $\eta_n$. This finding provides insights on selecting appropriate ${(p_n, \eta_n)}_{n \geq 1}$ to minimize the projection cost. Additionally, we propose the Debiased LPSA (DLPSA) as a practical application of our jump diffusion approximation result. DLPSA is shown to effectively reduce projection complexity compared to vanilla LPSA.	翻訳日:2023-04-26 19:45:26 公開日:2023-04-25
# 宇宙から何か分離する? Segment anything, from space? ( http://arxiv.org/abs/2304.13000v1 ) ライセンス: Link先を確認	Simiao Ren, Francesco Luzi, Saad Lahrichi, Kaleb Kassaw, Leslie M. Collins, Kyle Bradbury, Jordan M. Malof	(参考訳) 近年,視覚タスク用に開発された最初の基礎モデルが開発され,SAM (Segment Anything Model) と呼ばれる。 SAMは1つ(またはそれ以上)のポイント、バウンディングボックス、マスクなど、安価な入力プロンプトに基づいて入力画像にオブジェクトを分割することができる。著者らは、多数の視覚ベンチマークタスクにおいてSAMのゼロショット画像分割精度を検証し、SAMは通常、目標タスクで訓練された視覚モデルと似ているか、時には超過している。セグメンテーションのためのSAMの印象的な一般化は、自然画像の研究に重要な意味を持つ。本研究では,SAMの優れた性能が画像のオーバーヘッド問題にまで及んでいるかどうかを考察し,その開発に対するコミュニティの反応のガイドに役立てる。 SAMの性能を多様で広く研究されているベンチマークタスクのセットで検証する。 SAMはオーバヘッド画像によく当てはまるが、オーバヘッド画像とターゲットオブジェクトのユニークな特徴のため、いくつかのケースではフェールする。リモートセンシング画像に対するこれらのユニークな系統的障害事例について報告する。これは作業用紙であり、追加の分析と結果が完了すると更新される。 Recently, the first foundation model developed specifically for vision tasks was developed, termed the "Segment Anything Model" (SAM). SAM can segment objects in input imagery based upon cheap input prompts, such as one (or more) points, a bounding box, or a mask. The authors examined the zero-shot image segmentation accuracy of SAM on a large number of vision benchmark tasks and found that SAM usually achieved recognition accuracy similar to, or sometimes exceeding, vision models that had been trained on the target tasks. The impressive generalization of SAM for segmentation has major implications for vision researchers working on natural imagery. In this work, we examine whether SAM's impressive performance extends to overhead imagery problems, and help guide the community's response to its development. We examine SAM's performance on a set of diverse and widely-studied benchmark tasks. We find that SAM does often generalize well to overhead imagery, although it fails in some cases due to the unique characteristics of overhead imagery and the target objects. We report on these unique systematic failure cases for remote sensing imagery that may comprise useful future research for the community. Note that this is a working paper, and it will be updated as additional analysis and results are completed.	翻訳日:2023-04-26 19:39:30 公開日:2023-04-25
# AudioGPT: 音声、音楽、音声、トーキングヘッドの理解と生成 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head ( http://arxiv.org/abs/2304.12995v1 ) ライセンス: Link先を確認	Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe	(参考訳) 大規模言語モデル(LLM)は、さまざまな領域やタスクにまたがって顕著な能力を示し、学習と認知の理解に挑戦しています。最近の成功にもかかわらず、現在のLLMは複雑なオーディオ情報を処理したり、(SiriやAlexaのような)会話を行うことができない。本研究では,LLM(すなわちChatGPT)を補完するマルチモーダルAIシステムであるAudioGPTを提案する。 1)複雑な音声情報を処理し、多数の理解・生成課題を解決する基礎モデル 2)音声対話を支援するための入力/出力インタフェース(ASR, TTS)。人間の意図的理解と基礎モデルとの協調によるマルチモーダルLLMの評価の必要性が高まる中、我々はAudioGPTの原則とプロセスの概要を一貫性、能力、堅牢性の観点から検証する。実験の結果,複数回対話における音声,音楽,音声,会話の頭部理解と生成によるai課題の解決におけるaudiogptの能力が実証された。本システムは,<url{https://github.com/AIGC-Audio/AudioGPT}で公開されている。 Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Despite the recent success, current LLMs are not capable of processing complex audio information or conducting spoken conversations (like Siri or Alexa). In this work, we propose a multi-modal AI system named AudioGPT, which complements LLMs (i.e., ChatGPT) with 1) foundation models to process complex audio information and solve numerous understanding and generation tasks; and 2) the input/output interface (ASR, TTS) to support spoken dialogue. With an increasing demand to evaluate multi-modal LLMs of human intention understanding and cooperation with foundation models, we outline the principles and processes and test AudioGPT in terms of consistency, capability, and robustness. Experimental results demonstrate the capabilities of AudioGPT in solving AI tasks with speech, music, sound, and talking head understanding and generation in multi-round dialogues, which empower humans to create rich and diverse audio content with unprecedented ease. Our system is publicly available at \url{https://github.com/AIGC-Audio/AudioGPT}.	翻訳日:2023-04-26 19:38:51 公開日:2023-04-25
# R'enyi divergencesの有効性 Sufficiency of R\'enyi divergences ( http://arxiv.org/abs/2304.12989v1 ) ライセンス: Link先を確認	Niklas Galke, Lauritz van Luijk, Henrik Wilming	(参考訳) 古典的あるいは量子的状態の集合が、古典的または量子的チャネルのペアが他方にセットされた場合、別のものと同値である。ディコトミー(状態のペア)の場合、これは(古典的または量子的) R\'enyi divergences (RD) とデータ処理の不等式と密接に結びついている。ここでは、古典的二分法について、RDs の等式だけでは、2つの方向のいずれかのチャネルの存在に十分であることを示すとともに、いくつかの応用について議論する。最小量子RDの等式は量子の場合で十分であり、特殊の場合では証明できる。また、ペッツ量子も最大量子RDも十分でないことを示す。我々の手法の副作用として、古典、ペッツ量子、最大量子RDによって満たされる無限の不等式のリストを得る。これらの不等式は最小量子rdsには当てはまらない。 A set of classical or quantum states is equivalent to another one if there exists a pair of classical or quantum channels mapping either set to the other one. For dichotomies (pairs of states) this is closely connected to (classical or quantum) R\'enyi divergences (RD) and the data-processing inequality: If a RD remains unchanged when a channel is applied to the dichotomy, then there is a recovery channel mapping the image back to the initial dichotomy. Here, we prove for classical dichotomies that equality of the RDs alone is already sufficient for the existence of a channel in any of the two directions and discuss some applications. We conjecture that equality of the minimal quantum RDs is sufficient in the quantum case and prove it for special cases. We also show that neither the Petz quantum nor the maximal quantum RDs are sufficient. As a side-result of our techniques we obtain an infinite list of inequalities fulfilled by the classical, the Petz quantum, and the maximal quantum RDs. These inequalities are not true for the minimal quantum RDs.	翻訳日:2023-04-26 19:38:30 公開日:2023-04-25
# 胸部x線診断のためのパラレルアテンションブロックを用いたマルチスケール特徴核融合 Multi-Scale Feature Fusion using Parallel-Attention Block for COVID-19 Chest X-ray Diagnosis ( http://arxiv.org/abs/2304.12988v1 ) ライセンス: Link先を確認	Xiao Qi, David J. Foran, John L. Nosher, and Ilker Hacihaliloglu	(参考訳) 世界的なcovid-19危機では、胸部x線(cxr)画像からのcovid-19の正確な診断が重要である。放射線学的評価において, 医療的意思決定と後続の疾患管理を補完するために, コンピュータ支援診断ツールが活用されている。患者を迅速にトリアージし, 放射線科医を支援するためには, 高精度で頑健な計算方法が必要である。本研究では,並列注意ブロックを用いてオリジナルcxr画像と局所位相特徴強調cxr画像をマルチスケールで融合する,新しい多機能融合ネットワークを提案する。我々は、さまざまな組織から取得したさまざまなCOVID-19データセットのモデルを検証し、一般化能力を評価する。本実験は,本手法が最先端性能を実現し,一般化能力の向上を図っている。 Under the global COVID-19 crisis, accurate diagnosis of COVID-19 from Chest X-ray (CXR) images is critical. To reduce intra- and inter-observer variability, during the radiological assessment, computer-aided diagnostic tools have been utilized to supplement medical decision-making and subsequent disease management. Computational methods with high accuracy and robustness are required for rapid triaging of patients and aiding radiologists in the interpretation of the collected data. In this study, we propose a novel multi-feature fusion network using parallel attention blocks to fuse the original CXR images and local-phase feature-enhanced CXR images at multi-scales. We examine our model on various COVID-19 datasets acquired from different organizations to assess the generalization ability. Our experiments demonstrate that our method achieves state-of-art performance and has improved generalization capability, which is crucial for widespread deployment.	翻訳日:2023-04-26 19:38:06 公開日:2023-04-25
# 糖尿病足部潰瘍分類における画像類似性の影響の定量化 Quantifying the Effect of Image Similarity on Diabetic Foot Ulcer Classification ( http://arxiv.org/abs/2304.12987v1 ) ライセンス: Link先を確認	Imran Chowdhury Dipto, Bill Cassidy, Connah Kendrick, Neil D. Reeves, Joseph M. Pappachan, Vishnu Chandrabalan, Moi Hoon Yap	(参考訳) 本研究は,深層学習分類ネットワークを訓練する際の糖尿病性足底潰瘍データセットにおける視覚的に類似した画像の効果について検討する。ディープラーニングアルゴリズムのトレーニングに使用されるデータセットにバイナリIDの重複画像が存在することは、ネットワークパフォーマンスを劣化させる不必要なバイアスを生じさせる、よく知られた問題である。しかし, 視覚的に類似した非同一性像の影響は未検討の話題であり, 糖尿病性足部潰瘍研究ではまだ検討されていない。我々は,糖尿病性足潰瘍2021(dfuc2021)トレーニングデータセットにおける類似画像群を,オープンソースのファジィアルゴリズムを用いて同定する。それぞれの類似度しきい値に基づいて、ディープラーニング多クラス分類器のトレーニングに使用する新しいトレーニングセットを作成します。次に,dfuc2021テストセットにおけるベストパフォーマンスモデルの性能評価を行った。 InceptionResNetV2ネットワークを用いて,80\%の類似度閾値画像が除去されたトレーニングセットでトレーニングしたモデルが最も優れた性能を示した。このモデルは、それぞれ0.023、0.029、0.013のf1-score、精度、リコールを改善した。これらの結果から, 糖尿病性フット潰瘍チャレンジ2021データセットにおけるパフォーマンス劣化バイアスの存在に極めて類似した画像が寄与し, トレーニングセットから80%類似した画像の除去が分類性能の向上に有効であることが示唆された。 This research conducts an investigation on the effect of visually similar images within a publicly available diabetic foot ulcer dataset when training deep learning classification networks. The presence of binary-identical duplicate images in datasets used to train deep learning algorithms is a well known issue that can introduce unwanted bias which can degrade network performance. However, the effect of visually similar non-identical images is an under-researched topic, and has so far not been investigated in any diabetic foot ulcer studies. We use an open-source fuzzy algorithm to identify groups of increasingly similar images in the Diabetic Foot Ulcers Challenge 2021 (DFUC2021) training dataset. Based on each similarity threshold, we create new training sets that we use to train a range of deep learning multi-class classifiers. We then evaluate the performance of the best performing model on the DFUC2021 test set. Our findings show that the model trained on the training set with the 80\% similarity threshold images removed achieved the best performance using the InceptionResNetV2 network. This model showed improvements in F1-score, precision, and recall of 0.023, 0.029, and 0.013, respectively. These results indicate that highly similar images can contribute towards the presence of performance degrading bias within the Diabetic Foot Ulcers Challenge 2021 dataset, and that the removal of images that are 80\% similar from the training set can help to boost classification performance.	翻訳日:2023-04-26 19:37:51 公開日:2023-04-25
# 大規模マルチタスク中国語理解の測定 Measuring Massive Multitask Chinese Understanding ( http://arxiv.org/abs/2304.12986v1 ) ライセンス: Link先を確認	Hui Zeng	(参考訳) 大規模な中国語モデルの開発は盛んであるが、それに対応する能力評価が不足している。そこで本研究では,大規模中国語モデルのマルチタスク精度を計測するテストを提案する。このテストは、医学、法学、心理学、教育を含む4つの主要な領域を含み、15のサブタスクと8のサブタスクがある。その結果、ゼロショット設定における最高のパフォーマンスモデルは、最悪のパフォーマンスモデルよりも平均22ポイント向上した。 4つの主要領域全体で、全てのモデルの平均ゼロショット精度は0.5を超えなかった。サブドメインでは、gpt-3.5-turboモデルのみが臨床医学において0.703のゼロショット精度を達成した。すべてのモデルは法律領域で性能が悪く、最高ゼロショット精度は0.259にしか達しなかった。複数の分野にわたる知識の幅と深さを包括的に評価することにより、このテストはモデルの欠点をより正確に識別することができる。 The development of large-scale Chinese language models is flourishing, yet there is a lack of corresponding capability assessments. Therefore, we propose a test to measure the multitask accuracy of large Chinese language models. This test encompasses four major domains, including medicine, law, psychology, and education, with 15 subtasks in medicine and 8 subtasks in education. We found that the best-performing models in the zero-shot setting outperformed the worst-performing models by nearly 22 percentage points on average. Across the four major domains, the average zero-shot accuracy of all models did not exceed 0.5. In the subdomains, only the GPT-3.5-turbo model achieved a zero-shot accuracy of 0.703 in clinical medicine, which was the highest accuracy among all models across all subtasks. All models performed poorly in the legal domain, with the highest zero-shot accuracy reaching only 0.259. By comprehensively evaluating the breadth and depth of knowledge across multiple disciplines, this test can more accurately identify the shortcomings of the models.	翻訳日:2023-04-26 19:37:29 公開日:2023-04-25
# Rubikの光ニューラルネットワーク:物理対応ローテーションアーキテクチャによるマルチタスク学習 Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture ( http://arxiv.org/abs/2304.12985v1 ) ライセンス: Link先を確認	Yingjie Li, Weilu Gao, Cunxi Yu	(参考訳) 近年、電力効率、並列性、計算速度の面で機械学習(ML)に大きな利点をもたらす光学ニューラルネットワーク(ONN)の進歩への取り組みが増えている。計算速度とエネルギー効率にかなりの利点があるため、onnを医療センシング、セキュリティスクリーニング、薬物検出、自動運転に活用することには大きな関心がある。しかしながら、再構成可能性を実装することの難しさから、マルチタスク学習(mtl)アルゴリズムをonnにデプロイするには、実際のアプリケーションシナリオにおけるエネルギーとコスト効率を大幅に低下させる物理的拡散システムの再構築と複製が必要となる。この論文は、光学系の物理的性質を利用して複数のフィードフォワード関数をエンコードし、 \textit{rubik's cube} を回転させるのと同様にハードウェアを物理的に回転させることによって、新しい onns アーキテクチャ、すなわち \textit{rubikonns} を提案する。 RubikONN 上での MTL 性能を最適化するために,ドメイン固有の物理認識トレーニングアルゴリズム \textit{RotAgg} と \textit{RotSeq} を提案する。実験の結果, 最先端の手法と比較して, エネルギーとコストの効率が改善し, 限界精度が低下することを示した。 Recently, there are increasing efforts on advancing optical neural networks (ONNs), which bring significant advantages for machine learning (ML) in terms of power efficiency, parallelism, and computational speed. With the considerable benefits in computation speed and energy efficiency, there are significant interests in leveraging ONNs into medical sensing, security screening, drug detection, and autonomous driving. However, due to the challenge of implementing reconfigurability, deploying multi-task learning (MTL) algorithms on ONNs requires re-building and duplicating the physical diffractive systems, which significantly degrades the energy and cost efficiency in practical application scenarios. This work presents a novel ONNs architecture, namely, \textit{RubikONNs}, which utilizes the physical properties of optical systems to encode multiple feed-forward functions by physically rotating the hardware similarly to rotating a \textit{Rubik's Cube}. To optimize MTL performance on RubikONNs, two domain-specific physics-aware training algorithms \textit{RotAgg} and \textit{RotSeq} are proposed. Our experimental results demonstrate more than 4$\times$ improvements in energy and cost efficiency with marginal accuracy degradation compared to the state-of-the-art approaches.	翻訳日:2023-04-26 19:37:12 公開日:2023-04-25
# ホッターは簡単:スピン量子ビット周波数の予期せぬ温度依存性 Hotter is easier: unexpected temperature dependence of spin qubit frequencies ( http://arxiv.org/abs/2304.12984v1 ) ライセンス: Link先を確認	Brennan Undseth, Oriol Pietx-Casas, Eline Raymenants, Mohammad Mehmandoost, Mateusz T. Madzik, Stephan G.J. Philips, Sander L. de Snoo, David J. Michalak, Sergey V. Amitonov, Larysa Tryputen, Brian Paquelet Wuetz, Viviana Fezzi, Davide Degli Esposti, Amir Sammak, Giordano Scappucci, Lieven M. K. Vandersypen	(参考訳) スピンベースの量子プロセッサのサイズと複雑さが大きくなるにつれて、高いフィダリティの維持とクロストークの最小化が量子アルゴリズムと誤り訂正プロトコルの実装の成功に不可欠となる。特に最近の実験では、マイクロ波キュービット駆動に伴う過度な過渡的キュービット周波数シフトが強調されている。オフ共振マイクロ波バーストをプリパルスしてデバイスを定常状態にし、測定に先立って待ち時間、キュービット固有のキャリブレーションなど、小さなデバイスに対する回避策は、デバイススケーラビリティに悪影響を及ぼす。ここでは、この効果を理解し、克服する上で大きな進歩を遂げます。マイクロ波とベースバンドの制御信号による観測周波数シフトと一致した混合室温度とスピンラーモア周波数の驚くべき非単調関係について報告する。この装置を200mKで故意に動作させることは、キュービットコヒーレンスや単一キュービット忠実度ベンチマークを損なうことなく、有害な加熱効果を著しく抑制することを発見した。さらに、系統的非マルコフクロストークは大幅に削減される。本結果は,将来のスピンベース量子プロセッサのキャリブレーション手順を簡素化しつつ,マルチスピン制御の品質を向上させるための簡単な手段を提供する。 As spin-based quantum processors grow in size and complexity, maintaining high fidelities and minimizing crosstalk will be essential for the successful implementation of quantum algorithms and error-correction protocols. In particular, recent experiments have highlighted pernicious transient qubit frequency shifts associated with microwave qubit driving. Workarounds for small devices, including prepulsing with an off-resonant microwave burst to bring a device to a steady-state, wait times prior to measurement, and qubit-specific calibrations all bode ill for device scalability. Here, we make substantial progress in understanding and overcoming this effect. We report a surprising non-monotonic relation between mixing chamber temperature and spin Larmor frequency which is consistent with observed frequency shifts induced by microwave and baseband control signals. We find that purposefully operating the device at 200 mK greatly suppresses the adverse heating effect while not compromising qubit coherence or single-qubit fidelity benchmarks. Furthermore, systematic non-Markovian crosstalk is greatly reduced. Our results provide a straightforward means of improving the quality of multi-spin control while simplifying calibration procedures for future spin-based quantum processors.	翻訳日:2023-04-26 19:36:48 公開日:2023-04-25
# DSTC11におけるタスク指向対話トラックの会話からのインテント誘導 Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11 ( http://arxiv.org/abs/2304.12982v1 ) ライセンス: Link先を確認	James Gung, Raphael Shu, Emily Moeng, Wesley Rose, Salvatore Romeo, Yassine Benajiba, Arshit Gupta, Saab Mansour and Yi Zhang	(参考訳) 近年,仮想アシスタントの需要増加と採用に伴い,インテントの自動誘導やスロットや対話状態の誘導を通じて,ボットスキーマ設計を高速化する方法が研究されている。しかしながら、専用のベンチマークと標準化された評価の欠如により、システム間の進捗の追跡と比較が困難になっている。このチャレンジトラックは、第11回ダイアログシステム技術チャレンジの一環として開催され、人間エージェントと顧客間のカスタマーサービスインタラクションの現実的な設定において、顧客意図の自動誘導方法を評価するためのベンチマークを導入している。本稿では,インテントの自動誘導とそれに対応する評価手法に取り組むための2つのサブタスクを提案する。次に,タスク評価に適したデータセットを3つ提示し,簡単なベースラインを提案する。最後に、課題トラックの提出内容と結果を要約し、34チームから応募を受け取りました。 With increasing demand for and adoption of virtual assistants, recent work has investigated ways to accelerate bot schema design through the automatic induction of intents or the induction of slots and dialogue states. However, a lack of dedicated benchmarks and standardized evaluation has made progress difficult to track and comparisons between systems difficult to make. This challenge track, held as part of the Eleventh Dialog Systems Technology Challenge, introduces a benchmark that aims to evaluate methods for the automatic induction of customer intents in a realistic setting of customer service interactions between human agents and customers. We propose two subtasks for progressively tackling the automatic induction of intents and corresponding evaluation methodologies. We then present three datasets suitable for evaluating the tasks and propose simple baselines. Finally, we summarize the submissions and results of the challenge track, for which we received submissions from 34 teams.	翻訳日:2023-04-26 19:36:27 公開日:2023-04-25
# flickr-pad:新しい顔高分解能プレゼンテーションアタック検出データベース Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database ( http://arxiv.org/abs/2304.13015v1 ) ライセンス: Link先を確認	Diego Pasmino, Carlos Aravena, Juan Tapia and Christoph Busch	(参考訳) 現在,プレゼンテーションアタック検出は非常に活発な研究分野である。映像から抽出した画像を用いて,最先端のデータベースを構成する。主な問題は、多くのデータベースが低品質で小さな画像サイズであり、実際の遠隔バイオメトリックシステムでは運用シナリオを表現していないことである。現在、これらの画像は高品質で解像度の高いスマートフォンから撮影されている。画像品質の多様性を高めるため、この研究は「Flickr-PAD」と呼ばれるオープンアクセスFlickrイメージに基づいた新しいPADデータベースを提供する。新しい手作りデータベースは、高品質な印刷と画面のシナリオを示しています。これにより、研究者はより広いデータベース上の既存のアルゴリズムと新しいアプローチを比較することができる。このデータベースは他の研究者も利用できる。 MobileNet-V3(小さくて大きい)とEfficientNet-B0に基づく3つのPADモデルのトレーニングと評価に、Left-one-outプロトコルが使用された。最大の成果はMobileNet-V3で、BPCER10は7.08%、BPCER20は11.15%だった。 Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.	翻訳日:2023-04-26 19:29:06 公開日:2023-04-25
# 内視鏡画像とビデオにおける最小侵襲手術器具の分節化のための方法とデータセット:術法の現状について Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art ( http://arxiv.org/abs/2304.13014v1 ) ライセンス: Link先を確認	Tobias Rueckert (1), Daniel Rueckert (2 and 3), Christoph Palm (1 and 4) ((1) Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany, (2) Artificial Intelligence in Healthcare and Medicine, Klinikum rechts der Isar, Technical University of Munich, Germany, (3) Department of Computing, Imperial College London, UK, (4) Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Germany)	(参考訳) コンピュータ・ロボット支援の低侵襲手術の分野では,内視鏡画像における手術器具の認識により,近年,大きな進歩を遂げている。特に楽器の位置や種類の決定は、非常に興味深い。現在の研究は、空間的情報と時間的情報の両方が関係しており、外科的ツールの時間的移動の予測が最終セグメンテーションの質を向上させる可能性がある。公開データセットの提供は、最近、主にディープラーニングに基づく新しい手法の開発を奨励している。本稿では,手法の開発と評価に使用されるデータセットを特定し,文献上での使用頻度を定量化する。内視鏡画像における低侵襲手術器具のセグメンテーションと追跡に関する研究の現状について概説する。本論文は,1フレーム分割手法と時間情報を含む手法を考慮し,楽器の任意の種類のマーカーを付加せずに純粋に視覚的に機能する手法に焦点を当てた。レビューされた文献に関する議論が提供され、既存の欠点と将来の発展の可能性を強調している。検討された出版物は、Google Scholar、Web of Science、PubMedを通じて特定された。検索用語は「構造的セグメンテーション」、「構造的トラッキング」、「外科的ツールセグメンテーション」、「外科的ツールトラッキング」であり、2015年から2022年にかけて408の論文が発行され、109が体系的選択基準で含まれていた。 In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images. Especially the determination of the position and type of the instruments is of great interest here. Current work involves both spatial and temporal information with the idea, that the prediction of movement of surgical tools over time may improve the quality of final segmentations. The provision of publicly available datasets has recently encouraged the development of new methods, mainly based on deep learning. In this review, we identify datasets used for method development and evaluation, as well as quantify their frequency of use in the literature. We further present an overview of the current state of research regarding the segmentation and tracking of minimally invasive surgical instruments in endoscopic images. The paper focuses on methods that work purely visually without attached markers of any kind on the instruments, taking into account both single-frame segmentation approaches as well as those involving temporal information. A discussion of the reviewed literature is provided, highlighting existing shortcomings and emphasizing available potential for future developments. The publications considered were identified through the platforms Google Scholar, Web of Science, and PubMed. The search terms used were "instrument segmentation", "instrument tracking", "surgical tool segmentation", and "surgical tool tracking" and result in 408 articles published between 2015 and 2022 from which 109 were included using systematic selection criteria.	翻訳日:2023-04-26 19:28:50 公開日:2023-04-25
# 大規模視覚言語モデルのための安定・低精度トレーニング Stable and low-precision training for large-scale vision-language models ( http://arxiv.org/abs/2304.13013v1 ) ライセンス: Link先を確認	Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt	(参考訳) 新しい方法を紹介します 1)加速・加速 2)大規模言語視モデルの安定化訓練。 1) Int8量子化トレーニングの線形層であるSwitchBackを導入し,bfloat16トレーニングのパフォーマンスを1BパラメータCLIP ViT-Hugeの0.1パーセントの範囲内で一致させながら,13～25%の高速化を実現した。 float8のgpuサポートは稀ですが、シミュレーションを通じてfloat8トレーニングも分析しています。 SwitchBackはfloat8に有効であることが証明されているが、ネットワークがトレーニングされ初期化され、大きな特徴が無視され、ゼロで初期化された層スケールで達成される場合、標準技術も成功していることを示す。 2)安定トレーニングに向けて損失スパイクを分析し,AdamW第2モーメント推定器で2乗勾配が過小評価された後に連続して1～8回発生することを示した。結果として,CLIP ViT-Hugeモデルのトレーニング時に損失のスパイクを回避し,勾配クリッピングよりも優れているため,StableAdamWと呼ぶAdamW-Adafactorハイブリッドを推奨する。 We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) Towards accelerating training, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the performance of bfloat16 training within 0.1 percentage points for the 1B parameter CLIP ViT-Huge -- the largest int8 training to date. Our main focus is int8 as GPU support for float8 is rare, though we also analyze float8 training through simulation. While SwitchBack proves effective for float8, we show that standard techniques are also successful if the network is trained and initialized so that large feature magnitudes are discouraged, which we accomplish via layer-scale initialized with zeros. 2) Towards stable training, we analyze loss spikes and find they consistently occur 1-8 iterations after the squared gradients become under-estimated by their AdamW second moment estimator. As a result, we recommend an AdamW-Adafactor hybrid, which we refer to as StableAdamW because it avoids loss spikes when training a CLIP ViT-Huge model and outperforms gradient clipping.	翻訳日:2023-04-26 19:28:25 公開日:2023-04-25
# 非構造化データと構造化データ: 大きな言語モデルを持つ両方の世界のベストを得られるか? Unstructured and structured data: Can we have the best of both worlds with large language models? ( http://arxiv.org/abs/2304.13010v1 ) ライセンス: Link先を確認	Wang-Chiew Tan	(参考訳) 本稿では,大規模言語モデルを用いて非構造化データと構造化データの両方を問合せする可能性について考察する。また,両タイプのデータを対象とした質問応答システムの構築に関する研究課題についても概説する。 This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for both types of data.	翻訳日:2023-04-26 19:28:07 公開日:2023-04-25
# リモートセンシングにおけるビジュアルチャットGPTの可能性 The Potential of Visual ChatGPT For Remote Sensing ( http://arxiv.org/abs/2304.13009v1 ) ライセンス: Link先を確認	Lucas Prado Osco, Eduardo Lopes de Lemos, Wesley Nunes Gon\c{c}alves, Ana Paula Marques Ramos and Jos\'e Marcato Junior	(参考訳) 自然言語処理(NLP)の最近の進歩、特にディープラーニングベースのコンピュータビジョン技術に関連するLarge Language Models(LLMs)は、様々なタスクを自動化する可能性を示している。 1つの注目すべきモデルはVisual ChatGPTであり、これはChatGPTのLLM機能とビジュアル計算を組み合わせて、効果的な画像解析を可能にする。テキスト入力に基づく画像の処理能力は、様々な分野に革命をもたらす可能性がある。しかし、リモートセンシング領域での応用は未検討のままである。 GPTアーキテクチャ上に構築された最先端のLCMである Visual ChatGPT は,リモートセンシング領域に関連する画像処理の課題に対処するための最初の論文である。現在の機能の中で、Visual ChatGPTは画像のテキスト記述を生成し、キャニーエッジと直線検出を実行し、画像セグメンテーションを実行することができる。これらは画像コンテンツに関する貴重な洞察を与え、情報の解釈と抽出を容易にする。衛星画像の公開データセットにおけるこれらの技術の適用性を探ることで、リモートセンシング画像を扱う際の現在のモデルの限界を実証し、その課題と今後の展望を明らかにする。 LLMとビジュアルモデルの組み合わせは、まだ開発の初期段階であるが、リモートセンシング画像処理を変換し、現場でアクセスしやすく実用的な応用機会を生み出す大きな可能性を秘めている。 Recent advancements in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. One notable model is Visual ChatGPT, which combines ChatGPT's LLM capabilities with visual computation to enable effective image analysis. The model's ability to process images based on textual inputs can revolutionize diverse fields. However, its application in the remote sensing domain remains unexplored. This is the first paper to examine the potential of Visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, Visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model's limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visual models holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field.	翻訳日:2023-04-26 19:28:01 公開日:2023-04-25
# 思考連鎖のメタリゾン化による質問への回答 Answering Questions by Meta-Reasoning over Multiple Chains of Thought ( http://arxiv.org/abs/2304.13007v1 ) ライセンス: Link先を確認	Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant	(参考訳) マルチホップ質問応答(QA)のための現代のシステムは、最終回答に到達する前に、質問を一連の推論ステップ、すなわちチェーン・オブ・シント(CoT)に分割する。多くの場合、複数の連鎖が最終回答の投票機構を通じてサンプリングされ集約されるが、中間ステップ自体は破棄される。このようなアプローチはパフォーマンスを向上させるが、チェーン間の中間ステップ間の関係を考慮せず、予測された回答の統一的な説明を提供しない。 MCR(Multi-Chain Reasoning)は,大規模言語モデルに対して,回答を集約するのではなく,複数の思考チェーン上でメタ推論を行うアプローチである。 MCRは、異なる推論連鎖を調べ、それらを混合し、説明を生成し、答えを予測する際に最も関係のある事実を選択する。 MCRは7つのマルチホップQAデータセットで強いベースラインを上回ります。さらに,本分析の結果から,MCRの説明は高品質であり,人間が回答を検証できることがわかった。 Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.	翻訳日:2023-04-26 19:27:39 公開日:2023-04-25
# PoseVocab:人間のアバターモデリングのための共同構造ポス埋め込み学習 PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling ( http://arxiv.org/abs/2304.13006v1 ) ライセンス: Link先を確認	Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu	(参考訳) ポーズ駆動型人間のアバターを作成することは、低周波駆動のポーズから高周波のダイナミックな人間の外観へのマッピングをモデル化することであり、人間のアバターモデリングにおいて、高忠実な人間の詳細をエンコードできる効果的なポーズ符号化法が不可欠である。キャラクターのマルチビューRGBビデオが与えられた後、PoseVocabはトレーニングポーズに基づいてキーポーズと潜在埋め込みを構築する。ポーズ一般化と時間的一貫性を達成するために,大域的なポーズベクトルではなく,各ジョイントの$so(3)$でキー回転をサンプリングし,各サンプルされたキー回転に対してポーズ埋め込みを割り当てる。これらのジョイント構造のポーズ埋め込みは、異なるキーポーズの下でのダイナミックな外観をエンコードするだけでなく、ジョイント構造に埋め込まれたグローバルなポーズを分解し、各ジョイントの動きに関連する外観の変動をよりよく学習する。メモリ効率を保ちながらポーズ埋め込みの表現能力を向上するために,よりきめ細かな人間の外観をモデル化するために,コンパクトで効果的な3D表現である特徴線を導入する。さらに、クエリポーズと空間的位置が与えられた場合、ポーズ埋め込みを補間し、動的ヒト合成のための条件付きポーズ特徴を取得する階層的なクエリ戦略を導入する。全体的に、ponsvocabは人間の外観の動的な詳細を効果的にエンコードし、新しいポーズの下でリアルで一般化されたアニメーションを可能にする。実験により,本手法は質的および定量的に合成品質の点で,他の最先端ベースラインよりも優れていることが示された。コードはhttps://github.com/lizhe00/posevocabで入手できる。 Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling.To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in $so(3)$ of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.	翻訳日:2023-04-26 19:27:11 公開日:2023-04-25
# インド言語におけるバイリンガル・セマンティック・パーシングの評価 Evaluating Inter-Bilingual Semantic Parsing for Indian Languages ( http://arxiv.org/abs/2304.13005v1 ) ライセンス: Link先を確認	Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan	(参考訳) インド語の自然言語生成(IndicNLP)の進歩にもかかわらず、意味解析のような複雑な構造化タスクに関するデータセットが不足している。この差し迫ったギャップの1つは論理形式の複雑さであり、英語から多言語への翻訳が難しい。このプロセスでは、論理形式、意図、スロットを翻訳された非構造的発話とアライメントする。そこで本研究では,11の異なるインド言語を対象としたセマンティック解析データセットIE-SEMPARSEを提案する。本稿では,提案課題の実用性を強調し,既存の多言語Seq2seqモデルを複数の列車試験戦略で評価する。実験の結果,mTOP, Multilingual TOP, multiATIS++ など) と提案した IE-SEMPARSE スイートの性能に高い相関関係が認められた。 Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack of datasets around complex structured tasks such as semantic parsing. One reason for this imminent gap is the complexity of the logical form, which makes English to multilingual translation difficult. The process involves alignment of logical forms, intents and slots with translated unstructured utterance. To address this, we propose an Inter-bilingual Seq2seq Semantic parsing dataset IE-SEMPARSE for 11 distinct Indian languages. We highlight the proposed task's practicality, and evaluate existing multilingual seq2seq models across several train-test strategies. Our experiment reveals a high correlation across performance of original multilingual semantic parsing datasets (such as mTOP, multilingual TOP and multiATIS++) and our proposed IE-SEMPARSE suite.	翻訳日:2023-04-26 19:26:25 公開日:2023-04-25
# 複雑なリアルタイム戦略ゲームにおけるマルチエージェントRLの集中制御 Centralized control for multi-agent RL in a complex Real-Time-Strategy game ( http://arxiv.org/abs/2304.13004v1 ) ライセンス: Link先を確認	Roger Creus Castanyer	(参考訳) マルチエージェント強化学習(MARL)は、共有環境で共存する複数の学習エージェントの行動を研究する。 marlは、より複雑な学習ダイナミクスを含むため、シングルエージェントrlよりも難しい: 各エージェントの観察と報酬は、他のすべてのエージェントの機能である。 MARLの文脈では、リアルタイム戦略(RTS)ゲームは複数のプレイヤーが同時に相互作用し、異なる性質の多くのユニットを同時に制御する非常に困難な環境を表す。実際、RTSゲームは現在のRLメソッドでは難しいので、RLでそれらに取り組むことは興味深い。このプロジェクトは、lux ai v2 kaggleコンペティションにおいてrlを適用するエンドツーエンドのエクスペリエンスを提供する。このコンペティションでは、コンペティターは、可変サイズのユニット群を制御し、マルチ変数の最適化、リソースの収集、アロケーション問題に取り組むために、他のコンペティタに対して1v1シナリオでエージェントを設計する。 RLエージェントのトレーニングには集中型アプローチを使用し、そのプロセスに沿って複数の設計判断を報告します。プロジェクトのソースコードは、https://github.com/roger-creus/centralized-control-lux。 Multi-agent Reinforcement learning (MARL) studies the behaviour of multiple learning agents that coexist in a shared environment. MARL is more challenging than single-agent RL because it involves more complex learning dynamics: the observations and rewards of each agent are functions of all other agents. In the context of MARL, Real-Time Strategy (RTS) games represent very challenging environments where multiple players interact simultaneously and control many units of different natures all at once. In fact, RTS games are so challenging for the current RL methods, that just being able to tackle them with RL is interesting. This project provides the end-to-end experience of applying RL in the Lux AI v2 Kaggle competition, where competitors design agents to control variable-sized fleets of units and tackle a multi-variable optimization, resource gathering, and allocation problem in a 1v1 scenario against other competitors. We use a centralized approach for training the RL agents, and report multiple design decisions along the process. We provide the source code of the project: https://github.com/roger-creus/centralized-control-lux.	翻訳日:2023-04-26 19:26:10 公開日:2023-04-25
# 学習された構造化表現の一般化について On the Generalization of Learned Structured Representations ( http://arxiv.org/abs/2304.13001v1 ) ライセンス: Link先を確認	Andrea Dittadi	(参考訳) 過去10年間に大きく進歩したにもかかわらず、ディープラーニングの手法は一般に人間レベルの体系的な一般化に欠ける。データの基盤となる構造を明示的に捉えることで、コネクショニストシステムがより予測可能で体系的な方法で一般化できることが主張されている。実際、人間の証拠は、記号のような構成的実体で世界を解釈することは知的行動や高レベルの推論に不可欠であることを示唆している。ディープラーニングシステムのもうひとつの一般的な制限は、大量のトレーニングデータを必要とすることだ。表現学習では、任意の下流タスクを効率的に学習するのに有用な汎用データ表現を学習するために、大きなデータセットが利用される。この論文は構造化表現学習に関するものである。我々は,その隠れた構造を捉えた非構造化データの表現をほとんど,あるいは全く監視せずに学習する手法について検討する。論文の第1部では,データの変動の説明的要因を異にする表現に注目した。分散表現学習を新しいロボットデータセットにスケールアップし、下流ロボットタスクにおける分布外一般化のための事前学習表現の役割を体系的に大規模に研究する。この論文の第2部はオブジェクト中心の表現に焦点を当てており、視覚シーンのオブジェクトのようなシンボルのようなエンティティの観点で入力の構成構造を捉えている。オブジェクト中心学習法は、無構造入力から有意義な実体を形成することを学習し、コネクショニスト基板上でシンボリック情報処理を可能にする。本研究では,複数の共通データセット上のメソッドの選択を訓練し,下流タスクの有用性と分布から一般化する能力について検討する。 Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.	翻訳日:2023-04-26 19:25:50 公開日:2023-04-25
# DQS3D: 厳密に整合した量子化対応半教師付き3次元検出 DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection ( http://arxiv.org/abs/2304.13031v1 ) ライセンス: Link先を確認	Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, Guyue Zhou	(参考訳) 本研究では, 3次元室内シーンのクラッタ化に要するアノテーションコストを考慮し, 半教師付き3次元物体検出の問題点について検討する。自己啓発の強固で原則化された枠組みは,近年,半教師付き学習に顕著な進歩をもたらしている。このパラダイムは画像レベルやピクセルレベルの予測には自然であるが、提案マッチングの問題により検出問題に適応する。従来の手法は2段階のパイプラインに基づいており、第1段階で生成したヒューリスティックに選択された提案に一致し、空間的に疎い訓練信号をもたらす。対照的に,一段階的に動作し,空間的に密集したトレーニング信号を可能にする,最初の半教師付き3次元検出アルゴリズムを提案する。この新設計の根本的な問題は、点対ボクセルの離散化に起因する量子化誤差であり、これは必然的に、ボクセル領域における2つの変換されたビュー間の不一致を引き起こす。この目的のために、我々はこのミスアライメントを補うクローズドフォームルールを導出し実装する。 ScanNet mAP@0.5 を 20% のアノテーションで 35.2% から 48.5% まで推し進めるなど、我々の結果は重要である。コードとデータは公開される予定だ。 In this paper, we study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes. We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently. While this paradigm is natural for image-level or pixel-level prediction, adapting it to the detection problem is challenged by the issue of proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals generated in the first stage and resulting in spatially sparse training signals. In contrast, we propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals. A fundamental issue of this new design is the quantization error caused by point-to-voxel discretization, which inevitably leads to misalignment between two transformed views in the voxel domain. To this end, we derive and implement closed-form rules that compensate this misalignment onthe-fly. Our results are significant, e.g., promoting ScanNet mAP@0.5 from 35.2% to 48.5% using 20% annotation. Codes and data will be publicly available.	翻訳日:2023-04-26 19:20:05 公開日:2023-04-25
# completionformer:畳み込みと視覚変換による奥行き補完 CompletionFormer: Depth Completion with Convolutions and Vision Transformers ( http://arxiv.org/abs/2304.13030v1 ) ライセンス: Link先を確認	Zhang Youmin, Guo Xianda, Poggi Matteo, Zhu Zheng, Huang Guan, Mattoccia Stefano	(参考訳) スパース深度と対応するRGB画像が与えられた場合、深度補正は画像全体を通してスパース計測を空間的に伝播させ、深度予測を得ることを目的としている。深層学習に基づく深層学習手法の進歩にもかかわらず、畳み込み層やグラフモデルの局所性により、ネットワークが画素間の長距離関係をモデル化することは困難である。最近の完全トランスフォーマーベースのアーキテクチャは、グローバルレセプション分野での成果を奨励していると報告しているが、十分に開発されているcnnモデルの性能と効率の差は、局所的な特徴の詳細のために依然として残っている。本稿では、ピラミッド構造における深度補完モデルを構築するための基本単位として、畳み込み注意層と視覚変換器を1ブロックに深く結合したJCAT(Joint Convolutional Attention and Transformer Block)を提案する。このハイブリッドアーキテクチャは、自然に畳み込みの局所接続と1つのモデルにおけるトランスフォーマーのグローバルコンテキストの両方にメリットがある。その結果,屋外kitti深度補完ベンチマークと屋内nyuv2データセットのcnns法を上回り,純粋なトランスフォーマー法に比べて高い効率(約1/3フロップ)を達成した。コードは \url{https://github.com/youmi-zym/CompletionFormer} で入手できる。 Given sparse depths and the corresponding RGB images, depth completion aims at spatially propagating the sparse measurements throughout the whole image to get a dense depth prediction. Despite the tremendous progress of deep-learning-based depth completion methods, the locality of the convolutional layer or graph model makes it hard for the network to model the long-range relationship between pixels. While recent fully Transformer-based architecture has reported encouraging results with the global receptive field, the performance and efficiency gaps to the well-developed CNN models still exist because of its deteriorative local feature details. This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure. This hybrid architecture naturally benefits both the local connectivity of convolutions and the global context of the Transformer in one single model. As a result, our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods. Code is available at \url{https://github.com/youmi-zym/CompletionFormer}.	翻訳日:2023-04-26 19:19:42 公開日:2023-04-25
# Bake off redux:最近の時系列分類アルゴリズムのレビューと実験的評価 Bake off redux: a review and experimental evaluation of recent time series classification algorithms ( http://arxiv.org/abs/2304.13029v1 ) ライセンス: Link先を確認	Matthew Middlehurst, Patrick Sch\"afer and Anthony Bagnall	(参考訳) 2017年、カリフォルニア大学リバーサイド校(UCR)のアーカイブから得られた85のデータセットに対して、18の時系列分類(TSC)アルゴリズムを比較した。この研究は一般に「ベイクオフ」と呼ばれ、9つのアルゴリズムのみが使用されていた動的時間ウォーピング(DTW)と回転フォレストベンチマークよりもはるかに優れた性能を示した。研究は、時系列データから抽出した特徴の種類によって各アルゴリズムを分類し、5つの主要なアルゴリズムタイプを分類した。このアルゴリズムの分類と、コード提供と再現性のためのアクセス可能な結果の分類は、TSC分野の人気向上に寄与した。このブームから6年以上が経過し、UCRアーカイブは112のデータセットに拡張され、多くの新しいアルゴリズムが提案されている。提案したカテゴリが、当初からどのように進歩してきたかを確認し、拡張されたUCRアーカイブを用いて、以前のベスト・オブ・カテゴリに対して新しいアルゴリズムの性能を評価する。分類法を拡張して、最近の発展を反映した3つの新しいカテゴリを含める。提案した距離,間隔,シェープレット,辞書,ハイブリッドベースアルゴリズムとともに,より新しい畳み込みアルゴリズムと特徴ベースアルゴリズム,ディープラーニングアプローチを比較した。本稿では,最近アーカイブに寄贈された30の分類データセットと,tscフォーマットに再構成された分類データセットを紹介し,これらを用いて,各カテゴリのベストパフォーマンスアルゴリズムをさらに評価する。近年提案されているHydra+MultiROCKET と HIVE-COTEv2 のアルゴリズムは,現在のTSC 問題と新しい TSC 問題の両方において,他の手法よりも優れていることがわかった。 In 2017, a research paper compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a `bake off', identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, Hydra+MultiROCKET and HIVE-COTEv2, perform significantly better than other approaches on both the current and new TSC problems.	翻訳日:2023-04-26 19:19:13 公開日:2023-04-25
# 対称性、制約、長距離相互作用をまたいだ創発的流体力学とリンドブラッド低エネルギースペクトルの統一化 Unifying Emergent Hydrodynamics and Lindbladian Low Energy Spectra across Symmetries, Constraints, and Long-Range Interactions ( http://arxiv.org/abs/2304.13028v1 ) ライセンス: Link先を確認	Olumakinde Ogunnaike, Johannes Feldmeier, Jong Yeon Lee	(参考訳) 種々の対称性,制約,相互作用範囲を有するブラウンランダム回路において電荷輸送を制御する創発的流体力学を同定する。これは、二重ヒルベルト空間において有効ハミルトニアンとして作用するリンドブラッド作用素の平均動力学と低エネルギースペクトルの間の写像によって達成される。単一モード近似を用いて、この有効ハミルトニアンの分散励起状態を明示的に構成することにより、保存された多極モーメントと可変相互作用範囲を持つ多体系における拡散的、劣微分的、超拡散的緩和の包括的理解を提供する。我々はさらに,双極子保存が存在するにもかかわらず拡散緩和を示すエキゾチックなクリロフ空間分解流体力学を同定し,数値的に検証する。このアプローチは、ランダムなユニタリ時間発展の下で保存された演算子のダイナミクスを定性的に理解するための汎用的で汎用的なフレームワークを提供する。 We identify emergent hydrodynamics governing charge transport in Brownian random circuits with various symmetries, constraints, and ranges of interactions. This is accomplished via a mapping between the averaged dynamics and the low energy spectrum of a Lindblad operator, which acts as an effective Hamiltonian in a doubled Hilbert space. By explicitly constructing dispersive excited states of this effective Hamiltonian using a single mode approximation, we provide a comprehensive understanding of diffusive, subdiffusive, and superdiffusive relaxation in many-body systems with conserved multipole moments and variable interaction ranges. Our approach further allows us to identify exotic Krylov-space-resolved hydrodynamics exhibiting diffusive relaxation despite the presence of dipole conservation, which we verify numerically. Our approach provides a general and versatile framework to qualitatively understand the dynamics of conserved operators under random unitary time evolution.	翻訳日:2023-04-26 19:18:42 公開日:2023-04-25
# 公開データセットのみを持つ強く再現可能な物体検出器 A Strong and Reproducible Object Detector with Only Public Datasets ( http://arxiv.org/abs/2304.13027v1 ) ライセンス: Link先を確認	Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang	(参考訳) この研究は、COCO val2017で64.6 AP、COCO test-devで64.8 APを達成し、テスト時間を増やすことなく700万のパラメータしか持たない強力な再現可能なオブジェクト検出モデルであるFocal-Stable-DINOを提示する。強力なfocalnet-hugeバックボーンと効果的なstable-dino検出器の組み合わせを探索する。大規模プライベートデータやマージデータで多種多様なパラメータや複雑なトレーニング技術を使用する既存のsomaモデルとは異なり、このモデルは公開データセットオブジェクト365でのみトレーニングされるため、このアプローチの再現性が保証される。 This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64.6 AP on COCO val2017 and 64.8 AP on COCO test-dev using only 700M parameters without any test time augmentation. It explores the combination of the powerful FocalNet-Huge backbone with the effective Stable-DINO detector. Different from existing SOTA models that utilize an extensive number of parameters and complex training techniques on large-scale private data or merged data, our model is exclusively trained on the publicly available dataset Objects365, which ensures the reproducibility of our approach.	翻訳日:2023-04-26 19:18:23 公開日:2023-04-25
# 見ることは常に信じるとは限らない:AI生成画像の人間の知覚に関する定量的研究 Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images ( http://arxiv.org/abs/2304.13023v1 ) ライセンス: Link先を確認	Zeyu Lu, Di Huang, Lei Bai, Xihui Liu, Jingjing Qu, Wanli Ouyang	(参考訳) 写真は、人間が日常生活で何を経験したかを記録するための手段であり、しばしば信頼できる情報源と見なされる。しかし、人工知能(AI)技術の進歩が偽の写真を生み出し、写真に対する混乱と信頼の低下を引き起こすのではないかという懸念が高まっている。本研究の目的は、現在のaiベースの視覚コンテンツ生成モデルが一貫して人間の目を欺き、誤った情報を伝えることができるかどうかという疑問に答えることである。 50人の被験者を対象に高品質な定量的調査を行い、人間は実際の写真とaiが生成した偽の写真とを38.7%の程度で区別できないことを初めて明らかにした。また, 性別, 年齢, 経験など個人が生成するAIGC(AIGC)の背景は, 実際の写真とAI生成画像を区別する能力に大きく影響しないことがわかった。しかし、私たちは、人々が本物と偽の写真を区別するための手がかりとなる、AI生成画像にある種の欠陥があることを観察しています。我々の研究は、AI生成画像の潜在的なリスクに対する認識を高め、偽情報の拡散を防止するためにさらなる研究を促進することを願っている。ポジティブな観点から見れば、AI生成画像は様々な産業に革命をもたらす可能性があり、もしそれが適切に使用され、規制されたら、人類にとってより良い未来を生み出すことができる。 Photos serve as a way for humans to record what they experience in their daily lives, and they are often regarded as trustworthy sources of information. However, there is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos, which can create confusion and diminish trust in photographs. This study aims to answer the question of whether the current state-of-the-art AI-based visual content generation models can consistently deceive human eyes and convey false information. By conducting a high-quality quantitative study with fifty participants, we reveal, for the first time, that humans cannot distinguish between real photos and AI-created fake photos to a significant degree 38.7%. Our study also finds that an individual's background, such as their gender, age, and experience with AI-generated content (AIGC), does not significantly affect their ability to distinguish AI-generated images from real photographs. However, we do observe that there tend to be certain defects in AI-generated images that serve as cues for people to distinguish between real and fake photos. We hope that our study can raise awareness of the potential risks of AI-generated images and encourage further research to prevent the spread of false information. From a positive perspective, AI-generated images have the potential to revolutionize various industries and create a better future for humanity if they are used and regulated properly.	翻訳日:2023-04-26 19:18:11 公開日:2023-04-25
# 単一モーフィング攻撃検出における顔特徴の可視化 Face Feature Visualisation of Single Morphing Attack Detection ( http://arxiv.org/abs/2304.13021v1 ) ライセンス: Link先を確認	Juan Tapia and Christoph Busch	(参考訳) 本稿では,単一モーフィング攻撃検出のためのボナfideおよびモーフィング画像の検出を可能にする,異なる顔特徴抽出アルゴリズムの説明可能な可視化手法を提案する。特徴抽出は、生画像、形状、テクスチャ、周波数、圧縮に基づいている。この可視化は、国境政策、特に容疑者画像の詳細を調査しなければならない国境警備要員のためのグラフィカルユーザーインタフェースの開発に役立つかもしれない。ランダムフォレスト分類器は,3つのランドマークに基づく顔形態決定法と,frllデータベースでモーフィング画像が使用可能なスタイルガン型モーフィング法で訓練された。モーフィング攻撃検出では、離散コサイン変換法が合成画像の最良の結果とランドマークに基づく画像の特徴のBSIFを得た。 This paper proposes an explainable visualisation of different face feature extraction algorithms that enable the detection of bona fide and morphing images for single morphing attack detection. The feature extraction is based on raw image, shape, texture, frequency and compression. This visualisation may help to develop a Graphical User Interface for border policies and specifically for border guard personnel that have to investigate details of suspect images. A Random forest classifier was trained in a leave-one-out protocol on three landmarks-based face morphing methods and a StyleGAN-based morphing method for which morphed images are available in the FRLL database. For morphing attack detection, the Discrete Cosine-Transformation-based method obtained the best results for synthetic images and BSIF for landmark-based image features.	翻訳日:2023-04-26 19:17:49 公開日:2023-04-25
# 認定アンサンブル:s-リプシッツ性を持つ一般認定理論 Certifying Ensembles: A General Certification Theory with S-Lipschitzness ( http://arxiv.org/abs/2304.13019v1 ) ライセンス: Link先を確認	Aleksandar Petrov, Francisco Eiras, Amartya Sanyal, Philip H.S. Torr, Adel Bibi	(参考訳) ディープラーニングモデルの堅牢性の改善と保証は、激しい研究のトピックとなっている。複数の分類器を組み合わせてより良いモデルを提供するensemblingは、一般化、不確実性推定、キャリブレーション、概念ドリフトの効果の緩和に有効であることが示されている。しかし、認証された堅牢性に対するアンサンブルの影響は、あまり理解されていない。本研究では、S-Lipschitz分類器を導入してリプシッツ連続性を一般化し、アンサンブルの理論的堅牢性を分析する。この結果は,ロバスト分類器のアンサンブルがどの構成分類器よりも頑健である場合と,ロバストでない場合の条件が正確である。 Improving and guaranteeing the robustness of deep learning models has been a topic of intense research. Ensembling, which combines several classifiers to provide a better model, has shown to be beneficial for generalisation, uncertainty estimation, calibration, and mitigating the effects of concept drift. However, the impact of ensembling on certified robustness is less well understood. In this work, we generalise Lipschitz continuity by introducing S-Lipschitz classifiers, which we use to analyse the theoretical robustness of ensembles. Our results are precise conditions when ensembles of robust classifiers are more robust than any constituent classifier, as well as conditions when they are less robust.	翻訳日:2023-04-26 19:17:34 公開日:2023-04-25
# duett: 電子健康記録用のデュアルイベントタイムトランスフォーマー DuETT: Dual Event Time Transformer for Electronic Health Records ( http://arxiv.org/abs/2304.13017v1 ) ライセンス: Link先を確認	Alex Labach, Aslesha Pokhrel, Xiao Shi Huang, Saba Zuberi, Seung Eun Yi, Maksims Volkovs, Tomi Poutanen, Rahul G. Krishnan	(参考訳) 病院で記録された電子健康記録(ehrs)は、通常、高いスパーシティと不規則な観察によって特徴づけられる幅広い数値時系列データを含んでいる。このようなデータの効果的なモデリングは、時系列の性質、異なる種類の観測のセマンティックな関係、およびデータの空間構造における情報を活用する必要がある。自己教師付きトランスフォーマーは、nlpやコンピュータビジョンの様々な構造化タスクにおいて優れた性能を示している。しかし、多変量時系列データには、時間と記録されたイベントタイプという2次元にわたる構造化された関係が含まれており、時系列データへのトランスフォーマーの直接的な適用は、この異なる構造を利用しない。セルフアテンション層の二次スケーリングは、適切な入力工学を使わずに入力シーケンスの長さを著しく制限することができる。我々は,時間型とイベント型の両方の次元に対応するように設計されたトランスフォーマーの拡張であるduettアーキテクチャを紹介し,ehlデータからロバスト表現を生成する。 DuETTは、スパース時系列が一定の長さの正規シーケンスに変換される集約された入力を使用する。これにより、従来のERHトランスフォーマーモデルと比較して計算の複雑さが低下し、より重要なことに、より大きく深いニューラルネットワークの使用が可能になる。モデル事前学習のためのリッチで情報的な信号を提供する自己教師型予測タスクを訓練すると、MIMIC-IVおよびPhystoNet-2012 EHRデータセットから得られた複数の下流タスクにおける最先端のディープラーニングモデルよりも優れる。 Electronic health records (EHRs) recorded in hospital settings typically contain a wide range of numeric time series data that is characterized by high sparsity and irregular observations. Effective modelling for such data must exploit its time series nature, the semantic relationship between different types of observations, and information in the sparsity structure of the data. Self-supervised Transformers have shown outstanding performance in a variety of structured tasks in NLP and computer vision. But multivariate time series data contains structured relationships over two dimensions: time and recorded event type, and straightforward applications of Transformers to time series data do not leverage this distinct structure. The quadratic scaling of self-attention layers can also significantly limit the input sequence length without appropriate input engineering. We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions, yielding robust representations from EHR data. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length; this lowers the computational complexity relative to previous EHR Transformer models and, more importantly, enables the use of larger and deeper neural networks. When trained with self-supervised prediction tasks, that provide rich and informative signals for model pre-training, our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.	翻訳日:2023-04-26 19:17:21 公開日:2023-04-25
# サブサンプルリッジアンサンブル:同値と一般化されたクロスバリデーション Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation ( http://arxiv.org/abs/2304.13016v1 ) ライセンス: Link先を確認	Jin-Hong Du, Pratik Patil, Arun Kumar Kuchibhotla	(参考訳) 本研究では, 比例漸近状態におけるサブサンプリングに基づく隆起アンサンブルについて検討し, 比例比が一定となるような試料径に比例して特徴量が大きくなることを示した。リッジアンサンブルの2乗予測リスクを明示的なペナルティ$\lambda$と制限サブサンプルアスペクト比$\phi_s$(特徴サイズとサブサンプルサイズとの比率)の関数として解析することにより、達成可能なリスクで$(\lambda, \phi_s)$プレーンの輪郭を特徴づける。その結果、最適なリッジレスアンサンブル(すべての可能なサブサンプルに適合する)のリスクが、最適なリッジ予測器のそれと一致することを証明した。さらに,リッジアンサンブルの予測リスクを推定するためのサブサンプルサイズに対して,一般クロスバリデーション(GCV)の強い均一性を示す。これにより、サンプル分割なしでGCVベースのフルリッジレスアンサンブルのチューニングが可能となり、リスクが最適リッジリスクと一致する予測器が得られる。 We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $\lambda$ and the limiting subsample aspect ratio $\phi_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(\lambda, \phi_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk.	翻訳日:2023-04-26 19:16:57 公開日:2023-04-25

Title

Authors

Abstract

論文公表日・翻訳日

# ロボット群における人間のフィードバックの進化と創発的行動の発見

Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms ( http://arxiv.org/abs/2305.16148v1 )

ライセンス: Link先を確認

Connor Mattson, Daniel S. Brown

(参考訳) ロボット群は、しばしば観察が興味深い創発的な行動を示すが、エージェントの能力のセットの下でどのような群れの行動が現れるかを予測することは困難である。我々は、人間の入力を効果的に活用し、特定のマルチエージェントシステムから出現しうる集団行動の分類を、人間が事前に興味や可能な行動を知ることなく、自動的に発見することを目指している。提案手法は,自己教師付き学習とHuman-in-the-loopクエリを用いて,Swarm集団行動に対する類似性空間を学習することにより,ユーザの好みに適応する。学習した類似度指標と新規検索とクラスタリングを組み合わせることで,Swarm動作の空間を探索し,分類する。また,創発的行動につながる可能性のあるロボットコントローラを優先することで,創発的検索の効率を向上させる汎用ヒューリスティックも提案する。提案手法は,2つのロボット能力モデルを用いてシミュレーションを行い,先行研究よりもより豊かな創発的行動のセットを一貫して発見することを示す。コード、ビデオ、データセットはhttps://sites.google.com/view/evolving-novel-swarmsで入手できる。

Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand what behaviors are interesting or even possible. Our proposed approach adapts to user preferences by learning a similarity space over swarm collective behaviors using self-supervised learning and human-in-the-loop queries. We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors. We also propose several general-purpose heuristics that improve the efficiency of our novelty search by prioritizing robot controllers that are likely to lead to interesting emergent behaviors. We test our approach in simulation on two robot capability models and show that our methods consistently discover a richer set of emergent behaviors than prior work. Code, videos, and datasets are available at https://sites.google.com/view/evolving-novel-swarms.

翻訳日:2023-05-28 04:30:59 公開日:2023-04-25

# NUANCE:ネットワーク通信環境における近距離超音波攻撃

NUANCE: Near Ultrasound Attack On Networked Communication Environments ( http://arxiv.org/abs/2305.10358v1 )

ライセンス: Link先を確認

Forrest McKee and David Noever

(参考訳) 本研究では,近距離超音波トロイの木馬を用いて,amazon alexa音声サービスにおける一次不聴音攻撃ベクトルを調査し,攻撃面の特徴と不聴音音声コマンド発行の実際的意義について検討した。この研究は、各攻撃ベクトルを、エンタープライズ、モバイル、産業制御システム(ICS)フレームワークをカバーするMITRE ATT&CK行列から戦術またはテクニックにマッピングする。この実験では50台のウルトラソニックオーディオを生成して調査し、攻撃の有効性を評価し、未処理のコマンドが100%成功し、処理された音声が全体の成功率58%に達した。この体系的なアプローチは、事前に調整されていない攻撃面を刺激し、各ATT&CK識別器とテストされた防御手法を組み合わせながら、包括的検知と攻撃設計を確保する。本研究の主目的は、SUSBAM(Single Upper Sideband Amplitude Modulation)を用いて、聴覚音源からほぼ音声を生成することであり、音声コマンドを人間の聴覚以外の周波数域に変換することである。サイドバンドを小さくすることで、16-22kHzから6kHzの最小出力を達成できる。研究は、1つのデバイスが同時に複数のアクションやデバイスをトリガーする1対多の攻撃面を調査した。さらに、この研究は可逆性や復調性を示し、潜在的な警告手法と音声ステガノグラフィのような秘密メッセージを埋め込む可能性を示唆している。

This study investigates a primary inaudible attack vector on Amazon Alexa voice services using near ultrasound trojans and focuses on characterizing the attack surface and examining the practical implications of issuing inaudible voice commands. The research maps each attack vector to a tactic or technique from the MITRE ATT&CK matrix, covering enterprise, mobile, and Industrial Control System (ICS) frameworks. The experiment involved generating and surveying fifty near-ultrasonic audios to assess the attacks' effectiveness, with unprocessed commands having a 100% success rate and processed ones achieving a 58% overall success rate. This systematic approach stimulates previously unaddressed attack surfaces, ensuring comprehensive detection and attack design while pairing each ATT&CK Identifier with a tested defensive method, providing attack and defense tactics for prompt-response options. The main findings reveal that the attack method employs Single Upper Sideband Amplitude Modulation (SUSBAM) to generate near-ultrasonic audio from audible sources, transforming spoken commands into a frequency range beyond human-adult hearing. By eliminating the lower sideband, the design achieves a 6 kHz minimum from 16-22 kHz while remaining inaudible after transformation. The research investigates the one-to-many attack surface where a single device simultaneously triggers multiple actions or devices. Additionally, the study demonstrates the reversibility or demodulation of the inaudible signal, suggesting potential alerting methods and the possibility of embedding secret messages like audio steganography.

翻訳日:2023-05-21 10:34:40 公開日:2023-04-25

# 固有値解に対する逆二次湯川プラス逆二乗ポテンシャルによる位相効果

Topological Effects With Inverse Quadratic Yukawa Plus Inverse Square Potential on Eigenvalue Solutions ( http://arxiv.org/abs/2305.04823v1 )

ライセンス: Link先を確認

Faizuddin Ahmed

(参考訳) 本研究では,非相対論的シュロディンガー波動方程式を,点状大域モノポール(PGM)の背景に相互作用ポテンシャルを持つ量子流束場の影響下で検討する。実際、逆二次湯川プラス逆2乗ポテンシャルを考え、遠心項におけるグリーン・アルドリッチ近似スキームを用いた半径式を導出する。パラメトリックなNikiforov-Uvarov法を用いて近似固有値解を決定し,解析する。その後、指数ポテンシャルの級数展開法を用いて同じポテンシャルを用いて放射波方程式を導出し、解析的に解いた。エネルギー固有値は、平坦な空間結果と比較して点状の大域モノポールの位相的欠陥によってシフトすることを示す。加えて、エネルギー固有値はアハルノフ・ボーム効果の類似性を示す量子束場に依存することが分かる。

In this analysis, we study the non-relativistic Schrodinger wave equation under the influence of quantum flux field with interactions potential in the background of a point-like global monopole (PGM). In fact, we consider an inverse quadratic Yukawa plus inverse square potential and derive the radial equation employing the Greene-Aldrich approximation scheme in the centrifugal term. We determine the approximate eigenvalue solution using the parametric Nikiforov-Uvarov method and analyze the result. Afterwards, we derive the radial wave equation using the same potential employing a power series expansion method in the exponential potential and solve it analytically. We show that the energy eigenvalues are shifted by the topological defects of a point-like global monopole compared to the flat space result. In addition, we see that the energy eigenvalues depend on the quantum flux field that shows an analogue to the Aharonov-Bohm effect

翻訳日:2023-05-14 21:08:06 公開日:2023-04-25

# dartboardsによる量子コンピューティング

Quantum Computing with dartboards ( http://arxiv.org/abs/2305.06153v1 )

ライセンス: Link先を確認

Ishaan Ganti, Srinivasan S. Iyengar

(参考訳) ダーツゲーム用に構築されたルールを用いて,量子コンピューティングを物理的に魅力的かつエレガントに表現する。ダーツボードは量子力学における状態空間を表すために使用され、ダーツを投げる行為は測定の概念や量子力学における波動関数の崩壊とよく似ていることが示されている。アナロジーは任意の次元のダートボードを用いた任意の次元空間で構成され、そのような任意の空間に対して不確かさの ``visual''' の記述も与える。最後に、量子ビットと量子コンピューティングアルゴリズムの接続は、量子アルゴリズムとdart-throwコンペティションの類似性を構築する可能性を開く。

We present a physically appealing and elegant picture for quantum computing using rules constructed for a game of darts. A dartboard is used to represent the state space in quantum mechanics and the act of throwing the dart is shown to have close similarities to the concept of measurement, or collapse of the wavefunction in quantum mechanics. The analogy is constructed in arbitrary dimensional spaces, that is using arbitrary dimensional dartboards, and for for such arbitrary spaces this also provides us a ``visual'' description of uncertainty. Finally, connections of qubits and quantum computing algorithms is also made opening the possibility to construct analogies between quantum algorithms and coupled dart-throw competitions.

翻訳日:2023-05-14 20:57:29 公開日:2023-04-25

# UAVのパーチのためのマルチマーカーを用いた視覚的目標位置推定

Vision-based Target Pose Estimation with Multiple Markers for the Perching of UAVs ( http://arxiv.org/abs/2304.14838v1 )

ライセンス: Link先を確認

Truong-Dong Do, Nguyen Xuan-Mung and Sung-Kyung Hong

(参考訳) 自律型ナノ航空機は、その効率性と操縦性のために、監視および監視活動でますます人気を集めている。目標地点に到達すると、ドローンは任務の間も活動し続けなければならない。車両はそのような状況下でモーターをパーチし停止させ、エネルギーを節約し、また、好ましくない飛行条件下で静止位置を維持することができる。パーチング目標推定フェーズでは,マーカを備えた視覚カメラの定常的かつ高精度化が大きな課題である。大きなマーカーを使用すると、遠くから素早く検出できますが、ドローンが近づくと、カメラの視界からすぐに消えてしまいます。本稿では,上記の問題に対処するために,複数のマーカーを用いた視覚的ターゲットポーズ推定手法を提案する。まず, より広い範囲で検出能力を向上させるため, 小型のマーカーを内蔵したパーチングターゲットの設計を行った。第2に、単眼カメラを用いて検出されたマーカーから飛行車両の相対的なポーズを算出する。次にカルマンフィルタを適用し、特に予期せぬ理由により測定データが欠落している場合に、より安定して信頼性の高いポーズ推定を行う。最後に,複数マーカーからのポーズデータをマージするアルゴリズムを導入した。その後、ポーズは位置制御装置に送られ、ドローンとマーカーの中央を調整し、ターゲットのパーチに操る。実験の結果,本手法の有効性と有効性が示された。ドローンは25mmの丸みを帯びた磁石でマーカーの中央に到達することができる。

Autonomous Nano Aerial Vehicles have been increasingly popular in surveillance and monitoring operations due to their efficiency and maneuverability. Once a target location has been reached, drones do not have to remain active during the mission. It is possible for the vehicle to perch and stop its motors in such situations to conserve energy, as well as maintain a static position in unfavorable flying conditions. In the perching target estimation phase, the steady and accuracy of a visual camera with markers is a significant challenge. It is rapidly detectable from afar when using a large marker, but when the drone approaches, it quickly disappears as out of camera view. In this paper, a vision-based target poses estimation method using multiple markers is proposed to deal with the above-mentioned problems. First, a perching target with a small marker inside a larger one is designed to improve detection capability at wide and close ranges. Second, the relative poses of the flying vehicle are calculated from detected markers using a monocular camera. Next, a Kalman filter is applied to provide a more stable and reliable pose estimation, especially when the measurement data is missing due to unexpected reasons. Finally, we introduced an algorithm for merging the poses data from multi markers. The poses are then sent to the position controller to align the drone and the marker's center and steer it to perch on the target. The experimental results demonstrated the effectiveness and feasibility of the adopted approach. The drone can perch successfully onto the center of the markers with the attached 25mm-diameter rounded magnet.

翻訳日:2023-05-07 16:14:34 公開日:2023-04-25

# 金融マーケティングのためのマルチタスク学習によるターゲット間の依存性のモデル化

Curriculum Modeling the Dependence among Targets with Multi-task Learning for Financial Marketing ( http://arxiv.org/abs/2305.01514v1 )

ライセンス: Link先を確認

Yunpeng Weng, Xing Tang, Liang Chen, Xiuqiang He

(参考訳) 様々な実世界のアプリケーションに対するマルチタスク学習は通常、論理的逐次依存を伴うタスクを伴う。例えば、オンラインマーケティングでは、$impression \rightarrow click \rightarrow conversion$のカスケード動作パターンは、通常マルチタスク方式で複数のタスクとしてモデル化される。これらの手法は、タスクシーケンスとともに正のフィードバックがスペーサになるにつれて、長いパスシーケンシャルなタスクに対するデータスペーサリティ問題を緩和する。しかし、下流タスクではエラーの蓄積と負の転送が深刻な問題となる。特に、トレーニングの初期段階では、以前のタスクのパラメータの最適化はまだ収束しておらず、ダウンストリームタスクに転送される情報は否定的である。本稿では,複数の逐次的依存タスク学習のための新しい事前情報マージ(\textbf{PIMM})モジュールを用いて,タスク間の論理的依存を明示的にモデル化する事前情報マージモデル(\textbf{PIMM})を提案する。具体的には、PIMは、トレーニング中に下流タスクに転送するためのソフトサンプリング戦略を用いて、真のラベル情報または先行タスク予測をランダムに選択する。難易度の高いカリキュラムパラダイムに従って,サンプリング確率を動的に調整することで,下流タスクがトレーニングとともに効果的な情報を取得することを保証する。公開データセットと製品データセットのオフライン実験結果は、PIMMが最先端のベースラインを上回っていることを確認する。さらに,大規模なFinTechプラットフォームにPIMMをデプロイし,オンライン実験によりPIMMの有効性を実証した。

Multi-task learning for various real-world applications usually involves tasks with logical sequential dependence. For example, in online marketing, the cascade behavior pattern of $impression \rightarrow click \rightarrow conversion$ is usually modeled as multiple tasks in a multi-task manner, where the sequential dependence between tasks is simply connected with an explicitly defined function or implicitly transferred information in current works. These methods alleviate the data sparsity problem for long-path sequential tasks as the positive feedback becomes sparser along with the task sequence. However, the error accumulation and negative transfer will be a severe problem for downstream tasks. Especially, at the beginning stage of training, the optimization for parameters of former tasks is not converged yet, and thus the information transferred to downstream tasks is negative. In this paper, we propose a prior information merged model (\textbf{PIMM}), which explicitly models the logical dependence among tasks with a novel prior information merged (\textbf{PIM}) module for multiple sequential dependence task learning in a curriculum manner. Specifically, the PIM randomly selects the true label information or the prior task prediction with a soft sampling strategy to transfer to the downstream task during the training. Following an easy-to-difficult curriculum paradigm, we dynamically adjust the sampling probability to ensure that the downstream task will get the effective information along with the training. The offline experimental results on both public and product datasets verify that PIMM outperforms state-of-the-art baselines. Moreover, we deploy the PIMM in a large-scale FinTech platform, and the online experiments also demonstrate the effectiveness of PIMM.

翻訳日:2023-05-07 16:03:46 公開日:2023-04-25

# popsim:都市資源の公平配分のための個人レベル人口シミュレータ

PopSim: An Individual-level Population Simulator for Equitable Allocation of City Resources ( http://arxiv.org/abs/2305.02204v1 )

ライセンス: Link先を確認

Khanh Duy Nguyen, Nima Shahbazi and Abolfazl Asudeh

(参考訳) 人種に基づく歴史的体系的な排除戦術は、特定の人口集団の人々が特定の都市部に集結することを強制した。このような分離の倫理的側面とは別に、これらの政策は都市内の公共交通、医療、教育などの都市資源の配分に影響を及ぼす。これらの問題に対処するための最初のステップは、公平なリソース割り当ての状態を評価する監査を行うことである。しかし、プライバシーや機密性の懸念から、人口統計情報を含む個人レベルのデータは公開できない。人口統計データを活用することで、人口統計情報を用いた半合成個人レベルの人口データを生成するシステムであるPopSimを導入する。 PopSimを使って、シカゴ市のために複数のベンチマークデータセットを生成し、それらを検証するために広範な統計的評価を行う。都市資源の公平な配分を監査するためのシステムの適用例を示すいくつかのケーススタディで,我々はさらにデータセットを活用した。

Historical systematic exclusionary tactics based on race have forced people of certain demographic groups to congregate in specific urban areas. Aside from the ethical aspects of such segregation, these policies have implications for the allocation of urban resources including public transportation, healthcare, and education within the cities. The initial step towards addressing these issues involves conducting an audit to assess the status of equitable resource allocation. However, due to privacy and confidentiality concerns, individual-level data containing demographic information cannot be made publicly available. By leveraging publicly available aggregated demographic statistics data, we introduce PopSim, a system for generating semi-synthetic individual-level population data with demographic information. We use PopSim to generate multiple benchmark datasets for the city of Chicago and conduct extensive statistical evaluations to validate those. We further use our datasets for several case studies that showcase the application of our system for auditing equitable allocation of city resources.

翻訳日:2023-05-07 15:54:51 公開日:2023-04-25

# TCN-LSTMとマルチタスク学習モデルによる車線変更意図認識と運転状況予測への統一的アプローチ

A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models ( http://arxiv.org/abs/2304.13732v1 )

ライセンス: Link先を確認

Renteng Yuan, Mohamed Abdel-Aty, Xin Gu, Ou Zheng, Qiaojun Xiang

(参考訳) Lane Change (LC) は、連続的で複雑な操作プロセスである。 LCプロセスの正確な検出と予測は、交通参加者が周囲の環境をよりよく理解し、LCの潜在的な安全性を認識し、交通安全を改善するのに役立つ。本稿では,lc意図認識(lc-ir)モデルとlc状態予測(lc-sp)モデルを開発した。長い短期記憶ユニット(TCN-LSTM)を持つ新しいアンサンブル時間畳み込みネットワークが最初に提案され、シーケンシャルデータにおける長距離依存関係をキャプチャする。次に、3つのマルチタスクモデル(MTL-LSTM, MTL-TCN, MTL-TCN -LSTM)を開発し、出力インジケータの内在的関係を捉える。さらに,LC意図認識・駆動状態予測(LC-IR-SP)のための統合モデリングフレームワークを開発した。提案モデルの性能を検証するため,CitySimデータセットから1023台の車両軌跡を抽出した。ピアソン係数は関連する指標を決定するために用いられる。その結果,150フレームを入力長として用いたTN-LSTMモデルは,LC意図分類においてTNおよびLSTMモデルよりも96.67%精度が高く,各クラスに対してよりバランスの取れた結果が得られた。提案された3つのマルチタスク学習モデルは、対応するシングルタスクモデルと比較して、平均24.24%、平均絶対誤差(MAE)が22.86%、ルート平均角誤差(RMSE)がそれぞれ大幅に向上した。開発したLC-IR-SPモデルは,車線変更行動の識別,リアルタイム交通競合指数の算出,車両制御戦略の改善に,自動運転車に有望な応用を期待できる。

Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.

翻訳日:2023-04-28 15:40:03 公開日:2023-04-25

# 名前の由来は? CADファイルのユーザ指定名を用いた言語モデルにおけるアセンブリー部分意味的知識の評価

What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files ( http://arxiv.org/abs/2304.14275v1 )

ライセンス: Link先を確認

Peter Meltzer, Joseph G. Lambourne, Daniele Grandi

(参考訳) 集合における部分的および部分的関係に関する意味的知識は、設計リポジトリの検索からエンジニアリング的知識ベースの構築まで、様々なタスクに有用である。本稿では,設計者がCAD(Computer Aided Design)ソフトウェアで使用する自然言語名は,そのような知識の貴重な情報源であり,Large Language Models(LLM)には,このデータを扱う上で有用なドメイン固有情報や,他のCADやエンジニアリング関連のタスクが含まれていることを提案する。特に、自然言語部分、特徴、文書名の大きなコーパスを抽出し、これを用いて、事前学習された言語モデルが、前例のない3つの自己教師型タスクにおいて、多数のベンチマークを上回り得ることを定量的に示す。さらに,テキストデータコーパスの微調整により全タスクのパフォーマンスが向上し,これまで無視されてきたテキストデータの価値が証明された。また,テキストデータのみを用いた LLM の利用に対する重要な制限も指摘し,本研究はマルチモーダルテキスト幾何学モデルへのさらなる取り組みに強い動機を与える。この分野でのさらなる作業を支援するために、私たちはすべてのデータとコードを公開しています。

Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.

翻訳日:2023-04-28 13:02:19 公開日:2023-04-25

# グラフニューラルネットワークはノード分類に役立つか--ノード識別性に関するホモフィリー原理の検討

When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability ( http://arxiv.org/abs/2304.14274v1 )

ライセンス: Link先を確認

Sitao Luan, Chenqing Hua, Minkai Xu, Qincheng Lu, Jiaqi Zhu, Xiao-Wen Chang, Jie Fu, Jure Leskovec, Doina Precup

(参考訳) 同じラベルを持つノードが接続される可能性が高く、ノード分類(nc)タスクにおいてニューラルネットワーク(nns)よりもグラフニューラルネットワーク(gnns)のパフォーマンスが優れている主な理由は、ホモフィリー原理(homophily principle)であると考えられている。近年, ホモフィリー原理が破られたとしても, 同一クラスのノードが類似した近傍パターンを共有する限り, GNNの優位性は維持され, ホモフィリーの有効性を疑問視する理論的な結果が開発されている。しかし、この議論はクラス内ノード識別可能性(ND)のみを考慮し、クラス間NDを無視し、ホモフィリーの効果を研究するには不十分である。本論では,ND の理想的状況はクラス間 ND よりもクラス内 ND が小さいことである,と論じる。この考え方を定式化し, ホモフィリーの理解を深めるために, CSBM-H (Contextual Stochastic Block Model for Homophily) を提案し, 確率ベイズ誤差 (Probabilistic Bayes Error, PBE) と期待負負のKL偏差 (presented Negative KL-divergence,ENKL) という2つの指標を定義し, NDを定量化する。結果を可視化し、詳細な分析を行う。実験により,GNNの優越性は,KPM (Kernel Performance Metric) の定義に基づくホモフィリーレベルにかかわらず,クラス内NDとクラス間NDの両方に密接に関係していることが確認された。 KPMは、新しい非線形機能ベースのメトリクスであり、合成および実世界のデータセット上でのGNNのアドバンテージとデメリットを明らかにする上で、既存のホモフィリメトリックよりも効果的であることがテストされている。

Homophily principle, i.e. nodes with the same labels are more likely to be connected, was believed to be the main reason for the performance superiority of Graph Neural Networks (GNNs) over Neural Networks (NNs) on Node Classification (NC) tasks. Recently, people have developed theoretical results arguing that, even though the homophily principle is broken, the advantage of GNNs can still hold as long as nodes from the same class share similar neighborhood patterns, which questions the validity of homophily. However, this argument only considers intra-class Node Distinguishability (ND) and ignores inter-class ND, which is insufficient to study the effect of homophily. In this paper, we first demonstrate the aforementioned insufficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and have a better understanding of homophily, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and Expected Negative KL-divergence (ENKL), to quantify ND, through which we can also find how intra- and inter-class ND influence ND together. We visualize the results and give detailed analysis. Through experiments, we verified that the superiority of GNNs is indeed closely related to both intra- and inter-class ND regardless of homophily levels, based on which we define Kernel Performance Metric (KPM). KPM is a new non-linear, feature-based metric, which is tested to be more effective than the existing homophily metrics on revealing the advantage and disadvantage of GNNs on synthetic and real-world datasets.

翻訳日:2023-04-28 13:01:57 公開日:2023-04-25

# 子どもにAI教育を施すための監査フレームワーク

An Audit Framework for Adopting AI-Nudging on Children ( http://arxiv.org/abs/2304.14338v1 )

ライセンス: Link先を確認

Marianna Ganapini and Enrico Panai

(参考訳) これはAIナッジのための監査フレームワークである。文献で議論されるような静的なニュジングとは違って,ここでは大量のデータを使用してパーソナライズされたダイナミックなフィードバックとインターフェースを提供するニュジングのタイプに注目します。私たちはこれをAIナッジと呼んでいる(Lanzing, 2019, pp. 549; Yeung, 2017)。ここで概説した監査の最終的な目標は、監査の勧告、要件、提案(言い換えれば、監査の基準)に従えば、ナッジを使用するAIシステムが道徳的慣性や中立性のレベルを維持することを保証することである。意図しないネガティブな結果が発生した場合、監査は、実施可能なリスク軽減メカニズムを示唆する。意図しないポジティブな結果の場合、いくつかの強化メカニズムが示唆される。 IBM-Notre Dame Tech Ethics Labがスポンサー

This is an audit framework for AI-nudging. Unlike the static form of nudging usually discussed in the literature, we focus here on a type of nudging that uses large amounts of data to provide personalized, dynamic feedback and interfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). The ultimate goal of the audit outlined here is to ensure that an AI system that uses nudges will maintain a level of moral inertia and neutrality by complying with the recommendations, requirements, or suggestions of the audit (in other words, the criteria of the audit). In the case of unintended negative consequences, the audit suggests risk mitigation mechanisms that can be put in place. In the case of unintended positive consequences, it suggests some reinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab

翻訳日:2023-04-28 12:32:28 公開日:2023-04-25

# チャットボットの教育への脅威を超えて考える--文字とコーディングプロセスの可視化

Thinking beyond chatbots' threat to education: Visualizations to elucidate the writing and coding process ( http://arxiv.org/abs/2304.14342v1 )

ライセンス: Link先を確認

Badri Adhikari

(参考訳) 言語教育と学習のための教育実践の展望は、主に結果駆動のアプローチを中心にしている。最近の大規模言語モデルのアクセシビリティは、これらのアプローチを徹底的に妨げている。この混乱を考慮し、言語教育と学習プラクティスを変革する上で、言語学習が人間の知性の発展において重要な役割を担っていることに注意する必要がある。ライティングとコンピュータプログラミングは、教育システムにとって不可欠な2つのスキルです。何とどのように書くかが思考を形作り、自己指向学習の道筋を定めている。ほとんどの教育者は、‘プロセス’と‘プロダクト’はどちらも重要かつ不可分であることを理解しているが、ほとんどの教育環境では、学習者の形成過程に対する建設的なフィードバックを提供することは困難である。例えば、学習者が入力したコードが実行されるかどうかをコンピュータプログラミングで評価するのは簡単である。しかし、学習者の創造的プロセスを評価し、プロセスに対して有意義なフィードバックを提供するのは難しい。教育(および学習)におけるこの長年の課題に対処するため、本研究では、学習者の執筆やプログラミングプロセスの本質的で教えられた能力を要約する、新しい可視化ツールセットを提案する。これらの対話型プロセス可視化(PV)は、学習者に洞察力、権限、パーソナライズされたプロセス指向のフィードバックを提供する。ツールボックスは、教育者や学習者がテストする準備ができており、www.processfeedback.orgで公開されている。学習者のプロセス - 自己、仲間、教育者から - に対するフィードバックを提供することに重点を置くことで、学習者の自己指向学習やメタ認知といった高次スキル獲得能力が向上する。

The landscape of educational practices for teaching and learning languages has been predominantly centered around outcome-driven approaches. The recent accessibility of large language models has thoroughly disrupted these approaches. As we transform our language teaching and learning practices to account for this disruption, it is important to note that language learning plays a pivotal role in developing human intelligence. Writing and computer programming are two essential skills integral to our education systems. What and how we write shapes our thinking and sets us on the path of self-directed learning. While most educators understand that `process' and `product' are both important and inseparable, in most educational settings, providing constructive feedback on a learner's formative process is challenging. For instance, it is straightforward in computer programming to assess whether a learner-submitted code runs. However, evaluating the learner's creative process and providing meaningful feedback on the process can be challenging. To address this long-standing issue in education (and learning), this work presents a new set of visualization tools to summarize the inherent and taught capabilities of a learner's writing or programming process. These interactive Process Visualizations (PVs) provide insightful, empowering, and personalized process-oriented feedback to the learners. The toolbox is ready to be tested by educators and learners and is publicly available at www.processfeedback.org. Focusing on providing feedback on a learner's process--from self, peers, and educators--will facilitate learners' ability to acquire higher-order skills such as self-directed learning and metacognition.

翻訳日:2023-04-28 12:21:56 公開日:2023-04-25

# 量子クエリー通信シミュレーションにおける対称性の役割

The Role of Symmetry in Quantum Query-to-Communication Simulation ( http://arxiv.org/abs/2012.05233v2 )

ライセンス: Link先を確認

Sourav Chakraborty, Arkadev Chattopadhyay, Peter H{\o}yer, Nikhil S. Mande, Manaswi Paraashar, Ronald de Wolf

(参考訳) Buhrman, Cleve and Wigderson (STOC'98) は、すべてのブール関数 f : {-1,1}^n to {-1,1} と G in {AND_2, XOR_2} に対して、合成関数 f o G の有界エラー量子通信複雑性は O(Q(f) log n) と等しいことを示した。これは、Alice が f に対して最適な量子クエリアルゴリズムを実行し、各クエリを実装するために、O(log n) qubit の丸い通信を用いて実現している。これは古典的な設定とは対照的であり、R^{cc}(f o G) が少なくとも 2R(f) であることは容易に示され、R^{cc} と R はそれぞれ有界エラー通信とクエリ複雑性を表す。量子設定におけるO(log n)オーバーヘッドはいくつかの関数に対して必要であり、したがってBCWシミュレーションは厳密であることを示す。ここでは、我々の研究に先立ち、すべての f に対して Q^{cc}(f o G) = O(Q(f)) の可能性と {AND_2, XOR_2} におけるすべての G が除外されていないことに注意する。より具体的には、以下に示す。 - 対数 n のオーバーヘッドは、f が対称であるときに *not* が要求されることを示し、Aaronson と Ambainis の結果を集合交叉関数に対して一般化する(Theory of Computing'05)。 -上記のことを証明するため、fがor関数であるときに結果を証明できる雑音振幅増幅の効率的な分散バージョンを設計。 - 上記の最初の結果から、BCWシミュレーションにおける対数 n のオーバーヘッドは、f が推移的であっても回避できるかどうかを問うことができるが、これは対称性の弱い概念である。量子通信プロトコルが任意に1/2に近い誤差確率を許容しても、ある推移関数に対して、ログ n のオーバーヘッドが依然として必要であることを示すことで、強い負の答えを与える。また、bcwシミュレーションにおいて、有界エラー通信モデルにおいて、log n のオーバーヘッドを必要とする関数を構築するための一般的なレシピも提供します。

Buhrman, Cleve and Wigderson (STOC'98) showed that for every Boolean function f : {-1,1}^n to {-1,1} and G in {AND_2, XOR_2}, the bounded-error quantum communication complexity of the composed function f o G equals O(Q(f) log n), where Q(f) denotes the bounded-error quantum query complexity of f. This is achieved by Alice running the optimal quantum query algorithm for f, using a round of O(log n) qubits of communication to implement each query. This is in contrast with the classical setting, where it is easy to show that R^{cc}(f o G) is at most 2R(f), where R^{cc} and R denote bounded-error communication and query complexity, respectively. We show that the O(log n) overhead is required for some functions in the quantum setting, and thus the BCW simulation is tight. We note here that prior to our work, the possibility of Q^{cc}(f o G) = O(Q(f)), for all f and all G in {AND_2, XOR_2}, had not been ruled out. More specifically, we show the following. - We show that the log n overhead is *not* required when f is symmetric, generalizing a result of Aaronson and Ambainis for the Set-Disjointness function (Theory of Computing'05). - In order to prove the above, we design an efficient distributed version of noisy amplitude amplification that allows us to prove the result when f is the OR function. - In view of our first result above, one may ask whether the log n overhead in the BCW simulation can be avoided even when f is transitive, which is a weaker notion of symmetry. We give a strong negative answer by showing that the log n overhead is still necessary for some transitive functions even when we allow the quantum communication protocol an error probability that can be arbitrarily close to 1/2. - We also give, among other things, a general recipe to construct functions for which the log n overhead is required in the BCW simulation in the bounded-error communication model.

翻訳日:2023-04-27 18:53:47 公開日:2023-04-25

# 機械学習を用いた救急部トリアージ中の敗血症検出

Detection of sepsis during emergency department triage using machine learning ( http://arxiv.org/abs/2204.07657v5 )

ライセンス: Link先を確認

Oleksandr Ivanov, Karin Molander, Robert Dunne, Stephen Liu, Deena Brecher, Kevin Masek, Erica Lewis, Lisa Wolf, Debbie Travers, Deb Delaney, Kyla Montgomery, Christian Reilly

(参考訳) 敗血症は臓器機能不全を伴う生命を脅かす疾患であり、世界でも主要な死因である。敗血症の治療が数時間遅れても死亡率が上昇する。緊急部トリアージ中の敗血症の早期発見は、実験室分析、抗生物質投与、その他の敗血症治療プロトコルの早期開始を可能にする。本研究の目的は、標準敗血症スクリーニングアルゴリズム(感染源を含むsirs)のedトリアージにおける敗血症検出性能と、ehlトリアージデータに基づいて訓練された機械学習アルゴリズムを比較することである。 16病院のトリアージデータを用いた機械学習モデル(KATE Sepsis)を開発した。 KATEシープシスと標準スクリーニングは、成人の医療記録512,949件を振り返って評価した。 KATE Sepsis の AUC は 0.9423 (0.9401 - 0.9441) であり、感度は 71.09% (70.12% - 71.98%)、特異性は94.81% (94.75% - 94.87%) である。標準スクリーニングでは 0.6826 (0.6774 - 0.6878)、感度は 40.8% (39.71% - 41.86%)、特異度は95.72% (95.68% - 95.78%) である。 kate sepsisモデルは、77.67% (75.78% -79.42%) の重症敗血症検出感度、86.95% (84.2% - 88.81%) の敗血症性ショック検出感度を示す。標準スクリーニングプロトコルは、重症敗血症の検出感度が43.06% (41% - 45.87%)、敗血症性ショック検出感度が40% (36.55% - 43.26%)であることを示している。今後の研究は、KATE Sepsisの抗生物質、寛容率、致死率、死亡率に対する将来的な影響に焦点を当てるべきである。

Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Even a few hours of delay in the treatment of sepsis results in increased mortality. Early detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to compare sepsis detection performance at ED triage (prior to the use of laboratory diagnostics) of the standard sepsis screening algorithm (SIRS with source of infection) and a machine learning algorithm trained on EHR triage data. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16participating hospitals. KATE Sepsis and standard screening were retrospectively evaluated on the adult population of 512,949 medical records. KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of 71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standard screening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of 40.8% (39.71% - 41.86%) and specificity of95.72% (95.68% - 95.78%). The KATE Sepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%) sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivity in detecting septic shock. The standard screening protocol demonstrates 43.06% (41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%) sensitivity in detecting septic shock. Future research should focus on the prospective impact of KATE Sepsis on administration of antibiotics, readmission rate, morbidity and mortality.

翻訳日:2023-04-27 18:47:07 公開日:2023-04-25

# 新型コロナウイルス感染拡大に伴うメンタルヘルスのパンデミック : 公衆メンタルヘルスの窓口としてのソーシャルメディア

Mental Health Pandemic during the COVID-19 Outbreak: Social Media as a Window to Public Mental Health ( http://arxiv.org/abs/2203.00237v4 )

ライセンス: Link先を確認

Michelle Bak, Chungyi Chiu, Jessie Chin

(参考訳) ロックダウンやソーシャルディスタンシングなどの新型コロナウイルス(covid-19)パンデミックの予防対策が強化され、若者の社会的孤立(社会的ニーズと社会的環境の規定の相違)に対する認識が著しく高まった。社会的孤立は、抑うつ症状の危険因子である状況的孤独(環境変化から生じる孤独)と密接に関連している。以前の研究は、脆弱な若者がredditのようなオンラインソーシャルプラットフォームから支援を求める可能性が高いことを示唆していた。そこで本研究は、新型コロナウイルス(covid-19)の流行による孤独感サブredditにおけるうつ病関連対話の同定と分析を目的としている。本研究は,ロジスティック回帰と話題モデルを用いて,パンデミック前後の孤独度に関する抑うつ関連議論を分類・検討した。その結果、パンデミックの期間中に課題が報告されたうつ病に関する議論(メンタルヘルス、社会的相互作用、家族、感情など)の量が大幅に増加した。また, 抑うつに関する議論から, デート(プレパンデミック)からオンライン交流やコミュニティ(パンデミック)への転換がみられ, パンデミックにおけるオンラインソーシャルサポートの必要性や表現の高まりが示唆された。現在の調査結果は、ソーシャルメディアが公衆のメンタルヘルスを監視する窓口になる可能性を示している。今後の研究は,危機時の監視システム設計に影響を及ぼす現在のアプローチを臨床的に検証する。

Intensified preventive measures during the COVID-19 pandemic, such as lockdown and social distancing, heavily increased the perception of social isolation (i.e., a discrepancy between one's social needs and the provisions of the social environment) among young adults. Social isolation is closely associated with situational loneliness (i.e., loneliness emerging from environmental change), a risk factor for depressive symptoms. Prior research suggested vulnerable young adults are likely to seek support from an online social platform such as Reddit, a perceived comfortable environment for lonely individuals to seek mental health help through anonymous communication with a broad social network. Therefore, this study aims to identify and analyze depression-related dialogues on loneliness subreddits during the COVID-19 outbreak, with the implications on depression-related infoveillance during the pandemic. Our study utilized logistic regression and topic modeling to classify and examine depression-related discussions on loneliness subreddits before and during the pandemic. Our results showed significant increases in the volume of depression-related discussions (i.e., topics related to mental health, social interaction, family, and emotion) where challenges were reported during the pandemic. We also found a switch in dominant topics emerging from depression-related discussions on loneliness subreddits, from dating (prepandemic) to online interaction and community (pandemic), suggesting the increased expressions or need of online social support during the pandemic. The current findings suggest the potential of social media to serve as a window for monitoring public mental health. Our future study will clinically validate the current approach, which has implications for designing a surveillance system during the crisis.

翻訳日:2023-04-27 18:45:32 公開日:2023-04-25

# 手術映像理解のための概念グラフニューラルネットワーク

Concept Graph Neural Networks for Surgical Video Understanding ( http://arxiv.org/abs/2202.13402v2 )

ライセンス: Link先を確認

Yutong Ban, Jennifer A. Eckhoff, Thomas M. Ward, Daniel A. Hashimoto, Ozanan R. Meireles, Daniela Rus, Guy Rosman

(参考訳) 私たちは世界の知識と理解を常に統合し、見るものに対する私たちの解釈を強化します。この能力は、AI強化手術など、複数のエンティティや概念を推論するアプリケーションドメインにおいて不可欠である。本稿では,概念知識を時間的概念グラフネットワークを介して時間分析タスクに統合する新しい手法を提案する。提案するネットワークでは,大域的知識グラフが手術例の時間的分析に組み込まれ,データに適用される概念や関係の意味を学習する。本研究は,安全の重要視の検証や,パークランドグレーティングスケールの推定などの作業において,手術映像データから得られた結果を示す。その結果,本手法は複雑なベンチマークの認識と検出を改善し,他の解析的応用も可能となった。

We constantly integrate our knowledge and understanding of the world to enhance our interpretation of what we see. This ability is crucial in application domains which entail reasoning about multiple entities and concepts, such as AI-augmented surgery. In this paper, we propose a novel way of integrating conceptual knowledge into temporal analysis tasks via temporal concept graph networks. In the proposed networks, a global knowledge graph is incorporated into the temporal analysis of surgical instances, learning the meaning of concepts and relations as they apply to the data. We demonstrate our results in surgical video data for tasks such as verification of critical view of safety, as well as estimation of Parkland grading scale. The results show that our method improves the recognition and detection of complex benchmarks as well as enables other analytic applications of interest.

翻訳日:2023-04-27 18:44:58 公開日:2023-04-25

# 高速で高精度な圧縮圧縮ビデオ品質向上のためのビットストリームメタデータの活用

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement ( http://arxiv.org/abs/2202.00011v2 )

ライセンス: Link先を確認

Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

(参考訳) ビデオ圧縮は、ソーシャルメディアからビデオ会議まで、現代のインターネットを支える技術の中心的な特徴である。ビデオ圧縮は成熟を続けていますが、多くの圧縮設定では品質の低下が顕著です。これらの設定は、帯域制限や不安定な接続による効率的な動画伝送に重要な応用をもたらす。本研究では,ビデオビットストリームに埋め込まれた構造と動作情報を活用する圧縮ビデオに詳細を復元する深層学習アーキテクチャを開発した。その結果,従来の圧縮補正法と比較して復元精度が向上し,高スループットを実現しつつ,近年のディープラーニングビデオ圧縮法と比較した場合の競合性が示された。さらに、ビットストリームで容易に利用できる量子化データに対して、我々のモデルを条件付けする。これにより、1つのモデルでさまざまな圧縮品質の設定を処理でき、事前作業で複数のモデルが必要になります。

Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.

翻訳日:2023-04-27 18:44:24 公開日:2023-04-25

# モデル変換によるフレキシブル微分可能最適化

Flexible Differentiable Optimization via Model Transformations ( http://arxiv.org/abs/2206.06135v2 )

ライセンス: Link先を確認

Akshay Sharma and Mathieu Besan\c{c}on and Joaquim Dias Garcia and Beno\^it Legat

(参考訳) DiffOpt.jlは、目的および/または制約に存在する任意のパラメータに対する最適化問題の解を通じて区別する、Juliaライブラリである。このライブラリはMathOptInterface上に構築されており、解決者の豊富なエコシステムを活用し、JuMPのようなモデリング言語とうまく連携している。 diffoptは前方微分モードと逆微分モードの両方を提供し、ハイパーパラメータ最適化からバックプロパゲーションや感度分析まで、エンドツーエンドの微分可能プログラミングで制約付き最適化を橋渡しすることができる。 diffopt は二次プログラミングとコニックプログラミングの標準形式を区別するための2つの既知のルールに基づいている。しかし、モデル変換によって区別できる機能のおかげで、ユーザはこれらの形式に限定されず、これらの標準形式に再構成できるモデルのパラメータに関して区別することができる。これは特に、アフィンコニック制約と凸2次制約または客観的関数を混合するプログラムを含む。

We introduce DiffOpt.jl, a Julia library to differentiate through the solution of optimization problems with respect to arbitrary parameters present in the objective and/or constraints. The library builds upon MathOptInterface, thus leveraging the rich ecosystem of solvers and composing well with modeling languages like JuMP. DiffOpt offers both forward and reverse differentiation modes, enabling multiple use cases from hyperparameter optimization to backpropagation and sensitivity analysis, bridging constrained optimization with end-to-end differentiable programming. DiffOpt is built on two known rules for differentiating quadratic programming and conic programming standard forms. However, thanks ability to differentiate through model transformation, the user is not limited to these forms and can differentiate with respect to the parameters of any model that can be reformulated into these standard forms. This notably includes programs mixing affine conic constraints and convex quadratic constraints or objective function.

翻訳日:2023-04-27 18:36:39 公開日:2023-04-25

# オープンアクセスパブリッシングと関連する要因は何か? springer natureのケーススタディ

Which Factors are associated with Open Access Publishing? A Springer Nature Case Study ( http://arxiv.org/abs/2208.08221v4 )

ライセンス: Link先を確認

Fakhri Momeni, Stefan Dietze, Philipp Mayr, Kristin Biesenbender and Isabella Peters

(参考訳) Open Access (OA)は、記事へのアクセスを容易にする。しかし、著者や資金提供者は、OAの出版に金銭的支援を受けていない著者がOAの記事の引用に関わらないよう、出版費用を支払わなければならないことが多い。 OAは、出版システムにおける既存の不平等を克服するよりも、さらに悪化させる可能性がある。そこで,Springer Natureに掲載された522,411の論文を調査した。相関分析と回帰分析を用いて、異なる所得水準の国に属する著者間の関係、出版モデルの選択、論文の引用効果について述べる。機械学習の分類手法は,出版モデルの予測における特徴の重要性を検討するのに役立った。以上の結果から, APC ウェイバーの著者はゴールドOA 誌に他よりも多く掲載している。対照的に、APC割引を受ける著者は、OA出版物の中で最も低い割合であり、この割引が著者にゴールドOA雑誌に掲載する動機を不十分にしていると仮定する。 oaオプションはハイブリッドジャーナルでは避けられているが,gold-oaジャーナルではジャーナルランクと出版モデルとの間に強い相関関係がみられた。また,OA出版の収入レベル,年長性,経験が,ハイブリッド雑誌におけるOA出版の予測因子であることも示唆した。

Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate this, we studied 522,411 articles published by Springer Nature. Employing correlation and regression analyses, we describe the relationship between authors affiliated with countries from different income levels, their choice of publishing model, and the citation impact of their papers. A machine learning classification method helped us to explore the importance of different features in predicting the publishing model. The results show that authors eligible for APC waivers publish more in gold-OA journals than others. In contrast, authors eligible for an APC discount have the lowest ratio of OA publications, leading to the assumption that this discount insufficiently motivates authors to publish in gold-OA journals. We found a strong correlation between the journal rank and the publishing model in gold-OA journals, whereas the OA option is mostly avoided in hybrid journals. Also, results show that the countries' income level, seniority, and experience with OA publications are the most predictive factors for OA publishing in hybrid journals.

翻訳日:2023-04-27 18:27:58 公開日:2023-04-25

# BSMS-GNNを用いたメッシュ型物理シミュレーションの効率的学習

Efficient Learning of Mesh-Based Physical Simulation with BSMS-GNN ( http://arxiv.org/abs/2210.02573v3 )

ライセンス: Link先を確認

Yadi Cao, Menglei Chai, Minchen Li, Chenfanfu Jiang

(参考訳) フラットなグラフニューラルネットワーク(GNN)とスタックングメッセージパッシング(MP)による大規模メッシュ上での物理シミュレーションの学習は,ノード数や過度なスムース化といったスケーリングの複雑さのために難しい。物理シミュレーションのための GNN に \textit{multi-scale} 構造を導入することに対するコミュニティの関心が高まっている。しかしながら、現在の最先端の手法は、粗いメッシュの労働集約的な描画に依存するか、空間的近接に基づいて粗いレベルを構築するかによって制限される。 2成分グラフ決定に触発されて,上記の制限に取り組むために,新たなプーリング戦略である \textit{bi-stride} を提案する。バイストライドは、粗いメッシュの手動描画を必要とせず、空間的近接により間違ったエッジを避けることなく、ブロードスファーストサーチ(BFS)の他のフロンティアにノードをプールする。さらに、レベル毎の1MPスキームと非パラメトリズドプールと補間によるアンプールを可能にし、計算コストを大幅に削減するU-Netsに似ている。実験の結果,提案するフレームワークである‘textit{BSMS-GNN} は,物理シミュレーションの精度と計算効率の両面で,既存の手法よりも優れていた。

Learning the physical simulation on large-scale meshes with flat Graph Neural Networks (GNNs) and stacking Message Passings (MPs) is challenging due to the scaling complexity w.r.t. the number of nodes and over-smoothing. There has been growing interest in the community to introduce \textit{multi-scale} structures to GNNs for physical simulation. However, current state-of-the-art methods are limited by their reliance on the labor-intensive drawing of coarser meshes or building coarser levels based on spatial proximity, which can introduce wrong edges across geometry boundaries. Inspired by the bipartite graph determination, we propose a novel pooling strategy, \textit{bi-stride} to tackle the aforementioned limitations. Bi-stride pools nodes on every other frontier of the breadth-first search (BFS), without the need for the manual drawing of coarser meshes and avoiding the wrong edges by spatial proximity. Additionally, it enables a one-MP scheme per level and non-parametrized pooling and unpooling by interpolations, resembling U-Nets, which significantly reduces computational costs. Experiments show that the proposed framework, \textit{BSMS-GNN}, significantly outperforms existing methods in terms of both accuracy and computational efficiency in representative physical simulations.

翻訳日:2023-04-27 18:16:55 公開日:2023-04-25

# CRONOS:Wi-Fi CSIを用いたデバイスフリーNLoS人間プレゼンス検出のためのカラー化とコントラスト学習

CRONOS: Colorization and Contrastive Learning for Device-Free NLoS Human Presence Detection using Wi-Fi CSI ( http://arxiv.org/abs/2211.10354v2 )

ライセンス: Link先を確認

Chia-Che Hsieh, An-Hung Hsiao, Li-Hsiang Shen, Kai-Ten Feng

(参考訳) 近年、広く普及するスマートサービスやアプリケーションに対する需要は急速に増加している。センサーやカメラによるデバイスなしの人間検出は広く採用されているが、プライバシーの問題や、動きのない人の誤検知が伴っている。これらの欠点に対処するため、商用Wi-Fiデバイスから取得したチャネル状態情報(CSI)は、正確な検出のための豊富な信号機能を提供する。しかしながら、既存のシステムは、非視線(NLoS)の下での不正確な分類と、部屋の隅に立っているときのような固定的なシナリオに悩まされている。本研究では,動的な再帰プロット(rps)を生成するcronos(colorization and contrastive learning enhanced nlos human presence detection)と呼ばれるシステムを提案する。また、教師付きコントラスト学習を取り入れて実質的な表現を抽出し、コンサルテーション損失を定式化し、動的ケースと定常ケースの代表的な距離を区別する。さらに,rssとカラーコードcsi比のどちらを利用するかを決定するために,自己切り替え型静的特徴拡張分類器(s3fec)を提案する。包括的実験の結果,cronosは,機械学習や非学習型手法,オープン文学における非csi型機能などを適用した既存システムよりも優れていた。 CRONOSは、空白、移動性、視線(LoS)、NLoSシナリオにおいて、最も高い存在検出精度を達成する。

In recent years, the demand for pervasive smart services and applications has increased rapidly. Device-free human detection through sensors or cameras has been widely adopted, but it comes with privacy issues as well as misdetection for motionless people. To address these drawbacks, channel state information (CSI) captured from commercialized Wi-Fi devices provides rich signal features for accurate detection. However, existing systems suffer from inaccurate classification under a non-line-of-sight (NLoS) and stationary scenario, such as when a person is standing still in a room corner. In this work, we propose a system called CRONOS (Colorization and Contrastive Learning Enhanced NLoS Human Presence Detection), which generates dynamic recurrence plots (RPs) and color-coded CSI ratios to distinguish mobile people from vacancy in a room, respectively. We also incorporate supervised contrastive learning to retrieve substantial representations, where consultation loss is formulated to differentiate the representative distances between dynamic and stationary cases. Furthermore, we propose a self-switched static feature enhanced classifier (S3FEC) to determine the utilization of either RPs or color-coded CSI ratios. Our comprehensive experimental results show that CRONOS outperforms existing systems that apply machine learning, non-learning based methods, as well as non-CSI based features in open literature. CRONOS achieves the highest presence detection accuracy in vacancy, mobility, line-of-sight (LoS), and NLoS scenarios.

翻訳日:2023-04-27 18:09:42 公開日:2023-04-25

# 宇宙デコヒーレンス : 原始パワースペクトルと非ガウス性

Cosmic decoherence: primordial power spectra and non-Gaussianities ( http://arxiv.org/abs/2211.07598v2 )

ライセンス: Link先を確認

Aoumeur Daddi Hammou, Nicola Bartolo

(参考訳) 量子デコヒーレンスがインフレーション宇宙論的摂動に与える影響について検討する。このプロセスは、インフレーションのメカニズムの量子的性質が、インフレーションの変動の量子-古典的遷移の長年の問題と関連していることを示す特定の観察的なサインを印字するかもしれない。いくつかの研究は、原始変動の統計的性質に対する量子デコヒーレンスの影響を調査している。特に、宇宙デコヒーレンスが標準のスローロールインフレーションによって予測される曲率パワースペクトルの補正につながることが示されている。同様に、非ゼロ曲率トリスペクトラムは宇宙デコヒーレンスによって純粋に誘導されることが示されているが、驚くべきことにデコヒーレンスはバイスペクトルを発生しないようである。さらに, ポインターオブザーバブルの一般化形式を採用し, 非消滅曲率双スペクトルをデコヒーレンスが引き起こすことを示し, 具体的な具体的な物理プロセスを提供することにより, 解析をさらに発展させる。原始双スペクトルに関する現在の制約は、環境-システム相互作用の強さに上限を置くことができる。完全な一般性において、デコヒーレンス誘起双スペクトルはスケール依存であり、スケール独立となるパワースペクトルに対応する補正を課す。このような宇宙スケールへのスケール依存は、インフレーション中に起こる量子デコヒーレンス過程の顕著なインプリントを表しているかもしれない。また,宇宙デコヒーレンスが環境の種類とは無関係にスケール独立な補正を誘導する過程を理解するための基準を提供する。最後に,宇宙デコヒーレンスがテンソル摂動に及ぼす影響を考察し,デコヒーレンス補正したテンソル-スカラー摂動比を導出する。特定の場合、デコヒーレンスは標準テンソルパワースペクトルに青い傾いた補正を誘導する。

We study the effect of quantum decoherence on the inflationary cosmological perturbations. This process might imprint specific observational signatures revealing the quantum nature of the inflationary mechanism being related to the longstanding issue of the quantum-to-classical transition of inflationary fluctuations. Several works have investigated the effect of quantum decoherence on the statistical properties of primordial fluctuations. In particular, it has been shown that cosmic decoherence leads to corrections to the curvature power spectrum predicted by standard slow-roll inflation. Equally interesting, a non zero curvature trispectrum has been shown to be purely induced by cosmic decoherence, but surprisingly, decoherence seems not to generate any bispectrum. We further develop such an analysis by adopting a generalized form of the pointer observable, showing that decoherence does induce a non vanishing curvature bispectrum and providing a specific underlying concrete physical process. Present constraints on primordial bispectra allow to put an upper bound on the strength of the environment-system interaction. In full generality, the decoherence-induced bispectrum can be scale dependent provided one imposes the corresponding correction to the power spectrum to be scale independent. Such scale dependence on the largest cosmological scales might represent a distinctive imprint of the quantum decoherence process taking place during inflation. We also provide a criterion that allows to understand when cosmic decoherence induces scale independent corrections, independently of the type of environment considered. As a final result, we study the effect of cosmic decoherence on tensor perturbations and we derive the decoherence corrected tensor-to-scalar perturbation ratio. In specific cases, decoherence induces a blue tilted correction to the standard tensor power spectrum.

翻訳日:2023-04-27 18:08:56 公開日:2023-04-25

# FingerFlex:ECoG信号から指の軌道を推定する

FingerFlex: Inferring Finger Trajectories from ECoG signals ( http://arxiv.org/abs/2211.01960v2 )

ライセンス: Link先を確認

Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko

(参考訳) 運動脳コンピュータインタフェース(BCI)の開発は、ニューラルネットワークの時系列復号アルゴリズムに大きく依存している。ディープラーニングアーキテクチャの最近の進歩により、データ内の高次依存性を近似する自動機能選択が可能になった。本稿では,脳波(ECoG)データに対する指の動き回帰に適応した畳み込みエンコーダデコーダアーキテクチャであるFingerFlexモデルについて述べる。実測軌道と予測軌道の相関係数が最大0.74であるBCIコンペティションIVデータセット4で最先端の性能が達成された。提案手法は,完全機能型高精度皮質運動脳-コンピュータインタフェースを開発する機会を提供する。

Motor brain-computer interface (BCI) development relies critically on neural time series decoding algorithms. Recent advances in deep learning architectures allow for automatic feature selection to approximate higher-order dependencies in data. This article presents the FingerFlex model - a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. State-of-the-art performance was achieved on a publicly available BCI competition IV dataset 4 with a correlation coefficient between true and predicted trajectories up to 0.74. The presented method provides the opportunity for developing fully-functional high-precision cortical motor brain-computer interfaces.

翻訳日:2023-04-27 18:06:53 公開日:2023-04-25

# BTS:時間変化CSIによる屋内二室状態検出のための半監督学習における教師の2倍の学習

BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSI ( http://arxiv.org/abs/2212.10802v2 )

ライセンス: Link先を確認

Li-Hsiang Shen, Kai-Jui Chen, An-Hung Hsiao, Kai-Ten Feng

(参考訳) 近年,教師付き学習(SL)とチャネル状態情報(CSI)に基づく屋内人間の存在検知が注目されている。しかし、csiの空間情報に依存する既存の研究は、予測精度を低下させる物体移動、大気要因、機械の再起動などの環境変化に影響を受けやすい。さらに、SLベースの手法では、モデルの再トレーニングに時間を要する。したがって、半教師付き学習方式(SSL)を用いて、継続的に監視されるモデルライフサイクルを設計することが不可欠である。本稿では,SSLとラベル付けされていないデータセットを併用した存在検出システムに対して,BTS学習手法を提案する。提案する教師学習ネットワークは,ラベル付きcsiとラベル付きcsiから空間的・時間的特徴をインテリジェントに学習する。さらに、強化されたペナル化損失関数はエントロピーと距離の計測を利用して、漂流したデータ、すなわち時間変化の影響を受け、元の分布から変化した新しいデータセットの特徴を区別する。実験の結果,BTSシステムはラベルのないデータでモデルを再訓練した後,漸近的精度を保っていることがわかった。さらに、ラベルのないBTSは、SLベースの手法の漸近性能を達成しつつ、最大検出精度で既存のSSLベースのモデルより優れている。

In recent years, indoor human presence detection based on supervised learning (SL) and channel state information (CSI) has attracted much attention. However, the existing studies that rely on spatial information of CSI are susceptible to environmental changes, such as object movement, atmospheric factors, and machine rebooting, which degrade prediction accuracy. Moreover, SL-based methods require time-consuming labeling for retraining models. Therefore, it is imperative to design a continuously monitored model life-cycle using a semi-supervised learning (SSL) based scheme. In this paper, we conceive a bifold teacher-student (BTS) learning approach for presence detection systems that combines SSL by utilizing partially labeled and unlabeled datasets. The proposed primal-dual teacher-student network intelligently learns spatial and temporal features from labeled and unlabeled CSI. Additionally, the enhanced penalized loss function leverages entropy and distance measures to distinguish drifted data, i.e., features of new datasets affected by time-varying effects and altered from the original distribution. The experimental results demonstrate that the proposed BTS system sustains asymptotic accuracy after retraining the model with unlabeled data. Furthermore, the label-free BTS outperforms existing SSL-based models in terms of the highest detection accuracy while achieving the asymptotic performance of SL-based methods.

翻訳日:2023-04-27 17:59:35 公開日:2023-04-25

# 機械学習における公正性と構成の理解に向けて

Towards Understanding Fairness and its Composition in Ensemble Machine Learning ( http://arxiv.org/abs/2212.04593v2 )

ライセンス: Link先を確認

Usman Gohar, Sumon Biswas, Hridesh Rajan

(参考訳) 機械学習(ML)ソフトウェアは現代社会において広く採用されており、人種、性別、年齢などに基づく少数派グループに公正な影響が報告されている。近年,MLモデルのアルゴリズムバイアスを計測・緩和する手法が提案されている。既存のアプローチでは、単一分類器ベースのMLモデルに重点を置いている。しかし、現実のMLモデルは複数の独立した学習者(例えばランダムフォレスト)で構成され、フェアネスは非自明な方法で構成される。アンサンブルの公平さはどのように構成されますか。アンサンブルの究極の公平性に対する学習者の公平性の影響はどのようなものか? 公平な学習者は不公平なアンサンブルを生み出すことができるか? さらに、ハイパーパラメータがMLモデルの公平性に影響を与えることが研究によって示されている。アンサンブルハイパーパラメータは、学習者が異なるカテゴリのアンサンブルでどのように結合されるかに影響するため、より複雑である。アンサンブルハイパーパラメータがフェアネスに与える影響を理解することは、プログラマがフェアアンサンブルを設計するのに役立つ。今日では、これらを異なるアンサンブルアルゴリズムについて完全には理解していない。本稿では,バッキング,ブースティング,積み重ね,投票など,現実世界で人気のあるアンサンブルを包括的に研究する。我々は,4つの人気フェアネスデータセットを用いて,Kaggleから収集した168アンサンブルモデルのベンチマークを開発した。私たちはフェアネスの構成を理解するために既存のフェアネスメトリクスを使用します。その結果,アンサンブルは緩和技術を用いることなく,より公平に設計できることがわかった。また,フェアネス構成とデータ特性との相互作用を識別し,フェアアンサンブル設計を導く。最後に、我々のベンチマークはフェアアンサンブルのさらなる研究に活用できる。私たちの知る限りでは、これはまだ文献で提示されていないアンサンブルにおける公正な構成に関する最初のかつ最大の研究の1つである。

Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in different categories of ensembles. Understanding the impact of ensemble hyperparameters on fairness will help programmers design fair ensembles. Today, we do not understand these fully for different ensemble algorithms. In this paper, we comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design. Finally, our benchmark can be leveraged for further research on fair ensembles. To the best of our knowledge, this is one of the first and largest studies on fairness composition in ensembles yet presented in the literature.

翻訳日:2023-04-27 17:58:45 公開日:2023-04-25

# ニューラルフーリエフィルタバンク

Neural Fourier Filter Bank ( http://arxiv.org/abs/2212.01735v2 )

ライセンス: Link先を確認

Zhijie Wu and Yuhe Jin and Kwang Moo Yi

(参考訳) 本稿では, 効率的かつ高精度な再構築手法を提案する。ウェーブレットに触発されて、信号が空間的にも周波数的にも分解されるニューラルフィールドを学習する。空間分解のための最近のグリッドベースのパラダイムに従っているが、既存の作業とは異なり、フーリエ特徴エンコーディングを通じて各グリッドに特定の周波数を格納することを推奨している。次に、正の活性化を持つ多層パーセプトロンを適用し、これらフーリエエンコードされた特徴を適切な層に配置することで、高周波数成分を低周波成分の上に順次蓄積し、最終的な出力を形成する。本手法は,2次元画像整合,3次元形状再構成,神経放射場など,複数のタスクにおけるモデルコンパクト性と収束速度に関する技術よりも優れていることを示す。私たちのコードはhttps://github.com/ubc-vision/nffbで利用可能です。

We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, we learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and convergence speed on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields. Our code is available at https://github.com/ubc-vision/NFFB.

翻訳日:2023-04-27 17:58:05 公開日:2023-04-25

# ランク付きQRアルゴリズムを用いた貯水池計算のための時間シフト選択

Time-shift selection for reservoir computing using a rank-revealing QR algorithm ( http://arxiv.org/abs/2211.17095v3 )

ライセンス: Link先を確認

Joseph D. Hart and Francesco Sorrentino and Thomas L. Carroll

(参考訳) 出力層のみをトレーニングしたリカレントニューラルネットワークパラダイムであるReservoir Computingは、非線形システムの予測や制御といったタスクにおいて、顕著なパフォーマンスを示している。近年,貯水池で発生した信号に時間シフトを加えることで,性能が向上することが実証された。そこで,本研究では,位取りQRアルゴリズムを用いて,貯水池行列のランクを最大化する手法を提案する。この技術はタスク依存ではなく、システムのモデルを必要としないため、アナログハードウェア貯水池コンピュータに直接適用することができる。我々は,光電子発振器に基づく2種類のリザーバコンピュータと,$tanh$アクティベーション関数を持つ従来のリカレントネットワークを用いた時間シフト選択手法を示す。この手法は,ランダムな時間シフト選択よりも,ほぼすべてのケースにおいて精度が向上することを見出した。

Reservoir computing, a recurrent neural network paradigm in which only the output layer is trained, has demonstrated remarkable performance on tasks such as prediction and control of nonlinear systems. Recently, it was demonstrated that adding time-shifts to the signals generated by a reservoir can provide large improvements in performance accuracy. In this work, we present a technique to choose the time-shifts by maximizing the rank of the reservoir matrix using a rank-revealing QR algorithm. This technique, which is not task dependent, does not require a model of the system, and therefore is directly applicable to analog hardware reservoir computers. We demonstrate our time-shift selection technique on two types of reservoir computer: one based on an opto-electronic oscillator and the traditional recurrent network with a $tanh$ activation function. We find that our technique provides improved accuracy over random time-shift selection in essentially all cases.

翻訳日:2023-04-27 17:57:28 公開日:2023-04-25

# ボトル内の言語:解釈可能な画像分類のための言語モデルガイド型概念ボトルネック

Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification ( http://arxiv.org/abs/2211.11158v2 )

ライセンス: Link先を確認

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, Mark Yatskar

(参考訳) 概念ボトルネックモデル(cbm)は本質的に解釈可能なモデルであり、モデル決定を人間の可読概念に分解する。これにより、モデルが失敗した理由を簡単に理解できるようになる。 CBMは手動で指定した概念を必要とし、しばしばブラックボックスの能力に劣る。まず,ブラックボックスモデルと同様の精度を手作業で指定することなく,高性能なcbmを構築する方法を示す。当社のアプローチであるlanguage guided bottlenecks(labo)は、言語モデルgpt-3を活用して、可能なボトルネックの大きな空間を定義します。問題領域が与えられた場合、LaBoはGPT-3を使用してカテゴリに関する事実文を生成し、候補概念を形成する。 laboは、識別的かつ多様な情報の選択を促進する新しいサブモジュラーユーティリティを通じて、可能なボトルネックを効率的に検索する。最終的に、GPT-3の知覚概念は、CLIPを使用して画像に整列してボトルネック層を形成することができる。実験により、LaBoは視覚認識にとって重要な概念の非常に効果的な事前であることが示された。 11の多様なデータセットによる評価では、LaBoボトルネックは数ショットの分類で優れており、1ショットでのブラックボックス線形プローブよりも11.7%正確で、より多くのデータに匹敵する。全体として、LaBoはブラックボックスアプローチよりも、本質的に解釈可能なモデルが、同じような、あるいはより良いパフォーマンスで広く適用可能であることを示した。

Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and are first to show how to construct high-performance CBMs without manual specification of similar accuracy to black box models. Our approach, Language Guided Bottlenecks (LaBo), leverages a language model, GPT-3, to define a large space of possible bottlenecks. Given a problem domain, LaBo uses GPT-3 to produce factual sentences about categories to form candidate concepts. LaBo efficiently searches possible bottlenecks through a novel submodular utility that promotes the selection of discriminative and diverse information. Ultimately, GPT-3's sentential concepts can be aligned to images using CLIP, to form a bottleneck layer. Experiments demonstrate that LaBo is a highly effective prior for concepts important to visual recognition. In the evaluation with 11 diverse datasets, LaBo bottlenecks excel at few-shot classification: they are 11.7% more accurate than black box linear probes at 1 shot and comparable with more data. Overall, LaBo demonstrates that inherently interpretable models can be widely applied at similar, or better, performance than black box approaches.

翻訳日:2023-04-27 17:56:38 公開日:2023-04-25

# 魚眼画像の空間的統合に基づく人物再同定

Spatio-Visual Fusion-Based Person Re-Identification for Overhead Fisheye Images ( http://arxiv.org/abs/2212.11477v2 )

ライセンス: Link先を確認

Mertcan Cokbas, Prakash Ishwar, Janusz Konrad

(参考訳) パーソナライズ再識別(prid)は、様々なシーンをサイドマウントの直線レンズカメラで監視する典型的な監視シナリオで徹底的に研究されている。これまで魚眼カメラを頭上に搭載する手法は提案されておらず、性能に乏しい。この性能ギャップを解消するために,魚眼PRIDのための多機能フレームワークを提案する。魚眼PRIDデータセットであるFRIDAを用いた各種特徴組合せのためのフレームワークの性能評価を行った。提案手法は,近年の外観に基づくディープラーニング手法を約18%,位置ベース手法を約3%,マッチング精度を約3%向上させた。また,提案するpridフレームワークを,屋内の大規模密集した空間で数える人々に適用する可能性を示す。

Person re-identification (PRID) has been thoroughly researched in typical surveillance scenarios where various scenes are monitored by side-mounted, rectilinear-lens cameras. To date, few methods have been proposed for fisheye cameras mounted overhead and their performance is lacking. In order to close this performance gap, we propose a multi-feature framework for fisheye PRID where we combine deep-learning, color-based and location-based features by means of novel feature fusion. We evaluate the performance of our framework for various feature combinations on FRIDA, a public fisheye PRID dataset. The results demonstrate that our multi-feature approach outperforms recent appearance-based deep-learning methods by almost 18% points and location-based methods by almost 3% points in matching accuracy. We also demonstrate the potential application of the proposed PRID framework to people counting in large, crowded indoor spaces.

翻訳日:2023-04-27 17:47:34 公開日:2023-04-25

# サブガウス分布の高速, サンプル効率, アフィン不変プライベート平均と共分散推定

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions ( http://arxiv.org/abs/2301.12250v2 )

ライセンス: Link先を確認

Gavin Brown, Samuel B. Hopkins and Adam Smith

(参考訳) ほぼ最適なサンプル複雑性を持つ高次元共分散平均推定のための高速かつ微分プライベートなアルゴリズムを提案する。この保証を達成するのは指数時間推定器のみであった。未知の平均$\mu$ と共分散 $\sigma$ から$n$のサンプルが与えられると、我々の$(\varepsilon,\delta)$ は$\tilde{\mu}$を生成し、$n \gtrsim \tfrac d {\alpha^2} + \tfrac{d \sqrt{\log 1/\delta}}{\alpha \varepsilon}+\frac{d\log 1/\delta}{\varepsilon}$となる。 mahalanobis error metric $\|\mu - \hat{\mu}\|_{\sigma}$は、$\hat \mu$ と$\mu$ の間の距離を測定し、サンプル平均の誤差を特徴付ける。我々のアルゴリズムは時間$\tilde{O}(nd^{\omega - 1} + nd/\varepsilon)$で動き、$\omega < 2.38$は行列乗算指数である。 brown, gaboardi, smith, ullman, zakynthinou (2021) の指数時間アプローチを適用し,安定平均と共分散推定サブルーチンの効率的な変種を与え,サンプルの複雑さを上述の最適境界まで向上させた。安定共分散推定器は非制限部分ガウス分布のプライベート共分散推定に変換できる。 n\gtrsim d^{3/2}$サンプルでは、スペクトルノルムで推定が正確である。これは$n= o(d^2)$ サンプルを用いた最初のそのようなアルゴリズムであり、alabiら (2022) が提起した解答である。 n\gtrsim d^2$サンプルでは、この推定はフロベニウスノルムで正確である。これにより、テレビ距離における非制限ガウス分布のプライベート学習のための高速でほぼ最適なアルゴリズムが導かれる。 duchi, haque, kuditipudi (2023)も同様の結果が独立して得られた。

We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given $n$ samples from a (sub-)Gaussian distribution with unknown mean $\mu$ and covariance $\Sigma$, our $(\varepsilon,\delta)$-differentially private estimator produces $\tilde{\mu}$ such that $\|\mu - \tilde{\mu}\|_{\Sigma} \leq \alpha$ as long as $n \gtrsim \tfrac d {\alpha^2} + \tfrac{d \sqrt{\log 1/\delta}}{\alpha \varepsilon}+\frac{d\log 1/\delta}{\varepsilon}$. The Mahalanobis error metric $\|\mu - \hat{\mu}\|_{\Sigma}$ measures the distance between $\hat \mu$ and $\mu$ relative to $\Sigma$; it characterizes the error of the sample mean. Our algorithm runs in time $\tilde{O}(nd^{\omega - 1} + nd/\varepsilon)$, where $\omega < 2.38$ is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With $n\gtrsim d^{3/2}$ samples, our estimate is accurate in spectral norm. This is the first such algorithm using $n= o(d^2)$ samples, answering an open question posed by Alabi et al. (2022). With $n\gtrsim d^2$ samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.

翻訳日:2023-04-27 17:40:07 公開日:2023-04-25

# 非線型対流拡散吸着系の効率的なハイブリッドモデリングと吸着モデル発見--系統的科学的機械学習アプローチ

Efficient hybrid modeling and sorption model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach ( http://arxiv.org/abs/2303.13555v3 )

ライセンス: Link先を確認

Vinicius V. Santana, Erbet Costa, Carine M. Rebello, Ana Mafalda Ribeiro, Chris Rackauckas, Idelfonso B. R. Nogueira

(参考訳) 本研究では,非線型対流拡散吸着系における効率的なハイブリッドモデルの作成と吸着取り込みモデル発見のための機械学習手法を提案する。これは、勾配に基づく最適化器、随伴感度解析、JITコンパイルベクタージャコビアン積を用いて、空間離散化と適応積分器を組み合わせたこれらの複雑なシステムを効果的に訓練する方法を示す。ニューラルネットワークの欠落する機能を特定するためにスパースとシンボリックレグレッションが用いられた。提案手法のロバスト性は, 固定層吸着のノイズ破砕曲線観測のシリカ内データセット上で試験され, 良好なハイブリッドモデルが得られた。本研究は, 偏差とシンボリック回帰を用いて吸収吸収速度論を再構成し, 同定多項式を用いたブレークスルー曲線を精度良く予測し, 吸着運動法則構造の発見のためのフレームワークの可能性を強調した。

This study presents a systematic machine learning approach for creating efficient hybrid models and discovering sorption uptake models in non-linear advection-diffusion-sorption systems. It demonstrates an effective method to train these complex systems using gradient based optimizers, adjoint sensitivity analysis, and JIT-compiled vector Jacobian products, combined with spatial discretization and adaptive integrators. Sparse and symbolic regression were employed to identify missing functions in the artificial neural network. The robustness of the proposed method was tested on an in-silico data set of noisy breakthrough curve observations of fixed-bed adsorption, resulting in a well-fitted hybrid model. The study successfully reconstructed sorption uptake kinetics using sparse and symbolic regression, and accurately predicted breakthrough curves using identified polynomials, highlighting the potential of the proposed framework for discovering sorption kinetic law structures.

翻訳日:2023-04-27 17:31:47 公開日:2023-04-25

# 規制市場:AIガバナンスの未来

Regulatory Markets: The Future of AI Governance ( http://arxiv.org/abs/2304.04914v4 )

ライセンス: Link先を確認

Gillian K. Hadfield, Jack Clark

(参考訳) 人工知能を適切に規制することは、ますます緊急の政策課題である。立法府や規制当局は、公共の要求を法的要件に最善に翻訳するために必要な専門知識を欠いている。産業の自己規制への過度な依存は、民主的要求に責任を負うAIシステムの生産者とユーザを保持することに失敗する。民間規制当局から規制サービスを購入するための規制対象を政府が求める規制市場が提案されている。 ai規制に対するこのアプローチは、指揮統制規制と自己規制の両方の限界を克服する可能性がある。規制市場は、政策立案者の指示された目的を最も達成するための規制方法を開拓する市場力と産業R&Dの努力に頼りながら、AI規制のための政策優先順位を確立することができる。

Appropriately regulating artificial intelligence is an increasingly urgent policy challenge. Legislatures and regulators lack the specialized knowledge required to best translate public demands into legal requirements. Overreliance on industry self-regulation fails to hold producers and users of AI systems accountable to democratic demands. Regulatory markets, in which governments require the targets of regulation to purchase regulatory services from a private regulator, are proposed. This approach to AI regulation could overcome the limitations of both command-and-control regulation and self-regulation. Regulatory market could enable governments to establish policy priorities for the regulation of AI, whilst relying on market forces and industry R&D efforts to pioneer the methods of regulation that best achieve policymakers' stated objectives.

翻訳日:2023-04-27 17:21:54 公開日:2023-04-25

# 木構造Parzen推定器:アルゴリズム成分の理解と実験性能向上のための役割

Tree-structured Parzen estimator: Understanding its algorithm components and their roles for better empirical performance ( http://arxiv.org/abs/2304.11127v2 )

ライセンス: Link先を確認

Shuhei Watanabe

(参考訳) 多くの領域における最近の進歩は、より複雑な実験設計を必要とする。このような複雑な実験は、しばしばパラメータチューニングを必要とする多くのパラメータを持つ。ベイズ最適化手法であるTPE(Tree-structured Parzen estimator)は,最近のパラメータチューニングフレームワークで広く利用されている。その人気にもかかわらず、制御パラメータとアルゴリズム直観の役割については議論されていない。本チュートリアルでは,多種多様なベンチマークを用いて,各制御パラメータの役割とハイパーパラメータ最適化への影響を明らかにする。アブレーション研究から得られた推奨設定とベースライン手法を比較し,提案設定がTPEの性能を向上させることを示す。 tpeの実装はhttps://github.com/nabenabe0928/tpe/tree/single-optで利用可能です。

Recent advances in many domains require more and more complicated experiment design. Such complicated experiments often have many parameters, which necessitate parameter tuning. Tree-structured Parzen estimator (TPE), a Bayesian optimization method, is widely used in recent parameter tuning frameworks. Despite its popularity, the roles of each control parameter and the algorithm intuition have not been discussed so far. In this tutorial, we will identify the roles of each control parameter and their impacts on hyperparameter optimization using a diverse set of benchmarks. We compare our recommended setting drawn from the ablation study with baseline methods and demonstrate that our recommended setting improves the performance of TPE. Our TPE implementation is available at https://github.com/nabenabe0928/tpe/tree/single-opt.

翻訳日:2023-04-27 17:02:23 公開日:2023-04-25

# 医用画像解析のためのsegment anythingモデル--実験的検討

Segment Anything Model for Medical Image Analysis: an Experimental Study ( http://arxiv.org/abs/2304.10517v2 )

ライセンス: Link先を確認

Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang

(参考訳) 医用画像のセグメンテーションモデルは、データアノテーションの可用性と取得費用が限られているため、いまだに困難である。 Segment Anything Model (SAM)は10億以上のアノテーションに基づいてトレーニングされた基礎モデルであり、主に自然画像を対象としており、ユーザ定義の関心対象をインタラクティブな方法でセグメント化することを目的としている。自然画像における印象的な性能にもかかわらず、医療画像領域に移行する際にモデルがどのように影響を受けるかは不明だ。本稿では,様々な形態や解剖から11の医用画像データセットを収集し,samの医療画像のセグメント化能力について広範な評価を行った。実験では,対話的セグメンテーションをシミュレートする標準手法を用いて点プロンプトを生成した。実験の結果,1回のプロンプトに基づくSAMのパフォーマンスは,脊椎MRIデータセットの0.1135から股関節X線データセットの0.8650まで,タスクやデータセットによって大きく異なることがわかった。腫瘍のセグメンテーションのような他の多くのシナリオでは、不明瞭なプロンプトと貧弱なプロンプトを持つ、よく知られたオブジェクトを含むタスクのパフォーマンスは高いように見える。複数のプロンプトが提供されると、パフォーマンスがわずかに改善されるだけでなく、オブジェクトが連続していないデータセットも改善される。 RITMと比較すると、SAMは1つのプロンプトに対してより優れた性能を示したが、2つのメソッドの同様の性能はより多くのプロンプトに対して高い性能を示した。ゼロショット学習のセットアップでは、samはいくつかのデータセットで印象的なパフォーマンスを示すが、他のデータセットではパフォーマンスが低かった。 SAMは、モデルとして、そして学習パラダイムとして、医療画像領域に影響を及ぼすかもしれないが、この領域に適応する適切な方法を特定するためには、広範な研究が必要である。

Training segmentation models for medical images continues to be challenging due to the limited availability and acquisition expense of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to be able to segment the user-defined object of interest in an interactive manner. Despite its impressive performance on natural images, it is unclear how the model is affected when shifting to medical image domains. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 11 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point prompts using a standard method that simulates interactive segmentation. Experimental results show that SAM's performance based on single prompts highly varies depending on the task and the dataset, i.e., from 0.1135 for a spine MRI dataset to 0.8650 for a hip x-ray dataset, evaluated by IoU. Performance appears to be high for tasks including well-circumscribed objects with unambiguous prompts and poorer in many other scenarios such as segmentation of tumors. When multiple prompts are provided, performance improves only slightly overall, but more so for datasets where the object is not contiguous. An additional comparison to RITM showed a much better performance of SAM for one prompt but a similar performance of the two methods for a larger number of prompts. We conclude that SAM shows impressive performance for some datasets given the zero-shot learning setup but poor to moderate performance for multiple other datasets. While SAM as a model and as a learning paradigm might be impactful in the medical imaging domain, extensive research is needed to identify the proper ways of adapting it in this domain.

翻訳日:2023-04-27 17:01:48 公開日:2023-04-25

# ロボット脳としてのLLM : エゴセントリック記憶と制御の統合

LLM as A Robotic Brain: Unifying Egocentric Memory and Control ( http://arxiv.org/abs/2304.09349v2 )

ライセンス: Link先を確認

Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem

(参考訳) embodied aiは、物理的または仮想の体型(つまりロボット)を持ち、環境と動的に相互作用できるインテリジェントなシステムの研究と開発に焦点を当てている。メモリと制御は、具体化されたシステムの2つの重要な部分であり、通常、それぞれをモデル化するために別々のフレームワークが必要です。本稿では,ロボット脳として大規模言語モデルを用いて自己中心記憶と制御を統一する,llm-brainと呼ばれる新しい汎用フレームワークを提案する。 LLM-Brainフレームワークは、ゼロショット学習アプローチを利用して、ロボットタスクのための複数のマルチモーダル言語モデルを統合する。 LLM-Brain内の全てのコンポーネントは、認識、計画、制御、記憶を含む閉ループ多ラウンド対話において自然言語を用いて通信する。システムのコアは、エゴセントリックメモリを維持し、ロボットを制御するための具体化されたllmである。 LLM-Brainは,アクティブ探索と具体的質問応答という,下流の2つの課題を調べることで実証する。アクティブな探索タスクでは、ロボットは限られた数のアクションで未知の環境を広範囲に探索する必要がある。一方、具体的質問応答タスクでは、ロボットが事前探索中に得られた観察に基づいて質問に答える必要がある。

Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.

翻訳日:2023-04-27 16:59:59 公開日:2023-04-25

# 強化学習に基づく制御器に対するモデル抽出攻撃

Model Extraction Attacks Against Reinforcement Learning Based Controllers ( http://arxiv.org/abs/2304.13090v1 )

ライセンス: Link先を確認

Momina Sajid, Yanning Shen, Yasser Shoukry

(参考訳) 本稿では,攻撃者がシステムのフィードバックコントローラを推定(あるいは抽出)しようとするサイバー物理システムにおけるモデル抽出攻撃の問題を紹介する。コントローラの抽出(または推定)は、システムの将来の制御アクションを予測し、それに応じて攻撃を計画できるため、攻撃者に対して一致しないエッジを提供する。したがって、攻撃者がそのような攻撃を行う能力を理解することが重要である。本稿では,Reinforcement Learning (RL)アルゴリズムを用いてディープニューラルネットワーク(DNN)コントローラをトレーニングし,確率的システムを制御する際の設定に焦点を当てる。我々は、そのような未知のDNNコントローラを推定することを目的とした攻撃者の役割を担い、二相アルゴリズムを提案する。オフラインフェーズとも呼ばれる第1フェーズでは、攻撃者はRL-リワード関数とシステムダイナミクスに関するサイドチャネル情報を使用して、未知のDNNの候補推定セットを特定する。オンラインフェーズとも呼ばれる第2フェーズでは、攻撃者は未知のDNNの行動を観察し、これらの観察を使用して最終的なポリシー推定のセットをショートリスト化する。未知のDNNと推定したDNNの誤差を理論的に解析する。また,提案アルゴリズムの有効性を示す数値的な結果も提供する。

We introduce the problem of model-extraction attacks in cyber-physical systems in which an attacker attempts to estimate (or extract) the feedback controller of the system. Extracting (or estimating) the controller provides an unmatched edge to attackers since it allows them to predict the future control actions of the system and plan their attack accordingly. Hence, it is important to understand the ability of the attackers to perform such an attack. In this paper, we focus on the setting when a Deep Neural Network (DNN) controller is trained using Reinforcement Learning (RL) algorithms and is used to control a stochastic system. We play the role of the attacker that aims to estimate such an unknown DNN controller, and we propose a two-phase algorithm. In the first phase, also called the offline phase, the attacker uses side-channel information about the RL-reward function and the system dynamics to identify a set of candidate estimates of the unknown DNN. In the second phase, also called the online phase, the attacker observes the behavior of the unknown DNN and uses these observations to shortlist the set of final policy estimates. We provide theoretical analysis of the error between the unknown DNN and the estimated one. We also provide numerical results showing the effectiveness of the proposed algorithm.

翻訳日:2023-04-27 16:54:33 公開日:2023-04-25

# 目的:自己監督目標が視覚トランスフォーマー表現に与える影響を理解すること

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations ( http://arxiv.org/abs/2304.13089v1 )

ライセンス: Link先を確認

Shashank Shekhar, Florian Bordes, Pascal Vincent, Ari Morcos

(参考訳) 共同学習(例: simclr, moco, dino)と再構成学習(例: beit, simmim, mae)は視覚トランスフォーマーの自己教師付き学習のための2つの主要なパラダイムであるが、それらは転送性能において大きく異なる。本稿では,これらの目的が学習表現の構造と伝達性に与える影響を分析することにより,これらの違いを説明することを目的とする。分析の結果,リコンストラクションに基づく学習機能は,共同インベディングに基づく学習機能とは大きく異なっており,類似した目的を持ったモデルでは,アーキテクチャ全体でも類似した機能を学習できることが判明した。これらの違いはネットワークの初期に発生し、主に注目層と正規化層によって引き起こされる。異なる目的が異なる情報分布と学習表現の不変性を駆動するため,ジョイントエンベディング特徴は分類のためのより良い線形プローブ移動をもたらすことがわかった。これらの違いは、機能に空間的特異性を必要とする下流タスクの転送性能の逆の傾向を説明する。最後に, 微調整による再構成表現が, より優れた伝達を可能にすること, 微調整による情報再構成が, 事前訓練された関節埋め込みモデルとよりよく似たものになることを示す。

Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network and are primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the learned representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that fine-tuning re-organizes the information to be more similar to pre-trained joint embedding models.

翻訳日:2023-04-27 16:54:11 公開日:2023-04-25

# 新興技術の組織的ガバナンス - 医療におけるAI導入

Organizational Governance of Emerging Technologies: AI Adoption in Healthcare ( http://arxiv.org/abs/2304.13081v1 )

ライセンス: Link先を確認

Jee Young Kim, William Boag, Freya Gulamali, Alifia Hasan, Henry David Jeffry Hogg, Mark Lifson, Deirdre Mulligan, Manesh Patel, Inioluwa Deborah Raji, Ajai Sehgal, Keo Shaw, Danny Tobey, Alexandra Valladares, David Vidal, Suresh Balu, Mark Sendak

(参考訳) 民間および公共セクターの構造と規範は、新しい技術が実際にどのように使われているかを洗練している。医療分野では、AIの採用が急増しているにもかかわらず、その利用と統合を取り巻く組織ガバナンスはしばしば理解されていない。この研究でHealth AI Partnership(HAIP)が目指すのは、医療設定におけるAIシステムの適切な組織的ガバナンスの要件をより適切に定義し、ヘルスシステムリーダを支援して、AIの採用に関するより詳細な決定を行うことだ。この理解に向けて、私たちはまず、医療におけるAI採用の標準をどのように設計して、簡単かつ効率的に使用できるかを特定する。次に、特定医療システムにおけるAI技術の実践的導入に関わる、正確な決定ポイントを図示する。実際に、米国の主要医療機関のリーダーと関連する分野の重要情報提供者との複数組織的なコラボレーションを通じて、これを達成します。コンサルタントのIDEO.orgを使って、医療やAI倫理の専門家とユーザビリティテストのセッションを行うことができた。ユーザビリティ分析では、組織リーダが技術導入にアプローチする方法に合わせて、モックの重要な決定ポイントを中心に構成されたプロトタイプが明らかになった。同時に,医療関連分野の専門家89人と半構造化インタビューを行った。修正された基盤理論アプローチを使用して、AI導入ライフサイクルを通じて8つの重要な決定ポイントと包括的な手順を特定できた。これは、米国の医療システムによるAI導入に関わる、現在のガバナンス構造とプロセスに関する、最も詳細な定性的な分析の1つである。これらの発見が、医療における新興テクノロジーの安全で効果的で責任ある採用を促進する能力を構築するための将来の取り組みを知らせてくれることを期待している。

Private and public sector structures and norms refine how emerging technology is used in practice. In healthcare, despite a proliferation of AI adoption, the organizational governance surrounding its use and integration is often poorly understood. What the Health AI Partnership (HAIP) aims to do in this research is to better define the requirements for adequate organizational governance of AI systems in healthcare settings and support health system leaders to make more informed decisions around AI adoption. To work towards this understanding, we first identify how the standards for the AI adoption in healthcare may be designed to be used easily and efficiently. Then, we map out the precise decision points involved in the practical institutional adoption of AI technology within specific health systems. Practically, we achieve this through a multi-organizational collaboration with leaders from major health systems across the United States and key informants from related fields. Working with the consultancy IDEO.org, we were able to conduct usability-testing sessions with healthcare and AI ethics professionals. Usability analysis revealed a prototype structured around mock key decision points that align with how organizational leaders approach technology adoption. Concurrently, we conducted semi-structured interviews with 89 professionals in healthcare and other relevant fields. Using a modified grounded theory approach, we were able to identify 8 key decision points and comprehensive procedures throughout the AI adoption lifecycle. This is one of the most detailed qualitative analyses to date of the current governance structures and processes involved in AI adoption by health systems in the United States. We hope these findings can inform future efforts to build capabilities to promote the safe, effective, and responsible adoption of emerging technologies in healthcare.

翻訳日:2023-04-27 16:53:46 公開日:2023-04-25

# iMixer:階層型のHopfieldネットワークは、可逆的で暗黙的で反復的なMLP-Mixerを意味する

iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer ( http://arxiv.org/abs/2304.13061v1 )

ライセンス: Link先を確認

Toshihiro Ota, Masato Taki

(参考訳) ここ数年、コンピュータビジョンにおけるトランスフォーマーの成功は、MLP-Mixerのようなトランスフォーマーと競合する多くの代替モデルの発見を刺激してきた。弱い誘導バイアスにもかかわらず、これらのモデルはよく研究された畳み込みニューラルネットワークに匹敵する性能を達成した。最近のホップフィールドネットワークの研究は、あるエネルギーベースの連想メモリモデルとトランスフォーマーまたはMLPミクサーの対応を示唆し、トランスフォーマー型アーキテクチャの設計の理論的背景に光を当てた。本稿では,最近導入された階層型ホップフィールドネットワークへの対応を一般化し,新しいMLP-Mixerモデルの一般化であるiMixerを求める。通常のフィードフォワードニューラルネットワークとは異なり、iMixerは出力側から入力側へ前進するMLP層を含んでいる。モジュールを可逆的,暗黙的,反復的混合モジュールの例として特徴づける。画像分類タスクの様々なデータセットを用いてモデル性能を評価し,ベースラインのバニラMLP-Mixerと比較して,iMixerが合理的に改善できることを確認した。その結果、ホップフィールドネットワークとミキサーモデルとの対応は、より広い種類のトランスフォーマーライクなアーキテクチャ設計を理解するための原則であることが示唆された。

In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak induced bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer reasonably achieves the improvement compared to the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.

翻訳日:2023-04-27 16:53:23 公開日:2023-04-25

# just構造に関する事前学習 : トランスファー学習による言語帰納的バイアスの理解

Pretrain on just structure: Understanding linguistic inductive biases using transfer learning ( http://arxiv.org/abs/2304.13060v1 )

ライセンス: Link先を確認

Isabel Papadimitriou and Dan Jurafsky

(参考訳) 人間とトランスフォーマーの両方の言語モデルは、明示的な構造的監督なしに言語を学べる。この学習を可能にする帰納的学習バイアスは何か? 本研究では,人工構造データへの事前学習による構造バイアスを伴う言語モデルの提案と,英語の微調整による評価により,異なる帰納的学習バイアスの効果について検討する。実験的なセットアップにより、言語モデルの帰納バイアスを積極的に制御できるようになります。実験では,3種類の帰納バイアスの比較成功について検討した。 1)帰納的階層的処理のための帰納的バイアス 2)文脈自由文法でモデル化できない制約のないトークン分岐依存性に対する帰納的バイアス 3) zipfian power-law vocabulary distribution に対する帰納的バイアス。複雑なトークン-トークン間の相互作用が最高の帰納バイアスを形成し、非文脈自由の場合ではこれが最強であることを示す。また、Zipf の語彙分布は文法構造とは独立に優れた帰納的バイアスを形成することを示す。本研究は,人間では実行できない制御型言語学習実験を行うトランスフォーマーモデルの能力を活用して,人間と機械の両方で言語学習を促進する構造に関する仮説を提示する。

Both humans and transformer language models are able to learn language without explicit structural supervision. What inductive learning biases make this learning possible? In this study, we examine the effect of different inductive learning biases by predisposing language models with structural biases through pretraining on artificial structured data, and then evaluating by fine-tuning on English. Our experimental setup gives us the ability to actively control the inductive bias of language models. With our experiments, we investigate the comparative success of three types of inductive bias: 1) an inductive bias for recursive, hierarchical processing 2) an inductive bias for unrestricted token-token dependencies that can't be modeled by context-free grammars, and 3) an inductive bias for a Zipfian power-law vocabulary distribution. We show that complex token-token interactions form the best inductive biases, and that this is strongest in the non-context-free case. We also show that a Zipfian vocabulary distribution forms a good inductive bias independently from grammatical structure. Our study leverages the capabilities of transformer models to run controlled language learning experiments that are not possible to run in humans, and surfaces hypotheses about the structures that facilitate language learning in both humans and machines.

翻訳日:2023-04-27 16:53:00 公開日:2023-04-25

# 摂動散乱におけるエントロピー成長について

On Entropy Growth in Perturbative Scattering ( http://arxiv.org/abs/2304.13052v1 )

ライセンス: Link先を確認

Clifford Cheung, Temple He, Allic Sivaramakrishnan

(参考訳) 熱力学の第2法則に触発されて,二成分系における生成状態の動的ユニタリ進化によって生じるサブシステムエントロピーの変化を考察する。摂動相互作用における先行次数において、サブシステムの量子$n$-Tsallisエントロピーが決して減少しないことを証明し、サブシステムが等確率状態の統計的混合として初期化されることを条件として、$\Delta S_n \geq 0$ とする。これは任意のインタラクションの選択と補完サブシステムの初期化に対して当てはまる。この初期状態の条件が破られると、サブシステムエントロピーである$\delta s_n < 0$ を減少させる ``maxwell's demon''' プロセスを明示的に構築することができる。注目すべきは、粒子散乱の場合、$n$-Tsallisエントロピーに対応する回路図は、現代の散乱振幅プログラムで現れるオンシェル図と同じであり、$\Delta S_n \geq 0$ は断面の非負性と密接に関連していることである。

Inspired by the second law of thermodynamics, we study the change in subsystem entropy generated by dynamical unitary evolution of a product state in a bipartite system. Working at leading order in perturbative interactions, we prove that the quantum $n$-Tsallis entropy of a subsystem never decreases, $\Delta S_n \geq 0$, provided that subsystem is initialized as a statistical mixture of states of equal probability. This is true for any choice of interactions and any initialization of the complementary subsystem. When this condition on the initial state is violated, it is always possible to explicitly construct a ``Maxwell's demon'' process that decreases the subsystem entropy, $\Delta S_n < 0$. Remarkably, for the case of particle scattering, the circuit diagrams corresponding to $n$-Tsallis entropy are the same as the on-shell diagrams that have appeared in the modern scattering amplitudes program, and $\Delta S_n \geq 0$ is intimately related to the nonnegativity of cross-sections.

翻訳日:2023-04-27 16:52:40 公開日:2023-04-25

# GULP: SMS通知と機械学習画像処理を備えたソーラーパワーのスマートガベージセグメンテーションビン

GULP: Solar-Powered Smart Garbage Segregation Bins with SMS Notification and Machine Learning Image Processing ( http://arxiv.org/abs/2304.13040v1 )

ライセンス: Link先を確認

Jerome B. Sigongan, Hamer P. Sinodlay, Shahida Xerxy P. Cuizon, Joanna S. Redondo, Maricel G. Macapulay, Charlene O. Bulahan-Undag and Kenn Migan Vincent C. Gumonan

(参考訳) 本研究は, 廃棄物をそれぞれの容器に分離するスマートビンを構築することを目的としている。廃棄物管理プロセスをエンドユーザーにとってより面白くし、スマートビンを降ろす必要があるときにユーティリティスタッフに通知し、再生可能太陽エネルギー源を利用して環境にやさしいスマートビンを奨励する。研究者たちは、チームがワークロードをうまく管理でき、割り当てられた予算内に留まらずに最高の製品を作ることができるため、アジャイル開発アプローチを採用した。 6つの基本的なフェーズは、計画、設計、開発、テスト、リリース、フィードバックです。 iso/iec 25010による全体的な品質テストの結果は肯定的な結果となった。全体の平均は4.55で、これは口頭で素晴らしいと解釈される。さらに、このアプリケーションは太陽エネルギー源と独立して動作することができる。ユーザは、その興味深いメカニズムを通じて廃棄物処理の全過程を楽しんだ。以上の結果から, コンプレッサーは, ごみレベルが最大値に達すると圧縮し, より多くのゴミを収容できる部屋を造ることを推奨した。同時に複数のガベージを判定するアルゴリズムも推奨されている。ソーラーパネルとソーラーパネルを組み合わせることで、スマートビンの再生可能エネルギーを増やすことができる。

This study intends to build a smartbin that segregates solid waste into its respective bins. To make the waste management process more interesting for the end-users; to notify the utility staff when the smart bin needs to be unloaded; to encourage an environment-friendly smart bin by utilizing renewable solar energy source. The researchers employed an Agile Development approach because it enables teams to manage their workloads successfully and create the highest-quality product while staying within their allocated budget. The six fundamental phases are planning, design, development, test, release, and feedback. The Overall quality testing result that was provided through the ISO/IEC 25010 evaluation which concludes a positive outcome. The overall average was 4.55, which is verbally interpreted as excellent. Additionally, the application can also independently run with its solar energy source. Users were able to enjoy the whole process of waste disposal through its interesting mechanisms. Based on the findings, a compressor is recommended to compress the trash when the trash level reaches its maximum point to create more rooms for more garbage. An algorithm to determine multiple garbage at a time is also recommended. Adding a solar tracker coupled with solar panel will help produce more renewable energy for the smart bin.

翻訳日:2023-04-27 16:52:19 公開日:2023-04-25

# Raspberry Piのディープラーニングモデル最適化

Optimizing Deep Learning Models For Raspberry Pi ( http://arxiv.org/abs/2304.13039v1 )

ライセンス: Link先を確認

Salem Ameen and Kangaranmulle Siriwardana and Theo Theodoridis

(参考訳) ディープラーニングモデルは、コンピュータビジョン、自然言語処理、音声認識など、幅広いアプリケーションで広く普及しています。しかし、これらのモデルは通常、大量の計算リソースを必要とするため、raspberry piのような低消費電力デバイスでの実行は困難である。この課題に対処する1つのアプローチは、プルーニング技術を使用してディープラーニングモデルのサイズを減らすことだ。プルーニングは、重要でない重みと接続をモデルから取り除き、より小さく、より効率的なモデルをもたらす。プルーニングはトレーニング中またはモデルがトレーニングされた後に行うことができる。もう1つのアプローチは、特にraspberry piアーキテクチャのためにディープラーニングモデルを最適化することです。これには、モデルのアーキテクチャとパラメータを最適化して、CPUやGPUなどのRaspberry Piのハードウェア機能を活用することが含まれる。さらに、モデルに必要な計算量を最小化することで、エネルギー効率に最適化することができる。 raspberry pi用のディープラーニングモデルのプルーニングと最適化は、低消費電力デバイスの計算とエネルギーの制約を克服する上で有効であり、幅広いデバイスでディープラーニングモデルを実行できる。以下の節では、これらのアプローチをさらに詳細に検討し、Raspberry Piのディープラーニングモデルを最適化する効果について論じる。

Deep learning models have become increasingly popular for a wide range of applications, including computer vision, natural language processing, and speech recognition. However, these models typically require large amounts of computational resources, making them challenging to run on low-power devices such as the Raspberry Pi. One approach to addressing this challenge is to use pruning techniques to reduce the size of the deep learning models. Pruning involves removing unimportant weights and connections from the model, resulting in a smaller and more efficient model. Pruning can be done during training or after the model has been trained. Another approach is to optimize the deep learning models specifically for the Raspberry Pi architecture. This can include optimizing the model's architecture and parameters to take advantage of the Raspberry Pi's hardware capabilities, such as its CPU and GPU. Additionally, the model can be optimized for energy efficiency by minimizing the amount of computation required. Pruning and optimizing deep learning models for the Raspberry Pi can help overcome the computational and energy constraints of low-power devices, making it possible to run deep learning models on a wider range of devices. In the following sections, we will explore these approaches in more detail and discuss their effectiveness for optimizing deep learning models for the Raspberry Pi.

翻訳日:2023-04-27 16:51:59 公開日:2023-04-25

# 拡散確率モデルに基づく高精度・高自由度メタ表面逆設計

Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design ( http://arxiv.org/abs/2304.13038v1 )

ライセンス: Link先を確認

Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li

(参考訳) 従来のメタ原子設計は、全波シミュレーションを用いた研究者の事前知識と試行錯誤検索に重きを置き、結果として時間の消費と非効率なプロセスを生み出す。進化アルゴリズムやトポロジカル最適化といった最適化アルゴリズムに基づく逆設計法がメタマテリアルの設計に導入されている。しかし、これらのアルゴリズムはいずれも多目的タスクを満足するほど一般的ではない。近年, メタマテリアルの逆設計にGAN(Generative Adversarial Networks)で表される深層学習法が適用されており, Sパラメータ要求に基づいて, 直接的に自由度の高いメタ原子を生成することができる。しかし、gansの敵対的な訓練プロセスはネットワークを不安定にさせ、高いモデリングコストをもたらす。本稿では拡散確率理論に基づく新しいメタマテリアル逆設計法を提案する。元の構造をガウス分布に変換するマルコフ過程を学習することにより、ガウス分布から徐々にノイズを除去し、Sパラメータ条件を満たす新しい高次自由度メタ原子を生成することができる。モデル収束速度, 生成精度, 品質の観点から, 提案手法はGANの代表的な手法よりも優れていることが実証された。

Conventional meta-atom designs rely heavily on researchers' prior knowledge and trial-and-error searches using full-wave simulations, resulting in time-consuming and inefficient processes. Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials. However, none of these algorithms are general enough to fulfill multi-objective tasks. Recently, deep learning methods represented by Generative Adversarial Networks (GANs) have been applied to inverse design of metamaterials, which can directly generate high-degree-of-freedom meta-atoms based on S-parameter requirements. However, the adversarial training process of GANs makes the network unstable and results in high modeling costs. This paper proposes a novel metamaterial inverse design method based on the diffusion probability theory. By learning the Markov process that transforms the original structure into a Gaussian distribution, the proposed method can gradually remove the noise starting from the Gaussian distribution and generate new high-degree-of-freedom meta-atoms that meet S-parameter conditions, which avoids the model instability introduced by the adversarial training process of GANs and ensures more accurate and high-quality generation results. Experiments have proven that our method is superior to representative methods of GANs in terms of model convergence speed, generation accuracy, and quality.

翻訳日:2023-04-27 16:51:40 公開日:2023-04-25

# veml:大規模高次元データのためのエンドツーエンド機械学習ライフサイクル

VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data ( http://arxiv.org/abs/2304.13037v1 )

ライセンス: Link先を確認

Van-Duc Le

(参考訳) エンドツーエンドの機械学習(ML)ライフサイクルは、データ準備やMLモデル設計からモデルトレーニング、そして推論のためのトレーニングされたモデルのデプロイに至るまで、多くの反復プロセスで構成されている。 ML問題のためのエンドツーエンドライフサイクルを構築する場合、多くのMLパイプラインを設計して実行し、多数のライフサイクルバージョンを生成する必要がある。そこで本稿では,エンドツーエンドMLライフサイクル専用のバージョン管理システムであるVeMLを紹介する。我々のシステムは、他のシステムが解決していないいくつかの重要な問題に取り組む。まず、特に大規模かつ高次元のデータセットにおいて、MLライフサイクルを構築するための高コストに対処する。我々は、システム内で管理されている類似データセットのライフサイクルを、新しいトレーニングデータに転送することで、この問題を解決する。大規模・高次元データの類似性を効率的に計算するためのコアセットに基づくアルゴリズムを設計する。もうひとつの重要な問題は、トレーニングデータとML寿命中のテストデータの違いによるモデルの精度低下であり、リカバリにつながる。このシステムは、テストデータからラベル付きデータを取得し、新しいデータバージョンのmlライフサイクルを再構築することなく、このミスマッチを検出するのに役立ちます。本研究は,運転画像と時空間センサデータを用いた実世界の大規模データセット実験を行い,有望な結果を示す。

An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data efficiently. Another critical issue is the model accuracy degradation by the difference between training data and testing data during the ML lifetime, which leads to lifecycle rebuild. Our system helps to detect this mismatch without getting labeled data from testing data and rebuild the ML lifecycle for a new data version. To demonstrate our contributions, we conduct experiments on real-world, large-scale datasets of driving images and spatiotemporal sensor data and show promising results.

翻訳日:2023-04-27 16:51:17 公開日:2023-04-25

# 離散化すべき時:離散化連続問題におけるアルゴリズムの性能解析

When to be Discrete: Analyzing Algorithm Performance on Discretized Continuous Problems ( http://arxiv.org/abs/2304.13117v1 )

ライセンス: Link先を確認

Andr\'e Thomaser and Jacob de Nobel and Diederick Vermetten and Furong Ye and Thomas B\"ack and Anna V. Kononova

(参考訳) 最適化問題の領域は、その最も重要な特徴の1つと見なされる。特に、連続最適化と離散最適化の区別は、かなり影響が大きい。これに基づいて、最適化アルゴリズム、解析方法等が特定される。しかし実際には,真に連続した問題はありません。これが問題の計算限界やより具体的な性質によって引き起こされるかどうかに関わらず、ほとんどの変数は有限分解能を持つ。本研究では,連続変数の解法の概念を用いて,問題と連続領域を区別する。この解像度が連続最適化アルゴリズムの性能に与える影響について検討する。整数空間へのマッピングを通じて、これら連続最適化器と全く同じ問題に対する離散アルゴリズムを比較することができる。問題に離散化を追加すると、標準$(\mu_W, \lambda)$-CMA-ESが失敗することを示す。

The domain of an optimization problem is seen as one of its most important characteristics. In particular, the distinction between continuous and discrete optimization is rather impactful. Based on this, the optimizing algorithm, analyzing method, and more are specified. However, in practice, no problem is ever truly continuous. Whether this is caused by computing limits or more tangible properties of the problem, most variables have a finite resolution. In this work, we use the notion of the resolution of continuous variables to discretize problems from the continuous domain. We explore how the resolution impacts the performance of continuous optimization algorithms. Through a mapping to integer space, we are able to compare these continuous optimizers to discrete algorithms on the exact same problems. We show that the standard $(\mu_W, \lambda)$-CMA-ES fails when discretization is added to the problem.

翻訳日:2023-04-27 16:44:29 公開日:2023-04-25

# avface: 視聴覚4次元顔再建に向けて

AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction ( http://arxiv.org/abs/2304.13115v1 )

ライセンス: Link先を確認

Aggelina Chatziagapi, Dimitris Samaras

(参考訳) 本研究では,モノクロ映像からの4次元顔再構成問題に対するマルチモーダル・ソリューションを提案する。 2次元画像からの3次元顔の再構成は、深さのあいまいさによる制約の少ない問題である。最先端の手法は、単一の画像やビデオからの視覚情報を活用してこの問題を解決しようとするが、3dメッシュアニメーションのアプローチはオーディオに依存している。しかし、ほとんどのケース(例えばAR/VRアプリケーション)では、ビデオには視覚情報と音声情報の両方が含まれている。本研究では,任意の話者の4次元顔と唇の動きを,訓練に3次元的真実を必要とせず正確に再構成するAVFaceを提案する。粗いステージは、3次元の変形可能なモデルのフレームあたりのパラメータを推定し、続いて唇の精製を行い、さらに細かいステージは顔の幾何学的詳細を復元する。トランスフォーマティブ・モジュールによってキャプチャされた時間的音声と映像情報により,どちらのモダリティも不十分な場合(顔のオクルージョンなど)ではロバストな手法である。大規模定性的・定量的評価は,本手法が現状よりも優れていることを示す。

In this work, we present a multimodal solution to the problem of 4D face reconstruction from monocular videos. 3D face reconstruction from 2D images is an under-constrained problem due to the ambiguity of depth. State-of-the-art methods try to solve this problem by leveraging visual information from a single image or video, whereas 3D mesh animation approaches rely more on audio. However, in most cases (e.g. AR/VR applications), videos include both visual and speech information. We propose AVFace that incorporates both modalities and accurately reconstructs the 4D facial and lip motion of any speaker, without requiring any 3D ground truth for training. A coarse stage estimates the per-frame parameters of a 3D morphable model, followed by a lip refinement, and then a fine stage recovers facial geometric details. Due to the temporal audio and video information captured by transformer-based modules, our method is robust in cases when either modality is insufficient (e.g. face occlusions). Extensive qualitative and quantitative evaluation demonstrates the superiority of our method over the current state-of-the-art.

翻訳日:2023-04-27 16:44:19 公開日:2023-04-25

# BO-ICP:ベイズ最適化に基づく反復閉点の初期化

BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization ( http://arxiv.org/abs/2304.13114v1 )

ライセンス: Link先を確認

Harel Biggie, Andrew Beathard, Christoffer Heckman

(参考訳) イテレーティブ・クローズト・ポイント(ICP)のような点群登録のための典型的なアルゴリズムは、2つの点群間で良好な初期変換推定を必要とする。この開始条件を選択するための最先端の手法は、確率的サンプリングやブランチやバウンドなどの大域的最適化技術に依存している。本研究では,ベイズ最適化に基づく重要な初期ICP変換の探索手法を提案する。提案手法は,オフラインマップ構築などの実行環境において,高速な結果の検索と改善を行うアルゴリズムの汎用性を強調した3つの異なる構成を提供する。実験は一般的なデータセット上で行われ、同様の計算時間を与えると最先端の手法よりも優れた結果が得られることを示す。さらに、ICPベースのメソッドの開始点である初期変換の選択にのみ焦点をあてているため、ICPの他の改善とも互換性がある。

Typical algorithms for point cloud registration such as Iterative Closest Point (ICP) require a favorable initial transform estimate between two point clouds in order to perform a successful registration. State-of-the-art methods for choosing this starting condition rely on stochastic sampling or global optimization techniques such as branch and bound. In this work, we present a new method based on Bayesian optimization for finding the critical initial ICP transform. We provide three different configurations for our method which highlights the versatility of the algorithm to both find rapid results and refine them in situations where more runtime is available such as offline map building. Experiments are run on popular data sets and we show that our approach outperforms state-of-the-art methods when given similar computation time. Furthermore, it is compatible with other improvements to ICP, as it focuses solely on the selection of an initial transform, a starting point for all ICP-based methods.

翻訳日:2023-04-27 16:44:00 公開日:2023-04-25

# 限定CSIを用いたTHzビーム探索のための深部強化学習

Federated Deep Reinforcement Learning for THz-Beam Search with Limited CSI ( http://arxiv.org/abs/2304.13109v1 )

ライセンス: Link先を確認

Po-Chun Hsu, Li-Hsiang Shen, Chun-Hung Liu, and Kai-Ten Feng

(参考訳) 超広帯域でのテラヘルツ(THz)通信は、次世代無線ネットワークにおける高データレートの厳密な要求を実現するための有望な技術であるが、その高度な伝搬減衰は、実際にの実装を著しく妨げている。 thz信号の重大伝搬減衰を効果的に克服するために、大規模アンテナアレイのビーム方向を見つけることは、圧迫の必要である。本稿では,携帯電話ネットワーク上でエッジサーバが協調する複数の基地局(BS)のTHzビーム探索を高速に行うためのFDRL(Federated Deep reinforcement Learning)を提案する。全てのBSはDDPG(Deep Deterministic Policy gradient)ベースのDRLを実行し、限られたチャネル状態情報(CSI)を持つTHzビームフォーミングポリシーを得る。彼らは、細胞間干渉を軽減するために、隠された情報でDDPGモデルを更新する。我々は,THz CSIとDDPGの隠れニューロンの採用により,セルネットワークのスループットが向上できることを実証した。また、部分モデル更新によるFDRLは、フルモデル更新によるFDRLと同じ性能をほぼ達成できることを示し、部分モデルアップロードによるエッジサーバとBS間の通信負荷を低減する効果的な手段を示す。さらに、提案したFDRLは、従来の非学習ベースおよび既存の非FDRLベンチマーク最適化手法よりも優れている。

Terahertz (THz) communication with ultra-wide available spectrum is a promising technique that can achieve the stringent requirement of high data rate in the next-generation wireless networks, yet its severe propagation attenuation significantly hinders its implementation in practice. Finding beam directions for a large-scale antenna array to effectively overcome severe propagation attenuation of THz signals is a pressing need. This paper proposes a novel approach of federated deep reinforcement learning (FDRL) to swiftly perform THz-beam search for multiple base stations (BSs) coordinated by an edge server in a cellular network. All the BSs conduct deep deterministic policy gradient (DDPG)-based DRL to obtain THz beamforming policy with limited channel state information (CSI). They update their DDPG models with hidden information in order to mitigate inter-cell interference. We demonstrate that the cell network can achieve higher throughput as more THz CSI and hidden neurons of DDPG are adopted. We also show that FDRL with partial model update is able to nearly achieve the same performance of FDRL with full model update, which indicates an effective means to reduce communication load between the edge server and the BSs by partial model uploading. Moreover, the proposed FDRL outperforms conventional non-learning-based and existing non-FDRL benchmark optimization methods.

翻訳日:2023-04-27 16:43:46 公開日:2023-04-25

# WiFi CSIを用いたデバイスレスマルチルーム人間プレゼンス検出のための時間選択型RNN

Time-Selective RNN for Device-Free Multi-Room Human Presence Detection Using WiFi CSI ( http://arxiv.org/abs/2304.13107v1 )

ライセンス: Link先を確認

Fang-Yu Chu, Li-Hsiang Shen, An-Hung Hsiao, Kai-Ten Feng

(参考訳) 人間の存在検出は、ホームオートメーション、セキュリティ、医療など、さまざまなアプリケーションにとって重要な技術である。カメラベースのシステムは伝統的にこの目的で使われてきたが、プライバシーの懸念が高まる。この問題に対処するため、最近の研究では、商用WiFiアクセスポイント(AP)から抽出し、詳細なチャネル特性を提供するチャネル状態情報(CSI)アプローチについて検討している。本稿では,tcd-fern(time-selective conditional dual feature extract recurrent network)を用いたマルチルームシナリオのためのデバイスフリーな人間存在検出システムを提案する。本システムは、動的かつ静的なデータ前処理技術を用いて、現在の人間の特徴を条件付きで有意な時間的特徴を捉え、人の移動や空間的特徴を抽出し、視線遮断(LoS)経路とノンブロッキングケースを区別するように設計されている。部屋分割による特徴減衰問題を緩和するため,投票方式を採用した。提案するTD-FERNシステムは,コモディティなWiFi APの少ないマルチルームシナリオに対して,人間の存在検出を実現することができることを示すため,評価および実時間実験を行った。

Human presence detection is a crucial technology for various applications, including home automation, security, and healthcare. While camera-based systems have traditionally been used for this purpose, they raise privacy concerns. To address this issue, recent research has explored the use of channel state information (CSI) approaches that can be extracted from commercial WiFi access points (APs) and provide detailed channel characteristics. In this thesis, we propose a device-free human presence detection system for multi-room scenarios using a time-selective conditional dual feature extract recurrent Network (TCD-FERN). Our system is designed to capture significant time features with the condition on current human features using a dynamic and static (DaS) data preprocessing technique to extract moving and spatial features of people and differentiate between line-of-sight (LoS) path blocking and non-blocking cases. To mitigate the feature attenuation problem caused by room partitions, we employ a voting scheme. We conduct evaluation and real-time experiments to demonstrate that our proposed TCD-FERN system can achieve human presence detection for multi-room scenarios using fewer commodity WiFi APs.

翻訳日:2023-04-27 16:43:27 公開日:2023-04-25

# 屋内Wi-Fiを用いたデバイス不要な壁面位置検出のための注意深度学習

Attention-Enhanced Deep Learning for Device-Free Through-the-Wall Presence Detection Using Indoor WiFi System ( http://arxiv.org/abs/2304.13105v1 )

ライセンス: Link先を確認

Li-Hsiang Shen, Kuan-I Lu, An-Hung Hsiao and Kai-Ten Feng

(参考訳) 屋内環境における人的存在の正確な検出は,エネルギー管理やセキュリティなど,様々な用途において重要である。本稿では,WiFi信号のチャネル状態情報(CSI)を用いた人間の存在検知システムを提案する。本システムでは,CSIデータから情報サブキャリアを自動選択するためのアテンション・エンハンスド・ディープ・ラーニング(ALPD)と,CSIにおける時間的依存を捉えるための双方向長短期記憶(LSTM)ネットワークを利用する。さらに、静的な状態における人間の存在検出の精度を向上させるために静的な特徴を利用する。提案するALPDシステムは,CSIデータセットを収集するための一対のWiFiアクセスポイント(AP)をデプロイすることで評価し,さらにいくつかのベンチマークと比較した。その結果,alpdシステムは,特に干渉の有無において,精度の点でベンチマークを上回っていることがわかった。さらに、双方向送信データは、安定性と精度の向上、およびトレーニング用データ収集のコスト削減の訓練に有用である。提案するALPDシステムは,WiFi CSI信号を用いた人的存在検出において有望な結果を示す。

Accurate detection of human presence in indoor environments is important for various applications, such as energy management and security. In this paper, we propose a novel system for human presence detection using the channel state information (CSI) of WiFi signals. Our system named attention-enhanced deep learning for presence detection (ALPD) employs an attention mechanism to automatically select informative subcarriers from the CSI data and a bidirectional long short-term memory (LSTM) network to capture temporal dependencies in CSI. Additionally, we utilize a static feature to improve the accuracy of human presence detection in static states. We evaluate the proposed ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI dataset, which is further compared with several benchmarks. The results demonstrate that our ALPD system outperforms the benchmarks in terms of accuracy, especially in the presence of interference. Moreover, bidirectional transmission data is beneficial to training improving stability and accuracy, as well as reducing the costs of data collection for training. Overall, our proposed ALPD system shows promising results for human presence detection using WiFi CSI signals.

翻訳日:2023-04-27 16:43:06 公開日:2023-04-25

# LSTMによるマイクログリッドの騒音発生に対するロバスト性予測

LSTM-based Load Forecasting Robustness Against Noise Injection Attack in Microgrid ( http://arxiv.org/abs/2304.13104v1 )

ライセンス: Link先を確認

Amirhossein Nazeri and Pierluigi Pisu

(参考訳) 本稿では,マイクログリッドにおける負荷予測のためのノイズ注入攻撃に対するLSTMニューラルネットワークの堅牢性について検討する。 LSTMモデルの性能は、異なるSNRを持つブラックボックスガウスノイズアタックの下で検討する。攻撃者はlstmモデルの入力データのみにアクセスすると仮定される。その結果,ノイズアタックはLSTMモデルの性能に影響を及ぼすことがわかった。負荷予測は、健全な予測では絶対誤差(MAE)が0.047MWであり、SNR=6dBのガウスノイズ挿入では0.097MWとなる。 LSTMモデルをノイズアタックに対して堅牢化するために、モデル入力に最適カットオフ周波数の低域フィルタを適用し、ノイズアタックを除去する。フィルタは、SNRが低い場合にはより良く、小さなノイズに対しては期待できない。

In this paper, we investigate the robustness of an LSTM neural network against noise injection attacks for electric load forecasting in an ideal microgrid. The performance of the LSTM model is investigated under a black-box Gaussian noise attack with different SNRs. It is assumed that attackers have just access to the input data of the LSTM model. The results show that the noise attack affects the performance of the LSTM model. The load prediction means absolute error (MAE) is 0.047 MW for a healthy prediction, while this value increases up to 0.097 MW for a Gaussian noise insertion with SNR= 6 dB. To robustify the LSTM model against noise attack, a low-pass filter with optimal cut-off frequency is applied at the model's input to remove the noise attack. The filter performs better in case of noise with lower SNR and is less promising for small noises.

翻訳日:2023-04-27 16:42:43 公開日:2023-04-25

# HyMo:新しいマルチモードハイブリッドモデルによるスマートコントラクトの脆弱性検出

HyMo: Vulnerability Detection in Smart Contracts using a Novel Multi-Modal Hybrid Model ( http://arxiv.org/abs/2304.13103v1 )

ライセンス: Link先を確認

Mohammad Khodadadi, Jafar Tahmoresnezhad (1) ((1) Department of IT & Computer Engineering, Urmia University of Technology, Or\=um\=iyeh, Iran)

(参考訳) ブロックチェーン技術が急速に進歩し、金融、ヘルスケア、保険、ゲームなど、多くの業界でスマートコントラクトが一般的なツールになりつつある。スマートコントラクトの数は増えており、同時にスマートコントラクトのセキュリティは、スマートコントラクトの脆弱性によって引き起こされる金銭的損失により、かなりの注目を集めている。既存の分析技術は、多くのスマートコントラクトセキュリティの欠陥を識別できるが、専門家によって確立された厳格な基準に頼りすぎており、スマートコントラクトの複雑さが高まるにつれて検出プロセスがはるかに時間がかかる。本稿では,HyMoをマルチモーダルハイブリッド深層学習モデルとして提案し,多モード性を考慮した各種入力表現と,BiGRU深層学習技術を用いて各単語を文字のn-gramとして表現するFastText単語埋め込みを,スマートコントラクトの脆弱性検出において高精度な2つのGRUからなるシーケンス処理モデルとして提案する。このモデルは、さまざまなディープラーニングモデルを使用して機能を収集し、スマートコントラクトの脆弱性を特定する。 scrawldのような現在公開されているデータセットに関する一連の研究を通じて、当社のハイブリッドhymoモデルはスマートコントラクト脆弱性検出性能に優れています。したがってHyMoは、他のアプローチに対するスマートコントラクトの脆弱性をよりよく検出する。

With blockchain technology rapidly progress, the smart contracts have become a common tool in a number of industries including finance, healthcare, insurance and gaming. The number of smart contracts has multiplied, and at the same time, the security of smart contracts has drawn considerable attention due to the monetary losses brought on by smart contract vulnerabilities. Existing analysis techniques are capable of identifying a large number of smart contract security flaws, but they rely too much on rigid criteria established by specialists, where the detection process takes much longer as the complexity of the smart contract rises. In this paper, we propose HyMo as a multi-modal hybrid deep learning model, which intelligently considers various input representations to consider multimodality and FastText word embedding technique, which represents each word as an n-gram of characters with BiGRU deep learning technique, as a sequence processing model that consists of two GRUs to achieve higher accuracy in smart contract vulnerability detection. The model gathers features using various deep learning models to identify the smart contract vulnerabilities. Through a series of studies on the currently publicly accessible dataset such as ScrawlD, we show that our hybrid HyMo model has excellent smart contract vulnerability detection performance. Therefore, HyMo performs better detection of smart contract vulnerabilities against other approaches.

翻訳日:2023-04-27 16:42:20 公開日:2023-04-25

# サーロゲート勾配で学習したスパイクニューラルネットワークの表現について

Uncovering the Representation of Spiking Neural Networks Trained with Surrogate Gradient ( http://arxiv.org/abs/2304.13098v1 )

ライセンス: Link先を確認

Yuhang Li, Youngeun Kim, Hyoungseob Park, Priyadarshini Panda

(参考訳) スパイキングニューラルネットワーク(snn)は、その生体適合性とエネルギー効率のために次世代ニューラルネットワークの候補として認識される。近年、SNNは、代理勾配トレーニングを用いて画像認識タスクにおいて、ほぼ最先端のパフォーマンスを達成できることを示した。サーロゲート勾配で訓練されたsnsは、従来のニューラルネットワーク(anns)とは異なる表現を学んでいるのだろうか? SNNの時間次元は独特な表現力を提供するか? 本稿では,中心核アライメント(cka)を用いて,snsとann間の表現類似性解析を行うことにより,これらの質問に答える。まず、幅と深さの両方を含むネットワークの空間次元を分析する。さらに, 残差接続の解析により, SNN は周期パターンを学習し, SNN の表現を ANN 様に修正することを示した。さらに, 時間次元がSNN表現に与える影響についても検討し, より深い層が時間次元に沿ってより動的に作用することを示した。また、イベントストリームデータや敵攻撃などの入力データの影響についても検討する。我々の研究は、SNNにおける表現の新しい発見のホストを明らかにする。この研究が将来の研究に刺激を与え、SNNの表現力を完全に理解することを願っている。コードはhttps://github.com/Intelligent-Computing-Lab-Yale/SNNCKAで公開されている。

Spiking Neural Networks (SNNs) are recognized as the candidate for the next-generation neural networks due to their bio-plausibility and energy efficiency. Recently, researchers have demonstrated that SNNs are able to achieve nearly state-of-the-art performance in image recognition tasks using surrogate gradient training. However, some essential questions exist pertaining to SNNs that are little studied: Do SNNs trained with surrogate gradient learn different representations from traditional Artificial Neural Networks (ANNs)? Does the time dimension in SNNs provide unique representation power? In this paper, we aim to answer these questions by conducting a representation similarity analysis between SNNs and ANNs using Centered Kernel Alignment (CKA). We start by analyzing the spatial dimension of the networks, including both the width and the depth. Furthermore, our analysis of residual connections shows that SNNs learn a periodic pattern, which rectifies the representations in SNNs to be ANN-like. We additionally investigate the effect of the time dimension on SNN representation, finding that deeper layers encourage more dynamics along the time dimension. We also investigate the impact of input data such as event-stream data and adversarial attacks. Our work uncovers a host of new findings of representations in SNNs. We hope this work will inspire future research to fully comprehend the representation power of SNNs. Code is released at https://github.com/Intelligent-Computing-Lab-Yale/SNNCKA.

翻訳日:2023-04-27 16:41:43 公開日:2023-04-25

# ビデオ品質評価モデルをビット深度にロバストにする

Making Video Quality Assessment Models Robust to Bit Depth ( http://arxiv.org/abs/2304.13092v1 )

ライセンス: Link先を確認

Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman and Alan C. Bovik

(参考訳) 標準ダイナミックレンジ (sdr) ビデオ用に設計されたビデオ品質評価 (vqa) アルゴリズムに含まれるhdrmax機能と呼ばれる新しい機能セットを導入することで、これらのアルゴリズムによって不適切に説明される高ダイナミックレンジ (hdr) ビデオの歪みに感応する。これらの特徴はHDRに特有ではなく、SDRコンテンツ上でのVQAモデルの等価性予測性能を増大させるが、特にHDRに有効である。 hdrmaxの特徴は、自然ビデオ統計(nvs)モデルから引き出された強力な前処理を、映像の最も明るく暗い部分に影響を与える可測性を高め、既存のvqaモデルによって説明されない歪みを捉えることである。提案手法の有効性の実証として,現状のVQAモデルでは10ビットのHDRデータベースでは性能が低かったが,HDRMAXと10ビットの歪みビデオでテストした場合のHDRMAX機能の導入により性能が大幅に向上したことを示す。

We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range (HDR) videos that are inadequately accounted for by these algorithms. While these features are not specific to HDR, and also augment the equality prediction performances of VQA models on SDR content, they are especially effective on HDR. HDRMAX features modify powerful priors drawn from Natural Video Statistics (NVS) models by enhancing their measurability where they visually impact the brightest and darkest local portions of videos, thereby capturing distortions that are often poorly accounted for by existing VQA models. As a demonstration of the efficacy of our approach, we show that, while current state-of-the-art VQA models perform poorly on 10-bit HDR databases, their performances are greatly improved by the inclusion of HDRMAX features when tested on HDR and 10-bit distorted videos.

翻訳日:2023-04-27 16:41:07 公開日:2023-04-25

# 非一様超グラフ確率ブロックモデルの厳密な回復

Exact recovery for the non-uniform Hypergraph Stochastic Block Model ( http://arxiv.org/abs/2304.13139v1 )

ライセンス: Link先を確認

Ioana Dumitriu, Haixiao Wang

(参考訳) 非一様ハイパーグラフ確率ブロックモデル(hsbm)の下でのランダムハイパーグラフにおけるコミュニティ検出問題を考える。特に、K$クラスを持つモデルと対称二項モデル(K=2$)を考える。ここでの重要なポイントは、すべての均一な層から情報を集約することで、各層が単独では不可能に見える場合であっても、正確な回復が得られることである。しきい値以上の正確な回復を達成する2つの効率的なアルゴリズムが提供される。我々のアルゴリズムの理論的解析は、非一様ランダムハイパーグラフに対する隣接行列の濃度と正規化に依存しており、これは独立な関心を持つ可能性がある。またパラメータ知識と推定に関するオープンな問題にも対処する。

Consider the community detection problem in random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), where each hyperedge appears independently with some given probability depending only on the labels of its vertices. We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints; in particular, we consider the model with $K$ classes as well as the symmetric binary model ($K=2$). One crucial point here is that by aggregating information from all the uniform layers, we may obtain exact recovery even in cases when this may appear impossible if each layer were considered alone. Two efficient algorithms that successfully achieve exact recovery above the threshold are provided. The theoretical analysis of our algorithms relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which could be of independent interest. We also address some open problems regarding parameter knowledge and estimation.

翻訳日:2023-04-27 16:35:18 公開日:2023-04-25

# 決定時間計画のための更新等価フレームワーク

The Update Equivalence Framework for Decision-Time Planning ( http://arxiv.org/abs/2304.13138v1 )

ライセンス: Link先を確認

Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

(参考訳) 実行直前にポリシーを修正(あるいは構築)するプロセス – 決定時間計画(decisive-time planning)と呼ばれる – は、チェスやゴーといった完璧な情報設定で超人的なパフォーマンスを達成する上でキーとなる。最近の作業では、意思決定時間の計画をより一般的な不完全な情報設定に拡張し、ポーカーにおける超人的なパフォーマンスに繋がった。しかし,これらの手法では,非公開情報の量が多い場合には,そのサイズが急速に大きくなるサブゲームを考える必要がある。本稿では,サブゲームではなく,更新等価性の概念に基づく,意思決定時計画のための代替フレームワークを提案する。このフレームワークでは、決定時間計画アルゴリズムが同期学習アルゴリズムの更新をシミュレートする。この枠組みにより,公的な情報に依存しない意思決定時間計画手法を新たに導入し,非公的な情報量の多い設定において,健全かつ効果的な意思決定計画への扉を開くことができる。実験では、このファミリーのメンバーは、ハナビの最先端のアプローチと同等または優れた結果を生成し、3x3のAbrupt Dark HexとPhantom Tic-Tac-Toeのパフォーマンスを改善した。

The process of revising (or constructing) a policy immediately prior to execution -- known as decision-time planning -- is key to achieving superhuman performance in perfect-information settings like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information settings, leading to superhuman performance in poker. However, these methods requires considering subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on subgames but rather on the notion of update equivalence. In this framework, decision-time planning algorithms simulate updates of synchronous learning algorithms. This framework enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision-time planning in settings with large amounts of non-public information. In experiments, members of this family produce comparable or superior results compared to state-of-the-art approaches in Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.

翻訳日:2023-04-27 16:35:02 公開日:2023-04-25

# MEDNC:COVID-19診断のためのマルチアンサンブルディープニューラルネットワーク

MEDNC: Multi-ensemble deep neural network for COVID-19 diagnosis ( http://arxiv.org/abs/2304.13135v1 )

ライセンス: Link先を確認

Lin Yang, Shuihua Wang, Yudong Zhang

(参考訳) 2019年の新型コロナウイルス(covid-19)は3年間世界中に広がったが、多くの地域の医療施設はまだ不十分だ。リスクの高い患者を特定し、限られた医療資源の使用を最大化するために、急速な新型コロナウイルスの診断が必要である。そこで本研究では,CT画像を用いたCOVID-19自動予測・診断のための深層学習フレームワーク MEDNC を提案する。当社のモデルは、2セットのCOVID-19データを使用してトレーニングされました。そして、トランスファーラーニングのインスピレーションを得て作られた。その結果、medncは新型コロナウイルスの感染の検出を大幅に強化し、それぞれ98.79%と99.82%の精度に達した。我々は脳腫瘍と血液細胞データセットを用いてMDDNCを試験し、我々のモデルが幅広い問題に適用可能であることを示した。その結果,提案モデルでそれぞれ99.39%,99.28%の精度が得られた。この新型コロナウイルス(COVID-19)認識ツールは、医療資源の最適化と、ウイルスのスクリーニング時に臨床医の負担軽減に役立つ。

Coronavirus disease 2019 (COVID-19) has spread all over the world for three years, but medical facilities in many areas still aren't adequate. There is a need for rapid COVID-19 diagnosis to identify high-risk patients and maximize the use of limited medical resources. Motivated by this fact, we proposed the deep learning framework MEDNC for automatic prediction and diagnosis of COVID-19 using computed tomography (CT) images. Our model was trained using two publicly available sets of COVID-19 data. And it was built with the inspiration of transfer learning. Results indicated that the MEDNC greatly enhanced the detection of COVID-19 infections, reaching an accuracy of 98.79% and 99.82% respectively. We tested MEDNC on a brain tumor and a blood cell dataset to show that our model applies to a wide range of problems. The outcomes demonstrated that our proposed models attained an accuracy of 99.39% and 99.28%, respectively. This COVID-19 recognition tool could help optimize healthcare resources and reduce clinicians' workload when screening for the virus.

翻訳日:2023-04-27 16:34:36 公開日:2023-04-25

# LAST: JAXにおけるスケーラブルな格子ベースの音声モデリング

LAST: Scalable Lattice-Based Speech Modelling in JAX ( http://arxiv.org/abs/2304.13134v1 )

ライセンス: Link先を確認

Ke Wu, Ehsan Variani, Tom Bagby, Michael Riley

(参考訳) JAX で LAttice ベースの Speech Transducer ライブラリ LAST を紹介する。柔軟性、使いやすさ、スケーラビリティに重点を置いて、lastは、発話全体に対する認識格子のような大きなwfsaにスケールする \&推論のトレーニングに必要な微分可能重み付き有限状態オートマトン(wfsa)アルゴリズムを実装している。これらのWFSAアルゴリズムは文献でよく知られているが、現代のアーキテクチャのパフォーマンス特性や、自動微分におけるニュアンスから新たな課題が生じる。本稿では、これらの課題に対処するためにLASTで使用される一般的なテクニックのスイートを説明し、TPUv3とV100 GPUのベンチマークでその効果を実証する。

We introduce LAST, a LAttice-based Speech Transducer library in JAX. With an emphasis on flexibility, ease-of-use, and scalability, LAST implements differentiable weighted finite state automaton (WFSA) algorithms needed for training \& inference that scale to a large WFSA such as a recognition lattice over the entire utterance. Despite these WFSA algorithms being well-known in the literature, new challenges arise from performance characteristics of modern architectures, and from nuances in automatic differentiation. We describe a suite of generally applicable techniques employed in LAST to address these challenges, and demonstrate their effectiveness with benchmarks on TPUv3 and V100 GPU.

翻訳日:2023-04-27 16:34:20 公開日:2023-04-25

# 有向連鎖生成型逆ネットワーク

Directed Chain Generative Adversarial Networks ( http://arxiv.org/abs/2304.13131v1 )

ライセンス: Link先を確認

Ming Min, Ruimeng Hu, Tomoyuki Ichiba

(参考訳) 実世界のデータは、コミュニティにおける意見のばらつきを記述するデータ、ニューロンのインタースパイク間隔分布、振動子自然周波数などのマルチモーダルな分散が可能である。マルチモーダル分散実世界のデータ生成は,GAN(Generative Adversarial Network)の課題となっている。例えば、無限次元GANとして扱われるニューラル確率微分方程式(Neural SDEs)は、主に単調時系列データを生成することに成功している。本稿では,方向連鎖型sdesのドリフトと拡散係数に分布制約のある時系列データセット(方向連鎖または入力の近傍プロセスと呼ばれる)を挿入する,方向連鎖gans (dc-gans) という新しい時系列生成器を提案する。 dc-gansは近隣プロセスと同じ分布の新しい時系列を生成することができ、近傍プロセスはマルチモーダルな分散時系列を学習し生成するための重要なステップを提供する。提案するdc-ganは,社会科学と計算神経科学の2つの確率モデルと,株価とエネルギー消費に関する実世界データセットを含む4つのデータセットで検討された。我々の知る限り、DC-GANは、マルチモーダル時系列データを生成し、分布、データ類似性、予測能力に関して、常に最先端のベンチマークを上回ります。

Real-world data can be multimodal distributed, e.g., data describing the opinion divergence in a community, the interspike interval distribution of neurons, and the oscillators natural frequencies. Generating multimodal distributed real-world data has become a challenge to existing generative adversarial networks (GANs). For example, neural stochastic differential equations (Neural SDEs), treated as infinite-dimensional GANs, have demonstrated successful performance mainly in generating unimodal time series data. In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints. DC-GANs can generate new time series of the same distribution as the neighborhood process, and the neighborhood process will provide the key step in learning and generating multimodal distributed time series. The proposed DC-GANs are examined on four datasets, including two stochastic models from social sciences and computational neuroscience, and two real-world datasets on stock prices and energy consumption. To our best knowledge, DC-GANs are the first work that can generate multimodal time series data and consistently outperforms state-of-the-art benchmarks with respect to measures of distribution, data similarity, and predictive ability.

翻訳日:2023-04-27 16:34:07 公開日:2023-04-25

# 接地型マルチモーダルプリトレーニングのための名前付きエンティティリッチキャプションのハイパーニミゼーション

Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining ( http://arxiv.org/abs/2304.13130v1 )

ライセンス: Link先を確認

Giacomo Nebbia, Adriana Kovashka

(参考訳) 名前付きエンティティは、画像に自然に付随するテキスト、特にニュースやwikipediaの記事のようなドメインにおいてユビキタスである。これまでの研究で、wikipediaで事前トレーニングされ、名前付きエンティティフリーのベンチマークデータセットで評価された画像テキスト検索モデルの低パフォーマンスの理由として、名前付きエンティティが挙げられてきた。滅多に言及されないため、名前付きエンティティはモデル化が難しい場合がある。画像内の名前付きエンティティとオブジェクトの間のリンクは、モデルによって見逃されるかもしれないが、オブジェクトがより一般的な用語で言及された場合ではない。本研究では,複数モーダルモデルの事前学習やオープン語彙検出の微調整のための名前付きエンティティを扱う方法として,ハイパニミゼーションについて検討する。ハイパーnymizationを行うには,(1)概念の包括的オントロジーに依存する‘manual’パイプライン,(2)言語モデルを学習してハイパーnymizationを行う‘learned’アプローチの2つの方法を提案する。ウィキペディアやThe New York Timesのデータに関する実験を行っている。ハイパーnym化後の関心対象の事前学習性能の向上を報告し,特にトレーニング中に見ないクラスにおいて,オープンボキャブラリー検出におけるハイパーnym化の期待を示す。

Named entities are ubiquitous in text that naturally accompanies images, especially in domains such as news or Wikipedia articles. In previous work, named entities have been identified as a likely reason for low performance of image-text retrieval models pretrained on Wikipedia and evaluated on named entities-free benchmark datasets. Because they are rarely mentioned, named entities could be challenging to model. They also represent missed learning opportunities for self-supervised models: the link between named entity and object in the image may be missed by the model, but it would not be if the object were mentioned using a more common term. In this work, we investigate hypernymization as a way to deal with named entities for pretraining grounding-based multi-modal models and for fine-tuning on open-vocabulary detection. We propose two ways to perform hypernymization: (1) a ``manual'' pipeline relying on a comprehensive ontology of concepts, and (2) a ``learned'' approach where we train a language model to learn to perform hypernymization. We run experiments on data from Wikipedia and from The New York Times. We report improved pretraining performance on objects of interest following hypernymization, and we show the promise of hypernymization on open-vocabulary detection, specifically on classes not seen during training.

翻訳日:2023-04-27 16:33:41 公開日:2023-04-25

# 軌道角運動量を持つ光の方位バックフロー

Azimuthal backflow in light carrying orbital angular momentum ( http://arxiv.org/abs/2304.13124v1 )

ライセンス: Link先を確認

Bohnishikha Ghosh, Anat Daniel, Bernard Gorzkowski, Radek Lapkiewicz

(参考訳) M.V.ベリーの研究(J. Phys. A: Math. Theor. 43, 415302 (2010))は、量子力学のバックフローと波動の超振動の対応を強調した。スーパーオシレーション(Superoscillations)は、スーパーポジションの局所的な振動が最も速いフーリエ成分よりも速い状況を指す。この概念は、光波の逆線形運動量の逆流を示すために使われてきた。本研究では、負の軌道角運動量しか持たない古典光の干渉を調べ、そのような干渉、正の局所軌道角運動量の暗い縁で観測する。この発見は光-物質相互作用の研究に影響を及ぼし、量子逆流を2次元で観測する過程を示す。

M.V. Berry's work [J. Phys. A: Math. Theor. 43, 415302 (2010)] highlighted the correspondence between backflow in quantum mechanics and superoscillations in waves. Superoscillations refer to situations where the local oscillation of a superposition is faster than its fastest Fourier component. This concept has been used to demonstrate backflow in transverse linear momentum for optical waves. In this work, we examine the interference of classical light carrying only negative orbital angular momentum and observe in the dark fringes of such an interference, positive local orbital angular momentum. This finding may have implications for the studies of light-matter interaction and represents a step towards observing quantum backflow in two dimensions.

翻訳日:2023-04-27 16:33:18 公開日:2023-04-25

# 外部ゲージ場結合量子力学:ゲージ選択、ハイゼンベルク代数表現とゲージ不変性、特にランダウ問題

External gauge field coupled quantum dynamics: gauge choices, Heisenberg algebra representations and gauge invariance in general, and the Landau problem in particular ( http://arxiv.org/abs/2304.13122v1 )

ライセンス: Link先を確認

Jan Govaerts (CP3, Univ. cath. Louvain, UCLouvain, Louvain-la-Neuve, Belgium)

(参考訳) 古典的運動方程式は不変のままであるが、作用が加法的な全微分あるいは発散項によって再定義されるとき(機械系の場合)、そのような変換は系の正準位相空間の定式化に非自明な結果をもたらす。これはさらに真実であり、特に量子系で使われているハイゼンベルク代数のユニタリ構成空間表現における誘導変換を含む、正準量子化力学にとってさらに微妙な方法である。背景ゲージ場と結合すると、そのような考察は、その古典的な外部背景ゲージ場のゲージ変換に対するシステムの量子力学の結果を適切に理解するために重要となるが、そのような変換の下では、系の自由度、抽象量子状態、量子力学は確実に厳密に不変である。一般的な文脈におけるこれらの異なる点の詳細な解析の後、これらは量子ランダウ問題の場合、古典的な背景磁気ベクトルポテンシャルを持つ量子ランダウ問題において、最も一般的なパラメトリッドゲージ選択をここで実施する。後者の議論は、量子系の磁気ベクトルポテンシャルに対するゲージ選択の状況に関する文献におけるいくつかの難解な言明を明らかにすることを目的としている。ランダウ問題における大域的時空対称性とそのゲージ不変なネーター電荷の役割も強調される。

Even though its classical equations of motion are then left invariant, when an action is redefined by an additive total derivative or divergence term (in time, in the case of a mechanical system) such a transformation induces nontrivial consequences for the system's canonical phase space formulation. This is even more true and then in more subtle ways for the canonically quantised dynamics, with in particular an induced transformation in the unitary configuration space representation of the Heisenberg algebra being used for the quantum system. When coupled to a background gauge field, such considerations become crucial for a proper understanding of the consequences for the system's quantum dynamics of gauge transformations of that classical external background gauge field, while under such transformations the system's degrees of freedom, abstract quantum states and quantum dynamics are certainly strictly invariant. After a detailed analysis of these different points in a general context, these are then illustrated specifically in the case of the quantum Landau problem with its classical external background magnetic vector potential for which the most general possible parametrised gauge choice is implemented herein. The latter discussion aims as well to clarify some perplexing statements in the literature regarding the status of gauge choices to be made for the magnetic vector potential for that quantum system. The role of the global space-time symmetries of the Landau problem and their gauge invariant Noether charges is then also emphasized.

翻訳日:2023-04-27 16:33:02 公開日:2023-04-25

# 機械学習によるイベントバイイベントドップラー補正を用いた高速・高温エキゾチック同位体の精密分光

Precision Spectroscopy of Fast, Hot Exotic Isotopes Using Machine Learning Assisted Event-by-Event Doppler Correction ( http://arxiv.org/abs/2304.13120v1 )

ライセンス: Link先を確認

Silviu-Marian Udrescu, Diego Alejandro Torres, Ronald Fernando Garcia Ruiz

(参考訳) 本研究では,高速エキゾチック同位体の高感度・高精度レーザー分光法を提案する。電場内を走行する原子の段階的共振イオン化を誘導し、その後イオン及び対応する電子を検出することにより、得られた粒子の時間及び位置感応測定を行うことができる。混合密度ネットワーク(mdn)を用いて、この情報を利用して個々の原子の初期エネルギーを予測し、観測された遷移周波数のドップラー補正を事象ごとに適用することができる。提案手法の数値シミュレーションを行い, 極低温で生成するイオンビームに対して, 最大10ドルkev, 非一様速度分布でkhzレベルの不確かさが得られることを示した。高度エネルギービームで直接飛行中の分光を行う能力は、冷却技術を必要としない高温で汚染の高い環境で、ミリ秒以下の寿命の短寿命同位体を研究する特別な機会を提供する。このような種は、核構造、天体物理学、新しい物理探索に顕著な関心を持っている。

We propose an experimental scheme for performing sensitive, high-precision laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise resonant ionization of the atoms travelling inside an electric field and subsequently detecting the ion and the corresponding electron, time- and position-sensitive measurements of the resulting particles can be performed. Using a Mixture Density Network (MDN), we can leverage this information to predict the initial energy of individual atoms and thus apply a Doppler correction of the observed transition frequencies on an event-by-event basis. We conduct numerical simulations of the proposed experimental scheme and show that kHz-level uncertainties can be achieved for ion beams produced at extreme temperatures ($> 10^8$ K), with energy spreads as large as $10$ keV and non-uniform velocity distributions. The ability to perform in-flight spectroscopy, directly on highly energetic beams, offers unique opportunities to studying short-lived isotopes with lifetimes in the millisecond range and below, produced in low quantities, in hot and highly contaminated environments, without the need for cooling techniques. Such species are of marked interest for nuclear structure, astrophysics, and new physics searches.

翻訳日:2023-04-27 16:32:35 公開日:2023-04-25

# 非線形チャネル補償用変圧器の光学系への応用

Application of Transformers for Nonlinear Channel Compensation in Optical Systems ( http://arxiv.org/abs/2304.13119v1 )

ライセンス: Link先を確認

Behnam Behinaein Hamgini, Hossein Najafi, Ali Bakhshali, and Zhuhong Zhang

(参考訳) 本稿では,トランスフォーマを用いたコヒーレント長距離伝送のための非線形チャネル等化手法を提案する。シンボル列にまたがるメモリに直接出席する能力により,並列化構造でトランスフォーマーを効果的に使用できることを示す。本稿では,非線形等化のためのトランスのエンコーダ部分を実装し,その性能を多種多様なハイパーパラメータで解析する。各繰り返しでシンボルのブロックを処理し、エンコーダの出力のサブセットを慎重に選択することにより、効率的な非線形補償を実現することができる。また,非線形摂動理論に触発された物理形状のマスクを用いて,トランスフォーマー非線形等化の計算複雑性を低減する手法を提案する。

In this paper, we introduce a new nonlinear channel equalization method for the coherent long-haul transmission based on Transformers. We show that due to their capability to attend directly to the memory across a sequence of symbols, Transformers can be used effectively with a parallelized structure. We present an implementation of encoder part of Transformer for nonlinear equalization and analyze its performance over a wide range of different hyper-parameters. It is shown that by processing blocks of symbols at each iteration and carefully selecting subsets of the encoder's output to be processed together, an efficient nonlinear compensation can be achieved. We also propose the use of a physic-informed mask inspired by nonlinear perturbation theory for reducing the computational complexity of Transformer nonlinear equalization.

翻訳日:2023-04-27 16:32:12 公開日:2023-04-25

# オートエンコーダを用いたSMAPパッシブラジオメータの電波干渉低減

Autoencoder-based Radio Frequency Interference Mitigation For SMAP Passive Radiometer ( http://arxiv.org/abs/2304.13158v1 )

ライセンス: Link先を確認

Ali Owfi, Fatemeh Afghah

(参考訳) 1400-1427mhz保護周波数帯域対電波干渉(rfi)で運用される受動空間型放射計。無線デバイスの成長と新しい技術の出現により、このスペクトルを他の技術と共有することは、これらの放射計により多くのRFIをもたらす可能性がある。このバンドは5g以降にとって理想的な中間帯域周波数であり、高い容量と良好なカバレッジを提供する。 SMAP (Soil Moisture Active Passive) における現在のRFI検出・緩和技術は、特に重度のRFIケースにおいて、貴重な情報が失われる原因となる汚染されたデータを正しく検出・破棄・フィルタリングすることに依存する。本稿では, 受信受信側で受信した汚染信号から, 潜在的に共存する地上ユーザ(例えば5G基地局)によって引き起こされる支配的RFIを除去し, 貴重な情報を保存し, 汚染データの破棄を防止する, 自己エンコーダに基づくRFI緩和手法を提案する。

Passive space-borne radiometers operating in the 1400-1427 MHz protected frequency band face radio frequency interference (RFI) from terrestrial sources. With the growth of wireless devices and the appearance of new technologies, the possibility of sharing this spectrum with other technologies would introduce more RFI to these radiometers. This band could be an ideal mid-band frequency for 5G and Beyond, as it offers high capacity and good coverage. Current RFI detection and mitigation techniques at SMAP (Soil Moisture Active Passive) depend on correctly detecting and discarding or filtering the contaminated data leading to the loss of valuable information, especially in severe RFI cases. In this paper, we propose an autoencoder-based RFI mitigation method to remove the dominant RFI caused by potential coexistent terrestrial users (i.e., 5G base station) from the received contaminated signal at the passive receiver side, potentially preserving valuable information and preventing the contaminated data from being discarded.

翻訳日:2023-04-27 16:25:16 公開日:2023-04-25

# HDR-ChipQA:高ダイナミックレンジ映像の非参照品質評価

HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos ( http://arxiv.org/abs/2304.13156v1 )

ライセンス: Link先を確認

Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram Sethuraman and Alan C. Bovik

(参考訳) 我々は,HDR-ChipQAと呼ぶハイダイナミックレンジ(HDR)ビデオのスタンドアウト性能を実現するノン参照ビデオ品質モデルとアルゴリズムを提案する。 HDRビデオは、標準ダイナミックレンジ(SDR)ビデオよりも幅広い輝度、詳細、色を表現している。大規模ビデオネットワークにおけるHDRの採用の増加により、HDRコンテンツの歪みを考慮に入れたビデオ品質評価(VQA)アルゴリズムの必要性が高まっている。特に、標準的なVQAモデルは、ダイナミックレンジの極端における顕著な歪みを捉えることができない。局所的な拡張的非線形性は、{local} luma範囲の上端と下端で発生する歪みを強調する新しいアプローチを導入し、別々の経路に沿って計算される付加的な品質認識特徴の定義を可能にした。これらの特徴はHDR特有のものではなく、SDR映像のVQAも改善されている。この前処理ステップは,hdrコンテンツの品質予測に用いる際に,歪みに敏感な自然ビデオ統計(nvs)機能のパワーを著しく高めている。同様の方法で、同じ非線形処理ステップを用いて、新しい広域色特徴を別々に計算する。当社のモデルがsdr vqaアルゴリズムを,公開されている唯一の包括的なhdrデータベースで大幅に上回っていると同時に,sdrコンテンツの最先端のパフォーマンスも達成していることが分かりました。

We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms that better account for distortions on HDR content. In particular, standard VQA models may fail to capture conspicuous distortions at the extreme ends of the dynamic range, because the features that drive them may be dominated by distortions {that pervade the mid-ranges of the signal}. We introduce a new approach whereby a local expansive nonlinearity emphasizes distortions occurring at the higher and lower ends of the {local} luma range, allowing for the definition of additional quality-aware features that are computed along a separate path. These features are not HDR-specific, and also improve VQA on SDR video contents, albeit to a reduced degree. We show that this preprocessing step significantly boosts the power of distortion-sensitive natural video statistics (NVS) features when used to predict the quality of HDR content. In similar manner, we separately compute novel wide-gamut color features using the same nonlinear processing steps. We have found that our model significantly outperforms SDR VQA algorithms on the only publicly available, comprehensive HDR database, while also attaining state-of-the-art performance on SDR content.

翻訳日:2023-04-27 16:24:57 公開日:2023-04-25

# LumiGAN:3D顔の無条件生成

LumiGAN: Unconditional Generation of Relightable 3D Human Faces ( http://arxiv.org/abs/2304.13153v1 )

ライセンス: Link先を確認

Boyang Deng, Yifan Wang, Gordon Wetzstein

(参考訳) 非構造化2次元画像データによる3次元顔の教師なし学習は活発な研究領域である。近年の作品は印象的なフォトリアリズムを達成しているが、通常は照明の制御が欠如しており、生成された資産が新しい環境に配備されることを防いでいる。そこで本稿では,3次元顔用無条件生成逆ネットワーク(gan)lumiganと,推定時に新たな照明下での照明を可能にする物理ベースの照明モジュールを提案する。以前の研究とは異なり、LumiGANは自己監督的な方法で学習された効率的な可視性定式化を用いて、現実的な影効果を生み出すことができる。 LumiGANは、表面の正常、拡散アルベド、地上の真実データのない特異なスズなど、可照性のある顔の物理的特性を生成する。再現性に加えて, 最新技術である3D GANと比較して幾何生成が著しく向上し, 既存の3D GANよりもフォトリアリズムが優れていた。

Unsupervised learning of 3D human faces from unstructured 2D image data is an active research area. While recent works have achieved an impressive level of photorealism, they commonly lack control of lighting, which prevents the generated assets from being deployed in novel environments. To this end, we introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D human faces with a physically based lighting module that enables relighting under novel illumination at inference time. Unlike prior work, LumiGAN can create realistic shadow effects using an efficient visibility formulation that is learned in a self-supervised manner. LumiGAN generates plausible physical properties for relightable faces, including surface normals, diffuse albedo, and specular tint without any ground truth data. In addition to relightability, we demonstrate significantly improved geometry generation compared to state-of-the-art non-relightable 3D GANs and notably better photorealism than existing relightable GANs.

翻訳日:2023-04-27 16:24:34 公開日:2023-04-25

# ロールドロップ:単一パラメータによる観測ノイズの計算

Roll-Drop: accounting for observation noise with a single parameter ( http://arxiv.org/abs/2304.13150v1 )

ライセンス: Link先を確認

Luigi Campanaro and Daniele De Martini and Siddhant Gangapurwala and Wolfgang Merkt and Ioannis Havoutis

(参考訳) 本稿では,シミュレーション中にドロップアウトすることで,各状態の分布を明示的にモデル化することなく,デプロイ時の観測ノイズを考慮し,drl (deep-reinforcement learning) におけるsim-to-real の簡易な戦略を提案する。 drlは、ロボットを高度にダイナミックでフィードバックベースの操作に制御するための有望なアプローチであり、正確なシミュレーターは、望ましい振る舞いを学ぶために安価で豊富なデータを提供するために不可欠である。それでもシミュレートされたデータはノイズがなく、一般的にはノイズの影響を受けやすい実機への展開に挑戦する分布シフトを示す。標準的な解決策は、後者をモデル化し、トレーニング中に注入することであり、完全なシステム識別が必要であるが、ロールドロップは単一のパラメータのみをチューニングすることで、センサノイズに対する堅牢性を高める。観測では,最大25%のノイズを注入した場合の80%の成功率を示し,ベースラインの2倍の堅牢性を示した。シミュレーションで訓練したコントローラをUnitree A1プラットフォーム上に展開し,この物理系におけるロバスト性の向上を評価した。

This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to providing cheap and abundant data to learn the desired behaviour. Nevertheless, the simulated data are noiseless and generally show a distributional shift that challenges the deployment on real machines where sensor readings are affected by noise. The standard solution is modelling the latter and injecting it during training; while this requires a thorough system identification, Roll-Drop enhances the robustness to sensor noise by tuning only a single parameter. We demonstrate an 80% success rate when up to 25% noise is injected in the observations, with twice higher robustness than the baselines. We deploy the controller trained in simulation on a Unitree A1 platform and assess this improved robustness on the physical system.

翻訳日:2023-04-27 16:24:15 公開日:2023-04-25

# 仮想アシスタントのための音声情報クエリのモデル化 : オープン問題,課題,機会

Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities ( http://arxiv.org/abs/2304.13149v1 )

ライセンス: Link先を確認

Christophe Van Gysel

(参考訳) バーチャルアシスタントは音声による情報検索プラットフォームとしてますます重要になりつつある。本稿では,仮想アシスタントのための音声情報クエリのモデル化に関するオープン問題と課題と,仮想アシスタント音声認識の品質向上のために情報検索手法と研究が適用できる機会の一覧について論じる。問合せドメイン分類,知識グラフ,ユーザインタラクションデータ,および問合せパーソナライゼーションが,音声情報ドメインクエリの正確な認識向上にどのように役立つかを論じる。最後に,音声認識における現状の問題点と課題について概説する。

Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.

翻訳日:2023-04-27 16:23:53 公開日:2023-04-25

# MBIB - 最初のメディアバイアス識別ベンチマークタスクとデータセットコレクション

Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection ( http://arxiv.org/abs/2304.13148v1 )

ライセンス: Link先を確認

Martin Wessel, Tom\'a\v{s} Horych, Terry Ruas, Akiko Aizawa, Bela Gipp and Timo Spinde

(参考訳) メディアバイアス検出は複雑なマルチタスク問題であるが、これまでこれらの評価タスクをグループ化する統一ベンチマークは存在しない。メディアバイアス識別ベンチマーク(MBIB)は,様々なタイプのメディアバイアス(言語,認知,政治など)を共通の枠組みの下でグループ化し,予測検出手法の一般化を検証するための総合ベンチマークである。 115のデータセットをレビューした後、9つのタスクを選択し、メディアバイアス検出技術を評価するための22の関連するデータセットを慎重に提案する。我々は,最先端トランスフォーマー技術(T5,BARTなど)を用いてMBIBを評価する。我々の結果は、ヘイトスピーチ、人種的偏見、性別的偏見は検出しやすいが、モデルが認知や政治的偏見といった特定のバイアスタイプを扱うのに苦労していることを示唆している。しかし,本研究の結果から,他の手法よりも優れた性能を発揮する技術はないことがわかった。また,メディアバイアスの個々のタスクに対する研究関心と資源配分の不均一な分布も見いだした。統一ベンチマークは、より堅牢なシステムの開発を奨励し、メディアバイアス検出評価の現在のパラダイムを、複数のメディアバイアスタイプを同時に取り組むソリューションへとシフトさせる。

Although media bias detection is a complex multi-task problem, there is, to date, no unified benchmark grouping these evaluation tasks. We introduce the Media Bias Identification Benchmark (MBIB), a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques. We evaluate MBIB using state-of-the-art Transformer techniques (e.g., T5, BART). Our results suggest that while hate speech, racial bias, and gender bias are easier to detect, models struggle to handle certain bias types, e.g., cognitive and political bias. However, our results show that no single technique can outperform all the others significantly. We also find an uneven distribution of research interest and resource allocation to the individual tasks in media bias. A unified benchmark encourages the development of more robust systems and shifts the current paradigm in media bias detection evaluation towards solutions that tackle not one but multiple media bias types simultaneously.

翻訳日:2023-04-27 16:23:43 公開日:2023-04-25

# 時間スケール間の一貫性から自己教師付きマルチオブジェクトトラッキング

Self-Supervised Multi-Object Tracking From Consistency Across Timescales ( http://arxiv.org/abs/2304.13147v1 )

ライセンス: Link先を確認

Christopher Lang, Alexander Braun, Lars Schillingmann, Abhinav Valada

(参考訳) 自己監視されたマルチオブジェクトトラッカーは、世界中の膨大な生データを活用できる可能性がある。しかし、彼らは監督対象に比べて再同定の精度が低い。この欠損は、自己監督対象を単一のフレームまたはフレームペアに制限することに起因すると仮定する。このような設計は、一貫した再識別機能を学ぶのに十分な視覚的外観の変化を欠いている。そこで本研究では,短期・長期にわたる一貫したアソシエーションスコアを強制することにより,フレーム列上の再同定特徴を学習する学習目標を提案する。 BDD100KとMOT17ベンチマークの大規模な評価では、学習したReID機能は、他の自己管理手法と比較してIDスイッチを著しく減らし、自己管理されたマルチオブジェクトトラッキングのための技術の新たな状態を設定し、BDD100kベンチマークの教師付きメソッドと同等に実行しました。

Self-supervised multi-object trackers have the potential to leverage the vast amounts of raw data recorded worldwide. However, they still fall short in re-identification accuracy compared to their supervised counterparts. We hypothesize that this deficiency results from restricting self-supervised objectives to single frames or frame pairs. Such designs lack sufficient visual appearance variations during training to learn consistent re-identification features. Therefore, we propose a training objective that learns re-identification features over a sequence of frames by enforcing consistent association scores across short and long timescales. Extensive evaluations on the BDD100K and MOT17 benchmarks demonstrate that our learned ReID features significantly reduce ID switches compared to other self-supervised methods, setting the new state of the art for self-supervised multi-object tracking and even performing on par with supervised methods on the BDD100k benchmark.

翻訳日:2023-04-27 16:23:23 公開日:2023-04-25

# t細胞受容体タンパク質配列とスパースコード : 癌分類への新しいアプローチ

T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification ( http://arxiv.org/abs/2304.13145v1 )

ライセンス: Link先を確認

Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad and Murray Patterson

(参考訳) 癌は、制御不能な細胞増殖と増殖を特徴とする複雑な疾患である。 T細胞受容体(TCR)は、適応免疫系に必須のタンパク質であり、抗原の特異的認識は、がんを含む疾患に対する免疫応答において重要な役割を果たす。 TCRの多様性と特異性は、がん細胞をターゲットにするのに理想的であり、シークエンシング技術の最近の進歩は、TCRレパートリーの包括的なプロファイリングを可能にしている。これにより、強力な抗がん活性を持つTCRの発見とTCRベースの免疫療法の開発につながった。本研究では,癌分類を対象とするTCRタンパク質配列のマルチクラス分類におけるスパース符号の利用について検討した。スパースコーディングは、一連の情報的特徴を持つデータの表現を可能にし、アミノ酸間の複雑な関係を捉え、低次元の方法で見逃される可能性のあるシーケンス内の微妙なパターンを識別できる機械学習の一般的なテクニックである。まず、TCRシーケンスからk-merを計算し、次にスパース符号化を適用してデータの本質的な特徴を捉える。最終埋め込みの予測性能を向上させるため,各種類のがん特性に関するドメイン知識を統合する。次に,教師付き解析のためにtcr系列の埋め込みについて,異なる機械学習(線形および非線形)分類器を訓練する。提案手法は,TCRシーケンスのベンチマークデータセットへの埋め込みにより,予測性能においてベースラインを著しく上回り,99.8\%の精度を実現する。本研究は癌研究や他の関連分野におけるTCRタンパク質配列の解析におけるスパースコーディングの可能性を明らかにするものである。

Cancer is a complex disease characterized by uncontrolled cell growth and proliferation. T cell receptors (TCRs) are essential proteins for the adaptive immune system, and their specific recognition of antigens plays a crucial role in the immune response against diseases, including cancer. The diversity and specificity of TCRs make them ideal for targeting cancer cells, and recent advancements in sequencing technologies have enabled the comprehensive profiling of TCR repertoires. This has led to the discovery of TCRs with potent anti-cancer activity and the development of TCR-based immunotherapies. In this study, we investigate the use of sparse coding for the multi-class classification of TCR protein sequences with cancer categories as target labels. Sparse coding is a popular technique in machine learning that enables the representation of data with a set of informative features and can capture complex relationships between amino acids and identify subtle patterns in the sequence that might be missed by low-dimensional methods. We first compute the k-mers from the TCR sequences and then apply sparse coding to capture the essential features of the data. To improve the predictive performance of the final embeddings, we integrate domain knowledge regarding different types of cancer properties. We then train different machine learning (linear and non-linear) classifiers on the embeddings of TCR sequences for the purpose of supervised analysis. Our proposed embedding method on a benchmark dataset of TCR sequences significantly outperforms the baselines in terms of predictive performance, achieving an accuracy of 99.8\%. Our study highlights the potential of sparse coding for the analysis of TCR protein sequences in cancer research and other related fields.

翻訳日:2023-04-27 16:23:08 公開日:2023-04-25

# 時空間データの自己監督時間解析

Self-Supervised Temporal Analysis of Spatiotemporal Data ( http://arxiv.org/abs/2304.13143v1 )

ライセンス: Link先を確認

Yi Cao and Swetava Ganguli and Vipul Pandey

(参考訳) 地形活動の時間的パターンと土地利用の種類には相関関係がある。移動活動時系列に基づいて景観を階層化する新しい自己監督手法を提案する。まず、時系列信号は周波数領域に変換され、時間系列で観察される周期時間パターンを保存する収縮型オートエンコーダによりタスク非依存の時間埋め込みに圧縮される。ピクセルワイズ埋め込みは、深いセマンティックセグメンテーションを用いた下流空間タスクのタスクベースマルチモーダルモデリングに使用できるイメージライクなチャネルに変換される。実験により,時間的埋め込みは時系列データの意味的に意味のある表現であり,住宅地や商業地域を分類するといった様々なタスクに有効であることが示された。

There exists a correlation between geospatial activity temporal patterns and type of land use. A novel self-supervised approach is proposed to stratify landscape based on mobility activity time series. First, the time series signal is transformed to the frequency domain and then compressed into task-agnostic temporal embeddings by a contractive autoencoder, which preserves cyclic temporal patterns observed in time series. The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling of downstream geospatial tasks using deep semantic segmentation. Experiments show that temporal embeddings are semantically meaningful representations of time series data and are effective across different tasks such as classifying residential area and commercial areas.

翻訳日:2023-04-27 16:22:42 公開日:2023-04-25

# カシミール-ポルダー相互作用とチャーン-シモン境界層

Casimir-Polder interaction with Chern-Simons boundary layers ( http://arxiv.org/abs/2304.13186v1 )

ライセンス: Link先を確認

Valery N. Marachevsky and Arseny A. Sidelnikov

(参考訳) グリーン関数散乱法は、異なる媒体間の平面境界からの反射後の電磁偏光の混合を考慮し、チャーン・サイモンズ平面境界層を持つ系におけるカシミール・ポルダーポテンシャルの導出に適用するために一般化される。誘電体ハーフスペース上のチャーン・サイモンズ平面境界層の存在下で、異方性原子のカシミール・ポルダーポテンシャルを導出する手法が最初に適用された。そして、チャーン・サイモンズ平面平行境界層を持つ2つの誘電体半空間間の異方性原子のカシミール・ポルダーポテンシャルの一般結果を得る。真空中におけるチャーン・サイモンズ平面平行層間の異方性原子のカシミール・ポルダーポテンシャルは特殊関数によって表される。 2つのChern-Simons平面平行層と、その基底状態にある中性原子の系において、新しいP-odd三体真空効果を発見し、解析した。電界とqed双極子相互作用を持つ中性原子を用いた実験において、チャーン・シモン層の180度回転によって生じるp-odd三体真空効果が顕著に検証できる。

Green functions scattering method is generalized to consider mixing of electromagnetic polarizations after reflection from the plane boundary between different media and applied to derivation of the Casimir-Polder potential in systems with Chern-Simons plane boundary layers. The method is first applied to derive the Casimir-Polder potential of an anisotropic atom in the presence of a Chern-Simons plane boundary layer on a dielectric half-space. Then a general result for the Casimir-Polder potential of an anisotropic atom between two dielectric half-spaces with Chern-Simons plane parallel boundary layers is derived. The Casimir-Polder potential of an anisotropic atom between two Chern-Simons plane parallel layers in vacuum is expressed through special functions. Novel P-odd three-body vacuum effects are discovered and analyzed in the system of two Chern-Simons plane parallel layers and a neutral atom in its ground state between the layers. Remarkably, P-odd three-body vacuum effects arising due to 180 degree rotation of one of the Chern-Simons layers can be verified in experiments with neutral atoms having QED dipole interaction with an electromagnetic field.

翻訳日:2023-04-27 16:16:35 公開日:2023-04-25

# 画像テキストモデルのためのサンプル特異的デバイアス

Sample-Specific Debiasing for Better Image-Text Models ( http://arxiv.org/abs/2304.13181v1 )

ライセンス: Link先を確認

Peiqi Wang, Yingcheng Liu, Ching-Yun Ko, William M. Wells, Seth Berkowitz, Steven Horng, Polina Golland

(参考訳) 画像テキストデータに基づく自己教師付き表現学習は、画像分類、視覚的接地、相互モーダル検索などの重要な医療応用を促進する。 1つの一般的なアプローチは、意味論的に類似した(正)と異種(負)のデータポイントの対を対比することである。トレーニングデータセットから一様に負のサンプルを描画すると、偽の陰性、すなわち同じクラスに属する異種として扱われるサンプルが生じる。医療データでは、基礎となるクラス分布は不均一であり、偽陰性は高い変動率で起こることを意味する。学習表現の品質を向上させるために,偽陰性を補正する新しい手法を開発した。提案手法は, 標本特異的なクラス確率を推定し, 偏差学習の変種と見なすことができる。目的関数の理論的解析を行い、画像とペア画像テキストのデータセットに対して提案したアプローチを示す。実験はサンプル特異的デバイアスの実証的利点を示す。

Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points. Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class. In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate. To improve the quality of learned representations, we develop a novel approach that corrects for false negatives. Our method can be viewed as a variant of debiased constrastive learning that uses estimated sample-specific class probabilities. We provide theoretical analysis of the objective function and demonstrate the proposed approach on both image and paired image-text data sets. Our experiments demonstrate empirical advantages of sample-specific debiasing.

翻訳日:2023-04-27 16:16:18 公開日:2023-04-25

# Sebis at SemEval-2023 Task 7: A Joint System for Natural Language Inference and Evidence Retrieval from Clinical Trial Reports

Sebis at SemEval-2023 Task 7: A Joint System for Natural Language Inference and Evidence Retrieval from Clinical Trial Reports ( http://arxiv.org/abs/2304.13180v1 )

ライセンス: Link先を確認

Juraj Vladika, Florian Matthes

(参考訳) 毎日生成される臨床試験報告の数が増えるにつれて、証拠に基づく医療勧告を知らせる新たな発見に追随することは難しくなってきている。このプロセスを自動化し、医療専門家を支援するため、NLPソリューションが開発されている。これは、エビデンス検索と臨床試験データからの自然言語推論の2つのタスクのためのnlpシステムの開発を目標としたsemeval-2023タスク7の動機となった。本稿では,2つのシステムについて述べる。 1つは2つのタスクを個別にモデル化するパイプラインシステムであり、2つ目は2つのタスクを共有表現とマルチタスク学習アプローチで同時に学習するジョイントシステムである。最終的なシステムは、その出力をアンサンブルシステムに結合する。モデルを形式化し,その特性と課題を提示し,得られた結果の分析を行う。

With the increasing number of clinical trial reports generated every day, it is becoming hard to keep up with novel discoveries that inform evidence-based healthcare recommendations. To help automate this process and assist medical experts, NLP solutions are being developed. This motivated the SemEval-2023 Task 7, where the goal was to develop an NLP system for two tasks: evidence retrieval and natural language inference from clinical trial data. In this paper, we describe our two developed systems. The first one is a pipeline system that models the two tasks separately, while the second one is a joint system that learns the two tasks simultaneously with a shared representation and a multi-task learning approach. The final system combines their outputs in an ensemble system. We formalize the models, present their characteristics and challenges, and provide an analysis of achieved results.

翻訳日:2023-04-27 16:16:04 公開日:2023-04-25

# 電力制約深層学習によるロバスト非線形フィードバック符号化

Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning ( http://arxiv.org/abs/2304.13178v1 )

ライセンス: Link先を確認

Junghoon Kim, Taejoon Kim, David Love, Christopher Brinton

(参考訳) フィードバック可能な通信のためのコードの設計は、長年のオープンな問題であった。非線形な深層学習に基づく符号化方式に関する最近の研究は、線形符号よりも通信信頼性が大幅に向上しているが、チャネル上での前方およびフィードバックノイズの存在に対して脆弱である。本稿では,チャネルノイズに対するロバスト性を大幅に向上させる非線形フィードバック符号のファミリーを開発した。私たちのオートエンコーダベースのアーキテクチャは、ビットの連続ブロックに基づいてコードを学習するように設計されており、ノイズの多いチャネル上でのエンコーダとデコーダの物理的分離を克服するために、ビット単位での処理よりもノイズの少ないアドバンテージを得ます。さらに,ハードウェア制約を学習最適化に明示的に組み込むため,エンコーダの電力制御層を開発し,平均電力制約が漸近的に満たされることを示す。数値実験により,本手法は実効的なフォワードノイズやフィードバックノイズよりも広いマージンでフィードバック符号よりも優れており,非線形符号の挙動に関する情報理論的洞察を提供する。さらに, 長いブロック長条件下では, フィードバックノイズが高くなると, 標準誤り訂正符号がフィードバック符号より好まれることがわかった。

The design of codes for feedback-enabled communications has been a long-standing open problem. Recent research on non-linear, deep learning-based coding schemes have demonstrated significant improvements in communication reliability over linear codes, but are still vulnerable to the presence of forward and feedback noise over the channel. In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise. Our autoencoder-based architecture is designed to learn codes based on consecutive blocks of bits, which obtains de-noising advantages over bit-by-bit processing to help overcome the physical separation between the encoder and decoder over a noisy channel. Moreover, we develop a power control layer at the encoder to explicitly incorporate hardware constraints into the learning optimization, and prove that the resulting average power constraint is satisfied asymptotically. Numerical experiments demonstrate that our scheme outperforms state-of-the-art feedback codes by wide margins over practical forward and feedback noise regimes, and provide information-theoretic insights on the behavior of our non-linear codes. Moreover, we observe that, in a long blocklength regime, canonical error correction codes are still preferable to feedback codes when the feedback noise becomes high.

翻訳日:2023-04-27 16:15:46 公開日:2023-04-25

# 金融強化学習のための動的データセットと市場環境

Dynamic Datasets and Market Environments for Financial Reinforcement Learning ( http://arxiv.org/abs/2304.13174v1 )

ライセンス: Link先を確認

Xiao-Yang Liu, Ziyi Xia, Hongyang Yang, Jiechao Gao, Daochen Zha, Ming Zhu, Christina Dan Wang, Zhaoran Wang, Jian Guo

(参考訳) 金融市場は、動的データセットのユニークな特徴から、深層強化学習にとって特に困難な場である。金融強化学習(FinRL)エージェントを訓練するための高品質な市場環境の構築は、財務データの信号対雑音比の低さ、過去のデータの生存バイアス、モデルオーバーフィッティングなどの大きな要因により困難である。本稿では,実世界の市場からジム型の市場環境へ動的データセットを処理し,ai4financeコミュニティによって積極的に維持されているデータ中心かつオープンアクセス可能なライブラリであるfinrl-metaを提案する。まず、dataopsパラダイムに従って、自動データキュレーションパイプラインを通じて、数百のマーケット環境を提供します。第二に、我々は自家製の事例を提供し、人気のある研究論文を、ユーザーが新しい取引戦略を設計するための足場として再現する。また、ライブラリをクラウドプラットフォームにデプロイすることで、ユーザは自身の結果を視覚化し、コミュニティによるコンペティションを通じて相対的なパフォーマンスを評価することができます。第3に、急速に成長するコミュニティにサービスを提供するために、カリキュラムとドキュメントウェブサイトにまとめられた数十のJupyter/Pythonデモを提供しています。データキュレーションパイプラインのオープンソースコードはhttps://github.com/AI4Finance-Foundation/FinRL-Metaで公開されている。

The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The open-source codes for the data curation pipeline are available at https://github.com/AI4Finance-Foundation/FinRL-Meta

翻訳日:2023-04-27 16:15:24 公開日:2023-04-25

# SAFE: Shard Graphsを使った機械学習

SAFE: Machine Unlearning With Shard Graphs ( http://arxiv.org/abs/2304.13169v1 )

ライセンス: Link先を確認

Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto

(参考訳) 本稿では,学習モデルからトレーニングサンプルの影響を最小化しつつ,さまざまなデータ集合に大規模モデルを適応させる手法であるSynergy Aware Forgetting Ensemble (SAFE)を提案する。このプロセスは選択的忘れまたはアンラーニングとしても知られ、データセットをシャードに分割し、それぞれに完全に独立したモデルをトレーニングし、結果のモデルをアンセンブルすることで実行されることが多い。シャード数の増加は、期待されるコストを減少させるが、独立したモデルトレーニング中にサンプル間の相乗的情報が失われるため、推論コストを増加させ、モデルの最終的な精度を低下させる。個々のシャードを独立したものとして扱うのではなく、SAFEはシャードグラフの概念を導入し、これは訓練中に他のシャードから限られた情報を取り込むことを可能にし、予想される忘れるコストをわずかに増加させ、精度を著しく向上させる。 SAFEは軽量なアダプタシステムを使用し、ほとんどの計算を再利用しながらトレーニングすることができる。これにより、SAFEは現在の最先端の方法(つまり、忘れることのコストを削減)よりも小さなシャードでトレーニングできると同時に、精密なコンピュータビジョンデータセットで実証的に示すように、高い精度を維持することができる。

We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also known as selective forgetting or unlearning, is often conducted by partitioning a dataset into shards, training fully independent models on each, then ensembling the resulting models. Increasing the number of shards reduces the expected cost to forget but at the same time it increases inference cost and reduces the final accuracy of the model since synergistic information between samples is lost during the independent model training. Rather than treating each shard as independent, SAFE introduces the notion of a shard graph, which allows incorporating limited information from other shards during training, trading off a modest increase in expected forgetting cost with a significant increase in accuracy, all while still attaining complete removal of residual influence after forgetting. SAFE uses a lightweight system of adapters which can be trained while reusing most of the computations. This allows SAFE to be trained on shards an order-of-magnitude smaller than current state-of-the-art methods (thus reducing the forgetting costs) while also maintaining high accuracy, as we demonstrate empirically on fine-grained computer vision datasets.

翻訳日:2023-04-27 16:15:04 公開日:2023-04-25

# 進化アルゴリズムを用いた正定値非パラメトリック回帰と共分散関数推定への応用

Positive definite nonparametric regression using an evolutionary algorithm with application to covariance function estimation ( http://arxiv.org/abs/2304.13168v1 )

ライセンス: Link先を確認

Myeongjong Kang

(参考訳) 正定性制約を考慮した新しい非パラメトリック回帰フレームワークを提案する。定常過程の共分散関数を推定するための高度にモジュラーなアプローチを提供する。提案手法は, 正定性, 等方性, 単調性を推定器に課すことができ, クロスバリデーションを用いてその過度パラメータを決定することができる。カーネルベースの分布サロゲートの積分変換を用いて推定器を定義する。次に,分布推定アルゴリズムの変種である反復密度推定アルゴリズムを用いて推定値に適合する。また,点参照データの共分散関数を推定する手法を拡張した。代替手法と比較して,本手法は長距離依存性の信頼性の高い推定を行う。本手法の有効性と性能を示すために,いくつかの数値的研究を行った。また,空間補間比較97プロジェクトの降水データを用いて,本手法について述べる。

We propose a novel nonparametric regression framework subject to the positive definiteness constraint. It offers a highly modular approach for estimating covariance functions of stationary processes. Our method can impose positive definiteness, as well as isotropy and monotonicity, on the estimators, and its hyperparameters can be decided using cross validation. We define our estimators by taking integral transforms of kernel-based distribution surrogates. We then use the iterated density estimation evolutionary algorithm, a variant of estimation of distribution algorithms, to fit the estimators. We also extend our method to estimate covariance functions for point-referenced data. Compared to alternative approaches, our method provides more reliable estimates for long-range dependence. Several numerical studies are performed to demonstrate the efficacy and performance of our method. Also, we illustrate our method using precipitation data from the Spatial Interpolation Comparison 97 project.

翻訳日:2023-04-27 16:14:38 公開日:2023-04-25

# LEMaRT:画像調和のためのラベル効率の良いマスク付き領域変換

LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization ( http://arxiv.org/abs/2304.13166v1 )

ライセンス: Link先を確認

Sheng Liu, Cong Phuoc Huynh, Cong Chen, Maxim Arap, Raffay Hamid

(参考訳) 本稿では,大規模無注画像データセットを活用可能な画像調和のための,単純かつ効果的な自己教師付き事前学習手法を提案する。この目標を達成するために、私たちはまず、Label-Efficient Masked Region Transform (LEMaRT)パイプラインでオンラインで事前トレーニングデータを生成します。画像が与えられた後、LEMaRTは前景マスクを生成し、その後、生成されたマスクによって指定された領域のデフォーカスブラー、コントラスト、飽和などの様々な視覚特性を摂動させる一連の変換を適用する。次に,摂動画像から元の画像を復元して画像調和モデルを事前学習する。次に,Swin Transformer[27]を局所的・大域的自己注意機構の組み合わせで再現することで,画像調和モデル,すなわちSwinIHを導入する。 LEMaRTを用いたSwinIHの事前トレーニングは、ラベル効率が良く、既存の方法よりも微調整にアノテーションの少ないデータを使用するという、画像調和技術の新しい状態をもたらす。特に、iHarmony4データセット[8]では、SwinIHは、トレーニングデータのわずか50%で微調整された場合、SCS-Co[16]のマージンが0.4dB、フルトレーニングデータセットでトレーニングされた場合には1.0dBという、芸術の状態を上回ります。

We present a simple yet effective self-supervised pre-training method for image harmonization which can leverage large-scale unannotated image datasets. To achieve this goal, we first generate pre-training data online with our Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image, LEMaRT generates a foreground mask and then applies a set of transformations to perturb various visual attributes, e.g., defocus blur, contrast, saturation, of the region specified by the generated mask. We then pre-train image harmonization models by recovering the original image from the perturbed image. Secondly, we introduce an image harmonization model, namely SwinIH, by retrofitting the Swin Transformer [27] with a combination of local and global self-attention mechanisms. Pre-training SwinIH with LEMaRT results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co [16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data, and by 1.0 dB when it is trained on the full training dataset.

翻訳日:2023-04-27 16:14:21 公開日:2023-04-25

# 計算最適転送学習に向けて

Towards Compute-Optimal Transfer Learning ( http://arxiv.org/abs/2304.13164v1 )

ライセンス: Link先を確認

Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu

(参考訳) 転送学習の分野は、様々な下流タスクに強い適応性を示す大規模な事前訓練モデルの導入によって、大きな変化を遂げている。しかし、これらのモデルを微調整または使用するための高い計算およびメモリ要求は、それらが広く使われるのを妨げる可能性がある。本研究では,学習アルゴリズムが計算の無限大化の傾向として達成する性能として定義する漸近的性能の計算効率を,単純かつ効果的に取引する方法を提案する。具体的には、事前訓練されたモデルのゼロショット構造化プルーニングにより、性能を最小限に抑えて計算効率を向上させることができると論じる。提案手法は,様々なトランスファーシナリオを提供するnevis'22連続学習ベンチマークを用いて評価する。その結果, プリトレーニングモデルの畳み込みフィルタは, 低計算環境では20%以上の性能向上をもたらすことがわかった。

The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet effective way to trade computational efficiency for asymptotic performance which we define as the performance a learning algorithm achieves as compute tends to infinity. Specifically, we argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. We evaluate our method on the Nevis'22 continual learning benchmark that offers a diverse set of transfer scenarios. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.

翻訳日:2023-04-27 16:13:53 公開日:2023-04-25

# HDRかSDRか? スケール・圧縮映像の主観的・客観的研究

HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos ( http://arxiv.org/abs/2304.13162v1 )

ライセンス: Link先を確認

Joshua P. Ebenezer, Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, Sriram Sethuraman, Alan C. Bovik

(参考訳) 本研究では,HDR(High Dynamic Range)とSDR(Standard Dynamic Range)の人間の知覚品質判定を大規模に検討し,3種類の表示装置で観察した。 hdrビデオは、sdrビデオよりも、より広い色域、コントラスト、明るい白と暗い黒を表示することができる。従来の予測では、HDR品質はSDR品質よりも優れているが、HDRとSDRの主観的嗜好はディスプレイ装置や解像度のスケーリングやビットレートに大きく依存している。そこで本研究では,OLED,QLED,LCDテレビで356本のビデオを見た67名のボランティアから,品質評価を23,000件以上収集した。例えば、スケーリング、圧縮、およびSDR対HDRに関する決定を通知するために、これらのシナリオの下でビデオの品質を測定することは興味があるので、新しいデータベース上で、よく知られたフル参照およびノン参照ビデオ品質モデルを試した。この問題の進展に向けて,従来の測定値よりも精度良く,古典的およびビット深部に敏感な歪み統計量を用いる,hdrpatchmaxと呼ばれる新しい非参照モデルを開発した。

We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. HDR videos are able to present wider color gamuts, better contrasts, and brighter whites and darker blacks than SDR videos. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses both classical and bit-depth sensitive distortion statistics more accurately than existing metrics.

翻訳日:2023-04-27 16:13:37 公開日:2023-04-25

# 適応量子力学における連続対称性の破れ

Continuous symmetry breaking in adaptive quantum dynamics ( http://arxiv.org/abs/2304.13198v1 )

ライセンス: Link先を確認

Jacob Hauser, Yaodong Li, Sagar Vijay, Matthew P. A. Fisher

(参考訳) 量子多体系を操るためにユニタリ演算、測定、フィードバックが用いられる適応量子回路は、新しい動的定常状態を生成するエキサイティングな機会を提供する。我々は,一元演算,測定,局所ユニタリフィードバックを順序付けに用いた連続対称性を持つ適応量子力学を導入する。この設定では、純粋に定常な状態が対称性を保ち、これはギャップのない局所ハミルトニアンの基底状態である。この定常状態へのアプローチの力学的性質について検討する。この定常秩序は摂動に対して脆弱であり、連続対称性を尊重するものでさえある。

Adaptive quantum circuits, in which unitary operations, measurements, and feedback are used to steer quantum many-body systems, provide an exciting opportunity to generate new dynamical steady states. We introduce an adaptive quantum dynamics with continuous symmetry where unitary operations, measurements, and local unitary feedback are used to drive ordering. In this setting, we find a pure steady state hosting symmetry-breaking order, which is the ground state of a gapless, local Hamiltonian. We explore the dynamical properties of the approach to this steady state. We find that this steady-state order is fragile to perturbations, even those that respect the continuous symmetry.

翻訳日:2023-04-27 16:05:43 公開日:2023-04-25

# connector 0.5: グラフ表現学習のための統一フレームワーク

Connector 0.5: A unified framework for graph representation learning ( http://arxiv.org/abs/2304.13195v1 )

ライセンス: Link先を確認

Thanh Sang Nguyen, Jooho Lee, Van Thuy Hoang, O-Joun Lee

(参考訳) グラフ表現学習モデルは、グラフ構造とその特徴を潜在空間内の低次元ベクトルに表現することを目的としており、ノード分類やリンク予測などの下流タスクの恩恵を受けることができる。強力なグラフデータモデリング機能により、様々なグラフ埋め込みモデルやライブラリが提案されており、埋め込みを学び、研究者が実験を簡単に行えるようにしている。本稿では,浅層モデルから最先端モデル,すなわちコネクタモデルまで,様々なグラフ埋め込みモデルをカバーする新しいグラフ表現フレームワークを提案する。まず, 均質グラフ, 署名グラフ, 不均一グラフ, 知識グラフなど, 構造的関係が異なる様々な種類のグラフを構築し, グラフ生成を考える。次に,浅いグラフから深いグラフ埋め込みモデルまで,様々なグラフ表現学習モデルを紹介する。最後に、グラフの構造関係を表現するために、ディープグラフ埋め込みモデルを提供する効率的なオープンソースフレームワークを構築する計画である。フレームワークはhttps://github.com/NSLab-CUK/Connector.comから入手できる。

Graph representation learning models aim to represent the graph structure and its features into low-dimensional vectors in a latent space, which can benefit various downstream tasks, such as node classification and link prediction. Due to its powerful graph data modelling capabilities, various graph embedding models and libraries have been proposed to learn embeddings and help researchers ease conducting experiments. In this paper, we introduce a novel graph representation framework covering various graph embedding models, ranging from shallow to state-of-the-art models, namely Connector. First, we consider graph generation by constructing various types of graphs with different structural relations, including homogeneous, signed, heterogeneous, and knowledge graphs. Second, we introduce various graph representation learning models, ranging from shallow to deep graph embedding models. Finally, we plan to build an efficient open-source framework that can provide deep graph embedding models to represent structural relations in graphs. The framework is available at https://github.com/NSLab-CUK/Connector.

翻訳日:2023-04-27 16:05:31 公開日:2023-04-25

# 視力に基づく触覚センシングと信頼度校正ニューラルネットワークによる大腸癌ポリープ分類の信頼性向上に向けて

Towards Reliable Colorectal Cancer Polyps Classification via Vision Based Tactile Sensing and Confidence-Calibrated Neural Networks ( http://arxiv.org/abs/2304.13192v1 )

ライセンス: Link先を確認

Siddhartha Kapuria, Tarunraj G. Mohanraj, Nethra Venkatayogi, Ozdemir Can Kara, Yuki Hirata, Patrick Minot, Ariel Kapusta, Naruhiko Ikoma, and Farshid Alambeigi

(参考訳) 本研究では,既存の人工知能を用いた大腸癌 (CRC) ポリープ分類手法の高信頼出力に対処するために,信頼度校正された残留ニューラルネットワークを提案する。視覚に基づく触覚センサ(VS-TS)システムと独自のCRCポリープファントムを用いて,感度の高いCRCポリープ診断のためのモデル性能をカプセル化するには,精度や精度などの従来の指標が不十分であることを示す。そこで本研究では,残差ニューラルネットワーク分類器を開発し,crcポリプス分類のための過密出力を温度スケーリングの処理後手法で解決する。提案手法を評価するために,得られたVS-TSの音声画像にノイズとぼかしを導入し,信頼性図や他の統計指標を用いて非理想的な入力に対するモデルの信頼性をテストする。

In this study, toward addressing the over-confident outputs of existing artificial intelligence-based colorectal cancer (CRC) polyp classification techniques, we propose a confidence-calibrated residual neural network. Utilizing a novel vision-based tactile sensing (VS-TS) system and unique CRC polyp phantoms, we demonstrate that traditional metrics such as accuracy and precision are not sufficient to encapsulate model performance for handling a sensitive CRC polyp diagnosis. To this end, we develop a residual neural network classifier and address its over-confident outputs for CRC polyps classification via the post-processing method of temperature scaling. To evaluate the proposed method, we introduce noise and blur to the obtained textural images of the VS-TS and test the model's reliability for non-ideal inputs through reliability diagrams and other statistical metrics.

翻訳日:2023-04-27 16:05:15 公開日:2023-04-25

# メンタルヘルスのための説明可能で安全な会話エージェント--調査から

Towards Explainable and Safe Conversational Agents for Mental Health: A Survey ( http://arxiv.org/abs/2304.13191v1 )

ライセンス: Link先を確認

Surjodeep Sarkar, Manas Gaur, L. Chen, Muskan Garg, Biplav Srivastava, Bhaktee Dongaonkar

(参考訳) バーチャルメンタルヘルスアシスタント(vmhas)は、6000万のプライマリケア訪問と、毎年600万の救急室(er)訪問を受ける、過激なグローバル医療システムをサポートするために、継続的に進歩している。これらのシステムは、臨床心理学者、精神科医、認知行動療法(CBT)の研究者によって構築されている。現在、VMHAの役割は、情報を通じて感情的な支援を提供することであり、患者との反射的な会話の開発に注力することである。より包括的で安全で説明可能なアプローチは、責任あるVMHAを構築してフォローアップ質問をしたり、十分なインフォームドレスポンスを提供するために必要です。この調査は、メンタルヘルスにおける既存の会話エージェントの体系的な批判的レビューと、文脈知識、データセット、臨床決定支援におけるvmhasの改善に関する新たな洞察を提供する。また、VMHAのユーザエクスペリエンスを説明責任、安全性、そして全くの信頼性で強化する新たな方向性も提供します。最後に,VMHAとアクティブコミュニケーションの患者との信頼関係を構築するため,VMHAの評価指標と実践的考察を提供する。

Virtual Mental Health Assistants (VMHAs) are seeing continual advancements to support the overburdened global healthcare system that gets 60 million primary care visits, and 6 million Emergency Room (ER) visits annually. These systems are built by clinical psychologists, psychiatrists, and Artificial Intelligence (AI) researchers for Cognitive Behavioral Therapy (CBT). At present, the role of VMHAs is to provide emotional support through information, focusing less on developing a reflective conversation with the patient. A more comprehensive, safe and explainable approach is required to build responsible VMHAs to ask follow-up questions or provide a well-informed response. This survey offers a systematic critical review of the existing conversational agents in mental health, followed by new insights into the improvements of VMHAs with contextual knowledge, datasets, and their emerging role in clinical decision support. We also provide new directions toward enriching the user experience of VMHAs with explainability, safety, and wholesome trustworthiness. Finally, we provide evaluation metrics and practical considerations for VMHAs beyond the current literature to build trust between VMHAs and patients in active communications.

翻訳日:2023-04-27 16:04:58 公開日:2023-04-25

# 固有光力生成ゲインを持つ超ラジアント2レベルレーザー

A superradiant two-level laser with intrinsic light force generated gain ( http://arxiv.org/abs/2304.13190v1 )

ライセンス: Link先を確認

Anna Bychek, Helmut Ritsch

(参考訳) 能動周波数標準としてのスーパーラジアントレーザーの実装は、標準受動光時計と比較して短期安定性と熱的・機械的揺らぎに対する堅牢性の向上をもたらすと予測されている。しかし、最近の顕著な進歩にもかかわらず、光共振器内の活性原子の連続的な負荷、冷却、ポンプを必要とするため、連続波の超ラジアントレーザーの実験的実現は依然として未解決の課題である。本稿では, 単一モードキャビティ内に閉じ込められた冷媒ガスのバイクロマチックコヒーレントポンプによる2レベル原子状態に作用する光力を用いて, 連続的な利得を生み出す新しいシナリオを提案する。原子メーザーのセットアップとは対照的に、基底状態原子が反発する間、強い原子空洞結合領域における励起状態原子の収集と集中に調整された状態依存力が使用される。十分大きな原子アンサンブルの数値シミュレーションを容易にするために, 2次累積展開に依存し, 空洞軸に沿った光学的勾配力を誘導する位置依存光シフトを受ける半古典的点粒子近似における原子運動を記述する。超放射能発光に必要なポンプレーザ強度とデチューニングの最小条件について検討した。バランシングドップラー冷却と利得誘導加熱は、素原子周波数に近い連続狭帯域レーザー動作のパラメータ構造を同定する。

The implementation of a superradiant laser as an active frequency standard is predicted to provide better short-term stability and robustness to thermal and mechanical fluctuations when compared to standard passive optical clocks. However, despite significant recent progress, the experimental realization of continuous wave superradiant lasing still remains an open challenge as it requires continuous loading, cooling, and pumping of active atoms within an optical resonator. Here we propose a new scenario for creating continuous gain by using optical forces acting on the states of a two-level atom via bichromatic coherent pumping of a cold atomic gas trapped inside a single-mode cavity. Analogous to atomic maser setups, tailored state-dependent forces are used to gather and concentrate excited state atoms in regions of strong atom-cavity coupling while ground-state atoms are repelled. To facilitate numerical simulations of a sufficiently large atomic ensemble, we rely on a second-order cumulant expansion and describe the atomic motion in a semi-classical point-particle approximation subject to position-dependent light shifts which induce optical gradient forces along the cavity axis. We study minimal conditions on pump laser intensities and detunings required for collective superradiant emission. Balancing Doppler cooling and gain-induced heating we identify a parameter regime of a continuous narrow-band laser operation close to the bare atomic frequency.

翻訳日:2023-04-27 16:04:38 公開日:2023-04-25

# 海洋生命サーベイヤーにおける顕微鏡バイオシグナチュア検出のためのオンボード科学機器自律性

Onboard Science Instrument Autonomy for the Detection of Microscopy Biosignatures on the Ocean Worlds Life Surveyor ( http://arxiv.org/abs/2304.13189v1 )

ライセンス: Link先を確認

Mark Wronkiewicz, Jake Lee, Lukas Mandrake, Jack Lightholder, Gary Doran, Steffen Mauceri, Taewoo Kim, Nathan Oborny, Thomas Schibler, Jay Nadeau, James K. Wallace, Eshaan Moorjani, Chris Lindensmith

(参考訳) 地球外生命の探索は、文明レベルの意味を持つ重要な科学的取り組みである。太陽系の氷の衛星は、その液体の海が微小な生命の生息地になる可能性があるため、探査のターゲットとして有望です。しかし、生命の正確な定義の欠如は、検出戦略の定式化に根本的な課題をもたらす。不明瞭な検出の可能性を高めるために、補完的な機器群は複数の独立した生物記号(例えば、組成、運動/行動、可視構造)をサンプリングする必要がある。このような機器は、エンケラドゥスやエウロパのような遠く離れた海から送信されるデータより1万倍多い生データを生成することができる。この帯域制限に対処するため、オンボード・サイエンス・インスツルメンツ・オートノミー (Onboard Science Instrument Autonomy, OSIA) は、科学のリターンを最大化するために観測機器データを評価、要約、優先順位付けできる飛行システムの新興分野である。ジェット推進研究所のOcean Worlds Life Surveyor (OWLS) の試作機器スイートの一部として開発された2つのOSIA実装について述べる。第1はデジタルホログラフィービデオで生命に似た動きを識別し、第2は自然蛍光と染料誘起蛍光によって細胞構造と組成を識別する。飛行のような要求と計算上の制約は、火星のヘリコプター「インジェニュティ」と同様に、輸液の障壁を低くするために用いられた。シミュレーションおよび実験室データを用いてOSIAの性能評価を行い,超塩質モノレイク惑星アナログ地点で実地試験を行った。本研究は,バイオシグナチャ検出のためのOSIAの可能性を示すとともに,太陽系外惑星探査を目的とした将来のミッション概念に対する洞察と教訓を提供する。

The quest to find extraterrestrial life is a critical scientific endeavor with civilization-level implications. Icy moons in our solar system are promising targets for exploration because their liquid oceans make them potential habitats for microscopic life. However, the lack of a precise definition of life poses a fundamental challenge to formulating detection strategies. To increase the chances of unambiguous detection, a suite of complementary instruments must sample multiple independent biosignatures (e.g., composition, motility/behavior, and visible structure). Such an instrument suite could generate 10,000x more raw data than is possible to transmit from distant ocean worlds like Enceladus or Europa. To address this bandwidth limitation, Onboard Science Instrument Autonomy (OSIA) is an emerging discipline of flight systems capable of evaluating, summarizing, and prioritizing observational instrument data to maximize science return. We describe two OSIA implementations developed as part of the Ocean Worlds Life Surveyor (OWLS) prototype instrument suite at the Jet Propulsion Laboratory. The first identifies life-like motion in digital holographic microscopy videos, and the second identifies cellular structure and composition via innate and dye-induced fluorescence. Flight-like requirements and computational constraints were used to lower barriers to infusion, similar to those available on the Mars helicopter, "Ingenuity." We evaluated the OSIA's performance using simulated and laboratory data and conducted a live field test at the hypersaline Mono Lake planetary analog site. Our study demonstrates the potential of OSIA for enabling biosignature detection and provides insights and lessons learned for future mission concepts aimed at exploring the outer solar system.

翻訳日:2023-04-27 16:04:17 公開日:2023-04-25

# tablet: 表データのための指示から学ぶ

TABLET: Learning From Instructions For Tabular Data ( http://arxiv.org/abs/2304.13188v1 )

ライセンス: Link先を確認

Dylan Slack and Sameer Singh

(参考訳) 高品質なデータを取得することは、表的な予測のための機械学習(ml)モデルをトレーニングする上で、しばしば重要な課題である。大規模言語モデル(LLM)への自然言語命令の提供は代替ソリューションを提供する。しかし,表予測問題に対するllmの知識をいかに効果的に活用するかは明らかでない。このギャップに対処するために、私たちはタブレットを紹介します。タブレットは20の多様な表型データセットのベンチマークで、そのフラージング、粒度、技術性によって異なる指示を注釈付けしています。さらに、TABLETには命令のロジックと命令の構造化変更が含まれている。テキスト内命令はFlan-T5 11bのゼロショットF1性能を平均44%、TABLETのChatGPTでは13%向上させる。また,本ベンチマークにおける表予測にllmを用いた場合の制限について,命令忠実性の評価により検討する。 LLMは命令を無視し、例でも特定のインスタンスを正しく予測できないことが多い。 TABLET を用いた解析では,命令が LLM のパフォーマンスを補助する一方で,表データの命令から学習するには新たな機能が必要であることが示された。

Acquiring high-quality data is often a significant challenge in training machine learning (ML) models for tabular prediction, particularly in privacy-sensitive and costly domains like medicine and finance. Providing natural language instructions to large language models (LLMs) offers an alternative solution. However, it is unclear how effectively instructions leverage the knowledge in LLMs for solving tabular prediction problems. To address this gap, we introduce TABLET, a benchmark of 20 diverse tabular datasets annotated with instructions that vary in their phrasing, granularity, and technicality. Additionally, TABLET includes the instructions' logic and structured modifications to the instructions. We find in-context instructions increase zero-shot F1 performance for Flan-T5 11b by 44% on average and 13% for ChatGPT on TABLET. Also, we explore the limitations of using LLMs for tabular prediction in our benchmark by evaluating instruction faithfulness. We find LLMs often ignore instructions and fail to predict specific instances correctly, even with examples. Our analysis on TABLET shows that, while instructions help LLM performance, learning from instructions for tabular data requires new capabilities.

翻訳日:2023-04-27 16:03:47 公開日:2023-04-25

# AI支援コーディング: GPT-4による実験

AI-assisted coding: Experiments with GPT-4 ( http://arxiv.org/abs/2304.13187v1 )

ライセンス: Link先を確認

Russell A Poldrack, Thomas Lu, and Ga\v{s}per Begu\v{s}

(参考訳) 大規模言語モデルに基づく人工知能(AI)ツールは、いくつかのコンピュータプログラミングタスクにおいて人間レベルのパフォーマンスを高めている。 GPT-4を用いてコンピュータコードを生成する実験をいくつか報告する。これらの実験は、現在の世代のツールを使用したAIコード生成が強力であるにも関わらず、正確なパフォーマンスを保証するためには、人間による検証がかなり必要であることを実証している。また,既存のコードに対する GPT-4 のリファクタリングは,コード品質の確立した指標に沿ってコードを大幅に改善できることを示すとともに,GPT-4 がかなりのカバレッジでテストを生成することができることを示した。これらの結果は、AIコーディングツールは非常に強力であるが、結果の妥当性と正確性を保証するためには、まだ人間を必要とすることを示唆している。

Artificial intelligence (AI) tools based on large language models have acheived human-level performance on some computer programming tasks. We report several experiments using GPT-4 to generate computer code. These experiments demonstrate that AI code generation using the current generation of tools, while powerful, requires substantial human validation to ensure accurate performance. We also demonstrate that GPT-4 refactoring of existing code can significantly improve that code along several established metrics for code quality, and we show that GPT-4 can generate tests with substantial coverage, but that many of the tests fail when applied to the associated code. These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.

翻訳日:2023-04-27 16:03:26 公開日:2023-04-25

# KBody:一般,堅牢,整列した単分子体全体推定を目指して

KBody: Towards general, robust, and aligned monocular whole-body estimation ( http://arxiv.org/abs/2304.11542v2 )

ライセンス: Link先を確認

Nikolaos Zioulis and James F. O'Brien

(参考訳) KBodyは、低次元のボディモデルを画像に適合させる方法である。予測と最適化のアプローチに従い、体のパラメータの解決に使用される制約のためにデータ駆動モデル見積に依存する。高品質な対応の重要性を認識し、"仮想関節"を活用してフィッティング性能を改善し、ポーズパラメータと形状パラメータの最適化を解き、非対称距離場を統合してポーズと形状キャプチャの能力と画素アライメントのバランスをとる。また, 生成モデルインバージョンは, 人間の部分像の完成に用いられ, 汎用的かつロバストな単眼体フィッティングのビルディングブロックとして用いられるような, 強い外観を事前に与えていることを示す。プロジェクトページ: https://klothed.github.io/KBody.com

KBody is a method for fitting a low-dimensional body model to an image. It follows a predict-and-optimize approach, relying on data-driven model estimates for the constraints that will be used to solve for the body's parameters. Acknowledging the importance of high quality correspondences, it leverages ``virtual joints" to improve fitting performance, disentangles the optimization between the pose and shape parameters, and integrates asymmetric distance fields to strike a balance in terms of pose and shape capturing capacity, as well as pixel alignment. We also show that generative model inversion offers a strong appearance prior that can be used to complete partial human images and used as a building block for generalized and robust monocular body fitting. Project page: https://klothed.github.io/KBody.

翻訳日:2023-04-27 10:53:10 公開日:2023-04-25

# マイクロ波磁気力学における単一光子冷却

Single-photon cooling in microwave magneto-mechanics ( http://arxiv.org/abs/1912.05489v2 )

ライセンス: Link先を確認

D. Zoepfl, M. L. Juan, C. M. F. Schneider, G. Kirchmair

(参考訳) 光子を機械的運動に結合するキャビティ光学は、基本的な量子限界付近で機械的運動を制御するツールを提供する。単一光子強いカップリングは、非ガウス量子状態における機械共振器の準備を可能にする。このような状態における巨大な機械共振器の調製は、量子力学の境界をテストする上で特に興味深い。しかし、この目標は、通常大規模な装置で達成される小さな光機械的カップリングのため、依然として困難である。ここではマイクロ波空洞に機械共振器を磁気的に結合する新しい手法を示す。 g_0/2 \pi \sim 3$ khzの単光子カップリングを計測し、現在のマイクロ波光機械システムよりも1桁大きくなった。この結合において、我々は1光子強結合に達する重要なステップである$c_0 \gtrsim 10$の大きい1光子協調性を測定する。このような強い相互作用により、マイクロ波空洞に2光子未満の定常フォノン集団の3分の1に機械共振器を冷却することができる。量子基盤のテスト以外にも、我々のアプローチは量子センサーやマイクロ波から光トランスデューサにも適しています。

Cavity optomechanics, where photons are coupled to mechanical motion, provides the tools to control mechanical motion near the fundamental quantum limits. Reaching single-photon strong coupling would allow to prepare the mechanical resonator in non-Gaussian quantum states. Preparing massive mechanical resonators in such states is of particular interest for testing the boundaries of quantum mechanics. This goal remains however challenging due to the small optomechanical couplings usually achieved with massive devices. Here we demonstrate a novel approach where a mechanical resonator is magnetically coupled to a microwave cavity. We measure a single-photon coupling of $g_0/2 \pi \sim 3$ kHz, an improvement of one order of magnitude over current microwave optomechanical systems. At this coupling we measure a large single-photon cooperativity with $C_0 \gtrsim 10$, an important step to reach single-photon strong coupling. Such a strong interaction allows us to cool the massive mechanical resonator to a third of its steady state phonon population with less than two photons in the microwave cavity. Beyond tests for quantum foundations, our approach is also well suited as a quantum sensor or a microwave to optical transducer.

翻訳日:2023-04-27 04:19:05 公開日:2023-04-25

# Quantikzパッケージのチュートリアル

Tutorial on the Quantikz Package ( http://arxiv.org/abs/1809.03842v6 )

ライセンス: Link先を確認

Alastair Kay

(参考訳) このチュートリアルでは、量子回路図のタイプセットのためのQuantikz LaTeXパッケージを紹介(およびドキュメントソース経由で提供)する。これによりtikzを活用することで、回路オプションの制御性が向上する。優れたqcircuitパッケージに慣れている人は、記法の多くを認識するだろうが、少し進化している(願わくばシンプルだ!

This tutorial introduces (and provides, via the document source) the Quantikz LaTeX package for typesetting quantum circuit diagrams. This takes advantage of tikz to give greater control over the circuit options. Those familiar with the excellent QCircuit package will recognise much of the notation, although it has evolved a bit (hopefully simplified!).

翻訳日:2023-04-27 04:18:46 公開日:2023-04-25

# リレー付きIoTネットワークにおける実用的なAoIスケジューリング

A Practical AoI Scheduler in IoT Networks with Relays ( http://arxiv.org/abs/2203.04227v3 )

ライセンス: Link先を確認

Biplav Choudhury, Prasenjit Karmakar, Vijay K. Shah, Jeffrey H. Reed

(参考訳) IoT(Internet of Things)ネットワークは、自律コンピューティング、通信、デバイス間のコラボレーションがさまざまなタスクを達成するために人気になるにつれて、広く普及している。 IoTネットワークにおけるリレーの利用により、通信範囲の拡大や消費電力の最小化など、リレーが多くのメリットを提供するため、IoTネットワークのデプロイも便利になる。従来のAoIスケジューラの2つのホップリレーIoTネットワークに関する文献は、定数/非変更チャネル条件を前提として設計されており、既知の(通常、生成する)パケット生成パターンのために制限されている。ディープ強化学習(DRL)アルゴリズムは、リレー付き2ホップIoTネットワークにおけるAoIスケジューリングのために研究されているが、ネットワークが大きくなるにつれて行動空間が指数関数的に増加するため、小規模IoTネットワークにのみ適用可能である。これらの制限は、IoTネットワークデプロイメントにおけるAoIスケジューラの実用的利用を妨げる。本稿では、上記の制限に対処するリレー付き2ホップIoTネットワークのための実用的なAoIスケジューラを提案する。提案するスケジューラは,リニアなアクションスペースを維持した,新たな投票機構に基づく近距離ポリシ最適化(v-ppo)アルゴリズムを使用して,大規模iotネットワークとのスケーラビリティを実現している。提案されたv-PPOベースのAoIスケジューラは、未知のトラフィック生成パターンのネットワーク条件やアカウントの変更に順応する。シミュレーションの結果,提案したV-PPOベースのAoIスケジューラは,DQNベースのAoIスケジューラ,MAF-MAD(Maximal Age First-Maximal Age Difference),MAF(Maximal Age First),ラウンドロビンなど,MLおよび従来の(非ML)AoIスケジューラよりも優れていた。

Internet of Things (IoT) networks have become ubiquitous as autonomous computing, communication and collaboration among devices become popular for accomplishing various tasks. The use of relays in IoT networks further makes it convenient to deploy IoT networks as relays provide a host of benefits, like increasing the communication range and minimizing power consumption. Existing literature on traditional AoI schedulers for such two-hop relayed IoT networks are limited because they are designed assuming constant/non-changing channel conditions and known (usually, generate-at-will) packet generation patterns. Deep reinforcement learning (DRL) algorithms have been investigated for AoI scheduling in two-hop IoT networks with relays, however, they are only applicable for small-scale IoT networks due to exponential rise in action space as the networks become large. These limitations discourage the practical utilization of AoI schedulers for IoT network deployments. This paper presents a practical AoI scheduler for two-hop IoT networks with relays that addresses the above limitations. The proposed scheduler utilizes a novel voting mechanism based proximal policy optimization (v-PPO) algorithm that maintains a linear action space, enabling it be scale well with larger IoT networks. The proposed v-PPO based AoI scheduler adapts well to changing network conditions and accounts for unknown traffic generation patterns, making it practical for real-world IoT deployments. Simulation results show that the proposed v-PPO based AoI scheduler outperforms both ML and traditional (non-ML) AoI schedulers, such as, Deep Q Network (DQN)-based AoI Scheduler, Maximal Age First-Maximal Age Difference (MAF-MAD), MAF (Maximal Age First) , and round-robin in all considered practical scenarios.

翻訳日:2023-04-27 04:16:12 公開日:2023-04-25

# 古典的レート理論におけるキャビティ誘起分岐

Cavity-induced bifurcation in classical rate theory ( http://arxiv.org/abs/2202.12182v3 )

ライセンス: Link先を確認

Kalle S. U. Kansanen and Tero T. Heikkil\"a

(参考訳) 双安定系のアンサンブルと共振器場との結合が、このアンサンブルの集合確率的挙動にどのように影響するかを示す。特に、空洞はシステム間の効果的な相互作用を提供し、準安定状態間の遷移率をパラメトリック的に調節する。我々は空洞がシステム数に線形に依存する臨界温度で集合相転移を引き起こすことを予測した。これは双安定系の定常状態が分岐する自発的対称性の破れとして現れる。遷移速度は相転移とは無関係に低下するが, 共振器の乱れに対応して, 系の共振器結合の符号を交互に変化させる速度変化は消失する。この結果は、キャビティの存在が化学反応に影響を与えることが示唆された分極化学において特に関係している。

We show how coupling an ensemble of bistable systems to a common cavity field affects the collective stochastic behavior of this ensemble. In particular, the cavity provides an effective interaction between the systems, and parametrically modifies the transition rates between the metastable states. We predict that the cavity induces a collective phase transition at a critical temperature which depends linearly on the number of systems. It shows up as a spontaneous symmetry breaking where the stationary states of the bistable system bifurcate. We observe that the transition rates slow down independently of the phase transition, but the rate modification vanishes for alternating signs of the system-cavity couplings, corresponding to a disordered ensemble of dipoles. Our results are of particular relevance in polaritonic chemistry where the presence of a cavity has been suggested to affect chemical reactions.

翻訳日:2023-04-27 04:15:34 公開日:2023-04-25

# 形式理論学習システムにおける単純気泡問題

A Simplicity Bubble Problem in Formal-Theoretic Learning Systems ( http://arxiv.org/abs/2112.12275v2 )

ライセンス: Link先を確認

Felipe S. Abrah\~ao, Hector Zenil, Fabio Porto, Michael Winter, Klaus Wehmuth, Itala M. L. D'Ottaviano

(参考訳) 新しいデータを予測するために大規模なデータセットをマイニングする場合、統計機械学習の背後にある原則の限界は、ビッグデータの崩壊だけでなく、データ生成プロセスがアルゴリズムの複雑さの低さに偏っているという従来の仮定にも深刻な課題をもたらす。有限データセット生成器における単純さに対するアルゴリズム的情報バイアスを仮定しても、機械学習に対する現在のアプローチ(ディープラーニングや、トップダウンaiと統計的機械学習のあらゆる形式的ハイブリッドを含む)は、十分大きなデータセットによって、常に、自然に、あるいは人工的に、欺くことができる。特に、全ての学習アルゴリズム(形式理論にアクセスできるか否かに関わらず)に対して、予測不可能な十進法のアルゴリズム確率が、他の大きなデータセットのアルゴリズム確率の上限(学習アルゴリズムにのみ依存する乗算定数まで)であるような十分に大きなデータセットサイズが存在することを実証する。言い換えれば、非常に大きく複雑なデータセットは、学習アルゴリズムを他の特定の非知覚データセットと同様に‘simplicity bubble’’に認識することができる。これらの決定データセットは、学習アルゴリズムによって影響される予測が、学習アルゴリズムによってグローバルなものと見なされるにもかかわらず、低アルゴリズム-複雑度局所最適解に向かって収束しながら、高アルゴリズム-複雑度グローバルな最適解から予測不可能に分岐することを保証する。アルゴリズム情報理論と計算可能性理論の持つ本質的な力に基づく、より強力な機械学習へと、統計的な機械学習から脱却し、この偽りの現象を回避するために、満たすべき枠組みと追加の経験的条件について議論する。

When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that current approaches to machine learning (including deep learning, or any formal-theoretic hybrid mix of top-down AI and statistical machine learning approaches), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every learning algorithm (with or without access to a formal theory), there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets can deceive learning algorithms into a ``simplicity bubble'' as likely as any other particular non-deceiving dataset. These deceiving datasets guarantee that any prediction effected by the learning algorithm will unpredictably diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution, although the latter is deemed a global one by the learning algorithm. We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, and motivated by, the intrinsic power of algorithmic information theory and computability theory.

翻訳日:2023-04-27 04:15:21 公開日:2023-04-25

# データとデバイスの不均一性を考慮した半分散フェデレーションエッジ学習

Semi-Decentralized Federated Edge Learning with Data and Device Heterogeneity ( http://arxiv.org/abs/2112.10313v3 )

ライセンス: Link先を確認

Yuchang Sun and Jiawei Shao and Yuyi Mao and Jessie Hui Wang and Jun Zhang

(参考訳) feel(federated edge learning)は、ネットワークエッジに分散データを効果的に組み込んでディープラーニングモデルをトレーニングするための、プライバシ保護パラダイムとして注目されている。それでも、単一エッジサーバのカバー範囲が限られると、未参加のクライアントノードが不足し、学習性能が損なわれる可能性がある。本稿では,複数のエッジサーバを用いて多数のクライアントノードを協調的に調整する,半分散型フェデレーションエッジ学習(SD-FEEL)の新たなフレームワークについて検討する。効率的なモデル共有のためにエッジサーバ間の低レイテンシ通信を利用することで、SD-FEELは従来のフェデレート学習に比べてはるかにレイテンシの低いトレーニングデータを組み込むことができる。 SD-FEELのトレーニングアルゴリズムについて,ローカルモデル更新,クラスタ内モデルアグリゲーション,クラスタ間モデルアグリゲーションの3つのステップで詳述する。このアルゴリズムの収束は、非独立かつ同一分散(非iid)データで証明され、鍵パラメータがトレーニング効率に与える影響を明らかにし、実用的な設計ガイドラインを提供するのに役立つ。一方、エッジデバイスの不均一性はストラグラー効果を引き起こし、SD-FEELの収束速度を低下させる可能性がある。そこで本研究では,SD-FEELの安定化を意識したアグリゲーションスキームを用いた非同期トレーニングアルゴリズムを提案する。シミュレーションの結果,SD-FEELのための提案アルゴリズムの有効性と効率を実証し,解析結果を裏付ける。

Federated edge learning (FEEL) has attracted much attention as a privacy-preserving paradigm to effectively incorporate the distributed data at the network edge for training deep learning models. Nevertheless, the limited coverage of a single edge server results in an insufficient number of participated client nodes, which may impair the learning performance. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL), where multiple edge servers are employed to collectively coordinate a large number of client nodes. By exploiting the low-latency communication among edge servers for efficient model sharing, SD-FEEL can incorporate more training data, while enjoying much lower latency compared with conventional federated learning. We detail the training algorithm for SD-FEEL with three main steps, including local model update, intra-cluster, and inter-cluster model aggregations. The convergence of this algorithm is proved on non-independent and identically distributed (non-IID) data, which also helps to reveal the effects of key parameters on the training efficiency and provides practical design guidelines. Meanwhile, the heterogeneity of edge devices may cause the straggler effect and deteriorate the convergence speed of SD-FEEL. To resolve this issue, we propose an asynchronous training algorithm with a staleness-aware aggregation scheme for SD-FEEL, of which, the convergence performance is also analyzed. The simulation results demonstrate the effectiveness and efficiency of the proposed algorithms for SD-FEEL and corroborate our analysis.

翻訳日:2023-04-27 04:14:52 公開日:2023-04-25

# HDR-NeRF:高ダイナミックレンジニューラル放射場

HDR-NeRF: High Dynamic Range Neural Radiance Fields ( http://arxiv.org/abs/2111.14451v4 )

ライセンス: Link先を確認

Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, Qing Wang

(参考訳) 我々は、低ダイナミックレンジ(LDR)ビューのセットからHDR放射界を異なる露出で復元するために、HDR-NeRF(High Dynamic Range Neural Radiance Fields)を提案する。 HDR-NeRFを用いて、異なる露出下で、新しいHDRビューと新しいLDRビューの両方を生成することができる。この方法の鍵は物理イメージングの過程をモデル化することであり、シーンポイントの放射能が2つの暗黙的な機能を持つldr画像の画素値(放射能場とトーンマッパー)に変換されることを示す。放射場はシーンラディアンス(値が0から+infty)を符号化し、対応する光の起源と光方向を与えることにより、光の密度と放射を出力する。トーンマッパーは、カメラセンサに照射された光が画素値になるマッピング過程をモデル化する。放射光と対応する露光時間とをトーンマッパーに供給することにより、光の色を予測する。我々は、古典的なボリュームレンダリング技術を用いて出力放射率、色、密度をHDRおよびLDR画像に投影し、入力されたLDR画像のみを監督する。提案手法を評価するために,新しい前方向きHDRデータセットを収集する。合成および実世界のシーンにおける実験結果は, 合成ビューの露光を正確に制御できるだけでなく, ダイナミックレンジの描画も可能であることを確認した。

We present High Dynamic Range Neural Radiance Fields (HDR-NeRF) to recover an HDR radiance field from a set of low dynamic range (LDR) views with different exposures. Using the HDR-NeRF, we are able to generate both novel HDR views and novel LDR views under different exposures. The key to our method is to model the physical imaging process, which dictates that the radiance of a scene point transforms to a pixel value in the LDR image with two implicit functions: a radiance field and a tone mapper. The radiance field encodes the scene radiance (values vary from 0 to +infty), which outputs the density and radiance of a ray by giving corresponding ray origin and ray direction. The tone mapper models the mapping process that a ray hitting on the camera sensor becomes a pixel value. The color of the ray is predicted by feeding the radiance and the corresponding exposure time into the tone mapper. We use the classic volume rendering technique to project the output radiance, colors, and densities into HDR and LDR images, while only the input LDR images are used as the supervision. We collect a new forward-facing HDR dataset to evaluate the proposed method. Experimental results on synthetic and real-world scenes validate that our method can not only accurately control the exposures of synthesized views but also render views with a high dynamic range.

翻訳日:2023-04-27 04:14:26 公開日:2023-04-25

# HUMAP:階層的一様多様体近似と投影

HUMAP: Hierarchical Uniform Manifold Approximation and Projection ( http://arxiv.org/abs/2106.07718v3 )

ライセンス: Link先を確認

Wilson E. Marc\'ilio-Jr and Danilo M. Eler and Fernando V. Paulovich and Rafael M. Martins

(参考訳) 次元減少(DR)技術は、高次元空間におけるパターンの理解を支援する。これらの手法は、しばしば散乱プロットによって表現され、様々な科学領域で採用され、クラスターとデータサンプル間の類似性分析を容易にする。多くの粒度を含むデータセットや、分析が情報視覚化マントラに従う場合、階層的なdrテクニックは、前もって主要な構造と需要の詳細を示すので、最も適したアプローチである。しかし、現在の階層型DR技術は、階層レベルのプロジェクションメンタルマップを保存せず、ほとんどのデータタイプに適さないため、文学的な問題に完全に対処することができない。 HUMAPは、局所的・大域的構造と階層的探索を通してのメンタルマップの保存に柔軟に設計された、新しい階層的次元削減技術である。本手法の優位性を示す実証的な証拠を,現在の階層的アプローチと比較し,その強みを示す2つのケーススタディを示す。

Dimensionality reduction (DR) techniques help analysts understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. However, current hierarchical DR techniques are not fully capable of addressing literature problems because they do not preserve the projection mental map across hierarchical levels or are not suitable for most data types. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible in preserving local and global structures and the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show two case studies to demonstrate its strengths.

翻訳日:2023-04-27 04:13:40 公開日:2023-04-25

# ネットワークにおける階層的コミュニティ構造

Hierarchical community structure in networks ( http://arxiv.org/abs/2009.07196v2 )

ライセンス: Link先を確認

Michael T. Schaub, Jiaze Li and Leto Peel

(参考訳) モジュラーおよび階層的なコミュニティ構造は、現実世界の複雑なシステムに広く浸透している。これらの構造を検知し研究するために、多くの努力が費やされた。モジュラーの検出における重要な理論的進歩は、確率的生成モデルを用いてコミュニティ構造を形式的に定義することで検出可能性の基本的な限界を特定することである。階層型コミュニティ構造の検出は、コミュニティ検出から受け継いだものと並行して、さらなる課題をもたらす。本稿では,ネットワークにおける階層的コミュニティ構造に関する理論的研究について述べる。我々は以下の疑問に答える。 1)コミュニティの階層をどのように定義すべきか。 2)ネットワークに階層構造の十分な証拠があるかどうかをどうやって判断するか。そして 3)階層構造を効率的に検出する方法確率的外的同値分割の概念と確率的ブロックモデルのような確率的モデルとの関係に基づいて階層構造の定義を導入することにより,これらの疑問にアプローチする。階層構造の検出に関わる課題を列挙し,階層構造のスペクトル特性を調べることにより,効率的かつ原理的に検出する手法を提案する。

Modular and hierarchical community structures are pervasive in real-world complex systems. A great deal of effort has gone into trying to detect and study these structures. Important theoretical advances in the detection of modular have included identifying fundamental limits of detectability by formally defining community structure using probabilistic generative models. Detecting hierarchical community structure introduces additional challenges alongside those inherited from community detection. Here we present a theoretical study on hierarchical community structure in networks, which has thus far not received the same rigorous attention. We address the following questions: 1) How should we define a hierarchy of communities? 2) How do we determine if there is sufficient evidence of a hierarchical structure in a network? and 3) How can we detect hierarchical structure efficiently? We approach these questions by introducing a definition of hierarchy based on the concept of stochastic externally equitable partitions and their relation to probabilistic models, such as the popular stochastic block model. We enumerate the challenges involved in detecting hierarchies and, by studying the spectral properties of hierarchical structure, present an efficient and principled method for detecting them.

翻訳日:2023-04-27 04:12:38 公開日:2023-04-25

# 浮動小数点算術における音のランダム化

Sound Randomized Smoothing in Floating-Point Arithmetics ( http://arxiv.org/abs/2207.07209v2 )

ライセンス: Link先を確認

V\'aclav Vor\'a\v{c}ek and Matthias Hein

(参考訳) ランダム化平滑化は無限の精度で音を出す。しかし,無作為な平滑化は浮動小数点精度の限界に対してもはや健全ではないことを示す。 CIFAR10 の偽証明を提供するために、逆例が 0.8$ の距離にあるにもかかわらず、ランダム化された平滑化が 1 点あたり 1.26$ の半径を示す単純な例を示す。ランダム化平滑化の暗黙の仮定について議論し、平滑化バージョンが一般的に認証されている汎用画像分類モデルには適用されないことを示した。そこで本研究では,浮動小数点精度を本質的に同等の速度で使用する場合のランダム化平滑化のための音響的手法を提案する。唯一の前提は、公正なコインにアクセスできるということです。

Randomized smoothing is sound when using infinite precision. However, we show that randomized smoothing is no longer sound for limited floating-point precision. We present a simple example where randomized smoothing certifies a radius of $1.26$ around a point, even though there is an adversarial example in the distance $0.8$ and extend this example further to provide false certificates for CIFAR10. We discuss the implicit assumptions of randomized smoothing and show that they do not apply to generic image classification models whose smoothed versions are commonly certified. In order to overcome this problem, we propose a sound approach to randomized smoothing when using floating-point precision with essentially equal speed and matching the certificates of the standard, unsound practice for standard classifiers tested so far. Our only assumption is that we have access to a fair coin.

翻訳日:2023-04-27 04:06:44 公開日:2023-04-25

# 分散分配回帰のための非線形十分次元削減

Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression ( http://arxiv.org/abs/2207.04613v2 )

ライセンス: Link先を確認

Qi Zhang, Bing Li, and Lingzhou Xue

(参考訳) 距離空間の構成員としてモデル化された予測値と応答値の両方が分布データである場合の非線形十分次元減少に対する新しいアプローチを提案する。我々の重要なステップは、距離空間上に普遍的なカーネル(cc-ユニバーサル)を構築することであり、その結果、十分な次元の減少を決定する条件独立性を特徴付けるのに十分リッチな予測器と応答のためのカーネルヒルベルト空間を再現する。一変量分布ではワッサーシュタイン距離を用いて普遍核を構築するが、多変量分布ではスライスされたワッサーシュタイン距離を利用する。スライスされたワッサーシュタイン距離は、計量空間がワッサーシュタイン空間に類似した位相的性質を持つことを保証するとともに、重要な計算上の利点を提供する。合成データに基づく数値計算の結果,本手法は競合する手法よりも優れていた。この方法は、出生率、死亡率データ、カルガリー温度データを含むいくつかのデータセットにも適用される。

We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we construct the universal kernel using the Wasserstein distance, while for multivariate distributions, we resort to the sliced Wasserstein distance. The sliced Wasserstein distance ensures that the metric space possesses similar topological properties to the Wasserstein space while also offering significant computation benefits. Numerical results based on synthetic data show that our method outperforms possible competing methods. The method is also applied to several data sets, including fertility and mortality data and Calgary temperature data.

翻訳日:2023-04-27 04:06:29 公開日:2023-04-25

# BRExIt: エキスパートイテレーションにおける応答モデリングについて

BRExIt: On Opponent Modelling in Expert Iteration ( http://arxiv.org/abs/2206.00113v2 )

ライセンス: Link先を確認

Daniel Hernandez, Hendrik Baier, Michael Kaisers

(参考訳) 現代の人口ベースのトレーニングアプローチでは、強化学習アルゴリズムを最善の応答神託として採用し、候補者の対戦相手(主に以前に学習した政策)に対する遊びを改善する。本稿では,最先端学習アルゴリズムエキスパートイテレーション(exit)に敵モデルを組み込むことにより,ゲームにおける学習を加速するベストレスポンスエキスパートイテレーション(brexit)を提案する。ブレグジットの目的は、(1)対向政策を補助課題として予測する政策責任者、(2)与または学習した対向モデルに向かって計画中のバイアス相手を移動させ、最適な反応を近似する見習い対象を生成することである。 BRExItのアルゴリズム的変種と固定テストエージェントの集合に対する実証的アブレーションでは、BRExItがExItよりも優れたポリシーを学習しているという統計的証拠を提供する。

Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt's algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt.

翻訳日:2023-04-27 04:04:59 公開日:2023-04-25

# 可変密度雑音によるサブサンプリングを用いた自己教師型MR画像再構成のための理論的枠組み

A theoretical framework for self-supervised MR image reconstruction using sub-sampling via variable density Noisier2Noise ( http://arxiv.org/abs/2205.10278v4 )

ライセンス: Link先を確認

Charles Millard, Mark Chiew

(参考訳) 近年,サブサンプルMRI(Magnetic Resonance Imaging)データの再構成にニューラルネットワークの統計的モデリング機能を活用することに注目が集まっている。提案手法は, 代表的な完全サンプルデータセットの存在を前提として, 完全教師付きトレーニングを用いる。しかし、多くのアプリケーションでは、完全なサンプルトレーニングデータは利用できず、取得には非常に実用的でない可能性がある。したがって、訓練にサブサンプリングデータのみを使用する自己教師あり手法の開発と理解が極めて望ましい。この研究は、当初自己教師付き認知タスクのために構築されたNoisier2Noiseフレームワークを、可変密度サブサンプルMRIデータに拡張した。提案手法であるdata undersampling (ssdu) による自己教師付き学習の性能を解析的に説明するために,noisier2noiseフレームワークを用いた。さらに、理論的発展の結果として生じるSSDUの2つの修正を提案する。まず、サンプリングセットを分割して、サブセットが元のサンプリングマスクと同じタイプの分布を持つようにすることを提案する。次に, サンプル密度と分割密度を補償する損失重み付けを提案する。 fastMRIデータセットでは,これらの変化によりSSDUの画像復元精度が向上し,パーティショニングパラメータの堅牢性が向上した。

In recent years, there has been attention on leveraging the statistical modeling capabilities of neural networks for reconstructing sub-sampled Magnetic Resonance Imaging (MRI) data. Most proposed methods assume the existence of a representative fully-sampled dataset and use fully-supervised training. However, for many applications, fully sampled training data is not available, and may be highly impractical to acquire. The development and understanding of self-supervised methods, which use only sub-sampled data for training, are therefore highly desirable. This work extends the Noisier2Noise framework, which was originally constructed for self-supervised denoising tasks, to variable density sub-sampled MRI data. We use the Noisier2Noise framework to analytically explain the performance of Self-Supervised Learning via Data Undersampling (SSDU), a recently proposed method that performs well in practice but until now lacked theoretical justification. Further, we propose two modifications of SSDU that arise as a consequence of the theoretical developments. Firstly, we propose partitioning the sampling set so that the subsets have the same type of distribution as the original sampling mask. Secondly, we propose a loss weighting that compensates for the sampling and partitioning densities. On the fastMRI dataset we show that these changes significantly improve SSDU's image restoration quality and robustness to the partitioning parameters.

翻訳日:2023-04-27 04:04:39 公開日:2023-04-25

# 隠れた量子メモリ:誰かが見た時にメモリは存在するか?

Hidden Quantum Memory: Is Memory There When Somebody Looks? ( http://arxiv.org/abs/2204.08298v4 )

ライセンス: Link先を確認

Philip Taranto and Thomas J. Elliott and Simon Milz

(参考訳) 古典物理学では、メモリレス力学とマルコフ統計は同じである。これは量子力学には当てはまらない、なぜなら量子測定は侵入的だからである。ここでは、測定の侵襲性を超えて、古典的および量子的プロセス、すなわち隠れた量子メモリの可能性を区別する。 While Markovian statistics of classical processes can always be reproduced by a memoryless dynamical model, our main result shows that this is not true in quantum mechanics: We first provide an example of quantum non-Markovianity whose manifestation depends on whether or not a previous measurement is performed -- an impossible phenomenon for memoryless dynamics; we then strengthen this result by demonstrating statistics that are Markovian independent of how they are probed, but are nonetheless still incompatible with memoryless quantum dynamics. そこで我々は,その生成にメモリを必要とする量子過程を探究し,マルコフ統計の存在を立証する。

In classical physics, memoryless dynamics and Markovian statistics are one and the same. This is not true for quantum dynamics, first and foremost because quantum measurements are invasive. Going beyond measurement invasiveness, here we derive a novel distinction between classical and quantum processes, namely the possibility of hidden quantum memory. While Markovian statistics of classical processes can always be reproduced by a memoryless dynamical model, our main result shows that this is not true in quantum mechanics: We first provide an example of quantum non-Markovianity whose manifestation depends on whether or not a previous measurement is performed -- an impossible phenomenon for memoryless dynamics; we then strengthen this result by demonstrating statistics that are Markovian independent of how they are probed, but are nonetheless still incompatible with memoryless quantum dynamics. Thus, we establish the existence of Markovian statistics gathered by probing a quantum process that nevertheless fundamentally require memory for their creation.

翻訳日:2023-04-27 04:04:17 公開日:2023-04-25

# スピン回路量子力学を用いたJaynes-Cummings Ladderの提案

Probing the Jaynes-Cummings Ladder with Spin Circuit Quantum Electrodynamics ( http://arxiv.org/abs/2203.05668v2 )

ライセンス: Link先を確認

Tobias Bonsen (1), Patrick Harvey-Collard (1), Maximilian Russ (1), Jurgen Dijkema (1), Amir Sammak (2), Giordano Scappucci, Lieven M. K. Vandersypen (1) ((1) QuTech and Kavli Institute of Nanoscience, Delft University of Technology, (2) QuTech and Netherlands Organization for Applied Scientific Research (TNO))

(参考訳) 電子スピンを用いた回路量子力学(スピン回路QED)のJaynes-Cummingsはしごにおける励起状態間の遷移を報告する。本稿では,最近の実験研究における説明できない特徴がこのような遷移に対応することを示し,これらの効果を含む入力出力フレームワークを提案する。新しい実験では、まず以前の観測を再現し、プローブパワーを増大させ、2トーン分光を用いて励起状態遷移と多光子遷移の両方を明らかにする。このJaynes-Cummingsのはしごを探査する能力は、カップリング対デコヒーレンス比の改善によって実現され、量子現象を研究するための興味深いプラットフォームとしてスピン回路QEDの成熟度が増加することを示す。

We report observations of transitions between excited states in the Jaynes-Cummings ladder of circuit quantum electrodynamics with electron spins (spin circuit QED). We show that unexplained features in recent experimental work correspond to such transitions and present an input-output framework that includes these effects. In new experiments, we first reproduce previous observations and then reveal both excited-state transitions and multiphoton transitions by increasing the probe power and using two-tone spectroscopy. This ability to probe the Jaynes-Cummings ladder is enabled by improvements in the coupling-to-decoherence ratio, and shows an increase in the maturity of spin circuit QED as an interesting platform for studying quantum phenomena.

翻訳日:2023-04-27 04:03:39 公開日:2023-04-25

# 回転操作のトリガと制御のための深層強化学習による小型空中ロボットの逆着陸

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers ( http://arxiv.org/abs/2209.11043v2 )

ライセンス: Link先を確認

Bryan Habas, Jack W. Langelaan, Bo Cheng

(参考訳) 高速で堅牢な逆着陸は、特に船上でのセンシングと計算に完全に依存しながら、空中ロボットにとって難しい偉業である。それにもかかわらず、この偉業はコウモリ、ハエ、ミツバチなどの生物学的チラシによって定期的に行われる。これまでの研究では、一連の視覚手がかりと運動行動との直接的な因果関係を特定し、この挑戦的なエアロバティックな操作を小型の空中ロボットで信頼できる実行を可能にした。本研究では、まずDeep Reinforcement Learningと物理シミュレーションを用いて、任意のアプローチ条件から始まる頑健な逆着陸のための一般的な最適制御ポリシーを得る。この最適化された制御ポリシーは、システムの観測空間から回転操作のトリガーと制御を含む運動指令行動空間への計算効率のよいマッピングを提供する。これは、大きさや方向によって異なる幅広い接近飛行速度でシステムを訓練することで達成された。次に,シミュレーションにおけるロボットの慣性パラメータを変化させ,ドメインランダム化による学習方針のsim-to-real転送と実験的検証を行った。実験により, 着地堅牢性を大幅に向上させるいくつかの要因と, 逆着陸成功を決定づける主要なメカニズムを同定した。本研究で開発された学習フレームワークは, 騒音センサデータの利用, 様々な方向の面への着地, 動的に動く面への着地など, より困難な課題を解決するために一般化されることを期待している。

Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction. Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.

翻訳日:2023-04-27 03:56:18 公開日:2023-04-25

# 最適化によるビット割り当て

Bit Allocation using Optimization ( http://arxiv.org/abs/2209.09422v4 )

ライセンス: Link先を確認

Tongda Xu, Han Gao, Chenjian Gao, Yuanyuan Wang, Dailan He, Jinyong Pi, Jixiang Luo, Ziyu Zhu, Mao Ye, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

(参考訳) 本稿では,ニューラルビデオ圧縮(NVC)におけるビット割り当ての問題について考察する。まず,NVCにおけるビット割り当てと半補正変分推論(SAVI)の基本的な関係を明らかにする。具体的には、GoP(Group-of-Picture)レベルの確率を持つSAVIは、正確なレート \および品質依存モデルを持つピクセルレベルのビット割り当てと等価であることを示す。この等価性に基づいて、SAVIを用いたビット割り当ての新しいパラダイムを確立する。従来のビット割当法とは異なり、この手法は経験モデルを必要としないため最適である。さらに, 勾配上昇を用いたオリジナルのSAVIは, 単一レベル潜水剤にのみ適用するため, 勾配上昇による逆伝播を再帰的に適用することにより, SAVIをNVCなどのマルチレベルに拡張する。最後に,実用的な実装のためのトラクタブル近似を提案する。提案手法は,ビット割り当てのR-D性能の実証的バウンダリとして機能し,性能超過が速度を符号化するシナリオに適用できる。実験結果から、現在の最先端ビット割り当てアルゴリズムは、我々のものと比較して改善するために、$\approx 0.5$ dB PSNRの空間を持つことがわかった。コードは \url{https://github.com/tongdaxu/bit-allocation-using-optimization} で利用可能である。

In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). First, we reveal a fundamental relationship between bit allocation in NVC and Semi-Amortized Variational Inference (SAVI). Specifically, we show that SAVI with GoP (Group-of-Picture)-level likelihood is equivalent to pixel-level bit allocation with precise rate \& quality dependency model. Based on this equivalence, we establish a new paradigm of bit allocation using SAVI. Different from previous bit allocation methods, our approach requires no empirical model and is thus optimal. Moreover, as the original SAVI using gradient ascent only applies to single-level latent, we extend the SAVI to multi-level such as NVC by recursively applying back-propagating through gradient ascent. Finally, we propose a tractable approximation for practical implementation. Our method can be applied to scenarios where performance outweights encoding speed, and serves as an empirical bound on the R-D performance of bit allocation. Experimental results show that current state-of-the-art bit allocation algorithms still have a room of $\approx 0.5$ dB PSNR to improve compared with ours. Code is available at \url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}.

翻訳日:2023-04-27 03:55:52 公開日:2023-04-25

# 回折データの深層ニューラルネットワークによる弱信号抽出

Weak-signal extraction enabled by deep-neural-network denoising of diffraction data ( http://arxiv.org/abs/2209.09247v2 )

ライセンス: Link先を確認

Jens Oppliger, Michael M. Denner, Julia K\"uspert, Ruggero Frison, Qisi Wang, Alexander Morawietz, Oleh Ivashko, Ann-Christin Dippel, Martin von Zimmermann, Izabela Bia{\l}o, Leonardo Martinelli, Beno\^it Fauqu\'e, Jaewon Choi, Mirian Garcia-Fernandez, Kejin Zhou, Niels B. Christensen, Tohru Kurosawa, Naoki Momono, Migaku Oda, Fabian D. Natterer, Mark H. Fischer, Titus Neupert, Johan Chang

(参考訳) ノイズの除去やキャンセルは、画像や音響に広く応用されている。日常の応用では、デノナイジングには、根本的真実に反する生成的側面を含むこともある。しかし、科学的応用については、真理を正確に再現する必要がある。本稿では,弱い信号が定量的な精度で現れるように,深い畳み込みニューラルネットワークを用いてデータを分節化する方法を示す。特に結晶材料のX線回折について検討する。本研究では,ノイズデータにおける電荷秩序に起因する弱信号の可視性と正確性を示す。この成功は、測定された低ノイズデータと高ノイズデータのペアによるディープニューラルネットワークの教師付きトレーニングによって実現される。このようにして、ニューラルネットワークはノイズの統計的特性について学習する。人工雑音は, 定量的に正確な結果が得られないことを示す。提案手法は,難解な取得問題に適用可能なノイズフィルタリングの実践的戦略を示すものである。

Removal or cancellation of noise has wide-spread applications for imaging and acoustics. In every-day-life applications, denoising may even include generative aspects which are unfaithful to the ground truth. For scientific applications, however, denoising must reproduce the ground truth accurately. Here, we show how data can be denoised via a deep convolutional neural network such that weak signals appear with quantitative accuracy. In particular, we study X-ray diffraction on crystalline materials. We demonstrate that weak signals stemming from charge ordering, insignificant in the noisy data, become visible and accurate in the denoised data. This success is enabled by supervised training of a deep neural network with pairs of measured low- and high-noise data. This way, the neural network learns about the statistical properties of the noise. We demonstrate that using artificial noise does not yield such quantitatively accurate results. Our approach thus illustrates a practical strategy for noise filtering that can be applied to challenging acquisition problems.

翻訳日:2023-04-27 03:55:33 公開日:2023-04-25

# 半間接離散対数問題に対する部分指数量子アルゴリズム

A Subexponential Quantum Algorithm for the Semidirect Discrete Logarithm Problem ( http://arxiv.org/abs/2209.02814v4 )

ライセンス: Link先を確認

Christopher Battarbee, Delaram Kahrobaei, Ludovic Perret, and Siamak F. Shahandashti

(参考訳) グループベースの暗号は、量子後暗号における比較的未発見の家系であり、いわゆるセミダイレクト離散対数問題(Semidirect Discrete Logarithm Problem, SDLP)は最も中心的な問題の一つである。しかし、SDLPの複雑さと、特に量子敵に対するセキュリティに関して、よりよく知られた硬さ問題との関係はよく理解されておらず、この分野の研究者にとって重要なオープンな問題であった。本稿では,sdlpのセキュリティ解析を初めて実施する。特に、SDLPとグループアクションの間には、量子部分指数アルゴリズムを適用することが知られているコンテキストがある。したがって、SDLPを解くための部分指数量子アルゴリズムを構築することができ、SDLPの複雑さと既知の計算問題との関係を分類することができる。

Group-based cryptography is a relatively unexplored family in post-quantum cryptography, and the so-called Semidirect Discrete Logarithm Problem (SDLP) is one of its most central problems. However, the complexity of SDLP and its relationship to more well-known hardness problems, particularly with respect to its security against quantum adversaries, has not been well understood and was a significant open problem for researchers in this area. In this paper we give the first dedicated security analysis of SDLP. In particular, we provide a connection between SDLP and group actions, a context in which quantum subexponential algorithms are known to apply. We are therefore able to construct a subexponential quantum algorithm for solving SDLP, thereby classifying the complexity of SDLP and its relation to known computational problems.

翻訳日:2023-04-27 03:54:43 公開日:2023-04-25

# ロバスト音響誘導画像マニピュレーション

Robust Sound-Guided Image Manipulation ( http://arxiv.org/abs/2208.14114v3 )

ライセンス: Link先を確認

Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

(参考訳) 最近の成功は、例えば、晴れた日に風景シーンが、テキスト入力「レイニング」によって駆動される雨の日に同じシーンに操作されるように、テキストプロンプトで画像を操作できることを示唆している。これらのアプローチはしばしば、マルチモーダル(テキストとイメージ)埋め込み空間を利用するStyleCLIPベースのイメージジェネレータを利用する。しかし,このようなテキスト入力は,降雨時の豪雨と雷雨の区別など,リッチなセマンティック・キューの提供と合成においてしばしばボトルネックとなる。この問題に対処するために、テキストよりも多様な意味的手がかり(生き生きとした感情や自然界のダイナミックな表現)を伝達できるため、画像操作において顕著な優位性を持つ追加のモダリティ、音の活用を提唱する。本稿では,まず画像とテキストの組込み空間を音で拡張し,例えば雨音など,音声入力に基づいて画像を操作するための直接潜在最適化手法を提案する。当社の音響誘導画像操作手法は,最先端のテキストや音声誘導画像操作手法よりも,意味的かつ視覚的に正確な操作結果が得られることを示す。ダウンストリームタスク評価では,学習した画像-テキスト-音声統合埋め込み空間が音響入力を効果的に符号化することを示す。

Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in providing and synthesizing rich semantic cues, e.g., differentiating heavy rain from rain with thunderstorms. To address this issue, we advocate leveraging an additional modality, sound, which has notable advantages in image manipulation as it can convey more diverse semantic cues (vivid emotions or dynamic expressions of the natural world) than texts. In this paper, we propose a novel approach that first extends the image-text joint embedding space with sound and applies a direct latent optimization method to manipulate a given image based on audio input, e.g., the sound of rain. Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations. Our downstream task evaluations also show that our learned image-text-sound joint embedding space effectively encodes sound inputs.

翻訳日:2023-04-27 03:54:30 公開日:2023-04-25

# 抽象的な会議要約:調査

Abstractive Meeting Summarization: A Survey ( http://arxiv.org/abs/2208.04163v2 )

ライセンス: Link先を確認

Virgile Rennard, Guokan Shang, Julie Hunter, Michalis Vazirgiannis

(参考訳) 会話の最も重要なポイントを確実に特定し、まとめることができるシステムは、ビジネス会議から医療相談、カスタマーサービスコールに至るまで、さまざまな現実世界のコンテキストにおいて有用である。ディープラーニングの最近の進歩、特にエンコーダ-デコーダアーキテクチャの発明は、言語生成システムを大幅に改善し、多人数会話に特に適した要約の形式である抽象的要約(abstractive summarization)の改善への扉を開く。本稿では,要約を抽象化する作業によって引き起こされた課題の概要と,この問題に対処するためのデータセット,モデル,評価指標について概説する。

A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved language generation systems, opening the door to improved forms of abstractive summarization, a form of summarization particularly well-suited for multi-party conversation. In this paper, we provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models and evaluation metrics that have been used to tackle the problems.

翻訳日:2023-04-27 03:54:03 公開日:2023-04-25

# 自由空間における非平衡超放射相転移

A non-equilibrium superradiant phase transition in free space ( http://arxiv.org/abs/2207.10361v2 )

ライセンス: Link先を確認

Giovanni Ferioli, Antoine Glicenstein, Igor Ferrier-Barbut, and Antoine Browaeys

(参考訳) 散逸、外部駆動、相互作用が競合し、駆動なしでは存在しない非平衡相を生じさせる系のクラスが存在する。ここでは、位相遷移は対称性を破ることなく起こりうるが、局所的な順序パラメータでは、平衡における位相遷移のランダウ理論とは対照的である。最も単純な散逸量子系の一つは、光場によって駆動される原子遷移立方体の波長よりも小さい体積で囲まれた2レベル原子である。原子の駆動場への集団結合と協調的崩壊の競合は、全ての原子双極子が位相ロックされている相と超ラジカル自発的放出によって制御される相の間の遷移に繋がるべきである。ここでは,自由空間におけるレーザー冷却原子の鉛筆型雲を主軸に沿って光学的に励起し,予測した位相を観察することにより,このモデルを実現する。我々の実証は、自由空間超放射光レーザーの取得や、新しいタイプの時間結晶の観測の観点から有望である。

A class of systems exists in which dissipation, external drive and interactions compete and give rise to non equilibrium phases that would not exist without the drive. There, phase transitions could occur without the breaking of any symmetry, yet with a local order parameter, in contrast with the Landau theory of phase transitions at equilibrium. One of the simplest driven dissipative quantum systems consists of two-level atoms enclosed in a volume smaller than the wavelength of the atomic transition cubed, driven by a light field. The competition between collective coupling of the atoms to the driving field and their cooperative decay should lead to a transition between a phase where all the atomic dipoles are phaselocked and a phase governed by superradiant spontaneous emission. Here, we realize this model using a pencil-shaped cloud of laser cooled atoms in free space, optically excited along its main axis, and observe the predicted phases. Our demonstration is promising in view of obtaining free-space superradiant lasers or to observe new types of time crystals.

翻訳日:2023-04-27 03:53:53 公開日:2023-04-25

# Rokhsar-Kivelson-sign波動関数の絡み合い複雑性

Entanglement complexity of the Rokhsar-Kivelson-sign wavefunctions ( http://arxiv.org/abs/2211.01428v3 )

ライセンス: Link先を確認

Stefano Piemontese, Tommaso Roscilde, Alioscia Hamma

(参考訳) 本稿では,1つのパラメータによって絡み合いの度合いが制御される,模範状態であるロクサー・キベルソン符号波動関数(Rokhsar-Kivelson-sign wavefunctions)の絡み合い複雑性の遷移について検討する。この状態群は、エントロピーの体積則スケーリングを示す相と、エンタングルメントのサブ拡張スケーリングを持つ相の間の遷移を特徴とすることが知られており、乱れた量子ハミルトンの多体局所化遷移を想起させる[physical review b 92, 214204 (2015)]。我々は、量子情報理論のいくつかのツールを用いて、ロクサー・キヴェルソン符号波動関数の特異点とその絡み込み複雑性を、量子情報理論のいくつかのツールを用いて研究する: 忠実度計量、絡み合いスペクトル統計、絡み合いエントロピーゆらぎ、安定化器R\enyiエントロピー、および非絡みアルゴリズムの性能。体積則フェーズ全体を通して、状態は普遍的絡み合いスペクトル統計量を持つ。しかし、全てのメトリクスがパラメータ自身から独立になる制御パラメータの小さな値に「超ユニバーサル」の規則が現れる; 絡み合いエントロピーと安定化器 R\'enyi エントロピーは理論的な最大値に近づく; 絡み合いのゆらぎはランダムな普遍回路の出力状態のようにゼロにスケールし、解離アルゴリズムは本質的にゼロ効率を持つ。これら全ての指標は、一貫して複雑な絡み合いのパターンを示す。一方、サブボリューム法相では、絡み合いスペクトル統計はもはや普遍的ではなく、絡み合いの変動はより大きく、非ユニバーサルスケーリングを示し、非絡み合いアルゴリズムの効率は有限となる。モデル波動関数に基づき, エンタングルメントスケーリング特性とエンタングルメント複雑性特性の類似の組み合わせが, 高エネルギーハミルトニアンの固有状態に見られることが示唆された。

In this paper we study the transitions of entanglement complexity in an exemplary family of states - the Rokhsar-Kivelson-sign wavefunctions - whose degree of entanglement is controlled by a single parameter. This family of states is known to feature a transition between a phase exhibiting volume-law scaling of entanglement entropy and a phase with sub-extensive scaling of entanglement, reminiscent of the many-body-localization transition of disordered quantum Hamiltonians [Physical Review B 92, 214204 (2015)]. We study the singularities of the Rokhsar-Kivelson-sign wavefunctions and their entanglement complexity across the transition using several tools from quantum information theory: fidelity metric; entanglement spectrum statistics; entanglement entropy fluctuations; stabilizer R\'enyi Entropy; and the performance of a disentangling algorithm. Across the whole volume-law phase the states feature universal entanglement spectrum statistics. Yet a "super-universal" regime appears for small values of the control parameter in which all metrics become independent of the parameter itself; the entanglement entropy as well as the stabilizer R\'enyi entropy appear to approach their theoretical maximum; the entanglement fluctuations scale to zero as in output states of random universal circuits, and the disentangling algorithm has essentially null efficiency. All these indicators consistently reveal a complex pattern of entanglement. In the sub-volume-law phase, on the other hand, the entanglement spectrum statistics is no longer universal, entanglement fluctuations are larger and exhibiting a non-universal scaling; and the efficiency of the disentangling algorithm becomes finite. Our results, based on model wavefunctions, suggest that a similar combination of entanglement scaling properties and of entanglement complexity features may be found in high-energy Hamiltonian eigenstates.

翻訳日:2023-04-27 03:47:22 公開日:2023-04-25

# 半金属および半伝導性グラフェン-hBN多層膜

Semimetallic and semiconducting graphene-hBN multilayers with parallel or reverse stacking ( http://arxiv.org/abs/2210.16393v2 )

ライセンス: Link先を確認

Xi Chen, Klaus Zollner, Christian Moulsdale, Vladimir I. Fal'ko, Angelika Knothe

(参考訳) 異なる対称性を有する交互グラフェンおよびhbn層の3次元層状結晶を理論的に検討した。グラフェン層間のホッピングパラメータによって、これらの合成3D材料は、半金属、ギャップ、またはワイル半金属相を特徴付けることができる。その結果, 個々の2次元材料から積み重ねられた3次元結晶は, 構成成分とは異なる創発性を有する合成材料クラスであることがわかった。

We theoretically investigate 3D layered crystals of alternating graphene and hBN layers with different symmetries. Depending on the hopping parameters between the graphene layers, we find that these synthetic 3D materials can feature semimetallic, gapped, or Weyl semimetal phases. Our results demonstrate that 3D crystals stacked from individual 2D materials represent a synthetic materials class with emergent properties different from their constituents.

翻訳日:2023-04-27 03:46:41 公開日:2023-04-25

# Masked Autoencodersはアート学習者。

Masked Autoencoders Are Articulatory Learners ( http://arxiv.org/abs/2210.15195v2 )

ライセンス: Link先を確認

Ahmed Adel Attia, Carol Espy-Wilson

(参考訳) 調音録音は声道に沿った異なる調音器の位置と動きを追跡し、音声生成の研究や調音ベースの音声合成装置や音声インバージョンシステムといった音声技術の開発に広く用いられている。ウィスコンシン大学x線マイクロビーム(xrmb)データセットは、音声録音と同期した調音記録を提供する様々なデータセットの1つである。 xrmbの調音録音では、マイクロビームで追跡できる多数の調音器にペレットが配置されている。しかし、録音のかなりの部分は誤トラックされており、これまでは使用不可能であった。本研究では,マスキングオートエンコーダを用いて,xrmbデータセットの話者47名中41名を対象に,誤追跡された調音録音を正確に再構成する深層学習手法を提案する。従来使用できなかった3.4時間のうち3.28時間程度を収集し,8つの調音器のうち3つが誤追跡された場合でも,実感に合致した調音軌跡を再現することができる。

Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems. The University of Wisconsin X-Ray microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings. The XRMB articulatory recordings employ pellets placed on a number of articulators which can be tracked by the microbeam. However, a significant portion of the articulatory recordings are mistracked, and have been so far unsuable. In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. Our model is able to reconstruct articulatory trajectories that closely match ground truth, even when three out of eight articulators are mistracked, and retrieve 3.28 out of 3.4 hours of previously unusable recordings.

翻訳日:2023-04-27 03:46:32 公開日:2023-04-25

# 知識強化関係抽出データセット

Knowledge-Enhanced Relation Extraction Dataset ( http://arxiv.org/abs/2210.11231v3 )

ライセンス: Link先を確認

Yucong Lin, Hongming Xiao, Jiani Liu, Zichao Lin, Keming Lu, Feifei Wang, Wei Wei

(参考訳) 近年,補助知識グラフを利用した知識強化手法が,従来のテキストベースアプローチを超越した関係抽出に現れている。しかし、我々の知る限り、現在、知識強化関係抽出のための証拠文と知識グラフの両方を含む公開データセットは存在しない。このギャップに対処するために、知識強化関係抽出データセット(KERED)を導入する。 KEREDは各文に関係事実を付加し、エンティティリンクを通じてエンティティの知識コンテキストを提供する。得られたデータセットを用いて,2つのタスク設定(文レベルとバッグレベル)で,同時代の関係抽出手法を比較した。実験の結果,keredが提供する知識グラフは,知識エンハンスド関係抽出法をサポートできることがわかった。我々は,kered が知識グラフを用いた良質な関係抽出データセットを提供し,知識拡張関係抽出手法の性能評価を行っていると考えている。データセットは以下の通りである。 \url{https://figshare.com/projects/KERED/134459}

Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have emerged in relation extraction, surpassing traditional text-based approaches. However, to our best knowledge, there is currently no public dataset available that encompasses both evidence sentences and knowledge graphs for knowledge-enhanced relation extraction. To address this gap, we introduce the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED annotates each sentence with a relational fact, and it provides knowledge context for entities through entity linking. Using our curated dataset, We compared contemporary relation extraction methods under two prevalent task settings: sentence-level and bag-level. The experimental result shows the knowledge graphs provided by KERED can support knowledge-enhanced relation extraction methods. We believe that KERED offers high-quality relation extraction datasets with corresponding knowledge graphs for evaluating the performance of knowledge-enhanced relation extraction methods. Our dataset is available at: \url{https://figshare.com/projects/KERED/134459}

翻訳日:2023-04-27 03:46:16 公開日:2023-04-25

# 調和振動子の非ガウス状態に対するダイナミクスに基づく絡み合い証人

Dynamics-based entanglement witnesses for non-Gaussian states of harmonic oscillators ( http://arxiv.org/abs/2210.10357v3 )

ライセンス: Link先を確認

Pooja Jayachandran, Lin Htoo Zaw, Valerio Scarani

(参考訳) 連続変数系の絡み合い証人の族について紹介する。これはテスト時の結合調和発振器の動力学が結合調和振動子であるという唯一の仮定に依存する。絡み合いは、通常のモードの1つにおけるtsirelson nonclassicality testから推測され、他のモードの状態について何も知らない。各ラウンドにおいて、プロトコルは1つの座標(例えば位置)の符号のみを数回にわたって測定する必要がある。この動的ベースの絡み合いの証人は、不確実性の関係よりもベルの不等式に似ている:特に古典理論の偽陽性は認めない。我々の基準は非ガウス状態を検出するが、それらのいくつかは他の基準では見落としている。

We introduce a family of entanglement witnesses for continuous variable systems, which rely on the sole assumption that their dynamics is that of coupled harmonic oscillators at the time of the test. Entanglement is inferred from the Tsirelson nonclassicality test on one of the normal modes, without any knowledge about the state of the other mode. In each round, the protocol requires measuring only the sign of one coordinate (e.g. position) at one among several times. This dynamic-based entanglement witness is more akin to a Bell inequality than to an uncertainty relation: in particular, it does not admit false positives from classical theory. Our criterion detects non-Gaussian states, some of which are missed by other criteria.

翻訳日:2023-04-27 03:46:01 公開日:2023-04-25

# オープンソースソフトウェア開発者のためのコードレコメンデーション

Code Recommendation for Open Source Software Developers ( http://arxiv.org/abs/2210.08332v3 )

ライセンス: Link先を確認

Yiqiao Jin, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, Wei Wang

(参考訳) オープンソースソフトウェア(OSS)は、技術基盤の根幹を形成し、数百万人の人材を惹きつけている。特に、OSS開発者に適切な開発タスクを推奨するために、開発者の関心事とプロジェクトコードのセマンティックな特徴の両方を考慮するのは困難で重要なことです。本稿では,開発者のインタラクション履歴,ソースコードの意味的特徴,プロジェクトの階層的ファイル構造を考慮に入れて,今後の貢献行動を予測することを目的とした,新しいコード推薦問題を提案する。システム内の複数のパーティ間の複雑な相互作用を考慮し,オープンソースソフトウェア開発者のための新しいグラフベースのコードレコメンデーションフレームワークであるCODERを提案する。コーダーは、異種グラフを介して、ミクロなユーザ・コード間インタラクションとマクロなユーザ・プロジェクト間インタラクションを共同でモデル化し、さらに、プロジェクト階層を反映したファイル構造グラフの集約を通じて、2つのレベルの情報を橋渡しする。さらに,信頼性の高いベンチマークの欠如により,将来研究を促進するために3つの大規模データセットを構築した。大規模実験の結果,CODERフレームワークはプロジェクト内,クロスプロジェクト,コールドスタートレコメンデーションなど,様々な実験条件下で優れた性能を発揮することがわかった。この作業が受け入れられ次第、データ検索のためのすべてのデータセット、コード、ユーティリティをリリースします。

Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers' interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. Considering the complex interactions among multiple parties within the system, we propose CODER, a novel graph-based code recommendation framework for open source software developers. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, due to the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation. We will release all the datasets, code, and utilities for data retrieval upon the acceptance of this work.

翻訳日:2023-04-27 03:45:49 公開日:2023-04-25

# 適応バイアス量子近似最適化アルゴリズムによるSAT問題の解法

Solution of SAT Problems with the Adaptive-Bias Quantum Approximate Optimization Algorithm ( http://arxiv.org/abs/2210.02822v3 )

ライセンス: Link先を確認

Yunlong Yu, Chenfeng Cao, Xiang-Bin Wang, Nic Shannon, and Robert Joynt

(参考訳) 量子近似最適化アルゴリズム(QAOA)は、短期量子デバイスにおける古典的な組合せ最適化問題を解くための有望な方法である。 QAOA を 3-SAT および Max-3-SAT 問題に使用する場合、量子コストは、節密度が変化するにつれて、それぞれ容易にハードなパターンまたは簡単なハードなパターンを示す。ハードリージョン問題で必要とされる量子リソースは、現在のNISQデバイスには及ばない。本稿では,最大14変数の数値シミュレーションにより,適応バイアスQAOA(ab-QAOA)が3SAT問題のハード領域とMax-3-SAT問題のハード領域の性能を大幅に向上させることを示す。同様の精度では、ab-QAOAは平均で10変数の3SAT問題に対して3レベルを必要とする。 10変数のMax-3-SAT問題では、数値は7レベルと62レベルである。この改良は、進化の過程でより標的にされ、より限定された絡み合いから生じる。本稿では,ab-QAOAでは局所場を用いて進化を導くため,古典最適化は必須ではないことを示す。これにより,Ab-QAOAに比べて量子ゲートが著しく少ないハードリージョン3SATとMax-3-SATの問題を効果的に解くことができる最適化フリーなAb-QAOAを提案する。我々の研究は、NISQデバイスにおける最適化問題に対する量子アドバンテージを実現するための道を開いた。

The quantum approximate optimization algorithm (QAOA) is a promising method for solving certain classical combinatorial optimization problems on near-term quantum devices. When employing the QAOA to 3-SAT and Max-3-SAT problems, the quantum cost exhibits an easy-hard-easy or easy-hard pattern respectively as the clause density is changed. The quantum resources needed in the hard-region problems are out of reach for current NISQ devices. We show by numerical simulations with up to 14 variables and analytical arguments that the adaptive-bias QAOA (ab-QAOA) greatly improves performance in the hard region of the 3-SAT problems and hard region of the Max-3-SAT problems. For similar accuracy, on average, ab-QAOA needs 3 levels for 10-variable 3-SAT problems as compared to 22 for QAOA. For 10-variable Max-3-SAT problems, the numbers are 7 levels and 62 levels. The improvement comes from a more targeted and more limited generation of entanglement during the evolution. We demonstrate that classical optimization is not strictly necessary in the ab-QAOA since local fields are used to guide the evolution. This leads us to propose an optimization-free ab-QAOA that can solve the hard-region 3-SAT and Max-3-SAT problems effectively with significantly fewer quantum gates as compared to the original ab-QAOA. Our work paves the way for realizing quantum advantages for optimization problems on NISQ devices.

翻訳日:2023-04-27 03:45:27 公開日:2023-04-25

# 材料工学における人工知能: 材料工学におけるAIの応用に関するレビュー

Artificial Intelligence in Material Engineering: A review on applications of AI in Material Engineering ( http://arxiv.org/abs/2209.11234v2 )

ライセンス: Link先を確認

Lipichanda Goswami, Manoj Deka and Mohendra Roy

(参考訳) 物質科学と工学(MSE)における人工知能(AI)の役割は、AI技術の進歩とともにますます重要になりつつある。高性能コンピューティングの開発により、大きなパラメータを持つディープラーニング(DL)モデルをテストすることが可能となり、特性予測において密度汎関数理論(DFT)のような従来の計算手法の限界を克服する機会となった。機械学習(ML)ベースの手法は、DFTベースの手法よりも高速で正確である。さらに, 生成逆数ネットワーク(GAN)は, 結晶構造情報を使わずに無機材料の化学組成の生成を促進する。これらの開発は材料工学(ME)と研究に大きな影響を与えた。ここでは、MEにおけるAIの最新開発についてレビューする。まず, 材料加工, 構造と材料特性の研究, 各種面における材料性能の測定など, MEの重要領域におけるAIの開発について論じる。次に、グラフニューラルネットワーク、生成モデル、学習の伝達など、MSEにおけるAIの重要な方法とその利用について論じる。既存の分析機器からの結果を分析するためのAIの利用についても論じる。最後に、MEにおけるAIのアドバンテージ、デメリット、将来について論じる。

The role of artificial intelligence (AI) in material science and engineering (MSE) is becoming increasingly important as AI technology advances. The development of high-performance computing has made it possible to test deep learning (DL) models with significant parameters, providing an opportunity to overcome the limitation of traditional computational methods, such as density functional theory (DFT), in property prediction. Machine learning (ML)-based methods are faster and more accurate than DFT-based methods. Furthermore, the generative adversarial networks (GANs) have facilitated the generation of chemical compositions of inorganic materials without using crystal structure information. These developments have significantly impacted material engineering (ME) and research. Some of the latest developments in AI in ME herein are reviewed. First, the development of AI in the critical areas of ME, such as in material processing, the study of structure and material property, and measuring the performance of materials in various aspects, is discussed. Then, the significant methods of AI and their uses in MSE, such as graph neural network, generative models, transfer of learning, etc. are discussed. The use of AI to analyze the results from existing analytical instruments is also discussed. Finally, AI's advantages, disadvantages, and future in ME are discussed.

翻訳日:2023-04-27 03:44:46 公開日:2023-04-25

# KGML-xDTD: 薬物治療予測とメカニズム記述のための知識グラフベースの機械学習フレームワーク

KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description ( http://arxiv.org/abs/2212.01384v2 )

ライセンス: Link先を確認

Chunyu Ma, Zhihan Zhou, Han Liu, David Koslicki

(参考訳) 背景: 計算薬の再利用は、既存の薬物や化合物の新しい治療目標や疾患(指標)を特定することを目的とした、コストと時間効率のよいアプローチである。従来の湿式薬物発見法と比較して、投資が安く、研究サイクルが短いため、特に発病や孤児病にとって重要である。しかし、薬物と標的疾患との間の行動のメカニズム(moas)はほとんど不明であり、このことは依然として、臨床現場で広く採用される薬物再導入法の主要な障害となっている。結果: 本研究では, 薬物処理疾患の予測を行う知識グラフベースの機械学習フレームワークであるKGML-xDTDを提案する。薬物/化合物と疾患の間の治療の確率を予測するだけでなく、知識グラフ(KG)経路に基づくテスト可能な行動機構(MOAs)を介して生物学的にそれらを説明する2モジュールフレームワークである。グラフベース強化学習(GRL)パスの中間指導として,知識と公開に基づく情報を活用し,生物学的に意味のある「実証経路」を抽出する。包括的実験とケーススタディ分析により, 提案手法は, ヒトのmoa経路の薬物再導入と再認識の予測の両方において, 最先端のパフォーマンスを達成できることが示された。結論: KGML-xDTDは、予測結果と既存の生物学的知識と出版物の組み合わせを活用して、KGパスによる薬物再投薬予測を説明できる最初のモデルフレームワークである。我々は,「ブラックボックス」の懸念を効果的に軽減し,予測された経路に基づく説明に基づく薬物再資源化の予測信頼度を高め,新興疾患に対する薬物発見のプロセスをさらに促進できると考えている。

Background: Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings. Results: In this work, we propose KGML-xDTD: a Knowledge Graph-based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a two-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable mechanisms of action (MOAs). We leverage knowledge-and-publication based information to extract biologically meaningful "demonstration paths" as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths. Conclusions: KGML-xDTD is the first model framework that can offer KG-path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce "black-box" concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations, and further accelerate the process of drug discovery for emerging diseases.

翻訳日:2023-04-27 03:38:25 公開日:2023-04-25

# GREAD: グラフニューラル反応拡散ネットワーク

GREAD: Graph Neural Reaction-Diffusion Networks ( http://arxiv.org/abs/2211.14208v2 )

ライセンス: Link先を確認

Jeongwhan Choi, Seoyoung Hong, Noseong Park, Sung-Bae Cho

(参考訳) グラフニューラルネットワーク(GNN)は、ディープラーニングに関する最も人気のある研究トピックの1つである。 GNN法は通常、グラフ信号処理理論に基づいて設計されている。特に、拡散方程式はGNNのコア処理層の設計に広く用いられており、悪名高い過密問題に対して必然的に脆弱である。最近、いくつかの論文が拡散方程式とともに反応方程式に注意を払っている。しかし、それらはすべて限定的な反応方程式である。そこで本研究では,我々が設計した1つの特殊反応方程式に加えて,一般的な反応方程式をすべて考慮した反応拡散式に基づくgnn法を提案する。本論文は,反応拡散式に基づくgnnに関する最も包括的な研究の1つである。 9つのデータセットと28のベースラインを用いた実験では、GREADと呼ばれる手法がほとんどのケースで優れています。さらなる合成データ実験により、オーバースムーシング問題を緩和し、様々なホモフィリー率でうまく機能することが示された。

Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.

翻訳日:2023-04-27 03:37:58 公開日:2023-04-25

# エゴセントリック行動予測のためのインタラクションビジュアルトランスフォーマ

Interaction Visual Transformer for Egocentric Action Anticipation ( http://arxiv.org/abs/2211.14154v3 )

ライセンス: Link先を確認

Debaditya Roy, Ramanathan Rajendiran and Basura Fernando

(参考訳) ヒトと物体の相互作用は最も重要な視覚的手がかりの1つであり、人間と物体の相互作用をエゴセントリックな行動予測のために表現する方法を提案する。本稿では,アクションの実行による物体と人間の手の外観の変化を計算し,その変化を利用して映像表現を洗練することにより,インタラクションをモデル化するトランスフォーマーを提案する。具体的には,空間クロスアテンション(sca)を用いて手と物体の相互作用をモデル化し,さらに軌道クロスアテンションを用いた文脈情報から環境改良されたインタラクショントークンを得る。これらのトークンを用いて,行動予測のためのインタラクション中心のビデオ表現を構築する。本稿では,EPICKTICHENS100(EK100)とEGTEA Gaze+を用いて,最先端のアクション予測性能を実現するモデルInAViTを述べる。 InAViTは、オブジェクト中心のビデオ表現を含む他のビジュアルトランスフォーマーベースの手法より優れている。 EK100評価サーバでは、InAViTは公開リーダーボード上で(提出時点で)最高パフォーマンスの手法であり、平均5回のリコールで2番目に良いモデルよりも3.3%上回っている。

Human-object interaction is one of the most important visual cues and we propose a novel way to represent human-object interactions for egocentric action anticipation. We propose a novel transformer variant to model interactions by computing the change in the appearance of objects and human hands due to the execution of the actions and use those changes to refine the video representation. Specifically, we model interactions between hands and objects using Spatial Cross-Attention (SCA) and further infuse contextual information using Trajectory Cross-Attention to obtain environment-refined interaction tokens. Using these tokens, we construct an interaction-centric video representation for action anticipation. We term our model InAViT which achieves state-of-the-art action anticipation performance on large-scale egocentric datasets EPICKTICHENS100 (EK100) and EGTEA Gaze+. InAViT outperforms other visual transformer-based methods including object-centric video representation. On the EK100 evaluation server, InAViT is the top-performing method on the public leaderboard (at the time of submission) where it outperforms the second-best model by 3.3% on mean-top5 recall.

翻訳日:2023-04-27 03:37:45 公開日:2023-04-25

# イベントカメラのためのデータ駆動型特徴追跡

Data-driven Feature Tracking for Event Cameras ( http://arxiv.org/abs/2211.12826v3 )

ライセンス: Link先を確認

Nico Messikommer, Carter Fang, Mathias Gehrig, Davide Scaramuzza

(参考訳) 高時間分解能、動きのぼかしに対するレジリエンスの増大、そして非常に少ない出力のため、イベントカメラは挑戦的なシナリオであっても低レイテンシで低帯域幅の特徴追跡に最適であることが示されている。既存のイベントカメラの特徴追跡手法は手作りか第一原理から派生しているが、広範なパラメータチューニングが必要であり、ノイズに敏感であり、非モデル化効果のために異なるシナリオに一般化しない。これらの欠陥に対処するために、グレースケールフレームで検出された特徴を追跡するために、低レイテンシイベントを活用するイベントカメラ用の最初のデータ駆動機能トラッカーを導入する。特徴トラック間で情報を共有する新しいフレームアテンションモジュールにより,ロバストな性能を実現する。合成データから実データへのゼロショットを直接転送することで、データ駆動トラッカーは、相対的特徴年齢における既存のアプローチを最大120%上回り、低レイテンシを実現する。この性能ギャップはさらに130%増加し、トラッカーを新たな自己超越戦略で実データに適用する。

Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tracking, even in challenging scenarios. Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame. We achieve robust performance via a novel frame attention module, which shares information across feature tracks. By directly transferring zero-shot from synthetic to real data, our data-driven tracker outperforms existing approaches in relative feature age by up to 120% while also achieving the lowest latency. This performance gap is further increased to 130% by adapting our tracker to real data with a novel self-supervision strategy.

翻訳日:2023-04-27 03:37:01 公開日:2023-04-25

# 雑音極大絡み状態を持つ完全量子非局所ゲームの決定可能性

Decidability of fully quantum nonlocal games with noisy maximally entangled states ( http://arxiv.org/abs/2211.10613v4 )

ライセンス: Link先を確認

Minglong Qin, Penghui Yao

(参考訳) 本稿では、雑音の多い最大絡み合った状態を持つ完全量子非局所ゲームの決定可能性について考察する。完全量子非ローカルゲームは非ローカルゲームの一般化であり、質問と回答の両方が量子的であり、審判はプレイヤーから量子的回答を受けた後にゲームに勝つかどうかを決定するためにバイナリ povm 測定を行う。完全量子非局所ゲームの量子値 (quantum value) は、プレイヤーがゲームに勝つ確率の上限であり、プレイヤー間で共有される全ての可能な絡み合った状態と、プレイヤーが行うすべての有効な量子演算を超越する。セミナーワーク $\mathrm{MIP}^*=\mathrm{RE}$ は、完全非局所ゲームの量子値を近似することは決定不可能であることを意味する。これは、プレイヤーが最大に絡み合った状態を共有することしか許されていない場合でも継続される。本稿では,共有最大絡み合った状態がノイズである場合について検討する。我々は、プレイヤーが量子値に任意に近い確率で完全量子非局所ゲームに勝つために、ノイズの多い最大絡み合い状態のコピーに計算可能な上限が存在することを証明する。これは、これらのゲームの量子値の近似が決定可能であることを意味する。したがって、完全量子非局所ゲームにおける量子値の近似の難しさは共有状態のノイズに対して強固ではない。本稿では,協調分布の非対話的シミュレーションを決定可能とする枠組みを構築し,非局所ゲームに対する類似結果を一般化する。フーリエ解析の理論を超作用素の空間に拡張し、不変原理や超作用素の次元還元を含むいくつかの重要な結果を証明する。これらの結果は、それ自体が興味深いものであり、さらなる応用があると考えられている。

This paper considers the decidability of fully quantum nonlocal games with noisy maximally entangled states. Fully quantum nonlocal games are a generalization of nonlocal games, where both questions and answers are quantum and the referee performs a binary POVM measurement to decide whether they win the game after receiving the quantum answers from the players. The quantum value of a fully quantum nonlocal game is the supremum of the probability that they win the game, where the supremum is taken over all the possible entangled states shared between the players and all the valid quantum operations performed by the players. The seminal work $\mathrm{MIP}^*=\mathrm{RE}$ implies that it is undecidable to approximate the quantum value of a fully nonlocal game. This still holds even if the players are only allowed to share (arbitrarily many copies of) maximally entangled states. This paper investigates the case that the shared maximally entangled states are noisy. We prove that there is a computable upper bound on the copies of noisy maximally entangled states for the players to win a fully quantum nonlocal game with a probability arbitrarily close to the quantum value. This implies that it is decidable to approximate the quantum values of these games. Hence, the hardness of approximating the quantum value of a fully quantum nonlocal game is not robust against the noise in the shared states. This paper is built on the framework for the decidability of non-interactive simulations of joint distributions and generalizes the analogous result for nonlocal games. We extend the theory of Fourier analysis to the space of super-operators and prove several key results including an invariance principle and a dimension reduction for super-operators. These results are interesting in their own right and are believed to have further applications.

翻訳日:2023-04-27 03:36:42 公開日:2023-04-25

# 量子化学習のための部分スクラッチオフロッキーチケットの爆発

Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training ( http://arxiv.org/abs/2211.08544v3 )

ライセンス: Link先を確認

Yunshan Zhong, Mingbao Lin, Yuxin Zhang, Gongrui Nan, Fei Chao, Rongrong Ji

(参考訳) 量子化アウェアトレーニング(qat)は、量子化ネットワークのパフォーマンスを保ちながら広く普及している。現代のQATでは、全ての量子化重量がトレーニングプロセス全体に対して更新される。本稿では,我々が観察した興味深い現象をもとに,この経験に挑戦する。具体的には、量子化された重みの大部分が、いくつかのトレーニング期間を経て最適な量子化レベルに達します。この単純で価値の高い観測は、無意味な更新を避けるために、残りのトレーニング期間でこれらの重みの勾配計算をゼロにするきっかけとなりました。このチケットを効果的に見つけるために、フル精度のチケットと量子化レベルの距離が制御可能な閾値よりも小さい場合、重量を凍結する「抽選チケットスクラッカー」(LTS)と呼ばれるヒューリスティック手法を開発した。驚いたことに、提案されたLTSは一般的に、50%-70%の重量更新と25%-35%のFLOPを後方パスから排除するが、それでも比較したベースラインと同等またはそれ以上のパフォーマンスを達成している。例えば、LTSはベースラインと比較して2ビットのMobileNetV2を5.05%改善し、重量更新の46%と後方パスの23%のFLOPを排除した。コードは url{https://github.com/zysxmu/LTS} にある。

Quantization-aware training (QAT) receives extensive popularity as it well retains the performance of quantized networks. In QAT, the contemporary experience is that all quantized weights are updated for an entire training process. In this paper, this experience is challenged based on an interesting phenomenon we observed. Specifically, a large portion of quantized weights reaches the optimal quantization level after a few training epochs, which we refer to as the partly scratch-off lottery ticket. This straightforward-yet-valuable observation naturally inspires us to zero out gradient calculations of these weights in the remaining training period to avoid meaningless updating. To effectively find the ticket, we develop a heuristic method, dubbed lottery ticket scratcher (LTS), which freezes a weight once the distance between the full-precision one and its quantization level is smaller than a controllable threshold. Surprisingly, the proposed LTS typically eliminates 50%-70% weight updating and 25%-35% FLOPs of the backward pass, while still resulting on par with or even better performance than the compared baseline. For example, compared with the baseline, LTS improves 2-bit MobileNetV2 by 5.05%, eliminating 46% weight updating and 23% FLOPs of the backward pass. Code is at url{https://github.com/zysxmu/LTS}.

翻訳日:2023-04-27 03:36:13 公開日:2023-04-25

# 量子スピン系のギャップをブートストラップする

Bootstrapping the gap in quantum spin systems ( http://arxiv.org/abs/2211.03819v2 )

ライセンス: Link先を確認

Colin Oscar Nancarrow, Yuan Xin

(参考訳) 本研究では,共形場理論(CFT)のセットアップを密接に反映した量子力学問題に対する新しいブートストラップ法について報告する。運動方程式を用いて、行列要素の共形ブロック展開のアナログを開発し、それらの値に境界を置くために交叉対称性を課す。本手法は,局所ハミルトニアンを持つ任意の量子力学系に適用可能であり,非調和振動子モデルと (1+1)-次元横場イジングモデル(TFIM)を用いて実験を行う。非調和振動子モデルについて、少数の交叉方程式がスペクトルと行列要素の正確な解を与えることを示した。 TFIM に対して、ハミルトン方程式、翻訳不変性、大域対称性選択規則は熱力学極限における TFIM のギャップと行列要素に厳密な境界を課すことを示す。境界は、交差方程式のより大きな系を考えると改善され、より有限体積の解を除外する。本手法は、ハミルトニアンから無限格子の低エネルギースペクトルを厳密かつ近似なしで探究する方法を提供する。

In this work we report on a new bootstrap method for quantum mechanical problems that closely mirrors the setup from conformal field theory (CFT). We use the equations of motion to develop an analogue of the conformal block expansion for matrix elements and impose crossing symmetry in order to place bounds on their values. The method can be applied to any quantum mechanical system with a local Hamiltonian, and we test it on an anharmonic oscillator model as well as the (1+1)-dimensional transverse field Ising model (TFIM). For the anharmonic oscillator model we show that a small number of crossing equations provides an accurate solution to the spectrum and matrix elements. For the TFIM we show that the Hamiltonian equations of motion, translational invariance and global symmetry selection rules imposes a rigorous bound on the gap and the matrix elements of TFIM in the thermodynamic limit. The bound improves as we consider larger systems of crossing equations, ruling out more finite-volume solutions. Our method provides a way to probe the low energy spectrum of an infinite lattice from the Hamiltonian rigorously and without approximation.

翻訳日:2023-04-27 03:35:33 公開日:2023-04-25

# 点雲の所望距離関係へのユークリッド空間の計量化

Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds ( http://arxiv.org/abs/2211.03674v2 )

ライセンス: Link先を確認

Stefan Rass, Sandra K\"onig, Shahzad Ahmad, Maksim Goman

(参考訳) ユークリッド空間 $\mathbb{r}^\ell$ with $\ell>1$ の点の集合が与えられると、それらの点の間の対距離は、その空間的位置と、$\mathbb{r}^\ell$ with を与える計量 $d$ によって決定される。したがって、2つの点の間の距離 $d(\mathbf x,\mathbf y)=\delta$ は、$\mathbf x$ と $\mathbf y$ と $d$ の選択によって固定される。我々は、値 $\delta$ と点 $\mathbf x,\mathbf y$ を固定する関連する問題を研究し、所望距離 $\delta$ を計算する位相計量 $d$ が存在するかどうかを問う。この問題は、最大$o(\sqrt\ell)$ の点間の所望の対距離を$\mathbb{r}^\ell$ で同時に与えるメトリックを構築して解くことができることを示した。 We then introduce the notion of an $\varepsilon$-semimetric $\tilde{d}$ to formulate our main result: for all $\varepsilon>0$, for all $m\geq 1$, for any choice of $m$ points $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$, and all chosen sets of values $\{\delta_{ij}\geq 0: 1\leq i<j\leq m\}$, there exists an $\varepsilon$-semimetric $\tilde{\delta}:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$ such that $\tilde{d}(\mathbf y_i,\mathbf y_j)=\delta_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. 本稿では,教師なし学習アルゴリズム,具体的には$k$-Means and density-based clustering algorithm(DBSCAN)に対する攻撃効果を示す。これらには人工知能における多様体的応用があり、以下に示すように、外部から提供される距離測度で実行させることで、クラスタアルゴリズムが事前に決定され、従って可鍛性を持つ結果を生成することができる。このことはクラスタリングアルゴリズムの結果が、特定の距離関数を使用するための標準化された固定された処方令がない限り、一般的には信頼できないことを示している。

Given a set of points in the Euclidean space $\mathbb{R}^\ell$ with $\ell>1$, the pairwise distances between the points are determined by their spatial location and the metric $d$ that we endow $\mathbb{R}^\ell$ with. Hence, the distance $d(\mathbf x,\mathbf y)=\delta$ between two points is fixed by the choice of $\mathbf x$ and $\mathbf y$ and $d$. We study the related problem of fixing the value $\delta$, and the points $\mathbf x,\mathbf y$, and ask if there is a topological metric $d$ that computes the desired distance $\delta$. We demonstrate this problem to be solvable by constructing a metric to simultaneously give desired pairwise distances between up to $O(\sqrt\ell)$ many points in $\mathbb{R}^\ell$. We then introduce the notion of an $\varepsilon$-semimetric $\tilde{d}$ to formulate our main result: for all $\varepsilon>0$, for all $m\geq 1$, for any choice of $m$ points $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$, and all chosen sets of values $\{\delta_{ij}\geq 0: 1\leq i<j\leq m\}$, there exists an $\varepsilon$-semimetric $\tilde{\delta}:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$ such that $\tilde{d}(\mathbf y_i,\mathbf y_j)=\delta_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. We showcase our results by using them to attack unsupervised learning algorithms, specifically $k$-Means and density-based (DBSCAN) clustering algorithms. These have manifold applications in artificial intelligence, and letting them run with externally provided distance measures constructed in the way as shown here, can make clustering algorithms produce results that are pre-determined and hence malleable. This demonstrates that the results of clustering algorithms may not generally be trustworthy, unless there is a standardized and fixed prescription to use a specific distance function.

翻訳日:2023-04-27 03:35:10 公開日:2023-04-25

# 貧乏者の品質推定:参照のない参照ベースのmtメトリクスの予測

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference ( http://arxiv.org/abs/2301.09008v3 )

ライセンス: Link先を確認

Vil\'em Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan

(参考訳) 機械翻訳品質推定(QE)は、参照を見ることなく翻訳仮説の人間の判断を予測する。事前訓練された言語モデルに基づく最先端のQEシステムは、人間の判断と顕著な相関を達成しているが、それらは計算的に重く、作成に時間がかかる人間のアノテーションを必要とする。これらの制約に対処するために、基準を使わずに自動測定値を予測する計量推定(ME)の問題を定義する。基準にアクセスしなくても、我々のモデルは自動メトリクス(BLEUは$60%、他のメトリクスは$51%)を文レベルで推定できることを示す。自動メトリクスは人間の判断と相関するため、QEモデルの事前トレーニングにMEタスクを利用することができます。 QEタスクの場合、TERの事前トレーニングは、スクラッチのトレーニング(\rho$=20%)より優れている(\rho$=23%)。

Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference. State-of-the-art QE systems based on pretrained language models have been achieving remarkable correlations with human judgements yet they are computationally heavy and require human annotations, which are slow and expensive to create. To address these limitations, we define the problem of metric estimation (ME) where one predicts the automated metric scores also without the reference. We show that even without access to the reference, our model can estimate automated metrics ($\rho$=60% for BLEU, $\rho$=51% for other metrics) at the sentence-level. Because automated metrics correlate with human judgements, we can leverage the ME task for pre-training a QE model. For the QE task, we find that pre-training on TER is better ($\rho$=23%) than training for scratch ($\rho$=20%).

翻訳日:2023-04-27 03:28:03 公開日:2023-04-25

# 量子コンピュータへのブートストラップ埋め込み

Bootstrap Embedding on a Quantum Computer ( http://arxiv.org/abs/2301.01457v2 )

ライセンス: Link先を確認

Yuan Liu, Oinam R. Meitei, Zachary E. Chin, Arkopal Dutt, Max Tao, Isaac L. Chuang, Troy Van Voorhis

(参考訳) 量子コンピュータの実装に適するように分子ブートストラップ埋め込みを拡張した。これにより、全体システムのフラグメントを管理する複合ラグランジアンに対する最適化問題として、大きな分子の電子構造問題の解法が実現され、フラグメント解は量子コンピュータの能力を利用することができる。量子SWAPテストや量子振幅増幅を含む最先端の量子サブルーチンを用いることで、古典的アルゴリズムよりも2次的なスピードアップが原理的に得られることを示す。量子計算の活用により、アルゴリズムは1-rdmに制限されるのではなく、フラグメント境界の完全な密度行列と -- 計算コストを少しでも増やすことができる。現在の量子コンピュータは小さいが、量子ブートストラップの埋め込みは量子フラグメントマッチングを通じてそのような小さなマシンを利用するための潜在的に一般化可能な戦略を提供する。

We extend molecular bootstrap embedding to make it appropriate for implementation on a quantum computer. This enables solution of the electronic structure problem of a large molecule as an optimization problem for a composite Lagrangian governing fragments of the total system, in such a way that fragment solutions can harness the capabilities of quantum computers. By employing state-of-art quantum subroutines including the quantum SWAP test and quantum amplitude amplification, we show how a quadratic speedup can be obtained over the classical algorithm, in principle. Utilization of quantum computation also allows the algorithm to match -- at little additional computational cost -- full density matrices at fragment boundaries, instead of being limited to 1-RDMs. Current quantum computers are small, but quantum bootstrap embedding provides a potentially generalizable strategy for harnessing such small machines through quantum fragment matching.

翻訳日:2023-04-27 03:27:37 公開日:2023-04-25

# アンダーサンプルデータからの非視線イメージングのための曲率正規化

Curvature regularization for Non-line-of-sight Imaging from Under-sampled Data ( http://arxiv.org/abs/2301.00406v2 )

ライセンス: Link先を確認

Rui Ding, Juntian Ye, Qifeng Gao, Feihu Xu, Yuping Duan

(参考訳) 非視線画像(NLOS)は、複数の回折反射の後に光で符号化された光子時間情報を用いて、視線で測定されたデータから3次元の隠れたシーンを再構築することを目的としている。サンプリング済みの走査データは、高速な撮像を容易にすることができる。しかし, 結果として生じる復元問題は, ノイズや歪みにより劣化する可能性が高く, 深刻な逆問題となる。本稿では,曲率正規化に基づく2つの新しいnlos再構成モデル,すなわち,オブジェクト領域曲率正規化モデルと,デュアル(信号およびオブジェクト)領域曲率正規化モデルを提案する。 gpu実装によりさらに加速されるバックトラックステップ化規則(backtracking stepsize rule)を伴う乗算器の交互方向法(admm)に基づいて高速数値最適化アルゴリズムを開発した。提案したアルゴリズムは, 合成データセットと実データセットの両方で評価し, 特に圧縮センシング環境で, 最先端性能を実現する。私たちのコードとデータは、https://github.com/Duanlab123/CurvNLOSで利用可能です。

Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.

翻訳日:2023-04-27 03:27:21 公開日:2023-04-25

# housecat6d -- 現実的なシナリオで家庭用オブジェクトを使った大規模マルチモーダルカテゴリレベル6dオブジェクトポーズデータセット

HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios ( http://arxiv.org/abs/2212.10428v3 )

ライセンス: Link先を確認

HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp, Hannah Schieber, Pengyuan Wang, Giulia Rizzoli, Hongcheng Zhao, Sven Damian Meier, Daniel Roth, Nassir Navab, Benjamin Busam

(参考訳) オブジェクトの6Dポーズを推定することは、主要な3Dコンピュータビジョン問題である。インスタンスレベルのアプローチによる有望な結果から、研究責任者はより実用的なアプリケーションシナリオのためのカテゴリレベルのポーズ推定にも取り組んでいる。しかし、よく確立されたインスタンスレベルのポーズデータセットとは異なり、利用可能なカテゴリレベルのデータセットはアノテーションの品質やポーズ量に欠ける。新しいカテゴリーレベルの6DポーズデータセットHouseCat6Dを提案する。 1)ポラリメトリックRGBと深さ(RGBD+P)の多モード性 2)2つのフォトメトリックに挑戦するカテゴリを含む10の家庭用オブジェクトカテゴリの高度に多様な194のオブジェクト。 3) エラー範囲がわずか1.35mmから1.74mmの高品質ポーズアノテーション 4)広い視点と隠蔽を有する41の大規模シーン。 5)全シーンにおけるチェッカーボードのない環境 6) 同時に高密度6Dパラレルジャウグリップを付加した。さらに,最先端カテゴリレベルのポーズ推定ネットワークのベンチマーク結果も提供する。

Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks.

翻訳日:2023-04-27 03:27:01 公開日:2023-04-25

# InferEM:共感的対話生成のための話者意図の推測

InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation ( http://arxiv.org/abs/2212.06373v6 )

ライセンス: Link先を確認

Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng

(参考訳) 共感応答生成に対する現在のアプローチは、一般的に対話履歴全体をエンコードし、出力をデコーダに入れてフレンドリーなフィードバックを生成する。これらの手法は文脈情報のモデル化に焦点をあてるが、話者の直接の意図を捉えることは無視する。我々は,対話の最後の発声が話者の意図を実証的に伝えることを主張する。そこで本研究では,共感応答生成のための新しいモデルInferEMを提案する。我々は,最後の発話を別々に符号化し,多面的注意に基づく意図融合モジュールを通して対話全体と融合し,話者の意図を捉える。さらに,先行した発話を用いて最後の発話を予測し,人間の心理をシミュレートし,対話者が事前に何を話すのかを推測する。発話予測と応答生成の最適化率のバランスをとるために,InferEMのためのマルチタスク学習戦略を設計する。実験の結果,inferemの共感性発現改善における可能性と妥当性が示された。

Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.

翻訳日:2023-04-27 03:26:30 公開日:2023-04-25

# HACA3:マルチサイトMR画像調和のための統一的アプローチ

HACA3: A Unified Approach for Multi-site MR Image Harmonization ( http://arxiv.org/abs/2212.06065v2 )

ライセンス: Link先を確認

Lianrui Zuo, Yihao Liu, Yuan Xue, Blake E. Dewey, Samuel W. Remedios, Savannah P. Hays, Murat Bilgel, Ellen M. Mowry, Scott D. Newsome, Peter A. Calabresi, Susan M. Resnick, Jerry L. Prince, Aaron Carass

(参考訳) 標準化の欠如は磁気共鳴(MR)イメージングにおいて顕著な問題である。これはしばしば、ハードウェアと取得パラメータの違いのため、取得した画像に望ましくないコントラスト変動を引き起こす。近年,非所望のコントラスト変動を補うため,画像合成に基づくMRハーモニゼーションが提案されている。既存の方法の成功にもかかわらず、私たちは3つの大きな改善ができると主張している。第一に、既存のほとんどの手法は、同一対象のマルチコントラストMR画像が同じ解剖学を共有するという仮定に基づいて構築されている。この仮定は、異なるMRコントラストが異なる解剖学的特徴の強調に特化しているため、疑わしい。第二に、これらの方法は訓練のために固定されたMRコントラスト(例えば、T1強調画像とT2強調画像の両方)を必要とし、適用性を制限する。最後に、既存の手法は一般的にイメージングアーティファクトに敏感である。本稿では,これらの3つの問題に対処するための新しいアプローチである,注意に基づくコントラスト,解剖,アーティファクト認識(HACA3)を提案する。 HACA3は、MRコントラスト間の固有の解剖学的差異を説明する解剖学的融合モジュールを組み込んでいる。さらに、HACA3はイメージングアーティファクトにも堅牢であり、MRコントラストの任意のセットにトレーニングおよび適用することができる。 HACA3は、フィールド強度、スキャナープラットフォーム、取得プロトコルの異なる21のサイトから取得した多様なMRデータセット上で開発・評価されている。実験により、HACA3は複数の画像品質指標の下で最先端のパフォーマンスを達成することが示された。また,白質病変の分節化や縦断的体積分析を含む下流課題に対するhaca3の適用性と汎用性を示す。

The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations in the acquired images due to differences in hardware and acquisition parameters. In recent years, image synthesis-based MR harmonization with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable, since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both T1-weighted and T2-weighted images), limiting their applicability. Lastly, existing methods are generally sensitive to imaging artifacts. In this paper, we present Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), a novel approach to address these three issues. HACA3 incorporates an anatomy fusion module that accounts for the inherent anatomical differences between MR contrasts. Furthermore, HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. HACA3 is developed and evaluated on diverse MR datasets acquired from 21 sites with varying field strengths, scanner platforms, and acquisition protocols. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability and versatility of HACA3 on downstream tasks including white matter lesion segmentation and longitudinal volumetric analyses.

翻訳日:2023-04-27 03:26:14 公開日:2023-04-25

# 量子カオスと時間の矢印

Quantum chaos and the arrow of time ( http://arxiv.org/abs/2212.03914v4 )

ライセンス: Link先を確認

Nilakash Sorokhaibam

(参考訳) 私たちの周りの世界は明らかに時間の矢を持っている。古典的熱力学は、美しい統計学的解釈を持つ熱力学の第2法則の形で時間的矢印を与える。しかし、時空の矢印の量子的起源の明確な写真は今のところ不足している。ここでは、量子カオス系において時間矢印が生じることを示す。カオス的でもある孤立量子系の場合、エントロピーの変化は、系が全般的に摂動しているときに非負であることを示す。物理系は一般に高度に相互作用し、カオスシステムの良い例である。我々は,システムの摂動時のエネルギー変化を追跡することで,この結果を示す。非常に微調整された摂動を用いて、エントロピーを下げることができる。しかし、摂動を微調整するには、システムの高精度なエネルギー準位を測定する必要がある。これは古典的熱力学におけるマクスウェルのデーモン問題とそのその後の解法を想起させる。

The world around us distinctly possesses an arrow of time. Classical thermodynamics provides an arrow of time in the form of the second law of thermodynamics which also has a beautiful statistical interpretation. But a clear picture of the quantum origin of the arrow of time has been lacking so far. Here we show that an arrow of time arises in quantum chaotic systems. We show that, for an isolated quantum system which is also chaotic, the change in entropy is non-negative when the system is generically perturbed. Physical systems are, in general, highly interacting and are good examples of chaotic systems. We show our result by keeping track of the change in energy when the system is perturbed. Using an extremely fine-tuned perturbation, we can still lower the entropy. But fine-tuning the perturbation requires measurement of highly precise energy levels of the system. This is reminiscent of the Maxwell's demon problem in classical thermodynamics and its subsequent resolution.

翻訳日:2023-04-27 03:25:49 公開日:2023-04-25

# 長距離2次リンドブラディアンにおける絡み合いと局在

Entanglement and localization in long-range quadratic Lindbladians ( http://arxiv.org/abs/2303.07070v2 )

ライセンス: Link先を確認

Alejandro Cros Carrillo de Albornoz, Dominic C. Rose and Arijeet Pal

(参考訳) アンダーソン局在の存在は、無秩序系における古典波と量子波のコヒーレンスを示すものと考えられている。環境への結合が著しく抑制されるが排除されない凝縮物や低温原子系では、局在のシグネチャが観察されている。本研究では,開量子系を記述するランダム・リンドブラッド力学における局在現象を考察する。浴槽の局所的なアンサンブルに結合した非相互作用性スピンレスフェルミオンの1次元連鎖モデルを提案する。各サイトにリンクされた浴槽との相互作用を媒介するジャンプ演算子は、指数$p$のパワーローテールを有する。系の定常状態は,コヒーレントホッピングの有無で安定な$p$をチューニングすることにより,局所的絡み合い相転移が進行することを示す。開系の量子軌道における絡み合い遷移とは異なり、この遷移はリンドブレディアンの平均定常状態密度行列によって表される。局所化相の定常状態は、局所的な人口不均衡の不均一性によって特徴づけられる一方、ジャンプ演算子は影響する部位の一定の参加率を示す。我々の研究は、オープン量子システムにおける局在物理学の新たな実現を提供する。

Existence of Anderson localization is considered a manifestation of coherence of classical and quantum waves in disordered systems. Signatures of localization have been observed in condensed matter and cold atomic systems where the coupling to the environment can be significantly suppressed but not eliminated. In this work we explore the phenomena of localization in random Lindbladian dynamics describing open quantum systems. We propose a model of one-dimensional chain of non-interacting, spinless fermions coupled to a local ensemble of baths. The jump operator mediating the interaction with the bath linked to each site has a power-law tail with an exponent $p$. We show that the steady state of the system undergoes a localization entanglement phase transition by tuning $p$ which remains stable in the presence of coherent hopping. Unlike the entanglement transition in the quantum trajectories of open systems, this transition is exhibited by the averaged steady state density matrix of the Lindbladian. The steady state in the localized phase is characterised by a heterogeneity in local population imbalance, while the jump operators exhibit a constant participation ratio of the sites they affect. Our work provides a novel realisation of localization physics in open quantum systems.

翻訳日:2023-04-27 03:18:51 公開日:2023-04-25

# システム環境量子モデルの完全ダイナミクスに対する複素離散化近似

Complex Discretization approximation for the full dynamics of system-environment quantum models ( http://arxiv.org/abs/2303.06584v3 )

ライセンス: Link先を確認

H. T. Cui, Y. A. Yan, M. Qin, and X. X. Yi

(参考訳) 連続体における環境のオープンダイナミクスをシミュレートするために用いられる離散化近似法は、しばしば再発に悩まされ、効率が低下する。本稿では,複素ガウス二次数を用いた複素平面における離散化近似法の一般化を提案する。結果として得られる効果的ハミルトニアンは、系の散逸ダイナミクスのために非エルミート的である。提案手法は2つの可解モデル,すなわち,一般化 Aubry-Andr\'{e}-Harper モデルにおけるdephasing モデルと単一励起開力学に適用される。その結果, 複雑な離散モードの発生が再現性を著しく低減し, 両モデルにおける開放力学の効率的かつ正確なシミュレーションを可能にした。

The discretization approximation method used to simulate the open dynamics of the environment in continuum often suffers from recurrence, which results in inefficiency. To address this issue, this paper proposes a generalization of the discretization approximation method in the complex plane using complex Gauss quadratures. The resulting effective Hamiltonian is non-Hermitian due to the dissipative dynamics of the system. The proposed method is applied to two solvable models, namely the dephasing model and the single-excitation open dynamics in the generalized Aubry-Andr\'{e}-Harper model. The results demonstrate that the occurrence of complex discrete modes in the environment can significantly reduce recurrence, thereby enabling the efficient and accurate simulation of open dynamics in both models.

翻訳日:2023-04-27 03:18:33 公開日:2023-04-25

# 高温微細化と背景抑制によるきめ細かい視覚分類

Fine-grained Visual Classification with High-temperature Refinement and Background Suppression ( http://arxiv.org/abs/2303.06442v2 )

ライセンス: Link先を確認

Po-Yung Chou, Yu-Yung Kao, Cheng-Hung Lin

(参考訳) 細粒度の視覚的分類は、カテゴリ間の高い類似性と1つのカテゴリ内のデータ間の相違により難しい課題である。これらの課題に対処するため、従来の戦略では、カテゴリ間の微妙な相違点のローカライズと、それらにおける差別的特徴の集中に重点を置いてきた。しかし、背景には、分類に不必要であるか、あるいは有害であるかをモデルに伝える重要な情報もあり、微妙な特徴に強く依存するモデルは、グローバルな特徴や文脈的な情報を見落としてしまう可能性がある。本稿では,2つのモジュール,すなわち高温リファインメントモジュールと背景抑圧モジュールから構成される「高温リファインメント」と「背景抑圧」という,識別特性の抽出と背景雑音の抑制を行う新しいネットワークを提案する。高温改良モジュールは、異なるスケールで特徴マップを精製し、多様な特徴の学習を改善することにより、適切な特徴スケールを学習することを可能にする。そして、背景抑圧モジュールは、まず、分類信頼度スコアを用いて、特徴マップを前景と背景に分割し、識別的特徴を高めながら、低信頼領域の特徴値を抑制する。 CUB-200-2011 と NABirds のベンチマークにおいて, HERBS は様々なスケールの特徴を効果的に融合し, 背景雑音, 識別的特徴を微粒化のための適切なスケールで抑制し, CUB-200-2011 と NABirds のベンチマークにおける最先端性能を 93% を超える精度で達成した。このように、HERBSは、きめ細かい視覚分類タスクの性能を向上させるための有望なソリューションを提供する。コード:https://github.com/chou141253/FGVC-HERBS

Fine-grained visual classification is a challenging task due to the high similarity between categories and distinct differences among data within one single category. To address the challenges, previous strategies have focused on localizing subtle discrepancies between categories and enhencing the discriminative features in them. However, the background also provides important information that can tell the model which features are unnecessary or even harmful for classification, and models that rely too heavily on subtle features may overlook global features and contextual information. In this paper, we propose a novel network called ``High-temperaturE Refinement and Background Suppression'' (HERBS), which consists of two modules, namely, the high-temperature refinement module and the background suppression module, for extracting discriminative features and suppressing background noise, respectively. The high-temperature refinement module allows the model to learn the appropriate feature scales by refining the features map at different scales and improving the learning of diverse features. And, the background suppression module first splits the features map into foreground and background using classification confidence scores and suppresses feature values in low-confidence areas while enhancing discriminative features. The experimental results show that the proposed HERBS effectively fuses features of varying scales, suppresses background noise, discriminative features at appropriate scales for fine-grained visual classification.The proposed method achieves state-of-the-art performance on the CUB-200-2011 and NABirds benchmarks, surpassing 93% accuracy on both datasets. Thus, HERBS presents a promising solution for improving the performance of fine-grained visual classification tasks. code: https://github.com/chou141253/FGVC-HERBS

翻訳日:2023-04-27 03:18:19 公開日:2023-04-25

# シグモノイドネットワークのための複合最適化アルゴリズム

Composite Optimization Algorithms for Sigmoid Networks ( http://arxiv.org/abs/2303.00589v2 )

ライセンス: Link先を確認

Huixiong Chen, Qi Ye

(参考訳) 本稿では,合成最適化アルゴリズムを用いてシグモイドネットワークを解く。我々は,sgmoidネットワークを凸合成最適化に等価に転送し,線形近位アルゴリズムと乗算器の交互方向法に基づく合成最適化アルゴリズムを提案する。弱鋭極小と正則性条件の仮定の下では、非凸問題や非滑らか問題の場合であっても、アルゴリズムは対象関数のグローバル最適解に収束することが保証される。さらに、収束結果をトレーニングデータの量に直接関連付けることができ、シグモノイドネットワークのサイズを設定するための一般的なガイドを提供する。フランクの関数フィッティングと手書き数字認識に関する数値実験により,提案アルゴリズムは良好かつ堅牢に機能することを示した。

In this paper, we use composite optimization algorithms to solve sigmoid networks. We equivalently transfer the sigmoid networks to a convex composite optimization and propose the composite optimization algorithms based on the linearized proximal algorithms and the alternating direction method of multipliers. Under the assumptions of the weak sharp minima and the regularity condition, the algorithm is guaranteed to converge to a globally optimal solution of the objective function even in the case of non-convex and non-smooth problems. Furthermore, the convergence results can be directly related to the amount of training data and provide a general guide for setting the size of sigmoid networks. Numerical experiments on Franke's function fitting and handwritten digit recognition show that the proposed algorithms perform satisfactorily and robustly.

翻訳日:2023-04-27 03:17:33 公開日:2023-04-25

# 創発的因果性と意識の基礎

Emergent Causality & the Foundation of Consciousness ( http://arxiv.org/abs/2302.03189v3 )

ライセンス: Link先を確認

Michael Timothy Bennett

(参考訳) 対話的な環境で正確な推論を行うためには、エージェントは、イベントの受動的観察と、それらを引き起こすための介入を混同してはならない。 do$オペレータは介入を形式化し、その効果を判断します。しかし、対話的な環境では、介入の明示的な表現を前提とせず、最大精度の推論を行う汎用知能の最適数学的形式主義が存在する。我々はそのような形式主義を一つ検討する。我々は$do$演算子がない場合、介入は変数で表現できることを示した。変数は抽象化であり、事前に介入を明示的に表現する必要があるのは、そのような抽象化を前提にしているからである。上記の形式主義は、これを避けるため、初期条件は、誘導を通じて関連する因果的介入の表現が現れる。これらの創発的抽象化は、自己と他のオブジェクトの表現として機能し、それらのオブジェクトの介入が目標の満足度に影響を与えると判断される。このことは、他人のアイデンティティや意図、他人のアイデンティティや意図を、他人が認識するものとして、どのように考えるかを説明するものだ、と我々は主張する。狭義では、それは何を知るべきかを記述し、意識の側面の機械的な説明である。

To make accurate inferences in an interactive setting, an agent must not confuse passive observation of events with having intervened to cause them. The $do$ operator formalises interventions so that we may reason about their effect. Yet there exist pareto optimal mathematical formalisms of general intelligence in an interactive setting which, presupposing no explicit representation of intervention, make maximally accurate inferences. We examine one such formalism. We show that in the absence of a $do$ operator, an intervention can be represented by a variable. We then argue that variables are abstractions, and that need to explicitly represent interventions in advance arises only because we presuppose these sorts of abstractions. The aforementioned formalism avoids this and so, initial conditions permitting, representations of relevant causal interventions will emerge through induction. These emergent abstractions function as representations of one`s self and of any other object, inasmuch as the interventions of those objects impact the satisfaction of goals. We argue that this explains how one might reason about one`s own identity and intent, those of others, of one`s own as perceived by others and so on. In a narrow sense this describes what it is to be aware, and is a mechanistic explanation of aspects of consciousness.

翻訳日:2023-04-27 03:16:45 公開日:2023-04-25

# トポロジカル非エルミート皮膚効果

Topological Non-Hermitian skin effect ( http://arxiv.org/abs/2302.03057v2 )

ライセンス: Link先を確認

Rijia Lin, Tommy Tai, Linhu Li, Ching Hua Lee

(参考訳) 本稿では,非エルミート皮膚効果(NHSE)の最近の進展,特にトポロジーとの豊かな相互作用について概説する。レビューは、修正されたバルク境界対応、より高次元のnhseとバンドトポロジーの相乗的およびハイブリダイゼーション、スペクトル巻線トポロジーやスペクトルグラフトポロジーのような複素エネルギー平面上の関連するトポロジーに関する教育的紹介から始まります。その後、非エルミート臨界性、動的NHSE現象、および従来の線形非相互作用結晶格子、特に量子多体相互作用との相互作用を超えたNHSEの顕在化など、新たなトピックが導入される。最後に、NHSEの最近の実演と実験的提案について調査する。

This article reviews recent developments in the non-Hermitian skin effect (NHSE), particularly on its rich interplay with topology. The review starts off with a pedagogical introduction on the modified bulk-boundary correspondence, the synergy and hybridization of NHSE and band topology in higher dimensions, as well as, the associated topology on the complex energy plane such as spectral winding topology and spectral graph topology. Following which, emerging topics are introduced such as non-Hermitian criticality, dynamical NHSE phenomena, and the manifestation of NHSE beyond the traditional linear non-interacting crystal lattices, particularly its interplay with quantum many-body interactions. Finally, we survey the recent demonstrations and experimental proposals of NHSE.

翻訳日:2023-04-27 03:16:25 公開日:2023-04-25

# リスナー2Scene:インタラクティブな素材を意識したバイノーラルサウンドプロパゲーション

Listen2Scene: Interactive material-aware binaural soundbpropagation for reconstructed 3D scenes ( http://arxiv.org/abs/2302.02809v2 )

ライセンス: Link先を確認

Anton Ratnarajah, Dinesh Manocha

(参考訳) 本稿では、仮想現実(vr)および拡張現実(ar)アプリケーションのためのエンドツーエンドバイノーラルオーディオレンダリングアプローチ(listen2scene)を提案する。実環境の3次元モデルに対する音響効果を生成するニューラルネットを用いたバイノーラル音響伝搬法を提案する。クリーンオーディオやドライオーディオは、生成された音響効果と畳み込み、実際の環境に対応するオーディオをレンダリングすることができる。本稿では,3次元シーンの材料情報とトポロジー情報の両方を用いて,シーン潜在ベクトルを生成するグラフニューラルネットワークを提案する。さらに,現場潜伏ベクトルから音響効果を生成するために,条件付き生成対向ネットワーク(CGAN)を用いる。我々のネットワークは、再構成された3Dメッシュモデルでホールや他のアーティファクトを処理できる。空間音響効果を組み込むために,ジェネレータネットワークに効率的なコスト関数を提案する。ソースとリスナーの位置を考えると、学習に基づくバイノーラル音伝搬アプローチは、nvidia geforce rtx 2080 ti gpu上で0.1ミリ秒で音響効果を生成し、複数のソースを容易に処理できる。本研究では,インタラクティブな幾何音響伝搬アルゴリズムを用いて,バイノーラル音響効果を用いたアプローチの精度を評価し,実際の音響効果を捉えた。また, 従来の学習に基づく音声伝搬アルゴリズムを用いた音声に比べて, 提案手法により得られた音声が, より妥当であることが確認された。

We present an end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to the real environment. We propose a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector. Moreover, we use a conditional generative adversarial network (CGAN) to generate acoustic effects from the scene latent vector. Our network is able to handle holes or other artifacts in the reconstructed 3D mesh model. We present an efficient cost function to the generator network to incorporate spatial audio effects. Given the source and the listener position, our learning-based binaural sound propagation approach can generate an acoustic effect in 0.1 milliseconds on an NVIDIA GeForce RTX 2080 Ti GPU and can easily handle multiple sources. We have evaluated the accuracy of our approach with binaural acoustic effects generated using an interactive geometric sound propagation algorithm and captured real acoustic effects. We also performed a perceptual evaluation and observed that the audio rendered by our approach is more plausible as compared to audio rendered using prior learning-based sound propagation algorithms.

翻訳日:2023-04-27 03:16:11 公開日:2023-04-25

# RobCaps: アフィン変換と敵攻撃に対するカプセルネットワークのロバスト性の評価

RobCaps: Evaluating the Robustness of Capsule Networks against Affine Transformations and Adversarial Attacks ( http://arxiv.org/abs/2304.03973v2 )

ライセンス: Link先を確認

Alberto Marchisio and Antonio De Marco and Alessio Colucci and Maurizio Martina and Muhammad Shafique

(参考訳) Capsule Networks(CapsNets)は、画像分類タスクのための複数のオブジェクト間のポーズ関係を階層的に保存することができる。安全性クリティカルなアプリケーションにCapsNetをデプロイする際のもうひとつの重要な要因は、入力変換や悪意のある敵攻撃に対する堅牢性である。本稿では,従来の畳み込みニューラルネットワーク(cnns)と比較して,capsnetのロバスト性に影響する要因を体系的に分析し,評価する。包括的な比較のために、MNIST, GTSRB, CIFAR10データセットの2つのCapsNetモデルと2つのCNNモデル、およびこれらのデータセットのアフィン変換バージョンをテストする。詳細な分析により,これらのアーキテクチャの特性がロバスト性の向上と制約に寄与することを示す。全体として、CapsNetsは、同じ数のパラメータを持つ従来のCNNと比較して、敵の例やアフィン変換に対する堅牢性を改善する。同様の結論はCapsNetsとCNNのより深いバージョンに導出されている。さらに,この結果から,動的ルーティングがcapsnetsの堅牢性向上に大きく寄与しないことが判明した。実際、主な一般化の貢献はカプセルによる階層的特徴学習によるものである。

Capsule Networks (CapsNets) are able to hierarchically preserve the pose relationships between multiple objects for image classification tasks. Other than achieving high accuracy, another relevant factor in deploying CapsNets in safety-critical applications is the robustness against input transformations and malicious adversarial attacks. In this paper, we systematically analyze and evaluate different factors affecting the robustness of CapsNets, compared to traditional Convolutional Neural Networks (CNNs). Towards a comprehensive comparison, we test two CapsNet models and two CNN models on the MNIST, GTSRB, and CIFAR10 datasets, as well as on the affine-transformed versions of such datasets. With a thorough analysis, we show which properties of these architectures better contribute to increasing the robustness and their limitations. Overall, CapsNets achieve better robustness against adversarial examples and affine transformations, compared to a traditional CNN with a similar number of parameters. Similar conclusions have been derived for deeper versions of CapsNets and CNNs. Moreover, our results unleash a key finding that the dynamic routing does not contribute much to improving the CapsNets' robustness. Indeed, the main generalization contribution is due to the hierarchical feature learning through capsules.

翻訳日:2023-04-27 03:10:25 公開日:2023-04-25

# ロバスト音声翻訳のための選択的データ拡張

Selective Data Augmentation for Robust Speech Translation ( http://arxiv.org/abs/2304.03169v2 )

ライセンス: Link先を確認

Rajul Acharya, Ashish Panda, Sunil Kumar Kopparapu

(参考訳) 音声翻訳(st)システムは、ある言語でスピーチを他の言語でテキストに変換する。終端STシステム(e2e-ST)は、待ち時間と計算コストの削減により性能が向上したため、カスケードシステムで人気を博している。資源集約的なe2e-stシステムは、カスケードシステムとは異なり、パラ言語的および非言語的特徴を保持できる固有の能力を持っている。本稿では,英語ヒンディー語(en-hi)STにおけるe2eアーキテクチャを提案する。2つの不完全な機械翻訳(MT)サービスを用いて,Libri-transのテキストをハイテキストに変換する。本稿では,各サービスが並列STデータを生成するためにMTデータを個別に提供しながら,頑健なSTを支援するため,ノイズの多いMTデータのデータ拡張戦略を提案する。その結果, MTデータの鈍力増強よりもST(BLEUスコア)がよいことがわかった。我々はアプローチで1.59 bleuスコアの絶対的な改善を観察した。

Speech translation (ST) systems translate speech in one language to text in another language. End-to-end ST systems (e2e-ST) have gained popularity over cascade systems because of their enhanced performance due to reduced latency and computational cost. Though resource intensive, e2e-ST systems have the inherent ability to retain para and non-linguistic characteristics of the speech unlike cascade systems. In this paper, we propose to use an e2e architecture for English-Hindi (en-hi) ST. We use two imperfect machine translation (MT) services to translate Libri-trans en text into hi text. While each service gives MT data individually to generate parallel ST data, we propose a data augmentation strategy of noisy MT data to aid robust ST. The main contribution of this paper is the proposal of a data augmentation strategy. We show that this results in better ST (BLEU score) compared to brute force augmentation of MT data. We observed an absolute improvement of 1.59 BLEU score with our approach.

翻訳日:2023-04-27 03:10:03 公開日:2023-04-25

# 法律文書ページの文脈対応分類

Context-Aware Classification of Legal Document Pages ( http://arxiv.org/abs/2304.02787v2 )

ライセンス: Link先を確認

Pavlos Fragkogiannis, Martina Forster, Grace E. Lee, Dell Zhang

(参考訳) 法律文書(PDFフォーマットなど)などの専門文書の処理、索引付け、検索を必要とする多くのビジネスアプリケーションにとって、任意の文書のページを、事前に対応するタイプに分類することが不可欠である。文書画像分類の分野における既存の研究のほとんどは、単ページ文書にフォーカスするか、文書内の複数のページを独立して扱うかのどちらかである。近年,文書ページ分類の強化のために隣接するページの文脈情報を活用する手法が提案されているが,入力長の制約により,大規模な事前学習言語モデルでは利用できないことが多い。本稿では,上記の限界を克服する単純かつ効果的なアプローチを提案する。具体的には、bertのような事前学習されたトランスフォーマーモデルをコンテキスト認識ページ分類に使用できる、以前のページに関するシーケンシャルな情報を含む追加のトークンで入力を強化する。英語とポルトガル語の2つの法定データセットを用いた実験により,提案手法は,非帰納的設定と他の文脈対応ベースラインと比較して,文書ページ分類の性能を著しく向上することが示された。

For many business applications that require the processing, indexing, and retrieval of professional documents such as legal briefs (in PDF format etc.), it is often essential to classify the pages of any given document into their corresponding types beforehand. Most existing studies in the field of document image classification either focus on single-page documents or treat multiple pages in a document independently. Although in recent years a few techniques have been proposed to exploit the context information from neighboring pages to enhance document page classification, they typically cannot be utilized with large pre-trained language models due to the constraint on input length. In this paper, we present a simple but effective approach that overcomes the above limitation. Specifically, we enhance the input with extra tokens carrying sequential information about previous pages - introducing recurrence - which enables the usage of pre-trained Transformer models like BERT for context-aware page classification. Our experiments conducted on two legal datasets in English and Portuguese respectively show that the proposed approach can significantly improve the performance of document page classification compared to the non-recurrent setup as well as the other context-aware baselines.

翻訳日:2023-04-27 03:09:48 公開日:2023-04-25

# 任意条件(3レベル)状態の直観的可視化法

An Intuitive Visualisation Method for Arbitrary Qutrit (Three Level) States ( http://arxiv.org/abs/2304.01741v2 )

ライセンス: Link先を確認

Max Z. Festenstein

(参考訳) 視覚的な手法は、あらゆるレベルの理解において量子力学の理解と解釈に非常に有用である。例えば、ブロッホ球面は、2レベル量子ビット系の量子力学を視覚化するための重要かつ広く用いられるツールである。本研究では,3次元状態を完全に記述するために必要な8自由度をすべて含みながら,直感的に解釈できるような,ブロッホ球に類似したクイットの「オクタント」可視化手法を提案する。このフレームワークを使用して、典型的な3段階のプロセスのセットをモデル化し、記述し、表示する。

Visual methods are of great utility in understanding and interpreting quantum mechanics at all levels of understanding. The Bloch sphere, for example, is an invaluable and widely used tool for visualising quantum dynamics of a two level qubit system. In this work we present an `octant' visualisation method for qutrits bearing similarity to the Bloch sphere, that encompasses all eight degrees of freedom necessary to fully describe a three level state whilst remaining intuitive to interpret. Using this framework, a set of typical three level processes are modelled, described and displayed.

翻訳日:2023-04-27 03:09:31 公開日:2023-04-25

# 教師のいないプライバシ保全連系蒸留における選択的知識共有

Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher ( http://arxiv.org/abs/2304.01731v3 )

ライセンス: Link先を確認

Jiawei Shao, Fangzhao Wu, Jun Zhang

(参考訳) フェデレーション学習は、ローカルデータを公開せずに、プライバシー保護による協調学習を約束する一方で、ホワイトボックス攻撃に弱いままであり、異種クライアントへの適応に苦慮している。 fd(federated distillation)は、教師モデルから生徒モデルへ知識を移す効果的な技術であり、プライバシー保証を強化し、モデルの不均一性に対処するためのパラダイムである。それでも、ローカルなデータ分布の変化と、よく訓練された教師モデルの欠如によって生じる課題は、モデル性能を著しく低下させる誤解を招きあい、曖昧な知識共有につながる。この問題に対処するため,本稿では,fdのための選択的知識共有機構を提案する。クライアント側セレクタとサーバ側セレクタを含み、それぞれローカルとアンサンブルの予測から知識を正確かつ正確に識別する。理論的洞察に裏付けられた実証研究は、このアプローチがfdフレームワークの一般化能力を高め、ベースラインメソッドを一貫して上回っていることを証明している。

While federated learning is promising for privacy-preserving collaborative learning without revealing local data, it remains vulnerable to white-box attacks and struggles to adapt to heterogeneous clients. Federated distillation (FD), built upon knowledge distillation--an effective technique for transferring knowledge from a teacher model to student models--emerges as an alternative paradigm, which provides enhanced privacy guarantees and addresses model heterogeneity. Nevertheless, challenges arise due to variations in local data distributions and the absence of a well-trained teacher model, which leads to misleading and ambiguous knowledge sharing that significantly degrades model performance. To address these issues, this paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD. It includes client-side selectors and a server-side selector to accurately and precisely identify knowledge from local and ensemble predictions, respectively. Empirical studies, backed by theoretical insights, demonstrate that our approach enhances the generalization capabilities of the FD framework and consistently outperforms baseline methods.

翻訳日:2023-04-27 03:09:20 公開日:2023-04-25

# RPTQ:大規模言語モデルのためのリオーダーベースポストトレーニング量子化

RPTQ: Reorder-based Post-training Quantization for Large Language Models ( http://arxiv.org/abs/2304.01089v3 )

ライセンス: Link先を確認

Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

(参考訳) 大規模言語モデル(llm)は様々なタスクにおいて優れた性能を示しているが、そのデプロイは、その巨大なモデルサイズのために困難をもたらす。本稿では,LCMの量子化における主な課題は,外乱の問題だけでなく,チャネル間のアクティベーション範囲の違いによるものであることを確認し,LCMのアクティベーションの定量化の問題に対処する,新しいリオーダーベースの量子化手法であるRTPQを提案する。 RPTQはアクティベーション中のチャネルを並べ替え、クラスタ内でそれらを定量化することで、チャネルの範囲差の影響を低減する。さらに,明示的な順序変更を回避し,ストレージと計算オーバーヘッドを削減する。このアプローチを実装することで,LLMモデルを3ビットアクティベーションに初めてプッシュすることで,大きなブレークスルーを達成した。

Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

翻訳日:2023-04-27 03:09:01 公開日:2023-04-25

# $n$th-order Schr\"{o}dinger方程式における境界状態の量子化条件

Quantization Condition of the Bound States in $n$th-order Schr\"{o}dinger equations ( http://arxiv.org/abs/2304.00914v2 )

ライセンス: Link先を確認

Xiong Fan

(参考訳) 一般近似量子化規則 $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential Well of the equations $e^{-i\pi n/2}\nabla ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, $L_{E}$ and $R_{E}$ 古典的に禁止された領域の境界点が許される。唯一の仮説は、指数的に成長するすべての成分は無視可能であることである。 Schr\"{o}dinger 方程式や Bogoliubov-de Gennes 方程式を含む応用について論じる。

We will prove a general approximate quantization rule $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential well of the equations $e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, and $L_{E}$ and $R_{E}$ the boundary points between the classically forbidden regions and the allowed region. The only hypothesis is that all exponentially growing components are negligible, which is appropriate for not narrow wells. Applications including the Schr\"{o}dinger equation and Bogoliubov-de Gennes equation will be discussed.

翻訳日:2023-04-27 03:08:47 公開日:2023-04-25

# AutoKary2022: 染色体インスタンスセグメンテーションのための大規模アノテーション付きデータセット

AutoKary2022: A Large-Scale Densely Annotated Dataset for Chromosome Instance Segmentation ( http://arxiv.org/abs/2303.15839v3 )

ライセンス: Link先を確認

Dan You, Pengcheng Xia, Qiuzhu Chen, Minghui Wu, Suncheng Xiang, Jun Wang

(参考訳) 染色体異常 (karyotype analysis) の診断には, 異相細胞顕微鏡画像からの染色体インスタンスの自動分割が重要である。しかし、高い注釈付きデータセットの欠如や染色体の複雑な形態、例えば、密度分布、任意の方向、幅広い長さがあるため、依然として困難な課題である。この領域の開発を容易にするために、我々は、50人の患者から612の顕微鏡画像に27,000以上の染色体インスタンスを含むautokary2022という、大規模な密注釈付きデータセットを手作業で構築する。具体的には、各インスタンスにポリゴンマスクとクラスラベルをアノテートして、正確な染色体の検出とセグメンテーションを支援する。その上で,本データセットの代表的な手法を体系的に検討し,多くの興味深い知見を得た。このデータセットが医学的理解に向けて研究を進めることを願っている。データセットは、https://github.com/wangjuncongyu/chromosome-instance-segmentation-datasetで利用できる。

Automated chromosome instance segmentation from metaphase cell microscopic images is critical for the diagnosis of chromosomal disorders (i.e., karyotype analysis). However, it is still a challenging task due to lacking of densely annotated datasets and the complicated morphologies of chromosomes, e.g., dense distribution, arbitrary orientations, and wide range of lengths. To facilitate the development of this area, we take a big step forward and manually construct a large-scale densely annotated dataset named AutoKary2022, which contains over 27,000 chromosome instances in 612 microscopic images from 50 patients. Specifically, each instance is annotated with a polygonal mask and a class label to assist in precise chromosome detection and segmentation. On top of it, we systematically investigate representative methods on this dataset and obtain a number of interesting findings, which helps us have a deeper understanding of the fundamental problems in chromosome instance segmentation. We hope this dataset could advance research towards medical understanding. The dataset can be available at: https://github.com/wangjuncongyu/chromosome-instance-segmentation-dataset.

翻訳日:2023-04-27 03:08:10 公開日:2023-04-25

# Align and Attend: Dual Contrastive Lossesを用いたマルチモーダル要約

Align and Attend: Multimodal Summarization with Dual Contrastive Losses ( http://arxiv.org/abs/2303.07284v2 )

ライセンス: Link先を確認

Bo He, Jun Wang, Jielin Qiu, Trung Bui, Abhinav Shrivastava, Zhaowen Wang

(参考訳) マルチモーダル要約の目標は、異なるモダリティから最も重要な情報を抽出して要約を形成することである。単項要約とは異なり、マルチモーダル要約タスクはクロスモーダル情報を明示的に活用し、より信頼性が高く高品質な要約を生成する。しかし、既存の手法では、異なるモダリティ間の時間的対応を活用できず、異なるサンプル間の固有の相関を無視する。そこで本研究では,マルチモーダル入力を効果的に調整し,対応できる統一マルチモーダルトランスフォーマーモデルであるa2summ(aldin and attend multimodal summarization)を提案する。さらに,試料間相関と試料内相関の両方をモデル化する2つの新しいコントラスト損失を提案する。 2つの標準ビデオ要約データセット(TVSumとSumMe)と2つのマルチモーダル要約データセット(Daily MailとCNN)に対する大規模な実験は、A2Summの優位性を示し、すべてのデータセットで最先端のパフォーマンスを達成する。さらに,ライブストリームビデオと注釈付き要約文を含む大規模マルチモーダル要約データセットBLiSSを収集した。私たちのコードとデータセットは、~\url{https://boheumd.github.io/A2Summ/}で公開されています。

The goal of multimodal summarization is to extract the most important information from different modalities to form summaries. Unlike unimodal summarization, the multimodal summarization task explicitly leverages cross-modal information to help generate more reliable and high-quality summaries. However, existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples. To address this issue, we introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input. In addition, we propose two novel contrastive losses to model both inter-sample and intra-sample correlations. Extensive experiments on two standard video summarization datasets (TVSum and SumMe) and two multimodal summarization datasets (Daily Mail and CNN) demonstrate the superiority of A2Summ, achieving state-of-the-art performances on all datasets. Moreover, we collected a large-scale multimodal summarization dataset BLiSS, which contains livestream videos and transcribed texts with annotated summaries. Our code and dataset are publicly available at ~\url{https://boheumd.github.io/A2Summ/}.

翻訳日:2023-04-27 03:07:21 公開日:2023-04-25

# 量子メモリのデバイス非依存認証に向けて

Towards the device-independent certification of a quantum memory ( http://arxiv.org/abs/2304.10408v2 )

ライセンス: Link先を確認

Pavel Sekatski, Jean-Daniel Bancal, Marie Ioannou, Mikael Afzelius, Nicolas Brunner

(参考訳) 量子記憶は将来の量子通信ネットワークの主要な要素の1つである。そのため、彼らの認証は重要な課題である。ここでは,量子記憶の効率的な認証手法を提案する。ソースや測定装置の事前特徴化が不要なデバイス非依存的なアプローチを考えることで,量子記憶のためのロバストな自己テスト手法を開発した。次に、最近の固体アンサンブル量子メモリ実験において、0.87の忠実性を確認し、緩和されたシナリオでこの技術の実際的妥当性を示す。より一般的に,本手法は量子チャネルを実装した任意のデバイスの特徴付けに適用される。

Quantum memories represent one of the main ingredients of future quantum communication networks. Their certification is therefore a key challenge. Here we develop efficient certification methods for quantum memories. Considering a device-independent approach, where no a priori characterisation of sources or measurement devices is required, we develop a robust self-testing method for quantum memories. We then illustrate the practical relevance of our technique in a relaxed scenario by certifying a fidelity of 0.87 in a recent solid-state ensemble quantum memory experiment. More generally, our methods apply for the characterisation of any device implementing a qubit identity quantum channel.

翻訳日:2023-04-27 03:00:05 公開日:2023-04-25

# task loss-guided lpメトリックによるオブジェクト検出におけるトレーニング後の量子化の改善

Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric ( http://arxiv.org/abs/2304.09785v2 )

ライセンス: Link先を確認

Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu

(参考訳) オブジェクト検出ネットワークの効率的な推論は、エッジデバイスにおいて大きな課題である。完全精度モデルを直接低ビット幅に変換するPTQ(Post-Training Quantization)は、モデル推論の複雑さを減らすための効果的で便利なアプローチである。しかし、オブジェクト検出などの複雑なタスクに適用すると、かなり精度が低下する。 PTQは量子化パラメータを異なるメトリクスで最適化し、量子化の摂動を最小化する。量子化前後の特徴写像のp-ノルム距離 Lp は摂動を評価する計量として広く用いられている。対象検出ネットワークの特殊性について,lpメトリックのパラメータpが量子化性能に大きく影響することを示す。固定ハイパーパラメータpは最適量子化性能を達成できないことを示す。この問題を軽減するため,我々は,オブジェクト検出のタスク損失を表す object detection output loss (odol) を用いて,異なるレイヤを定量化するための異なる p 値を割り当てるフレームワーク detptq を提案する。 DetPTQは最適な量子化パラメータを選択するためにODOLベースの適応Lpメトリックを使用する。実験の結果,DetPTQは2次元と3次元の両方の物体検出器において,最先端のPTQ法よりも優れていた。例えば、RetinaNet-ResNet18上では、31.1/31.7(量子化/フル精度)のmAPを4ビットの重みと4ビットの活性化で達成する。

Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.

翻訳日:2023-04-27 02:59:38 公開日:2023-04-25

# CB-Conformer: バイアス付き単語認識のためのコンテキストバイアス変換器

CB-Conformer: Contextual biasing Conformer for biased word recognition ( http://arxiv.org/abs/2304.09607v2 )

ライセンス: Link先を確認

Yaoxun Xu and Baiji Liu and Qiaochu Huang and, Xingchen Song and Zhiyong Wu and Shiyin Kang and Helen Meng

(参考訳) ソース領域とターゲット領域のミスマッチにより、偏りのある単語情報をうまく利用して、ターゲット領域における自動音声認識モデルの性能を向上させる方法が、ホットな研究テーマとなる。以前のアプローチでは、固定された外部言語モデルでデコードするか、サイズの大きいバイアスモジュールを導入していた。本研究では,文脈バイアスモジュールと自己適応型言語モデルを導入してバイアス付き単語認識を改善するcb-conformerを提案する。コンテキストバイアスモジュールは、オーディオフラグメントとコンテキスト情報を組み合わせたもので、オリジナルのコンフォーメータのモデルパラメータはわずか0.2%である。自己適応言語モデル(Self-Adaptive Language Model)は、そのリコールと精度に基づいてバイアス付き単語の内部重みを修正し、バイアス付き単語に焦点を合わせ、標準の固定言語モデルよりも自動音声認識モデルとの統合を成功させる。さらに,wenetspeechに基づくオープンソースmandarinbiased-wordデータセットを構築し,公開する。実験の結果,提案手法では文字誤り率を15.34%削減し,14.13%の単語リコール,6.80%の単語F1スコアがベースコンバータに比べて増加した。

Due to the mismatch between the source and target domains, how to better utilize the biased word information to improve the performance of the automatic speech recognition model in the target domain becomes a hot research topic. Previous approaches either decode with a fixed external language model or introduce a sizeable biasing module, which leads to poor adaptability and slow inference. In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer. The Contextual Biasing Module combines audio fragments and contextual information, with only 0.2% model parameters of the original Conformer. The Self-Adaptive Language Model modifies the internal weights of biased words based on their recall and precision, resulting in a greater focus on biased words and more successful integration with the automatic speech recognition model than the standard fixed language model. In addition, we construct and release an open-source Mandarin biased-word dataset based on WenetSpeech. Experiments indicate that our proposed method brings a 15.34% character error rate reduction, a 14.13% biased word recall increase, and a 6.80% biased word F1-score increase compared with the base Conformer.

翻訳日:2023-04-27 02:59:18 公開日:2023-04-25

# 単語から音楽へ:シンボリック音楽生成におけるサブワードトークン化手法の研究

From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation ( http://arxiv.org/abs/2304.08953v2 )

ライセンス: Link先を確認

Adarsh Kumar and Pedro Sarmento

(参考訳) サブワードのトークン化は、トランスフォーマーベースのモデルでテキストベースの自然言語処理(nlp)タスクで広く成功している。シンボリック音楽研究においてトランスフォーマーモデルがますます普及するにつれて、シンボリック音楽領域におけるサブワードトークン化の有効性を検討することが重要である。本稿では,シンボリック音楽生成におけるバイトペア符号化(bpe)などのサブワードトークン化手法と,その全体的な構造への影響について検討する。実験は、シングルトラックメロディのみ、シングル楽器付きマルチトラック、マルチトラックとマルチストラクチャの3種類のMIDIデータセットに基づいている。サブワードのトークン化をポスト・ミュージックのトークン化スキームに適用し,同時に長曲の生成を可能にし,構造指標 (si) やピッチクラスエントロピーなどの客観的指標を用いて,生成された楽曲全体の構造を改善する。また,bpeとunigramという2つのサブワードトークン化手法を比較し,両手法が一貫した改善をもたらすことを確認した。本研究は,サブワードのトークン化が記号的音楽生成に有望な手法であることを示唆し,特にマルチトラック曲などの複雑なデータを含む場合において,楽曲構成に広範な影響を及ぼす可能性があることを示唆する。

Subword tokenization has been widely successful in text-based natural language processing (NLP) tasks with Transformer-based models. As Transformer models become increasingly popular in symbolic music-related studies, it is imperative to investigate the efficacy of subword tokenization in the symbolic music domain. In this paper, we explore subword tokenization techniques, such as byte-pair encoding (BPE), in symbolic music generation and its impact on the overall structure of generated songs. Our experiments are based on three types of MIDI datasets: single track-melody only, multi-track with a single instrument, and multi-track and multi-instrument. We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time and improves the overall structure of the generated music in terms of objective metrics like structure indicator (SI), Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE and Unigram, and observe that both methods lead to consistent improvements. Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition, particularly in cases involving complex data such as multi-track songs.

翻訳日:2023-04-27 02:58:55 公開日:2023-04-25

# DiffFit: 簡単なパラメータ効率の良い微調整による大拡散モデルの解錠性

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2304.06648v3 )

ライセンス: Link先を確認

Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li

(参考訳) 拡散モデルは高品質な画像の生成に非常に有効であることが証明されている。しかし、大規模な事前学習拡散モデルを新しい領域に適用することは、現実世界のアプリケーションにとって重要な課題である。本稿では,新しい領域への高速適応を可能にする大規模事前学習拡散モデルを微調整するパラメータ効率の高い手法であるdifffitを提案する。 DiffFitは、特定のレイヤでバイアス項と新たに追加されたスケーリング要素のみを微調整するが、トレーニングのスピードアップとモデルストレージコストの削減をもたらす、恥ずかしいほど単純である。完全な微調整と比較すると、DiffFitは2$\times$トレーニングスピードアップを実現しており、全体のモデルパラメータの約0.12\%を格納する必要がある。高速適応におけるスケーリング因子の有効性を正当化する直観的理論解析が提案されている。下流の8つのデータセットでは、DiffFitはより効率的でありながら、完全な微調整よりも優れた、あるいは競争的なパフォーマンスを達成する。注目すべきは、DiffFitが最小のコストを加えることで、訓練済みの低解像度生成モデルを高解像度に適応できることである。拡散ベースの手法の中で、DiffFitはImageNet 512$\times$512ベンチマークで3.02の最先端FIDを新たに設定し、公開前のImageNet 256$\times$256チェックポイントから25エポックだけを微調整した。

Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12\% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

翻訳日:2023-04-27 02:57:59 公開日:2023-04-25

# SwiftTron: 量子トランスフォーマーのための効率的なハードウェアアクセラレータ

SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers ( http://arxiv.org/abs/2304.03986v2 )

ライセンス: Link先を確認

Alberto Marchisio and Davide Dura and Maurizio Capra and Maurizio Martina and Guido Masera and Muhammad Shafique

(参考訳) Transformerの計算集約操作は、リソースに制約のあるEdgeAI / smallMLデバイスへのデプロイにおいて、大きな課題となる。確立されたニューラルネットワーク圧縮技術として、量子化はハードウェア計算とメモリ資源を減らす。特に、固定点量子化は、基礎となるハードウェアの加算器や乗算器のような軽量ブロックを使った計算を容易にするために望ましい。しかし、既存の汎用ハードウェアや汎用AIアクセラレータ、あるいは浮動小数点ユニットを備えたトランスフォーマー専用のアーキテクチャに完全に量子化されたトランスフォーマーをデプロイすることは、実現不可能または/または非効率である。そこで我々は,量子トランスフォーマー用に設計された,効率的なハードウェアアクセラレータSwiftTronを提案する。 SwiftTronは、さまざまなタイプのTransformer操作(Attention、Softmax、GELU、Layer Normalizationなど)の実行をサポートし、正しい計算を行うためのさまざまなスケーリング要因を説明できる。 ASIC設計フローを用いて,完全なSwiftTronアーキテクチャを65ドル nm CMOS 技術で合成する。我々の加速器はRoBERTaベースモデルを1.83 nsで実行し、33.64 mWの電力を消費し、面積は273 mm^2である。再現性を容易にするため、SwiftTronアーキテクチャのRTLはhttps://github.com/albertomarchisio/SwiftTronでリリースされています。

Transformers' compute-intensive operations pose enormous challenges for their deployment in resource-constrained EdgeAI / tinyML devices. As an established neural network compression technique, quantization reduces the hardware computational and memory resources. In particular, fixed-point quantization is desirable to ease the computations using lightweight blocks, like adders and multipliers, of the underlying hardware. However, deploying fully-quantized Transformers on existing general-purpose hardware, generic AI accelerators, or specialized architectures for Transformers with floating-point units might be infeasible and/or inefficient. Towards this, we propose SwiftTron, an efficient specialized hardware accelerator designed for Quantized Transformers. SwiftTron supports the execution of different types of Transformers' operations (like Attention, Softmax, GELU, and Layer Normalization) and accounts for diverse scaling factors to perform correct computations. We synthesize the complete SwiftTron architecture in a $65$ nm CMOS technology with the ASIC design flow. Our Accelerator executes the RoBERTa-base model in 1.83 ns, while consuming 33.64 mW power, and occupying an area of 273 mm^2. To ease the reproducibility, the RTL of our SwiftTron architecture is released at https://github.com/albertomarchisio/SwiftTron.

翻訳日:2023-04-27 02:57:12 公開日:2023-04-25

# MAP-MRF問題の緩和:F-Wolfe方向の組合せ

Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions ( http://arxiv.org/abs/2010.09567v5 )

ライセンス: Link先を確認

Vladimir Kolmogorov

(参考訳) MAP-MRF推論問題のLP緩和問題,特に最近提案された手法について考察する(Swoboda, Kolmogorov 2019; Kolmogorov, Pock 2021)。重要な計算サブルーチンとして、FW(Frank-Wolfe)法の変種を用いて、組合せポリトープ上の滑らかな凸関数を最小化する。本稿では, (freund et al. 2017) において異なる文脈で導入された, 内面的frank-wolfe方向に基づくこのサブプルーチンの効率的な実装を提案する。より一般的には、合成サブプロブレムの抽象データ構造を定義し、FW方向を表わし、木構造MAP-MRF推論サブプロブレムの特殊化を記述する。実験の結果,本手法は問題のあるクラスに対する現状のlpソルバであることが判明した。私たちのコードはhttps://pub.ist.ac.at/~vnk/papers/IN-FACE-FW.htmlで利用可能です。

We consider the problem of solving LP relaxations of MAP-MRF inference problems, and in particular the method proposed recently in (Swoboda, Kolmogorov 2019; Kolmogorov, Pock 2021). As a key computational subroutine, it uses a variant of the Frank-Wolfe (FW) method to minimize a smooth convex function over a combinatorial polytope. We propose an efficient implementation of this subproutine based on in-face Frank-Wolfe directions, introduced in (Freund et al. 2017) in a different context. More generally, we define an abstract data structure for a combinatorial subproblem that enables in-face FW directions, and describe its specialization for tree-structured MAP-MRF inference subproblems. Experimental results indicate that the resulting method is the current state-of-art LP solver for some classes of problems. Our code is available at https://pub.ist.ac.at/~vnk/papers/IN-FACE-FW.html.

翻訳日:2023-04-27 01:12:36 公開日:2023-04-25

# 適度に監督された学習:定義、枠組み、一般性

Moderately Supervised Learning: Definition, Framework and Generality ( http://arxiv.org/abs/2008.11945v4 )

ライセンス: Link先を確認

Yongquan Yang

(参考訳) 教師付き学習は多くの人工知能(AI)アプリケーションで顕著な成功を収めた。現在の文献では、トレーニングデータセットに用意されたラベルの特性を参照することにより、教師あり学習(SL)と弱教師あり学習(WSL)に分類される。 SLは、トレーニングデータセットが理想的な(完全で正確な)ラベルで割り当てられている状況、WSLはトレーニングデータセットが非理想的(不完全、不正確な、不正確な)ラベルで割り当てられている状況に関する。しかし、SLタスクに対する様々なソリューションは、与えられたラベルが必ずしも習得しやすいとは限らないことを示しており、与えられたラベルから学習が容易なターゲットへの変換は最終SLソリューションの性能に大きな影響を及ぼす可能性がある。 SLの定義は、与えられたラベルから簡単に学習できるターゲットへの変換の性質を考慮せずに、特定のSLタスクの適切なソリューションを構築する上で重要ないくつかの詳細を隠蔽する。したがって、AIアプリケーション分野のエンジニアには、これらの詳細を体系的に明らかにすることが望ましい。本稿では、SLの分類を拡大し、与えられたラベルが理想である状況に関するサブタイプの中等教育学習(MSL)を調査することにより、この目標を達成することを試みるが、アノテーションの単純さにより、与えられたラベルを学習しやすいターゲットに変換するには、注意深い設計が必要である。定義, フレームワーク, 一般性の観点から, MSL を概念化し, MSL タスクを体系的に解析するための基本的基礎を提供する。その間、mslの概念化と数学者のビジョンの関係を明らかにするとともに、この論文は、数学者のビジョンから解決すべき問題を見るためのaiアプリケーションエンジニアのためのチュートリアルを確立する。

Learning with supervision has achieved remarkable success in numerous artificial intelligence (AI) applications. In the current literature, by referring to the properties of the labels prepared for the training dataset, learning with supervision is categorized as supervised learning (SL) and weakly supervised learning (WSL). SL concerns the situation where the training data set is assigned with ideal (complete, exact and accurate) labels, while WSL concerns the situation where the training data set is assigned with non-ideal (incomplete, inexact or inaccurate) labels. However, various solutions for SL tasks have shown that the given labels are not always easy to learn, and the transformation from the given labels to easy-to-learn targets can significantly affect the performance of the final SL solutions. Without considering the properties of the transformation from the given labels to easy-to-learn targets, the definition of SL conceals some details that can be critical to building the appropriate solutions for specific SL tasks. Thus, for engineers in the AI application field, it is desirable to reveal these details systematically. This article attempts to achieve this goal by expanding the categorization of SL and investigating the sub-type moderately supervised learning (MSL) that concerns the situation where the given labels are ideal, but due to the simplicity in annotation, careful designs are required to transform the given labels into easy-to-learn targets. From the perspectives of the definition, framework and generality, we conceptualize MSL to present a complete fundamental basis to systematically analyse MSL tasks. At meantime, revealing the relation between the conceptualization of MSL and the mathematicians' vision, this paper as well establishes a tutorial for AI application engineers to refer to viewing a problem to be solved from the mathematicians' vision.

翻訳日:2023-04-27 01:12:16 公開日:2023-04-25

# マルチエンベディングインタラクションの観点からの知識グラフ埋め込み手法の解析

Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective ( http://arxiv.org/abs/1903.11406v4 )

ライセンス: Link先を確認

Hung Nghiep Tran, Atsuhiro Takasu

(参考訳) 知識グラフは知識を表現するための一般的なフォーマットであり、セマンティック検索エンジン、質問応答システム、レコメンデーターシステムに多くの応用がある。実世界の知識グラフは通常不完全であるため、この問題を解決するためにCanonical decomposition/Parallel factorization (CP)、DistMult、ComplExなどの知識グラフ埋め込み法が提案されている。これらの手法は、実体と関係を意味空間への埋め込みベクトルとして表現し、それらの間の関係を予測する。埋め込みベクトル自体は、リッチなセマンティック情報を含み、データ分析のような他のアプリケーションで使用することができる。しかし、これらのモデルと埋め込みベクトル自体のメカニズムは大きく異なり、それらの理解と比較が困難である。このような理解の欠如を考えると、特にCPのような2つのロールベースの埋め込みベクトルを持つ複雑なモデルや、複雑な値の埋め込みベクトルを持つ最先端のComplExモデルに対して、それらを用いることは効果的または正しくない。本稿では,これらのモデルの統合と一般化のための新しいアプローチとして,マルチエンベディングインタラクション機構を提案する。理論的にこのメカニズムを導出し、実験的な分析と比較を行う。また,四元数代数に基づく新しいマルチエンベディングモデルを提案し,人気のあるベンチマークを用いて有望な結果が得られることを示す。ソースコードはgithubのhttps://github.com/tranhungnghiep/analyzekgeで入手できる。

Knowledge graph is a popular format for representing knowledge, with many applications to semantic search engines, question-answering systems, and recommender systems. Real-world knowledge graphs are usually incomplete, so knowledge graph embedding methods, such as Canonical decomposition/Parallel factorization (CP), DistMult, and ComplEx, have been proposed to address this issue. These methods represent entities and relations as embedding vectors in semantic space and predict the links between them. The embedding vectors themselves contain rich semantic information and can be used in other applications such as data analysis. However, mechanisms in these models and the embedding vectors themselves vary greatly, making it difficult to understand and compare them. Given this lack of understanding, we risk using them ineffectively or incorrectly, particularly for complicated models, such as CP, with two role-based embedding vectors, or the state-of-the-art ComplEx model, with complex-valued embedding vectors. In this paper, we propose a multi-embedding interaction mechanism as a new approach to uniting and generalizing these models. We derive them theoretically via this mechanism and provide empirical analyses and comparisons between them. We also propose a new multi-embedding model based on quaternion algebra and show that it achieves promising results using popular benchmarks. Source code is available on GitHub at https://github.com/tranhungnghiep/AnalyzeKGE.

翻訳日:2023-04-27 01:11:47 公開日:2023-04-25

# アンサンブルサンプリング

Ensemble Sampling ( http://arxiv.org/abs/1705.07347v4 )

ライセンス: Link先を確認

Xiuyuan Lu, Benjamin Van Roy

(参考訳) トンプソンサンプリングは、幅広いオンライン決定問題に対して効果的なヒューリスティックとして現れた。その基本的な形式では、アルゴリズムはモデル上の後方分布から計算とサンプリングを必要とし、単純な特別な場合のみ扱いやすい。本稿では,ニューラルネットワークのような複雑なモデルに直面した場合でもトラクタビリティを維持しつつ,トンプソンサンプリングを近似するアンサンブルサンプリングを開発する。アンサンブルサンプリングは、トンプソンサンプリングが実現可能なアプリケーションの範囲を劇的に拡大する。我々は、このアプローチを支持する理論的基盤を確立し、さらなる洞察を提供する計算結果を示す。

Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. We establish a theoretical basis that supports the approach and present computational results that offer further insight.

翻訳日:2023-04-27 01:11:23 公開日:2023-04-25

# ガウス過程回帰における最大確率推定は不適切である

Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed ( http://arxiv.org/abs/2203.09179v3 )

ライセンス: Link先を確認

Toni Karvonen and Chris J. Oates

(参考訳) ガウス過程の回帰は、機械学習と統計学の無数の学術的および工業的応用を基盤としており、最大推定値は、共分散カーネルの適切なパラメータを選択するために日常的に使用される。しかしながら、回帰モデルの予測がデータの小さな摂動に影響を受けないような場合、最大帰納推定が適切に設定される状況を確立することは、まだ未解決の問題である。本稿は,データ内の予測分布がヘリンガー距離に関してリプシッツではない場合に,最大確率推定器が適切に適用されないシナリオを明らかにする。これらの障害ケースは、最大確率で長さスケールパラメータを推定する定常共分散関数を持つガウス過程において、ノイズのないデータ設定で発生する。最大推定の失敗はガウス過程の民俗学の一部ではあるが、これらの厳密な理論的な結果がこの種の最初のものと思われる。これらの負の結果は、ガウス過程モデルのトレーニングに最大推定値を用いた場合、ケース・バイ・ケース・ケース・バイ・ケースにおいて、適切性を評価する必要があることを示唆している。

Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.

翻訳日:2023-04-27 00:21:40 公開日:2023-04-25

# 強化学習におけるスキル伝達の事前, 階層, 情報非対称性

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning ( http://arxiv.org/abs/2201.08115v2 )

ライセンス: Link先を確認

Sasha Salter, Kristian Hartikainen, Walter Goodwin, Ingmar Posner

(参考訳) 過去の経験から行動を発見し、それらを新しいタスクに移す能力は、現実世界でサンプル効率よく行動するインテリジェントエージェントの目印である。具体化された強化学習者を同じ能力で獲得することは、ロボット工学への展開を成功させる上で重要である。階層的およびKL規則化された強化学習は、ここでは個別に約束するが、おそらくハイブリッドアプローチはそれぞれの利点を組み合わせることができるだろう。これらの分野の鍵となるのは、学習するスキルをバイアスするために、アーキテクチャモジュール間で情報非対称性を使用することである。非対称性の選択は転送可能性に大きな影響を及ぼすが、既存の方法は主にドメインに依存しない、潜在的に最適でない方法での直観に基づく。本稿では,情報非対称性によって制御された逐次的タスク間のスキルの重要表現性と伝達可能性のトレードオフを理論的かつ実証的に示す。この知見を生かして,階層的kl正規化手法である表現可能・伝達可能スキル(apes)に対する注意的優先事項について紹介する。既存のアプローチとは異なり、APESはデータ駆動の領域依存的な方法で、表現性-伝達可能性定理に基づいて非対称性の選択を自動化する。ロボットブロックの積み重ねなど、様々なレベルの外挿と疎結合の複雑な転写領域に対する実験は、APESが以前の手法を大幅に上回って、正しい非対称選択の臨界度を示す。

The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized reinforcement learning individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry across architectural modules to bias which skills are learnt. While asymmetry choice has a large influence on transferability, existing methods base their choice primarily on intuition in a domain-independent, potentially sub-optimal, manner. In this paper, we theoretically and empirically show the crucial expressivity-transferability trade-off of skills across sequential tasks, controlled by information asymmetry. Given this insight, we introduce Attentive Priors for Expressive and Transferable Skills (APES), a hierarchical KL-regularized method, heavily benefiting from both priors and hierarchy. Unlike existing approaches, APES automates the choice of asymmetry by learning it in a data-driven, domain-dependent, way based on our expressivity-transferability theorems. Experiments over complex transfer domains of varying levels of extrapolation and sparsity, such as robot block stacking, demonstrate the criticality of the correct asymmetric choice, with APES drastically outperforming previous methods.

翻訳日:2023-04-27 00:20:42 公開日:2023-04-25

# 適応重み付きマルチビュークラスタリング

Adaptive Weighted Multi-View Clustering ( http://arxiv.org/abs/2110.13240v2 )

ライセンス: Link先を確認

Shuo Shuo Liu and Lin Lin

(参考訳) マルチビューデータの学習は機械学習研究において新たな問題であり、非負行列分解(NMF)は複数のビューから情報を統合するための一般的な次元性還元法である。これらの見解はしばしばコンセンサスだけでなく補完的な情報を提供する。しかし、多くのマルチビューnmfアルゴリズムは、各ビューに等しい重みを割り当てたり、ラインサーチを通じて経験的に重みを調整したりする。本稿では,重み付きマルチビューNMF(WM-NMF)アルゴリズムを提案する。特に,各視点の情報内容の定量化のために,視点固有の重みと観測固有の再構築重みの両方を学ぶことを目的とした。導入された重み付けスキームは不要なビューの悪影響を緩和し、より小さいビューとより大きなビューを割り当てることで重要なビューのポジティブな効果を増大させることができる。提案手法の有効性と利点は,既存のアルゴリズムと比較して,クラスタリング性能の向上とノイズデータへの対処について検証した。

Learning multi-view data is an emerging problem in machine learning research, and nonnegative matrix factorization (NMF) is a popular dimensionality-reduction method for integrating information from multiple views. These views often provide not only consensus but also complementary information. However, most multi-view NMF algorithms assign equal weight to each view or tune the weight via line search empirically, which can be infeasible without any prior knowledge of the views or computationally expensive. In this paper, we propose a weighted multi-view NMF (WM-NMF) algorithm. In particular, we aim to address the critical technical gap, which is to learn both view-specific weight and observation-specific reconstruction weight to quantify each view's information content. The introduced weighting scheme can alleviate unnecessary views' adverse effects and enlarge the positive effects of the important views by assigning smaller and larger weights, respectively. Experimental results confirm the effectiveness and advantages of the proposed algorithm in terms of achieving better clustering performance and dealing with the noisy data compared to the existing algorithms.

翻訳日:2023-04-27 00:20:21 公開日:2023-04-25

# エージェントにマップの仕方を教える:マルチオブジェクトナビゲーションのための空間推論

Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation ( http://arxiv.org/abs/2107.06011v4 )

ライセンス: Link先を確認

Pierre Marza, Laetitia Matignon, Olivier Simonin, Christian Wolf

(参考訳) 視覚ナビゲーションの文脈では,エージェントがその観測履歴を考慮した場所で活用し,既知の目標を効率的に達成するためには,新たな環境をマップする能力が必要である。この能力は、エージェントが空間的関係や規則性を知覚し、対象の特性を発見できる空間的推論と関連付けられる。最近の研究は、ディープニューラルネットワークによってパラメータ化され、強化学習(RL)でトレーニングされた学習可能なポリシーを導入している。古典的なRLセットアップでは、報酬のみから、空間的にマッピングと推論の能力がエンドツーエンドで学習される。そこで本研究では,目標達成目標達成のために訓練されたエージェントにおける空間認識能力の出現を指向した補助的タスクの形で補足的監視を導入する。与えられた位置におけるエージェントと到達目標の間の空間的関係を定量化する指標を推定する学習は、多目的ナビゲーション設定において高い正の影響を及ぼすことを示す。提案手法は,環境の明示的あるいは暗黙的な表現を構築する,異なるベースラインエージェントの性能を著しく向上させる。提案された補助的損失で訓練された文献の学習ベースのエージェントは、CVPR 2021 Embodied AI Workshopの一部であるMulti-Object Navigation Challengeに優勝した。

In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object characteristics. Recent work introduces learnable policies parametrized by deep neural networks and trained with Reinforcement Learning (RL). In classical RL setups, the capacity to map and reason spatially is learned end-to-end, from reward alone. In this setting, we introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge, part of the CVPR 2021 Embodied AI Workshop.

翻訳日:2023-04-27 00:20:03 公開日:2023-04-25

# Fed-FSNet:ファジィ合成ネットワークによる非I.I.D.フェデレーション学習の緩和

Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network ( http://arxiv.org/abs/2208.12044v2 )

ライセンス: Link先を確認

Jingcai Guo, Song Guo, Jie Zhang, Ziming Liu

(参考訳) フェデレーテッド・ラーニング(FL)は、最近、有望なプライバシー保護分散機械学習フレームワークとして登場した。エッジデバイス上でローカルに分散トレーニングを実行し、クラウドサーバに生のデータ共有を集中せずにグローバルモデルに集約することで、共有グローバルモデルを共同学習することを目指している。しかし、エッジデバイス間の大きなローカルデータ不均一性(Non-I.D.データ)のため、FLはローカルデータセットによりシフトした勾配を生成できるグローバルモデルを容易に得ることができ、それによってモデルの性能が低下したり、トレーニング中に非収束に苦しむことさえできる。本稿では,Fed-FSNet(Fed-FSNet)と呼ばれる新しいFLトレーニングフレームワークを提案する。具体的には、クラウドサーバにエッジに依存しない隠れモデルを保持し、グローバルモデルの方向対応インバージョンを推定する。隠れたモデルは、グローバルモデルのみに条件付きI.I.D.データサンプル(サンプル特徴)をファジィに合成し、エッジデバイスで共有することで、FLトレーニングを高速でよりよく収束させる。さらに、この合成プロセスは、ローカルモデルのパラメータや更新情報へのアクセスや、個々のローカルモデル出力の分析を伴わないため、FLのプライバシを保証できる。いくつかのFLベンチマークによる実験結果から,本手法は非I.D.問題を大幅に軽減し,他の代表手法よりも優れた性能が得られることが示された。

Federated learning (FL) has emerged as a promising privacy-preserving distributed machine learning framework recently. It aims at collaboratively learning a shared global model by performing distributed training locally on edge devices and aggregating local models into a global one without centralized raw data sharing in the cloud server. However, due to the large local data heterogeneities (Non-I.I.D. data) across edge devices, the FL may easily obtain a global model that can produce more shifted gradients on local datasets, thereby degrading the model performance or even suffering from the non-convergence during training. In this paper, we propose a novel FL training framework, dubbed Fed-FSNet, using a properly designed Fuzzy Synthesizing Network (FSNet) to mitigate the Non-I.I.D. FL at-the-source. Concretely, we maintain an edge-agnostic hidden model in the cloud server to estimate a less-accurate while direction-aware inversion of the global model. The hidden model can then fuzzily synthesize several mimic I.I.D. data samples (sample features) conditioned on only the global model, which can be shared by edge devices to facilitate the FL training towards faster and better convergence. Moreover, since the synthesizing process involves neither access to the parameters/updates of local models nor analyzing individual local model outputs, our framework can still ensure the privacy of FL. Experimental results on several FL benchmarks demonstrate that our method can significantly mitigate the Non-I.I.D. issue and obtain better performance against other representative methods.

翻訳日:2023-04-27 00:12:21 公開日:2023-04-25

# 事前知識を用いた多目的パラメータ最適化のための効率的なユーティリティ関数学習

Efficient Utility Function Learning for Multi-Objective Parameter Optimization with Prior Knowledge ( http://arxiv.org/abs/2208.10300v2 )

ライセンス: Link先を確認

Farha A. Khan, J\"org P. Dietrich, Christian Wirth

(参考訳) マルチオブジェクト最適化における現在の最先端は、与えられたユーティリティ関数を仮定し、インタラクティブにユーティリティ関数を学習するか、または完全なParetoフロントを決定しようとする。しかしながら、実世界の問題における結果誘発は、しばしば暗黙的かつ明示的な専門家の知識に基づいているため、ユーティリティ関数の定義が困難である。これを軽減するため、好み学習によって専門家の知識を用いて、オフラインでユーティリティ関数を学習する。他の作品とは対照的に、結果の選好(pairwise)だけでなく、ユーティリティ関数空間に関する粗い情報も使用しています。これにより、特に非常に少ない結果を使用する場合、ユーティリティ関数の推定を改善することができる。さらに,ユーティリティ関数学習タスクにおける不確かさをモデル化し,最適化チェーン全体を通して伝達する。ユーティリティ関数を学習する手法は,高品質な結果をもたらす一方で,専門家の関与を繰り返す必要をなくす。本稿では,提案手法のサンプル効率と品質向上を4つの領域で示し,特にサーロゲートユーティリティ関数が真のエキスパートユーティリティ関数を正確に捉えることができない場合について述べる。また, 良好な結果を得るには, 誘導不確実性を検討し, 実世界領域で一般的な問題であるバイアスドサンプルの効果を分析することが重要であることを示した。

The current state-of-the-art in multi-objective optimization assumes either a given utility function, learns a utility function interactively or tries to determine the complete Pareto front, requiring a post elicitation of the preferred result. However, result elicitation in real world problems is often based on implicit and explicit expert knowledge, making it difficult to define a utility function, whereas interactive learning or post elicitation requires repeated and expensive expert involvement. To mitigate this, we learn a utility function offline, using expert knowledge by means of preference learning. In contrast to other works, we do not only use (pairwise) result preferences, but also coarse information about the utility function space. This enables us to improve the utility function estimate, especially when using very few results. Additionally, we model the occurring uncertainties in the utility function learning task and propagate them through the whole optimization chain. Our method to learn a utility function eliminates the need of repeated expert involvement while still leading to high-quality results. We show the sample efficiency and quality gains of the proposed method in 4 domains, especially in cases where the surrogate utility function is not able to exactly capture the true expert utility function. We also show that to obtain good results, it is important to consider the induced uncertainties and analyze the effect of biased samples, which is a common problem in real world domains.

翻訳日:2023-04-27 00:11:53 公開日:2023-04-25

# ディープニューラルネットワークを用いた等方関数予測

Isoform Function Prediction Using a Deep Neural Network ( http://arxiv.org/abs/2208.03325v3 )

ライセンス: Link先を確認

Sara Ghazanfari, Ali Rasteh, Seyed Abolfazl Motahari, Mahdieh Soleymani Baghshah

(参考訳) アイソフォームは、オルタナティブスプライシングと呼ばれる現象において同じ遺伝子部位から生成されるmRNAである。ヒトマルチエクソン遺伝子の95%以上が代替スプライシングを受けていることが研究で示されている。 mRNA配列にはほとんど変化はないが、細胞機能や調節に系統的な影響を及ぼす可能性がある。遺伝子のアイソフォームは異なる、あるいは対照的な機能を持っていると広く報告されている。多くの研究は、代替スプライシングが人間の健康と病気に重要な役割を果たすことを示した。幅広い遺伝子機能研究にもかかわらず、アイソフォームの機能についてはほとんど情報がない。近年,遺伝子機能と遺伝子発現プロファイルを用いてアイソフォーム関数を予測するために,複数インスタンス学習に基づく計算手法が提案されている。しかし、ラベル付きトレーニングデータがないため、それらのパフォーマンスは望ましいものではない。さらに、条件ランダム場(CRF)のような確率モデルを用いてアイソフォームの関係をモデル化している。本研究は, アイソフォーム配列, 発現プロファイル, 遺伝子オントロジーグラフなどのデータと貴重な情報を全て利用し, ディープニューラルネットワークに基づく包括的モデルを提案する。 UniProt Gene Ontology (GO)データベースは、遺伝子機能の標準参照として使用される。 NCBI RefSeqデータベースは遺伝子およびアイソフォーム配列の抽出に使用され、NCBI SRAデータベースは発現プロファイルデータに使用される。予測精度の測定には、曲線下の受信機動作特性領域(roc auc)や曲線下の精度リコール(pr auc)などの指標を用いる。

Isoforms are mRNAs produced from the same gene site in the phenomenon called Alternative Splicing. Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing. Although there are few changes in mRNA sequence, They may have a systematic effect on cell function and regulation. It is widely reported that isoforms of a gene have distinct or even contrasting functions. Most studies have shown that alternative splicing plays a significant role in human health and disease. Despite the wide range of gene function studies, there is little information about isoforms' functionalities. Recently, some computational methods based on Multiple Instance Learning have been proposed to predict isoform function using gene function and gene expression profile. However, their performance is not desirable due to the lack of labeled training data. In addition, probabilistic models such as Conditional Random Field (CRF) have been used to model the relation between isoforms. This project uses all the data and valuable information such as isoform sequences, expression profiles, and gene ontology graphs and proposes a comprehensive model based on Deep Neural Networks. The UniProt Gene Ontology (GO) database is used as a standard reference for gene functions. The NCBI RefSeq database is used for extracting gene and isoform sequences, and the NCBI SRA database is used for expression profile data. Metrics such as Receiver Operating Characteristic Area Under the Curve (ROC AUC) and Precision-Recall Under the Curve (PR AUC) are used to measure the prediction accuracy.

翻訳日:2023-04-27 00:11:30 公開日:2023-04-25

# イベントレベルの視覚的質問応答に対するクロスモーダル因果関係推論

Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering ( http://arxiv.org/abs/2207.12647v7 )

ライセンス: Link先を確認

Yang Liu, Guanbin Li, Liang Lin

(参考訳) 既存の視覚的質問応答手法は、しばしばクロスモーダルなスプリアス相関や、ビデオにまたがる事象の時間性、因果性、ダイナミクスを捉えるのに失敗するイベントレベルの推論プロセスを単純化してしまう。本稿では,イベントレベルの視覚的質問応答のタスクに対処するため,クロスモーダル因果関係推論のためのフレームワークを提案する。特に、視覚的および言語的モダリティにまたがる因果構造を発見するために、一連の因果的介入操作が導入された。私たちのフレームワークは、Cross-Modal Causal RelatIonal Reasoning (CMCIR)と呼ばれ、3つのモジュールを含んでいる。一正面的及び裏的因果的介入による視覚的及び言語的スプリアス相関を共同的に区別する因果性認識視覚言語的推論(cvlr)モジュール二視覚的・言語的意味論のきめ細かい相互作用を捉えるための時空間変換器(STT)モジュール三グローバル意味認識視覚言語表現を適応的に学習するための視覚言語機能融合(vlff)モジュール 4つのイベントレベルのデータセットに対する大規模な実験は、視覚言語学的因果構造を発見し、堅牢なイベントレベルの視覚的質問応答を実現する上で、CMCIRの優位性を示している。データセット、コード、モデルはhttps://github.com/HCPLab-SYSU/CMCIRで公開されている。

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively disentangling the visual and linguistic spurious correlations via front-door and back-door causal interventions; ii) Spatial-Temporal Transformer (STT) module for capturing the fine-grained interactions between visual and linguistic semantics; iii) Visual-Linguistic Feature Fusion (VLFF) module for learning the global semantic-aware visual-linguistic representations adaptively. Extensive experiments on four event-level datasets demonstrate the superiority of our CMCIR in discovering visual-linguistic causal structures and achieving robust event-level visual question answering. The datasets, code, and models are available at https://github.com/HCPLab-SYSU/CMCIR.

翻訳日:2023-04-27 00:11:07 公開日:2023-04-25

# バイオメトリックブレンダー:生体特徴空間を模倣する超高次元多クラス合成データジェネレータ

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space ( http://arxiv.org/abs/2206.10747v2 )

ライセンス: Link先を確認

Marcell Stippinger, D\'avid Han\'ak, Marcell T. Kurbucz, Gergely Hancz\'ar, Oliv\'er M. T\"orteli, Zolt\'an Somogyv\'ari

(参考訳) 自由に利用可能な(実物または合成物)高次元または超高次元のマルチクラスデータセットの欠如は、特徴スクリーニングの研究、特にバイオメトリックスの分野では、このようなデータセットの使用が一般的である。本稿では,超高次元多クラス合成データ生成器であるbiometricblenderと呼ばれるpythonパッケージについて報告する。データ生成プロセスにおいて、ブレンドされた特徴の全体的な有用性と相互関係をユーザによって制御することができ、合成特徴空間は実際のバイオメトリックデータセットの重要な特性を模倣することができる。

The lack of freely available (real-life or synthetic) high or ultra-high dimensional, multi-class datasets may hamper the rapidly growing research on feature screening, especially in the field of biometrics, where the usage of such datasets is common. This paper reports a Python package called BiometricBlender, which is an ultra-high dimensional, multi-class synthetic data generator to benchmark a wide range of feature screening methods. During the data generation process, the overall usefulness and the intercorrelations of blended features can be controlled by the user, thus the synthetic feature space is able to imitate the key properties of a real biometric dataset.

翻訳日:2023-04-27 00:10:43 公開日:2023-04-25

# Twitter会話スレッドのヘイトインテンシティ予測

Predicting Hate Intensity of Twitter Conversation Threads ( http://arxiv.org/abs/2206.08406v3 )

ライセンス: Link先を確認

Qing Meng and Tharun Suresh, Roy Ka-Wei Lee, Tanmoy Chakraborty

(参考訳) ツイートは、オンラインのソーシャルメディアにおける最も簡潔なコミュニケーション形態であり、一つのツイートが会話の会話を作り、破壊する可能性を秘めている。オンラインヘイトスピーチはかつてないほどアクセスしやすく、その拡散を抑制することは、ソーシャルメディア企業やユーザーにとって、コンジェニアルコミュニケーションにとって最も重要である。最近の少数の研究は、ツイートスレッド/コンテキストに関わらず、個々のツイートを分類することに重点を置いている。ヘイトスピーチを抑制する古典的なアプローチの1つは、ヘイトスピーチの投稿後にリアクティブ戦略を採用することである。ポストのファクト戦略は、ヘイトスピーチを自力で扇動する可能性を示さない微妙なポストを無視する結果となり、ポストの回答で続く議論に終止符を打つ可能性がある。本稿では,将来,ツイートが応答チェーンを通じてもたらす憎悪の強さを予測することを目的としたDRAGNET++を提案する。ツイートスレッドのセマンティックな構造と伝播構造を利用して、続く各ツイートにおけるヘイト強度の低下につながるコンテキスト情報を最大化する。反人種差別には、米国の政治や新型コロナウイルス(covid-19)背景における人種差別的発言に関するソーシャルメディア談話の返信ツイート、新型コロナウイルス(covid-19)のパンデミック中の4000万ツイートのデータセット、新型コロナウイルス(covid-19)のパンデミック時の反asian行動に基づくtwitterデータセットが含まれる。キュレートされたデータセットはすべて、ツイートスレッドの構造グラフ情報で構成されている。 DRAGNET++は最先端のすべてのベースラインを大幅に上回ることを示す。これは、Person相関係数の11%のマージンで最高のベースラインを上回り、他の2つのデータセットで同様のパフォーマンスを持つ反ラチズムデータセットのRMSEでは25%低下する。

Tweets are the most concise form of communication in online social media, wherein a single tweet has the potential to make or break the discourse of the conversation. Online hate speech is more accessible than ever, and stifling its propagation is of utmost importance for social media companies and users for congenial communication. Most of the research barring a recent few has focused on classifying an individual tweet regardless of the tweet thread/context leading up to that point. One of the classical approaches to curb hate speech is to adopt a reactive strategy after the hate speech postage. The ex-post facto strategy results in neglecting subtle posts that do not show the potential to instigate hate speech on their own but may portend in the subsequent discussion ensuing in the post's replies. In this paper, we propose DRAGNET++, which aims to predict the intensity of hatred that a tweet can bring in through its reply chain in the future. It uses the semantic and propagating structure of the tweet threads to maximize the contextual information leading up to and the fall of hate intensity at each subsequent tweet. We explore three publicly available Twitter datasets -- Anti-Racism contains the reply tweets of a collection of social media discourse on racist remarks during US political and Covid-19 background; Anti-Social presents a dataset of 40 million tweets amidst the COVID-19 pandemic on anti-social behaviours; and Anti-Asian presents Twitter datasets collated based on anti-Asian behaviours during COVID-19 pandemic. All the curated datasets consist of structural graph information of the Tweet threads. We show that DRAGNET++ outperforms all the state-of-the-art baselines significantly. It beats the best baseline by an 11% margin on the Person correlation coefficient and a decrease of 25% on RMSE for the Anti-Racism dataset with a similar performance on the other two datasets.

翻訳日:2023-04-27 00:10:31 公開日:2023-04-25

# 自己教師付き視覚前訓練のためのマスク周波数モデリング

Masked Frequency Modeling for Self-Supervised Visual Pre-Training ( http://arxiv.org/abs/2206.07706v2 )

ライセンス: Link先を確認

Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

(参考訳) MFM(Masked Frequency Modeling)は、視覚モデルの自己教師付き事前学習のための統合周波数領域に基づくアプローチである。本稿では,空間領域の入力埋め込みにマスクトークンをランダムに挿入する代わりに,その視点を周波数領域にシフトする。具体的には、まずMFMが入力画像の周波数成分の一部をマスクし、周波数スペクトルの欠落周波数を予測する。我々の重要な洞察は、周波数領域におけるマスキング成分の予測は、空間領域におけるマスキングパッチの予測よりも、空間領域におけるマスキングパターンを明らかにすることがより理想的なことである。その結果,マスク・アンド・予測戦略の適切な構成では,高周波数成分の構造情報と低周波数成分間の低レベル統計の両方が優れた表現の学習に有用であることが示唆された。 MFMは初めて、ViTとCNNの両方で、単純な非シームフレームワークが、以下のものを使って意味のある表現を学習できることを示した。 (i)余分なデータ (ii)余分なモデル (iii)マスクトークン。画像分類と意味セグメンテーションの実験結果およびいくつかのロバスト性ベンチマークは、最近のマスク画像モデリングアプローチと比較して、mfmの競争力と高度なロバスト性を示している。さらに,従来の画像復元作業の有効性を,統合周波数の観点から総合的に検討し,MFM手法との興味深い関係を明らかにする。

We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. Instead of randomly inserting mask tokens to the input embeddings in the spatial domain, in this paper, we shift the perspective to the frequency domain. Specifically, MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum. Our key insight is that predicting masked components in the frequency domain is more ideal to reveal underlying image patterns rather than predicting masked patches in the spatial domain, due to the heavy spatial redundancy. Our findings suggest that with the right configuration of mask-and-predict strategy, both the structural information within high-frequency components and the low-level statistics among low-frequency counterparts are useful in learning good representations. For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token. Experimental results on image classification and semantic segmentation, as well as several robustness benchmarks show the competitive performance and advanced robustness of MFM compared with recent masked image modeling approaches. Furthermore, we also comprehensively investigate the effectiveness of classical image restoration tasks for representation learning from a unified frequency perspective and reveal their intriguing relations with our MFM approach.

翻訳日:2023-04-27 00:09:55 公開日:2023-04-25

# 分散多重ネットワークモデルにおけるスパース部分空間クラスタリング

Sparse Subspace Clustering in Diverse Multiplex Network Model ( http://arxiv.org/abs/2206.07602v2 )

ライセンス: Link先を確認

Majid Noroozi and Marianna Pensky

(参考訳) 本論文は,pensky と wang (2021) で導入された多元的多重化(dimple)ネットワークモデルについて考察する。さらに、すべての層を同じコミュニティ構造を持つグループに分割することができるが、同じグループの層はブロック接続確率の異なる行列を持つかもしれない。 DIMPLEモデルは、すべての層で同じコミュニティ構造を持つ多層ネットワークを研究する複数の論文と、同じグループの層がブロック接続確率の同じ行列を持つMixture Multilayer Stochastic Block Model (MMLSBM)を一般化する。ペンスキーとwang (2021) は隣接テンソルのプロキシにスペクトルクラスタリングを適用したが、本論文は同一のコミュニティ構造を持つ層群を識別するためにスパース部分空間クラスタリング (ssc) を用いる。穏やかな条件下では、後者は層間クラスタリングに強い一貫性をもたらす。さらに、SSC は Pensky や Wang (2021) の方法論よりもはるかに大きなネットワークを扱うことができ、並列コンピューティングの応用に完全に適している。

The paper considers the DIverse MultiPLEx (DIMPLE) network model, introduced in Pensky and Wang (2021), where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection probabilities. The DIMPLE model generalizes a multitude of papers that study multilayer networks with the same community structures in all layers, as well as the Mixture Multilayer Stochastic Block Model (MMLSBM), where the layers in the same group have identical matrices of block connection probabilities. While Pensky and Wang (2021) applied spectral clustering to the proxy of the adjacency tensor, the present paper uses Sparse Subspace Clustering (SSC) for identifying groups of layers with identical community structures. Under mild conditions, the latter leads to the strongly consistent between-layer clustering. In addition, SSC allows to handle much larger networks than methodology of Pensky and Wang (2021), and is perfectly suitable for application of parallel computing.

翻訳日:2023-04-27 00:09:33 公開日:2023-04-25

# nash平衡としての対称一般化固有値問題

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium ( http://arxiv.org/abs/2206.04993v2 )

ライセンス: Link先を確認

Ian Gemp, Charlie Chen, Brian McWilliams

(参考訳) 対称一般化固有値問題(SGEP)は、数値線型代数の基本概念である。標準相関解析、独立成分分析、部分最小二乗法、線形判別分析、主成分など、多くの古典的機械学習問題の解を捉えている。それにもかかわらず、ほとんどの一般的なソルバは、ストリーミングデータセット(ミニバッチ)を扱う場合、制限的に高価であり、研究は、特定の問題インスタンスに対する効率的なソリューションを見つけることに集中している。本研究では,nash 平衡が一般化固有ベクトルの集合であるトップ-$k$ sgep のゲーム理論的定式化を考案する。また,Nashへの漸近収束を保証した並列化可能なアルゴリズムを提案する。現在の最先端のメソッドは、1回のイテレーションで$o(d^2k)$ランタイムの複雑さを必要とします。私たちは、この並列アプローチを$O(dk)$ランタイムの複雑さを達成する方法を示します。実験の結果,ニューラルネットワークのアクティベーションの大規模解析を含む様々なsgep問題に対して,このアルゴリズムが解決できることを実証する。

The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive when dealing with streaming data sets (i.e., minibatches) and research has instead concentrated on finding efficient solutions to specific problem instances. In this work, we develop a game-theoretic formulation of the top-$k$ SGEP whose Nash equilibrium is the set of generalized eigenvectors. We also present a parallelizable algorithm with guaranteed asymptotic convergence to the Nash. Current state-of-the-art methods require $O(d^2k)$ runtime complexity per iteration which is prohibitively expensive when the number of dimensions ($d$) is large. We show how to modify this parallel approach to achieve $O(dk)$ runtime complexity. Empirically we demonstrate that this resulting algorithm is able to solve a variety of SGEP problem instances including a large-scale analysis of neural network activations.

翻訳日:2023-04-27 00:09:13 公開日:2023-04-25

# ターゲット適応設計

Targeted Adaptive Design ( http://arxiv.org/abs/2205.14208v2 )

ライセンス: Link先を確認

Carlo Graziani and Marieme Ngom

(参考訳) 現代の先進的製造と先端材料設計は、しばしば最適な構造、特性、性能パラメータをもたらす設定のために比較的高次元のプロセス制御パラメータ空間を探索する必要がある。前者から後者へのマッピングは、ノイズの実験や高価なシミュレーションから決定されなければならない。本稿では,制御空間から設計空間への未知の関数を,所定の許容範囲内で所望の設計特徴を生成する最適制御設定を定量化して,高価なノイズ測定により確認しなければならない数学的枠組みに抽象化する。本稿では,このサンプリング作業を効率的に行う新しいアルゴリズムであるtarget adaptive design (tad)について述べる。 TADは、各反復段階で未知のマッピングのガウス過程サロゲートモデルを作成し、新しい制御設定のバッチを実験的にサンプリングし、ターゲット設計のログ予測可能性の更新を最適化する。 tadは、許容ボックス内に収まる不確実性のある解を見つけるか、将来の予測情報を用いて探索空間が無解で枯渇したかどうかを判定する。したがって、TADは、ベイズ最適化や最適実験設計と本質的に異なる方法で探査・探査の緊張を具現化している。

Modern advanced manufacturing and advanced materials design often require searches of relatively high-dimensional process control parameter spaces for settings that result in optimal structure, property, and performance parameters. The mapping from the former to the latter must be determined from noisy experiments or from expensive simulations. We abstract this problem to a mathematical framework in which an unknown function from a control space to a design space must be ascertained by means of expensive noisy measurements, which locate optimal control settings generating desired design features within specified tolerances, with quantified uncertainty. We describe targeted adaptive design (TAD), a new algorithm that performs this sampling task efficiently. TAD creates a Gaussian process surrogate model of the unknown mapping at each iterative stage, proposing a new batch of control settings to sample experimentally and optimizing the updated log-predictive likelihood of the target design. TAD either stops upon locating a solution with uncertainties that fit inside the tolerance box or uses a measure of expected future information to determine that the search space has been exhausted with no solution. TAD thus embodies the exploration-exploitation tension in a manner that recalls, but is essentially different from, Bayesian optimization and optimal experimental design.

翻訳日:2023-04-27 00:08:56 公開日:2023-04-25

# 図面における顔・身体検出のためのドメイン適応型自己監督型事前訓練

Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings ( http://arxiv.org/abs/2211.10641v2 )

ライセンス: Link先を確認

Bar{\i}\c{s} Batuhan Topal, Deniz Yuret, Tevfik Metin Sezgin

(参考訳) 図面は絵の抽象とコミュニケーションの強力な手段である。デジタルアート、漫画、漫画など様々な形の図面を理解することは、コンピュータビジョンやコンピュータグラフィックスのコミュニティにとって大きな関心事となっている。漫画や漫画のデジタル化図面は多いが、多彩なスタイルのバリエーションがあり、ドメイン固有認識器の訓練に高価な手書きラベルを必要とする。本研究では,学生ネットワークの更新設計を改良した教師学生ネットワークに基づく自己教師型学習が,顔と身体の検知にどのように役立つかを示す。私たちの設定では、少数のサブセットのみにラベルが提供される場合、ターゲットドメインから大量のラベル付きデータを利用できます。さらに我々は,自然画像(現実世界の画像)から大量のドメイン外ラベル付き画像を用いて,学習パイプラインからブートストラップ検出器へのスタイル転送が可能であることを実証した。組合わされたアーキテクチャは,最小限のアノテーションによる最先端(SOTA)および近SOTA性能の検出器を生成する。私たちのコードはhttps://github.com/barisbatuhan/DASS_Detectorからアクセスできます。

Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.

翻訳日:2023-04-27 00:02:57 公開日:2023-04-25

# MMD-B-Fair:統計的テストによる公正表現の学習

MMD-B-Fair: Learning Fair Representations with Statistical Testing ( http://arxiv.org/abs/2211.07907v3 )

ライセンス: Link先を確認

Namrata Deka and Danica J. Sutherland

(参考訳) 本稿では,カーネル2サンプルテストによるデータの公平な表現を学習するためのMDD-B-Fairを提案する。最大平均不一致(mmd)テストでは、対象属性に関する情報を保存しつつ、異なる感度グループの表現を区別できないような、データのニューラルな特徴を見出す。 mmdテストのパワーを最小化することは、テストしきい値の複雑な振る舞いを単純に無視できないため、(以前の作業のように)最大化するよりも難しい。本手法は, ブロックテスト方式の単純な漸近を利用して, 複雑な対角最適化や生成的モデリング方式を必要とせずに, 公正表現を効率的に見つける。提案手法を各種データセット上で評価し, 重要属性に関する情報を「隠蔽」する機能, 下流転送における有効性を示す。

We introduce a method, MMD-B-Fair, to learn fair representations of data via kernel two-sample testing. We find neural features of our data where a maximum mean discrepancy (MMD) test cannot distinguish between representations of different sensitive groups, while preserving information about the target attributes. Minimizing the power of an MMD test is more difficult than maximizing it (as done in previous work), because the test threshold's complex behavior cannot be simply ignored. Our method exploits the simple asymptotics of block testing schemes to efficiently find fair representations without requiring complex adversarial optimization or generative modelling schemes widely used by existing work on fair representation learning. We evaluate our approach on various datasets, showing its ability to ``hide'' information about sensitive attributes, and its effectiveness in downstream transfer tasks.

翻訳日:2023-04-27 00:02:38 公開日:2023-04-25

# 安全な潜伏拡散:拡散モデルにおける不適切な変性の緩和

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models ( http://arxiv.org/abs/2211.05105v3 )

ライセンス: Link先を確認

Patrick Schramowski, Manuel Brack, Bj\"orn Deiseroth, Kristian Kersting

(参考訳) テキスト条件付き画像生成モデルは近年,画像品質とテキストアライメントの驚くべき結果が得られ,急速に成長するアプリケーションに採用されている。それらは高度にデータ駆動であり、インターネットからランダムにスクレイピングされた数十億規模のデータセットに依存しているため、デジェネレーションや偏りのある人間の行動からも苦しんでいます。逆に、これらのバイアスを補強することもある。好ましくない副作用に対処するために,安全な潜伏拡散(SLD)を示す。具体的には, トレーニングセットの不整合による不適切な変性を測定するため, ヌード性や暴力などの概念を包含する, ベッド不適切な画像プロンプト(I2P)を含む新しい画像生成テスト用画像プロンプトを確立する。以上の結果から,SLDは拡散過程において不適切な画像部分を除去・抑制し,追加の訓練を必要とせず,全体的な画像品質やテキストアライメントに悪影響を及ぼさない。

Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the internet, they also suffer, as we demonstrate, from degenerated and biased human behavior. In turn, they may even reinforce such biases. To help combat these undesired side effects, we present safe latent diffusion (SLD). Specifically, to measure the inappropriate degeneration due to unfiltered and imbalanced training sets, we establish a novel image generation test bed-inappropriate image prompts (I2P)-containing dedicated, real-world image-to-text prompts covering concepts such as nudity and violence. As our exhaustive empirical evaluation demonstrates, the introduced SLD removes and suppresses inappropriate image parts during the diffusion process, with no additional training required and no adverse effect on overall image quality or text alignment.

翻訳日:2023-04-27 00:02:18 公開日:2023-04-25

# structdiffusion: 未知のオブジェクトを用いた物理的に有価な構造の作成

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects ( http://arxiv.org/abs/2211.04604v2 )

ライセンス: Link先を確認

Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton

(参考訳) 人間の環境で動作しているロボットは、オブジェクトを意味的に意味のある構成に再構成できる必要がある。本研究では,ステップバイステップの指示を伴わずに,物理的に有効な構造を構築する問題に着目する。本研究では,拡散モデルとオブジェクト中心トランスフォーマーを組み合わせることで,部分視点の雲や高レベルな言語目標,例えば「テーブルをセットする」といった構造を構築する。 1つのモデルを用いて言語条件の多段階計画タスクを複数実行することができる。 structdiffusionは、特定の構造で訓練された既存のマルチモーダルトランスフォーマーモデルと比較して、物理的に有価な構造を、被写体から組み立てる成功率を平均16%向上させる。シミュレーションおよび実世界の再配置作業における保持対象について実験を行った。重要となるのは,拡散モデルと衝突弁別モデルを統合することで,これまで見つからなかった物体を並べ替える際の他の方法に対する一般化が向上することを示すことである。ビデオや追加結果については、当社のwebサイトをご覧ください。

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

翻訳日:2023-04-27 00:01:58 公開日:2023-04-25

# 視覚トランスフォーマーのためのデータレベル抽選券仮説

Data Level Lottery Ticket Hypothesis for Vision Transformers ( http://arxiv.org/abs/2211.01484v2 )

ライセンス: Link先を確認

Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

(参考訳) 従来の抽選切符仮説(LTH)は、密集ニューラルネットワーク内にスパースサブネットワークが存在し、入賞切符と呼ばれる適切なランダム初期化法が存在し、それは密集切符と同程度にゼロからトレーニングすることができると主張している。一方、視覚変換器(ViT)におけるLTHの研究はほとんど評価されていない。本稿では,従来の方式ではvitの重量レベルでは従来の当選券を見つけることが困難であることを示す。次に、VTの入力依存性にインスパイアされた画像パッチからなるデータを入力するために、VTのLTHを一般化する。すなわち、入力イメージパッチのサブセットが存在し、このパッチのサブセットだけを使用して、ViTをゼロからトレーニングし、すべてのイメージパッチを使用してトレーニングされたViTと同様の精度を達成することができる。我々は、このサブセットを、入力データのかなりの量の情報を表す、エムの入賞チケットにパッチを当てる。チケットセレクタを用いて,DeiT,LV-ViT,Swin Transformerなど,様々な種類のViTのパッチ情報に基づいて,入賞券を生成する。実験の結果, 入賞券で学習したモデルの性能とランダムに選択された部分集合との間には明らかな差が認められ, 提案する理論が検証された。提案するデータ-LTH-ViTと従来のLTHの類似性について詳しく検討し,理論の完全性をさらに検証した。コードは補足室で提供される。

The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at the weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the em winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate on the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH to further verify the integrity of our theory. The code is provided in the supplementary.

翻訳日:2023-04-27 00:01:40 公開日:2023-04-25

# マルチビューデータにおける欠落値のインプット

Imputation of missing values in multi-view data ( http://arxiv.org/abs/2210.14484v2 )

ライセンス: Link先を確認

Wouter van Loon, Marjolein Fokkema, Mark de Rooij

(参考訳) オブジェクトの集合が複数の異なる特徴集合(ビューと呼ばれる)によって記述されるデータは、マルチビューデータと呼ばれる。マルチビューデータに欠落する値が発生した場合、ビュー内のすべての機能が同時に欠落する可能性がある。これは、特に高次元性と組み合わせた場合、計算的に不可能な条件付き計算手法を適用する、非常に大量の欠落データをもたらす。多視点学習のための既存の累積ペナル化ロジスティック回帰(StaPLR)アルゴリズムに基づく新しい計算法を提案する。マルチビューコンテキストに固有の計算問題に対処するために、次元還元空間で計算を実行する。シミュレーションデータセットにおいて,新しい計算法の性能と既存の計算アルゴリズムを比較した。その結果,新しいインプテーション手法は,計算コストがはるかに低く競争結果をもたらすことを示し,計算が不可能であるような環境では,ミスフォレストや予測平均マッチングといった高度なインプテーションアルゴリズムを利用可能とする。

Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This leads to very large quantities of missing data which, especially when combined with high-dimensionality, makes the application of conditional imputation methods computationally infeasible. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.

翻訳日:2023-04-27 00:01:15 公開日:2023-04-25

# 特徴選択のための逐次注意

Sequential Attention for Feature Selection ( http://arxiv.org/abs/2209.14881v3 )

ライセンス: Link先を確認

Taisuke Yasuda, MohammadHossein Bateni, Lin Chen, Matthew Fahrbach, Gang Fu, Vahab Mirrokni

(参考訳) 機能選択は、予算制約の対象となるモデル品質を最大化する機械学習モデルの機能のサブセットを選択する問題である。ニューラルネットワークでは、$\ell_1$の正規化、注意、その他のテクニックに基づく先行手法は、通常、1つの評価ラウンドにおいて機能サブセット全体を選択し、選択中の機能の残価値、すなわち、他の機能が既に選択されているという特徴の限界寄与を無視する。本稿では,ニューラルネットワークの最先端な実験結果を実現するSequential Attentionと呼ばれる特徴選択アルゴリズムを提案する。このアルゴリズムは、グレディフォワード選択の効率的なワンパス実装に基づいており、各ステップの注意重みを特徴量のプロキシとして利用する。線形回帰のアルゴリズムに対する理論的洞察は、この設定への適応が古典直交マッチング追従法 (omp) のアルゴリズムと同値であることを示し、従って証明可能な保証をすべて継承する。我々の理論および経験的分析は、注意の有効性と過剰パラメータ化との関連について、独立した関心を持つかもしれない新しい説明を提供する。

Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.

翻訳日:2023-04-27 00:00:47 公開日:2023-04-25

# 訓練中のllmによる解釈モデルの拡張

Augmenting Interpretable Models with LLMs during Training ( http://arxiv.org/abs/2209.11799v3 )

ライセンス: Link先を確認

Chandan Singh, Armin Askari, Rich Caruana, Jianfeng Gao

(参考訳) 最近の大規模言語モデル(llm)は、増加するタスク群に対する顕著な予測性能を示している。しかし、高吸収領域(医学など)への増殖と計算限界の設定は、解釈可能性と効率性に急激なニーズを生み出している。 LLMが学んだ知識を活用して極めて効率的かつ解釈可能なモデルを構築するためのフレームワークであるAug-imodels(Aug-imodels)を提案することで、このニーズに対処する。 Aug-imodel は入射時に LLM を使用するが、推論中は使用せず、完全な透過性を実現し、LLM と比較して1000倍以上の速度/メモリの改善が可能である。自然言語処理における aug-imodel の2つのインスタンス化について検討する。一 LLM と LLM との疎結合による一般化加法モデルを強化した Aug-GAM (ii) LLM機能拡張で決定木を拡大するAug-Tree。さまざまなテキスト分類データセットにまたがって、どちらも非指定のデータセットよりも優れています。 Aug-GAMは1万倍のパラメータを持ち、完全に透明であるにもかかわらず、はるかに大きなモデル(例えば6ビリオンのパラメータ GPT-J モデル)よりも優れている。さらに、Aug-imodelsを自然言語fMRI研究で探求し、科学データから興味深い解釈を生成する。 Aug-imodelsの使用と結果の再現に関するすべてのコードはGithubで公開されている。

Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github.

翻訳日:2023-04-27 00:00:26 公開日:2023-04-25

# 信号時相論理述語のモデル予測ロバスト性

Model Predictive Robustness of Signal Temporal Logic Predicates ( http://arxiv.org/abs/2209.07881v2 )

ライセンス: Link先を確認

Yuanfei Lin, Haoxuan Li, Matthias Althoff

(参考訳) 信号時相論理のロバスト性は、信号が仕様に準拠しているかを評価するだけでなく、式がどの程度満たされるか違反しているかの指標を提供する。ロバスト性の計算は、基礎となる述語のロバスト性の評価に基づいている。しかしながら、述語のロバスト性は通常、システムダイナミクスを含まずに、モデルフリーな方法で定義される。さらに、複雑な述語の堅牢性を定義することはしばしば自明である。これらの問題に対処するために,モデルに基づく予測を考慮し,従来の手法に比べて頑健性を評価する体系的な方法を提供するモデル予測頑健性の概念を提案する。特にガウス過程回帰を用いて事前計算された予測に基づいてロバストネスを学習し、ロバストネス値をオンライン上で効率的に計算する。記録されたデータセット上での形式化された交通ルールに使用される述語を用いた自動運転のユースケースに対する我々のアプローチの評価を行い、表現性の観点から従来のアプローチと比較して、我々のアプローチの利点を強調した。堅牢性の定義をトラジェクティブプランナーに組み込むことで、自動運転車はデータセットの人間ドライバーよりもロバストな交通規則に従う。

The robustness of signal temporal logic not only assesses whether a signal adheres to a specification but also provides a measure of how much a formula is fulfilled or violated. The calculation of robustness is based on evaluating the robustness of underlying predicates. However, the robustness of predicates is usually defined in a model-free way, i.e., without including the system dynamics. Moreover, it is often nontrivial to define the robustness of complicated predicates precisely. To address these issues, we propose a notion of model predictive robustness, which provides a more systematic way of evaluating robustness compared to previous approaches by considering model-based predictions. In particular, we use Gaussian process regression to learn the robustness based on precomputed predictions so that robustness values can be efficiently computed online. We evaluate our approach for the use case of autonomous driving with predicates used in formalized traffic rules on a recorded dataset, which highlights the advantage of our approach compared to traditional approaches in terms of expressiveness. By incorporating our robustness definitions into a trajectory planner, autonomous vehicles obey traffic rules more robustly than human drivers in the dataset.

翻訳日:2023-04-27 00:00:03 公開日:2023-04-25

# マスク付き自動エンコーディングは自然言語を大規模に監視するのに役立たない

Masked Autoencoding Does Not Help Natural Language Supervision at Scale ( http://arxiv.org/abs/2301.07836v3 )

ライセンス: Link先を確認

Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

(参考訳) 自己監督と自然言語監督は、様々な下流タスクに優れた汎用画像エンコーダを訓練する2つのエキサイティングな方法として登場した。 m3aeやslipのような最近の研究は、これらのアプローチを効果的に組み合わせられることを示唆しているが、最も注目すべきは、小さな事前トレーニングデータセット(<50mサンプル)を使用しており、これらのアプローチで一般的に使用される大規模なレジーム(>100mサンプル)を効果的に反映していないことである。ここでは、同様のアプローチが、はるかに多くのデータでトレーニングした場合に有効かどうかを検討する。マスク付きオートエンコーダ,MAE,コントラスト言語イメージ事前トレーニングの2つの方法を組み合わせることで,CLIPは11.3Mイメージテキストペアのコーパスでトレーニングされた場合にはCLIPよりもメリットを提供するが,1.4Bイメージの大規模なコーパスでトレーニングされた場合には,CLIPに対する(一般的なビジョンタスクのスイートで評価された)メリットはほとんどない。私たちの研究は、大規模な画像テキストトレーニングにおける自己監督の有効性(あるいは欠如)について、必要な明確さを提供します。

Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks. Recent works such as M3AE and SLIP have suggested that these approaches can be effectively combined, but most notably their results use small pre-training datasets (<50M samples) and don't effectively reflect the large-scale regime (>100M examples) that is commonly used for these approaches. Here we investigate whether a similar approach can be effective when trained with a much larger amount of data. We find that a combination of two state of the art approaches: masked auto-encoders, MAE and contrastive language image pre-training, CLIP provides a benefit over CLIP when trained on a corpus of 11.3M image-text pairs, but little to no benefit (as evaluated on a suite of common vision tasks) over CLIP when trained on a large corpus of 1.4B images. Our work provides some much needed clarity into the effectiveness (or lack thereof) of self supervision for large-scale image-text training.

翻訳日:2023-04-26 23:53:48 公開日:2023-04-25

# ディープニューラルネットワークは2年生よりスマートか?

Are Deep Neural Networks SMARTer than Second Graders? ( http://arxiv.org/abs/2212.09993v3 )

ライセンス: Link先を確認

Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum

(参考訳) 最近では、高度な認知能力を必要とするタスク(例えば、囲い込み、アートの生成、チャットgptなど)を解決するためのディープニューラルネットワークの応用が増えている。幅広いスキルを必要とする問題を解決する上で、ニューラルネットワークはどの程度一般化可能か? この質問に答えるために、ニューラルネットワークの抽象化、推論、一般化能力を評価するための、単純なマルチモーダルアルゴリズム推論タスクと関連するsmart-101データセットを提案する。私たちのデータセットは101の独特なパズルで構成されており、それぞれのパズルは絵と質問で構成されており、それらの解には算術、代数、空間的推論などいくつかの基本的なスキルが必要です。ディープニューラルネットワークのトレーニングに向けてデータセットをスケールするために、解アルゴリズムを維持しながら、パズルごとに完全に新しいインスタンスをプログラムで生成する。 SMART-101の性能をベンチマークするために,様々な最先端のバックボーンを用いた視覚・言語メタラーニングモデルを提案する。実験の結果,強力な深層モデルでは教師付き環境下でのパズルに対して妥当な性能が得られたが,一般化のための解析ではランダムな精度に劣らないことがわかった。また,最近のchatgptや他の大規模言語モデルをsmart-101の一部として評価し,説得力のある推論能力を示すが,回答はしばしば誤りであることを確認した。

Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle, while retaining their solution algorithm. To benchmark performances on SMART-101, we propose a vision and language meta-learning model using varied state-of-the-art backbones. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT and other large language models on a part of SMART-101 and find that while these models show convincing reasoning abilities, the answers are often incorrect.

翻訳日:2023-04-26 23:53:22 公開日:2023-04-25

# Invariant Lipschitz Bandits: A Side Observation Approach

Invariant Lipschitz Bandits: A Side Observation Approach ( http://arxiv.org/abs/2212.07524v2 )

ライセンス: Link先を確認

Nam Phuong Tran, Long Tran-Thanh

(参考訳) 対称性は多くの最適化と意思決定の問題に現れ、最適化コミュニティからかなりの注目を集めている。最適化の成功にもかかわらず、特にバンディット文学において、オンライン最適化設定において対称性の利用は十分に検討されていない。そこで本論文では、リプシッツ・バンディット・セッティング(Lipschitz bandit setting)という、リプシッツ・バンディットのサブクラスにおいて、報酬関数とアームの集合が変換群の下で保存されるような不変なリプシッツ・バンディット・セッティング(Lipschitz bandit setting)について検討する。これは、群軌道を用いたサイドオブザーバーを、アームの集合を一様に判別する \texttt{uniformmesh-n} アルゴリズム (\cite{kleinberg2005_uniformmesh}) に統合するものである。サイドオブザーブレーションアプローチを用いて、群が有限であることを前提に、群の濃度に依存する後悔の上界が改善されたことを証明する。また、不変リプシッツ・バンディット類(対数因子まで)に対する後悔の下限が一致することも証明する。我々は、バンディット理論とシーケンシャルな意思決定理論における対称性のさらなる研究に火をつけることを願っている。

Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.

翻訳日:2023-04-26 23:52:56 公開日:2023-04-25

# GPViT:グループ伝搬を用いた高分解能非階層視覚変換器

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation ( http://arxiv.org/abs/2212.06795v2 )

ライセンス: Link先を確認

Chenhongyi Yang, Jiarui Xu, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang

(参考訳) グループ伝搬型視覚トランスフォーマ(gpvit: group propagation vision transformer, gpvit)は、非階層的(非ピラミダル)トランスフォーマモデルである。高分解能機能(またはトークン)は、検出やセグメンテーションなどの細かな詳細を知覚するタスクに自然に適合するが、これらの機能間のグローバル情報交換は、自己依存のスケール方法のため、メモリと計算において高価である。グローバルな情報を交換するための,効率のよいグループ伝搬ブロック(GPブロック)を提供する。各GPブロックでは、まず一定数の学習可能なグループトークンで特徴をグループ化し、次にグループ間でグローバル情報を交換するグループプロパゲーションを行い、最後に、更新されたグループ化された特徴のグローバル情報を変換器デコーダを介して画像特徴に戻す。画像分類,セマンティックセグメンテーション,オブジェクト検出,インスタンスセグメンテーションなど,さまざまな視覚的タスクにおけるGPViTの評価を行った。例えば、我々のgpvit-l3 は ade20k の意味セマンティクスセグメンテーションにおいて、swin transformer-b を 2.0 miou で上回っており、パラメータは半分に過ぎません。プロジェクトページ:chenhongyiyang.com/projects/GPViT/GPViT

We present the Group Propagation Vision Transformer (GPViT): a novel nonhierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information. In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder. We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves significant performance gains over previous works across all tasks, especially on tasks that require highresolution outputs, for example, our GPViT-L3 outperforms Swin Transformer-B by 2.0 mIoU on ADE20K semantic segmentation with only half as many parameters. Project page: chenhongyiyang.com/projects/GPViT/GPViT

翻訳日:2023-04-26 23:52:32 公開日:2023-04-25

# ヒト精子追跡データセットVISEM-Tracking

VISEM-Tracking, a human spermatozoa tracking dataset ( http://arxiv.org/abs/2212.02842v3 )

ライセンス: Link先を確認

Vajira Thambawita, Steven A. Hicks, Andrea M. Stor{\aa}s, Thu Nguyen, Jorunn M. Andersen, Oliwia Witczak, Trine B. Haugen, Hugo L. Hammer, P{\aa}l Halvorsen, Michael A. Riegler

(参考訳) 精子運動を手動で評価するには顕微鏡観察が必要であり、視野の速い精子の観察が困難である。正確な結果を得るためには、手動による評価には広範な訓練が必要である。そのため、コンピュータ支援精子分析(CASA)はクリニックでの利用が増えている。それにもかかわらず、精子運動と運動学の評価の精度と信頼性を向上させるために、教師付き機械学習アプローチの訓練にはより多くのデータが必要である。そこで本研究では,濡れた精子の30秒間(29,196フレームを含む)のビデオ記録を手動で注釈付き拘束箱座標で記録するVISEM-Tracking(VISEM-Tracking)というデータセットと,その領域の専門家が分析した精子特性のセットを提供する。注釈付きデータに加えて,自己教師なし学習などの手法により,データへのアクセスと分析が容易なラベル付きビデオクリップを提供する。本稿では,VISEM-Trackingデータセットを用いて学習したYOLOv5ディープラーニング(DL)モデルを用いた精子検出性能について述べる。その結果、データセットは複雑なdlモデルの訓練と精子の分析に使用できることが示された。

A manual assessment of sperm motility requires microscopy observation, which is challenging due to the fast-moving spermatozoa in the field of view. To obtain correct results, manual evaluation requires extensive training. Therefore, computer-assisted sperm analysis (CASA) has become increasingly used in clinics. Despite this, more data is needed to train supervised machine learning approaches in order to improve accuracy and reliability in the assessment of sperm motility and kinematics. In this regard, we provide a dataset called VISEM-Tracking with 20 video recordings of 30 seconds (comprising 29,196 frames) of wet sperm preparations with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. In addition to the annotated data, we provide unlabeled video clips for easy-to-use access and analysis of the data via methods such as self- or unsupervised learning. As part of this paper, we present baseline sperm detection performances using the YOLOv5 deep learning (DL) model trained on the VISEM-Tracking dataset. As a result, we show that the dataset can be used to train complex DL models to analyze spermatozoa.

翻訳日:2023-04-26 23:52:06 公開日:2023-04-25

# 質問応答のための関係認識言語グラフ変換

Relation-Aware Language-Graph Transformer for Question Answering ( http://arxiv.org/abs/2212.00975v2 )

ライセンス: Link先を確認

Jinyoung Park, Hyeong Kyu Choi, Juyeon Ko, Hyeonjin Park, Ji-Hoon Kim, Jisu Jeong, Kyungmin Kim, Hyunwoo J. Kim

(参考訳) 質問回答(QA)は自然言語の文脈を推論するタスクであり、関連する多くの作業は、言語モデル(LM)をグラフニューラルネットワーク(GNN)で拡張し、知識グラフ(KG)情報をエンコードする。しかし、既存のGNNベースのQAモジュールの多くは、KGのリッチリレーショナル情報を活用せず、LMとKG間の限られた情報相互作用に依存している。これらの問題に対処するために,言語とグラフを統一的に関連づける質問応答変換器(QAT)を提案する。具体的には、QATはメタパストークンを構築し、多様な構造的および意味的関係に基づいて関係中心の埋め込みを学習する。そこで,我々のRelation-Aware Self-Attentionモジュールは,異なるモダリティの関係者間の情報交換を案内するクロスモーダル相対位置バイアスを通じて,様々なモダリティを包括的に統合する。我々は,CommonsenseQA や OpenBookQA などの常識質問応答データセットと医療質問応答データセット MedQA-USMLE に対するQAT の有効性を検証する。すべてのデータセットにおいて,本手法は最先端の性能を実現する。私たちのコードはhttp://github.com/mlvlab/qatで利用可能です。

Question Answering (QA) is a task that entails reasoning over natural language contexts, and many relevant works augment language models (LMs) with graph neural networks (GNNs) to encode the Knowledge Graph (KG) information. However, most existing GNN-based modules for QA do not take advantage of rich relational information of KGs and depend on limited information interaction between the LM and the KG. To address these issues, we propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations in a unified manner. Specifically, QAT constructs Meta-Path tokens, which learn relation-centric embeddings based on diverse structural and semantic relations. Then, our Relation-Aware Self-Attention module comprehensively integrates different modalities via the Cross-Modal Relative Position Bias, which guides information exchange between relevant entites of different modalities. We validate the effectiveness of QAT on commonsense question answering datasets like CommonsenseQA and OpenBookQA, and on a medical question answering dataset, MedQA-USMLE. On all the datasets, our method achieves state-of-the-art performance. Our code is available at http://github.com/mlvlab/QAT.

翻訳日:2023-04-26 23:51:47 公開日:2023-04-25

# 頂点間の相互作用をモデル化するグラフニューラルネットワークの能力について

On the Ability of Graph Neural Networks to Model Interactions Between Vertices ( http://arxiv.org/abs/2211.16494v4 )

ライセンス: Link先を確認

Noam Razin, Tom Verbin, Nadav Cohen

(参考訳) グラフニューラルネットワーク(GNN)は、グラフの頂点として表されるエンティティ間の複雑な相互作用をモデル化するために広く使われている。近年のGNNの表現力を理論的に分析する試みにもかかわらず、相互作用をモデル化する能力の形式的特徴は欠如している。現在の論文は、このギャップに対処することを目的としている。分離ランクと呼ばれる確立された尺度による相互作用の形式化強度は、与えられた頂点の部分集合とその補集合の間の相互作用をモデル化する特定のGNNの能力を定量化する。この結果から, 相互作用をモデル化する能力は, 分割の境界から得られるウォーク数によって定義されるグラフ理論特性であるウォーク指数によって決定されることがわかった。一般的なgnnアーキテクチャを用いた実験はこの発見を裏付ける。本理論の実用的応用として,入力エッジの除去時にGNNが相互作用をモデル化する能力を保持するWIS(Walk Index Sparsification)というエッジスペーシフィケーションアルゴリズムを設計する。 wisは単純で計算効率が良く,本実験では誘導予測の精度で代替手法を著しく上回っている。より広義には、モデリング可能な相互作用を理論的に分析することで、GNNを改善する可能性を示している。

Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between the sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic defined by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and in our experiments has markedly outperformed alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.

翻訳日:2023-04-26 23:50:25 公開日:2023-04-25

# プライバシ・イン・プラクティス:X線画像におけるプライベート新型コロナウイルス検出(拡張版)

Privacy in Practice: Private COVID-19 Detection in X-Ray Images (Extended Version) ( http://arxiv.org/abs/2211.11434v3 )

ライセンス: Link先を確認

Lucas Lange, Maja Schneider, Peter Christen, Erhard Rahm

(参考訳) 機械学習(ML)は、大量の画像の迅速なスクリーニングを可能にすることで、新型コロナウイルスなどのパンデミックに対抗するのに役立つ。患者のプライバシを維持しながらデータ分析を行うため,差分プライバシー(DP)を満たすMLモデルを作成する。新型コロナウイルス(COVID-19)のプライベートモデルを探索する以前の研究は、部分的には小さなデータセットに基づいており、より弱いか不明確なプライバシー保証を提供し、実用的なプライバシーを調査していない。これらのオープンギャップに対処するための改善を提案する。我々は、固有の階級不均衡を考慮し、ユーティリティとプライバシのトレードオフをより広範囲に、より厳格なプライバシー予算よりも評価する。我々の評価は、ブラックボックスメンバーシップ推論攻撃(MIA)による実践的プライバシを実証的に推定することで支持される。導入されたdpは,miasによる漏洩脅威の抑制に役立ち,この仮説をcovid-19分類タスクで最初に検証する実践的な分析を行う。以上の結果から,MIAの課題依存的実践的脅威によって,必要なプライバシーレベルが異なる可能性が示唆された。以上の結果から, DP保証の増加に伴い, 経験的プライバシー漏洩はわずかに改善し, DPがMIA防衛に限られた影響を及ぼす可能性が示唆された。本研究は, 実用プライバシトレードオフの改善の可能性を明らかにし, 実用プライバシのチューニングにおいて, 経験的攻撃特異的プライバシ推定が重要な役割を果たすと考えている。

Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML models that satisfy Differential Privacy (DP). Previous works exploring private COVID-19 models are in part based on small datasets, provide weaker or unclear privacy guarantees, and do not investigate practical privacy. We suggest improvements to address these open gaps. We account for inherent class imbalances and evaluate the utility-privacy trade-off more extensively and over stricter privacy budgets. Our evaluation is supported by empirically estimating practical privacy through black-box Membership Inference Attacks (MIAs). The introduced DP should help limit leakage threats posed by MIAs, and our practical analysis is the first to test this hypothesis on the COVID-19 classification task. Our results indicate that needed privacy levels might differ based on the task-dependent practical threat from MIAs. The results further suggest that with increasing DP guarantees, empirical privacy leakage only improves marginally, and DP therefore appears to have a limited impact on practical MIA defense. Our findings identify possibilities for better utility-privacy trade-offs, and we believe that empirical attack-specific privacy estimation can play a vital role in tuning for practical privacy.

翻訳日:2023-04-26 23:50:04 公開日:2023-04-25

# chatgptは優れたnlgエバブリエーターか? 予備的研究

Is ChatGPT a Good NLG Evaluator? A Preliminary Study ( http://arxiv.org/abs/2303.04048v2 )

ライセンス: Link先を確認

Jiaan Wang, Yunlong Liang, Fandong Meng, Zengkui Sun, Haoxiang Shi, Zhixu Li, Jinan Xu, Jianfeng Qu, Jie Zhou

(参考訳) 近年、ChatGPTの出現は、計算言語学コミュニティから広く注目を集めている。多くの先行研究により、ChatGPTは自動評価指標を用いて様々なNLPタスクにおいて顕著な性能を発揮することが示されている。しかし、ChatGPTが評価指標として機能する能力はまだ未定である。自然言語生成モデル(NLG)の質を評価することは困難な作業であり、NLGの指標は人間の判断と相関が低いことで悪名高いことから、ChatGPTは優れたNLG評価指標であるのだろうか。本稿では,その信頼性を NLG 測定値として示すため,ChatGPT の予備メタ評価を行う。より詳しくは、ChatGPTを人間評価器とみなし、タスク固有(例えば、要約)とアスペクト固有(例えば、関連)の指示を与えて、ChatGPTにNLGモデルの生成された結果を評価する。我々は5つのNLGメタ評価データセット(要約、ストーリー生成、データ・トゥ・テキストタスクを含む)について実験を行った。実験の結果,ChatGPTは従来の自動測定値と比較すると,ほとんどの場合,人間の判断と最先端あるいは競合的な相関が得られた。さらに,ChatGPT評価器の有効性は,メタ評価データセットの作成方法の影響を受けている可能性が示唆された。参照に大きく依存して生成されるメタ評価データセットに対して、ChatGPT評価器は効果を失う可能性がある。我々の予備研究は、汎用的な信頼性NLGメトリックの出現を促すことを願っている。

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

翻訳日:2023-04-26 23:44:10 公開日:2023-04-25

# 変分オートエンコーダを用いた安全監視型自律ロボットのラストミル配送における有効探索空間の学習

Using a Variational Autoencoder to Learn Valid Search Spaces of Safely Monitored Autonomous Robots for Last-Mile Delivery ( http://arxiv.org/abs/2303.03211v2 )

ライセンス: Link先を確認

Peter J. Bentley, Soo Ling Lim, Paolo Arcaini, Fuyuki Ishikawa

(参考訳) 顧客に商品を届けるための自律ロボットの利用は、信頼性と持続可能なサービスを提供するためのエキサイティングな新しい方法だ。しかし、現実の世界では、自律ロボットは安全のために人間の監督を必要とする。我々は、自律ロボットのタイミングを最適化して配達を最大化する現実的な問題に取り組み、安全に監視できるように、同時に走るロボットが多すぎることを保証する。我々は,最近のハイブリッド機械学習最適化手法であるCOIL (Constrained optimization in learn latent space) を用いて,この問題のバリエーションを探索するためのベースライン遺伝的アルゴリズムと比較した。また,COILの高速化と効率向上のための新しい手法についても検討した。テストされた全ての問題に対して,適切な数のロボットが同時に動作するような有効な解はCOILでのみ見つかることを示す。また,COILが遅延表現を学習した場合には,GAよりも10%高速に最適化できることが示され,毎日の配達要求をロボットに割り当てるロボットの再最適化において,同時に走るロボットの安全数を確保できる。

The use of autonomous robots for delivery of goods to customers is an exciting new way to provide a reliable and sustainable service. However, in the real world, autonomous robots still require human supervision for safety reasons. We tackle the realworld problem of optimizing autonomous robot timings to maximize deliveries, while ensuring that there are never too many robots running simultaneously so that they can be monitored safely. We assess the use of a recent hybrid machine-learningoptimization approach COIL (constrained optimization in learned latent space) and compare it with a baseline genetic algorithm for the purposes of exploring variations of this problem. We also investigate new methods for improving the speed and efficiency of COIL. We show that only COIL can find valid solutions where appropriate numbers of robots run simultaneously for all problem variations tested. We also show that when COIL has learned its latent representation, it can optimize 10% faster than the GA, making it a good choice for daily re-optimization of robots where delivery requests for each day are allocated to robots while maintaining safe numbers of robots running at once.

翻訳日:2023-04-26 23:43:45 公開日:2023-04-25

# 日立 at semeval-2023 task 3: explore cross-lingual multi-task strategies for genre and framing detection in online news

Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online News ( http://arxiv.org/abs/2303.01794v2 )

ライセンス: Link先を確認

Yuta Koreeda, Ken-ichi Yokote, Hiroaki Ozaki, Atsuki Yamaguchi, Masaya Tsunokake, Yasuhiro Sogawa

(参考訳) 本稿では、日立チームによるsemeval-2023タスク3「多言語環境におけるオンラインニュースにおけるジャンル、フレーミング、説得技術の検出」への参加について述べる。タスクのマルチリンガル・マルチタスク特性と低リソース設定に基づき,事前学習された言語モデルの訓練のための異なるクロスリンガル・マルチタスク戦略を検討した。広範な実験を通して、私たちは (a)クロスランガル/マルチタスク・トレーニング、及び b)外部バランスの取れたデータセットを収集し、ジャンルやフレーミング検出に役立てることができる。結果からアンサンブルモデルを構築し,イタリアおよびロシアのジャンル分類サブタスクにおけるマクロ平均F1スコアを達成した。

This paper explains the participation of team Hitachi to SemEval-2023 Task 3 "Detecting the genre, the framing, and the persuasion techniques in online news in a multi-lingual setup.'' Based on the multilingual, multi-task nature of the task and the low-resource setting, we investigated different cross-lingual and multi-task strategies for training the pretrained language models. Through extensive experiments, we found that (a) cross-lingual/multi-task training, and (b) collecting an external balanced dataset, can benefit the genre and framing detection. We constructed ensemble models from the results and achieved the highest macro-averaged F1 scores in Italian and Russian genre categorization subtasks.

翻訳日:2023-04-26 23:43:27 公開日:2023-04-25

# モーメントベース正定値部分多様体最適化の簡易化とディープラーニングへの応用

Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning ( http://arxiv.org/abs/2302.09738v3 )

ライセンス: Link先を確認

Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

(参考訳) 運動量を伴うリーマン部分多様体の最適化は、しばしば難しい微分方程式の解法や近似を必要とするため、計算的に困難である。我々は、アフィン不変距離を持つ構造化対称正定値行列のクラスに対するそのような最適化アルゴリズムを単純化する。我々は、計量を保存し、問題をユークリッド非制約問題に動的に自明化するリーマン正規座標の一般化版を提案する。提案手法は,構造化共分散の既存手法の説明と単純化に利用し,行列逆数のない大規模NNの学習に有効な2次最適化器を開発した。

Riemannian submanifold optimization with momentum is computationally challenging because ensuring iterates remain on the submanifold often requires solving or approximating difficult differential equations. We simplify such optimization algorithms for a class of structured symmetric positive-definite matrices with the affine invariant metric. We propose a generalized version of the Riemannian normal coordinates which preserves the metric and dynamically trivializes the problem into a Euclidean unconstrained problem. We use our approach to explain and simplify existing approaches for structured covariances and develop efficient second-order optimizers for training large-scale NNs without matrix inverses.

翻訳日:2023-04-26 23:43:14 公開日:2023-04-25

# ゲーミフィケーションはmHealthアプリケーションにおける自己申告の負担を軽減するか? スマートウォッチデータからの機械学習による認知負荷推定の実現可能性の検討

Can gamification reduce the burden of self-reporting in mHealth applications? A feasibility study using machine learning from smartwatch data to estimate cognitive load ( http://arxiv.org/abs/2302.03616v2 )

ライセンス: Link先を確認

Michal K. Grzeszczyk and Paulina Adamczyk and Sylwia Marek and Ryszard Pr\k{e}cikowski and Maciej Ku\'s and M. Patrycja Lelujko and Rosmary Blanco and Tomasz Trzci\'nski and Arkadiusz Sitek and Maciej Malawski and Aneta Lisowska

(参考訳) デジタル治療の有効性は、患者にアプリケーションを通じて自身の状態を自己報告するよう要求することで測定できるが、圧倒的であり、離脱を引き起こす可能性がある。我々は,ゲーミフィケーションが自己報告に与える影響を調査する。本研究のアプローチは,光胸腺造影(PPG)信号の解析を通じて認知負荷(CL)を評価するシステムの構築である。 11人の参加者のデータを機械学習モデルにトレーニングしてCLを検出する。その後、ゲーミフィケーションと従来の調査の2つのバージョンを作成します。調査終了後に他の参加者(13)が経験したclを推定した。 CL検出器の性能は,ストレス検出タスクの事前学習により向上できることがわかった。 13人中10人に対して、パーソナライズされたCL検出器は0.7以上のF1スコアを達成できる。 CLでは,ゲーミフィケーション版と非ゲーミフィケーション版の違いは認められなかったが,参加者はゲーミフィケーション版を好んだ。

The effectiveness of digital treatments can be measured by requiring patients to self-report their state through applications, however, it can be overwhelming and causes disengagement. We conduct a study to explore the impact of gamification on self-reporting. Our approach involves the creation of a system to assess cognitive load (CL) through the analysis of photoplethysmography (PPG) signals. The data from 11 participants is utilized to train a machine learning model to detect CL. Subsequently, we create two versions of surveys: a gamified and a traditional one. We estimate the CL experienced by other participants (13) while completing surveys. We find that CL detector performance can be enhanced via pre-training on stress detection tasks. For 10 out of 13 participants, a personalized CL detector can achieve an F1 score above 0.7. We find no difference between the gamified and non-gamified surveys in terms of CL but participants prefer the gamified version.

翻訳日:2023-04-26 23:43:02 公開日:2023-04-25

# ricci流下における学習離散化ニューラルネットワーク

Learning Discretized Neural Networks under Ricci Flow ( http://arxiv.org/abs/2302.03390v3 )

ライセンス: Link先を確認

Jun Chen, Hanwen Chen, Mengmeng Wang, Guang Dai, Ivor W. Tsang, Yong Liu

(参考訳) 本稿では,非微分的離散関数による無限勾配あるいはゼロ勾配に苦しむ低精度重みとアクティベーションからなる離散化ニューラルネットワーク(dnn)について検討する。この場合、ほとんどのトレーニングベースのDNNは、勾配w.r.t.離散値の近似に標準のSTE(Straight-Through Estimator)を使用する。しかし、STEは近似勾配の摂動により勾配ミスマッチの問題を引き起こす。この問題に対処するため、本論文ではこのミスマッチを双対性理論のレンズを通してリーマン多様体の計量摂動と見なすことができる。さらに,情報幾何学に基づいて,DNNに対する線形近傍ユークリッド多様体(LNE)を構築し,摂動に対処する。計量に偏微分方程式、すなわちリッチフローを導入することで、LNE計量の動的安定性と収束を$L^2$-norm摂動で証明する。収束率が分数である以前の摂動理論とは異なり、リッチフロー下の計量摂動はlne多様体において指数関数的に減衰することができる。各種データセットに対する実験結果から,本手法はDNNに対して,他の代表的なトレーニングベース手法よりも優れた,より安定した性能を示すことが示された。

In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function in the training process. In this case, most training-based DNNs employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the STE gives rise to the problem of gradient mismatch, due to the perturbations of the approximated gradient. To address this problem, this paper reveals that this mismatch can be viewed as a metric perturbation in a Riemannian manifold through the lens of duality theory. Further, on the basis of the information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs as a background to deal with perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. Unlike the previous perturbation theory whose convergence rate is the fractional powers, the metric perturbation under the Ricci flow can be exponentially decayed in the LNE manifold. The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.

翻訳日:2023-04-26 23:42:47 公開日:2023-04-25

# コンフォーマルコスト制御による高速オンライン価値最大化予測セット

Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control ( http://arxiv.org/abs/2302.00839v3 )

ライセンス: Link先を確認

Zhen Lin, Shubhendu Trivedi, Cao Xiao, Jimeng Sun

(参考訳) 実世界のマルチラベル予測問題の多くは、下流の使用によって引き起こされる特定の要件を満たさなければならない集合値予測を伴う。我々は、このような要件を個別に$\textit{value}$と$\textit{cost}$をエンコードし、互いに競合する典型的なシナリオに焦点を当てる。例えば、病院はスマート診断システムによって、重篤で、しばしば共死的な病気(その価値)をできるだけ多く捉え、誤った予測(コスト)を厳格にコントロールすることを期待しているかもしれない。このようなシナリオのコストを制御しながら、価値を最大化するために、FavMacと呼ばれる一般的なパイプラインを提案する。 FavMacは、ほとんどすべてのマルチラベル分類器と組み合わせて、コスト管理における分布のない理論的保証を提供する。さらに、従来の作業とは異なり、慎重に設計されたオンライン更新メカニズムを通じて、現実世界の大規模アプリケーションを扱うことができる。 FavMacは、厳格なコスト管理を維持しつつ、いくつかの変種やベースラインよりも高い価値を提供する。私たちのコードはhttps://github.com/zlin7/FavMacで利用可能です。

Many real-world multi-label prediction problems involve set-valued predictions that must satisfy specific requirements dictated by downstream usage. We focus on a typical scenario where such requirements, separately encoding $\textit{value}$ and $\textit{cost}$, compete with each other. For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios. FavMac can be combined with almost any multi-label classifier, affording distribution-free theoretical guarantees on cost control. Moreover, unlike prior works, it can handle real-world large-scale applications via a carefully designed online update mechanism, which is of independent interest. Our methodological and theoretical contributions are supported by experiments on several healthcare tasks and synthetic datasets - FavMac furnishes higher value compared with several variants and baselines while maintaining strict cost control. Our code is available at https://github.com/zlin7/FavMac

翻訳日:2023-04-26 23:42:25 公開日:2023-04-25

# 仮説の最適選択は最も弱く、最短ではない

The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest ( http://arxiv.org/abs/2301.12987v3 )

ライセンス: Link先を確認

Michael Timothy Bennett

(参考訳) もし$A$と$B$が$A \subset B$であるような集合であれば、一般化は$B$を構成するのに十分な仮説の$A$からの推論として理解することができる。 A$から任意の数の仮説を推測できるが、それらのいくつかだけが$B$に一般化できる。どちらが一般化しそうなのか、どうしてわかるのか? 一つの戦略は最も短いものを選び、情報を圧縮する能力と一般化する能力(知能の代理人)を同等にすることである。我々は、エンアクティブ認知の数学的形式論の文脈でこれを調べる。圧縮は性能を最大化するのに必要でも十分でもないことを示す(仮説の一般化の確率の観点から測る)。弱点と呼ばれる長さや単純さに関係のないプロキシを定式化する。タスクが一様に分散している場合、少なくともすべてのタスクにおいて弱点を最大化しながら、少なくとも1つで厳密に実行するプロキシの選択肢がないことを示す。 2進算術の文脈における最大弱さと最小記述長を比較する実験では、前者は後者の1.1ドルから5ドルの間で一般化した。これは弱点がはるかに優れたプロキシであることを示し、DeepmindのApperception Engineが効果的に一般化できる理由を説明する。

If $A$ and $B$ are sets such that $A \subset B$, generalisation may be understood as the inference from $A$ of a hypothesis sufficient to construct $B$. One might infer any number of hypotheses from $A$, yet only some of those may generalise to $B$. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a proxy for intelligence). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between $1.1$ and $5$ times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.

翻訳日:2023-04-26 23:42:05 公開日:2023-04-25

# 分子結晶構造サンプリングのための剛体流

Rigid body flows for sampling molecular crystal structures ( http://arxiv.org/abs/2301.11355v2 )

ライセンス: Link先を確認

Jonas K\"ohler, Michele Invernizzi, Pim de Haan, Frank No\'e

(参考訳) 正規化フロー(NF)は、高い柔軟性と表現力を持つ複雑な分布をモデル化する能力によって近年人気を集めている強力な生成モデルである。本研究では,結晶中の分子などの3次元空間における複数の物体の位置と向きをモデル化するために調整された新しい正規化フローを導入する。第一に、単位四元数の群上の滑らかで表現的な流れを定義し、剛体の連続的な回転運動を捉えること、第二に、単位四元数の二重被覆性を用いて回転群の適切な密度を定義することである。これにより,本モデルは,熱力学的対象密度に対する標準確率法や変分推論を用いてトレーニングすることができる。 TIP4P-Ew水モデルでは,外部磁場における四面体系の多モード密度と氷XI相の2つの分子例に対してボルツマン発生器を訓練して評価した。我々の流れは分子の内部自由度に作用する流れと組み合わせることができ、多くの相互作用する分子の分布のモデリングへの重要なステップとなる。

Normalizing flows (NF) are a class of powerful generative models that have gained popularity in recent years due to their ability to model complex distributions with high flexibility and expressiveness. In this work, we introduce a new type of normalizing flow that is tailored for modeling positions and orientations of multiple objects in three-dimensional space, such as molecules in a crystal. Our approach is based on two key ideas: first, we define smooth and expressive flows on the group of unit quaternions, which allows us to capture the continuous rotational motion of rigid bodies; second, we use the double cover property of unit quaternions to define a proper density on the rotation group. This ensures that our model can be trained using standard likelihood-based methods or variational inference with respect to a thermodynamic target density. We evaluate the method by training Boltzmann generators for two molecular examples, namely the multi-modal density of a tetrahedral system in an external field and the ice XI phase in the TIP4P-Ew water model. Our flows can be combined with flows operating on the internal degrees of freedom of molecules, and constitute an important step towards the modeling of distributions of many interacting molecules.

翻訳日:2023-04-26 23:41:43 公開日:2023-04-25

# rangevit:自動運転における3次元意味セグメンテーションのための視覚トランスフォーマ

RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving ( http://arxiv.org/abs/2301.10222v2 )

ライセンス: Link先を確認

Angelika Ando, Spyros Gidaris, Andrei Bursuc, Gilles Puy, Alexandre Boulch, Renaud Marlet

(参考訳) 外部LiDAR点雲のキャスティングセマンティックセマンティックセグメンテーションは、例えばレンジプロジェクションによる2次元問題として、効果的で一般的なアプローチである。これらのプロジェクションベースの手法は、通常は高速計算の恩恵を受け、他のポイントクラウド表現を使用する技術と組み合わせると、最先端の結果が得られる。今日、投影ベースの手法は2d cnnを利用するが、コンピュータビジョンの最近の進歩により、視覚トランスフォーマー(vits)は多くの画像ベースのベンチマークで最先端の結果を得た。本研究では,3次元セマンティックセグメンテーションのプロジェクションに基づく手法が,ViTの最近の改良の恩恵を受けるかどうかを問う。私たちは正に答えるが、それらと3つの主要な材料を組み合わせることでのみ答える。 (a)ViTはトレーニングが難しいことで知られており、強力な表現を学ぶために多くのトレーニングデータが必要です。 RGBイメージと同じバックボーンアーキテクチャを保存することで、ポイントクラウドよりもはるかに安価でアノテート可能な大規模なイメージコレクションの長いトレーニングから知識を活用できます。大規模な画像データセット上で、トレーニング済みのViTで最高の結果を得る。 b) 古典的な線形埋込み層に対して, 適合した畳み込み茎を置換することにより, ViTsの誘導バイアスの欠如を補う。 c)畳み込みデコーダと畳み込みステムからのスキップ接続により,畳み込みステムの低レベルだが細粒度の特徴とvitエンコーダの高レベルだが粗い予測を組み合わせることにより,画素単位での予測を洗練する。これらの材料を用いて,本手法はRangeViTと呼ばれ,nuScenes や SemanticKITTI の既存のプロジェクションベース手法よりも優れていることを示す。コードはhttps://github.com/valeoai/rangevitで入手できる。

Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other point cloud representations, achieve state-of-the-art results. Today, projection-based methods leverage 2D CNNs but recent advances in computer vision show that vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks. In this work, we question if projection-based methods for 3D semantic segmentation can benefit from these latest improvements on ViTs. We answer positively but only after combining them with three key ingredients: (a) ViTs are notoriously hard to train and require a lot of training data to learn powerful representations. By preserving the same backbone architecture as for RGB images, we can exploit the knowledge from long training on large image collections that are much cheaper to acquire and annotate than point clouds. We reach our best results with pre-trained ViTs on large image datasets. (b) We compensate ViTs' lack of inductive bias by substituting a tailored convolutional stem for the classical linear embedding layer. (c) We refine pixel-wise predictions with a convolutional decoder and a skip connection from the convolutional stem to combine low-level but fine-grained features of the the convolutional stem with the high-level but coarse predictions of the ViT encoder. With these ingredients, we show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and SemanticKITTI. The code is available at https://github.com/valeoai/rangevit.

翻訳日:2023-04-26 23:41:25 公開日:2023-04-25

# チャットボットにおける生成的安全性を目指して

Learn What NOT to Learn: Towards Generative Safety in Chatbots ( http://arxiv.org/abs/2304.11220v2 )

ライセンス: Link先を確認

Leila Khalatbari, Yejin Bang, Dan Su, Willy Chung, Saeed Ghadimi, Hossein Sameti, Pascale Fung

(参考訳) 生成的かつオープンドメインな会話モデルは、Webベースのソーシャルデータで訓練されているため、特に安全でないコンテンツを生成する可能性がある。この問題を軽減する以前のアプローチには、会話の流れを乱す、有害な入力コンテキストを認識できないような一般化を制限する、安全性のために対話の品質を犠牲にするといった欠点がある。本稿では,正と負の両方のトレーニング信号から学習することで一般化を促進するために,対照的な損失を生かした「LOT(Learn NOT to)」という新しいフレームワークを提案する。本手法は,従来学習されてきた安全で安全でない言語分布から,正負の信号を自動的に得るという点で,標準のコントラスト学習フレームワークと異なる。 LOTフレームワークは、会話の流れを保ちながら、安全でない部分空間から安全な部分空間へ世代を誘導するために分岐を利用する。提案手法は, 復号時の記憶効率と時間効率が向上し, 関与性と流動性を維持しつつ毒性を効果的に低減する。実験の結果,LOTは基準モデルに比べて4倍から6倍のエンゲージネスとフラエンシを達成し,毒性を最大4倍に低下させることがわかった。我々の発見は人間の評価によってさらに裏付けられている。

Conversational models that are generative and open-domain are particularly susceptible to generating unsafe content since they are trained on web-based social data. Prior approaches to mitigating this issue have drawbacks, such as disrupting the flow of conversation, limited generalization to unseen toxic input contexts, and sacrificing the quality of the dialogue for the sake of safety. In this paper, we present a novel framework, named "LOT" (Learn NOT to), that employs a contrastive loss to enhance generalization by learning from both positive and negative training signals. Our approach differs from the standard contrastive learning framework in that it automatically obtains positive and negative signals from the safe and unsafe language distributions that have been learned beforehand. The LOT framework utilizes divergence to steer the generations away from the unsafe subspace and towards the safe subspace while sustaining the flow of conversation. Our approach is memory and time-efficient during decoding and effectively reduces toxicity while preserving engagingness and fluency. Empirical results indicate that LOT reduces toxicity by up to four-fold while achieving four to six-fold higher rates of engagingness and fluency compared to baseline models. Our findings are further corroborated by human evaluation.

翻訳日:2023-04-26 23:36:04 公開日:2023-04-25

# グラニュラ・ボール・コンピューティング : 効率的で堅牢で解釈可能な適応型多粒度表現と計算法

Granular-ball computing: an efficient, robust, and interpretable adaptive multi-granularity representation and computation method ( http://arxiv.org/abs/2304.11171v2 )

ライセンス: Link先を確認

Shuyin Xia, Guoyin Wang, Xinbo Gao

(参考訳) 人間の認知には「大規模ファースト」認知機構があり、適応的な多粒性記述能力を有する。これにより、効率、堅牢性、解釈可能性などの計算特性が得られる。既存の人工知能学習手法の多くは、特定の多粒度特徴を持つが、「大規模ファースト」認知機構と完全に一致していない。マルチグラニュラー性粒球計算は近年開発された重要なモデル手法である。この方法は、異なる大きさの粒状球を用いてサンプル空間を適応的に表現し、粒状球に基づいて学習することができる。粒度が粗い「粒度」の数はサンプル点数より小さいため、粒度計算はより効率的であり、粒度が粗い粒度の特徴は細かい試料点の影響を受けにくく、より堅牢になり、粒度の多粒度構造はトポロジカルな構造と粗い粒度記述を生成でき、自然な解釈性を提供する。グラニュラ・ボール・コンピューティングは人工知能の様々な分野に効果的に拡張され、グラニュラ・ボール分類器、グラニュラ・ボール・クラスタリング法、グラニュラ・ボール・ニューラルネットワーク、グラニュラ・ボール・ラフ・セット、グラニュラ・ボールの進化計算などの理論的手法を開発し、効率、ノイズの堅牢性、既存手法の解釈可能性を大幅に向上させた。優れたイノベーション、実用性、そして開発の可能性を持っている。本稿では,これらの手法を体系的に紹介し,グラニュラーボールコンピューティングが現在直面している主な問題を解析し,グラニュラーボールコンピューティングの主要なシナリオについて論じるとともに,将来の研究者がこの理論を改善するための参照と提案を提供する。

Human cognition has a ``large-scale first'' cognitive mechanism, therefore possesses adaptive multi-granularity description capabilities. This results in computational characteristics such as efficiency, robustness, and interpretability. Although most existing artificial intelligence learning methods have certain multi-granularity features, they do not fully align with the ``large-scale first'' cognitive mechanism. Multi-granularity granular-ball computing is an important model method developed in recent years. This method can use granular-balls of different sizes to adaptively represent and cover the sample space, and perform learning based on granular-balls. Since the number of coarse-grained "granular-ball" is smaller than the number of sample points, granular-ball computing is more efficient; the coarse-grained characteristics of granular-balls are less likely to be affected by fine-grained sample points, making them more robust; the multi-granularity structure of granular-balls can produce topological structures and coarse-grained descriptions, providing natural interpretability. Granular-ball computing has now been effectively extended to various fields of artificial intelligence, developing theoretical methods such as granular-ball classifiers, granular-ball clustering methods, granular-ball neural networks, granular-ball rough sets, and granular-ball evolutionary computation, significantly improving the efficiency, noise robustness, and interpretability of existing methods. It has good innovation, practicality, and development potential. This article provides a systematic introduction to these methods and analyzes the main problems currently faced by granular-ball computing, discussing both the primary applicable scenarios for granular-ball computing and offering references and suggestions for future researchers to improve this theory.

翻訳日:2023-04-26 23:35:45 公開日:2023-04-25

# 量子輸送における多体コヒーレンス

Many-Body Coherence in Quantum Transport ( http://arxiv.org/abs/2304.11151v2 )

ライセンス: Link先を確認

Ching-Chi Hang, Liang-Yan Hsu

(参考訳) 本研究では,多体系における電子輸送を制御するために,量子コヒーレンスを利用する概念を提案する。ハバード作用素に基づくオープン量子システム手法を組み合わせることで,多体コヒーレンスが有名なクーロン階段を取り除き,強い負の差動抵抗を引き起こすことを示した。この機構を解明するため、ゼロ電子-フォノンカップリング限界における電流-コヒーレンス関係を解析的に導出する。さらに,ゲートフィールドを組み込むことで,コヒーレンス制御トランジスタ構築の可能性を示す。この開発は、多体コヒーレンスに基づく量子電子デバイス探索のための新しい方向を開く。

In this study, we propose the concept of harnessing quantum coherence to control electron transport in a many-body system. Combining an open quantum system technique based on Hubbard operators, we show that many-body coherence can eliminate the well-known Coulomb staircase and cause strong negative differential resistance. To explore the mechanism, we analytically derive the current-coherence relationship in the zero electron-phonon coupling limit. Furthermore, by incorporating a gate field, we demonstrate the possibility of constructing a coherence-controlled transistor. This development opens up a new direction for exploring quantum electronic devices based on many-body coherence.

翻訳日:2023-04-26 23:34:35 公開日:2023-04-25

# 欠落データに基づく交通信号制御のための強化学習手法

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data ( http://arxiv.org/abs/2304.10722v2 )

ライセンス: Link先を確認

Hao Mei, Junxian Li, Bin Shi, Hua Wei

(参考訳) 信号制御タスクにおける強化学習(RL)手法の出現は,従来のルールベース手法よりも優れた性能を実現している。ほとんどのRLアプローチでは、エージェントが長期的な報酬に最適なアクションを決定するために環境を観察する必要がある。しかし、現実の都市では、センサの欠如により交通状態の観察が欠如することがあるため、既存のRL法を道路網に適用できず、観測が欠如している。本研究では,道路網の交差点の一部にセンサを装着せず,その周辺を直接観測することなく,実環境において交通信号を制御することを目的とする。我々の知る限りでは、実世界の交通信号制御問題に対処するためにRL法を最初に利用した人物である。具体的には,第1に適応制御を実現するために交通状態をインプットし,第2に適応制御とRLエージェントのトレーニングを可能にするために,状態と報酬の両方をインプットする。本手法は,合成と実世界の道路網トラフィックの両方について広範な実験を行い,従来の手法よりも優れており,異なる欠落率で一貫した性能を示す。また,データの欠落がモデルの性能に与える影響についてもさらなる調査を行う。

The emergence of reinforcement learning (RL) methods in traffic signal control tasks has achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.

翻訳日:2023-04-26 23:34:23 公開日:2023-04-25

# 大規模機械学習におけるアダム不安定性の理論

A Theory on Adam Instability in Large-Scale Machine Learning ( http://arxiv.org/abs/2304.09871v2 )

ライセンス: Link先を確認

Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

(参考訳) 本稿では,大規模言語モデルの訓練において,これまで説明されていなかった発散行動の理論について述べる。我々は、この現象はadamと呼ばれるトレーニングに使用される支配的最適化アルゴリズムの成果物であると主張する。我々は、adam がパラメータ更新ベクトルが比較的大きなノルムを持ち、トレーニング損失のランドスケープにおける降下方向と本質的に無関係である状態に入ることを観測し、分岐を引き起こす。このアーティファクトは、大規模な言語モデルトレーニングの典型的な設定である大きなバッチサイズを持つディープモデルのトレーニングにおいて、より観察される可能性が高い。この理論を議論するために、我々は70億、300億、65億、および546億の異なるスケールの言語モデルのトレーニング実行から観察する。

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent on the training loss landscape, leading to divergence. This artifact is more likely to be observed in the training of a deep model with a large batch size, which is the typical setting of large-scale language model training. To argue the theory, we present observations from the training runs of the language models of different scales: 7 billion, 30 billion, 65 billion, and 546 billion parameters.

翻訳日:2023-04-26 23:33:47 公開日:2023-04-25

# 中国のオープンインストラクションジェネラリスト:予備リリース

Chinese Open Instruction Generalist: A Preliminary Release ( http://arxiv.org/abs/2304.07987v4 )

ライセンス: Link先を確認

Ge Zhang, Yemin Shi, Ruibo Liu, Ruibin Yuan, Yizhi Li, Siwei Dong, Yu Shu, Zhaoqun Li, Zekun Wang, Chenghua Lin, Wenhao Huang, Jie Fu

(参考訳) InstructGPT~\citep{ouyang2022training} と ChatGPT\footnote{\url{https://chat.openai.com/}} のリリースで研究者や一般の注目を集めている。英語指向の大規模言語モデル (LLM) は目覚ましい進歩を遂げているが, 英語をベースとしたLLMが, 英語タスクと多言語タスクでよく似た機能を発揮するか, チューニングに必要なコーパスを構築するかは, いまだ未定である。このギャップを解消するために,4つのサブタスクの特徴に適応した様々な手法による中国語命令データセット作成の試みとして提案する。我々は、品質を保証するために手作業でチェックされた約200万の中国語命令チューニングサンプルを収集した。また、既存の英語と中国語の命令コーパスを要約し、新たに構築された中国語の命令コーパスの潜在的な応用を簡潔に述べる。得られた \textbf{C}hinese \textbf{O}pen \textbf{I}nstruction \textbf{G}eneralist (\textbf{COIG}) corpora は Huggingface\footnote{\url{https://huggingface.co/datasets/BAAI/COIG}} と Github\footnote{\url{https://github.com/BAAI-Zlab/COIG}} で利用可能で、継続的に更新される。

Instruction tuning is widely recognized as a key technique for building generalist language models, which has attracted the attention of researchers and the public with the release of InstructGPT~\citep{ouyang2022training} and ChatGPT\footnote{\url{https://chat.openai.com/}}. Despite impressive progress in English-oriented large-scale language models (LLMs), it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning. To remedy this gap, we propose the project as an attempt to create a Chinese instruction dataset by various methods adapted to the intrinsic characteristics of 4 sub-tasks. We collect around 200k Chinese instruction tuning samples, which have been manually checked to guarantee high quality. We also summarize the existing English and Chinese instruction corpora and briefly describe some potential applications of the newly constructed Chinese instruction corpora. The resulting \textbf{C}hinese \textbf{O}pen \textbf{I}nstruction \textbf{G}eneralist (\textbf{COIG}) corpora are available in Huggingface\footnote{\url{https://huggingface.co/datasets/BAAI/COIG}} and Github\footnote{\url{https://github.com/BAAI-Zlab/COIG}}, and will be continuously updated.

翻訳日:2023-04-26 23:33:09 公開日:2023-04-25

# 大規模言語モデルに関する調査

A Survey of Large Language Models ( http://arxiv.org/abs/2303.18223v7 )

ライセンス: Link先を確認

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie and Ji-Rong Wen

(参考訳) 言語は基本的に、文法規則によって支配される人間の表現の複雑な複雑な体系である。言語を理解・把握するための有能なaiアルゴリズムを開発することは大きな課題となる。主要なアプローチとして、言語モデリングは過去20年間、言語理解と生成のために広く研究され、統計的言語モデルから神経言語モデルへと進化してきた。近年,大規模コーパス上でのトランスフォーマモデルによる事前学習言語モデル (plms) が提案されている。モデルスケーリングがパフォーマンス改善につながることを研究者は発見しているので、モデルサイズをさらに大きくすることで、スケーリング効果をさらに研究している。興味深いことに、パラメータスケールが一定のレベルを超えると、これらの拡張言語モデルは大幅な性能向上を達成するだけでなく、小規模な言語モデルには存在しない特別な能力を示す。パラメータスケールの違いを識別するために、研究コミュニティは、大きなサイズのplmに対して、大言語モデル(llm)という用語を生み出した。近年、LLMの研究は学術と産業の両方で大きく進歩しており、ChatGPTの立ち上げが目覚ましい進歩であり、社会から広く注目を集めている。 LLMの技術的な進化は、AIアルゴリズムの開発と使用方法に革命をもたらすような、AIコミュニティ全体に重要な影響を与えています。本稿では, LLMの最近の進歩について, 背景, 重要な発見, 主流技術を紹介して概観する。特に,事前トレーニング,適応チューニング,利用,キャパシティ評価という,llmの主な4つの側面に注目した。さらに,llm開発のための利用可能なリソースを要約するとともに,今後の課題についても論じる。

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

翻訳日:2023-04-26 23:32:25 公開日:2023-04-25

# カラムローアンタングル型画素合成による高効率スケール不変発電機

Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis ( http://arxiv.org/abs/2303.14157v3 )

ライセンス: Link先を確認

Thuan Hoang Nguyen, Thanh Van Le, Anh Tran

(参考訳) 任意のスケールの画像合成は、任意のスケールで写真リアルな画像を合成する、効率的でスケーラブルなソリューションを提供する。しかし、既存のGANベースのソリューションは畳み込みと階層アーキテクチャに過度に依存するため、出力解像度をスケールする際、一貫性と$``$texture sticking$"$問題が発生する。別の観点では、inrベースのジェネレータは設計によってスケール等価であるが、その巨大なメモリフットプリントと遅い推論は、大規模またはリアルタイムシステムでこれらのネットワークを採用することを妨げている。本研究では,空間的畳み込みや粗雑な設計を使わずに,効率的かつスケール等価な新しい生成モデルである$\textbf{c}$olumn-$\textbf{r}$ow$\textbf{e}$ntangled$\textbf{p}$ixel$\textbf{s}$ynthesis (\textbf{creps}$)を提案する。メモリフットプリントを節約し、システムをスケーラブルにするために、レイヤ毎の機能マップを$`$thick$"$カラムと行エンコーディングに分割する、新しい双方向表現を採用しました。 FFHQ、LSUN-Church、MetFaces、Flickr-Sceneryといったさまざまなデータセットの実験では、CREPSが適切なトレーニングと推論速度で任意の解像度でスケール一貫性とエイリアスのない画像を合成する能力を確認している。コードはhttps://github.com/VinAIResearch/CREPS.comから入手できる。

Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.

翻訳日:2023-04-26 23:31:56 公開日:2023-04-25

# プライバシーラベルの概要とプライバシーポリシーとの互換性

The Overview of Privacy Labels and their Compatibility with Privacy Policies ( http://arxiv.org/abs/2303.08213v2 )

ライセンス: Link先を確認

Rishabh Khandelwal, Asmit Nayak, Paul Chung and Kassem Fawaz

(参考訳) プライバシー栄養ラベルは、長く読みにくいプライバシーポリシーを読むことなく、アプリの重要なデータプラクティスを理解する方法を提供する。最近、ios(apple)とandroid(google)のアプリ配布プラットフォームは、アプリ開発者にデータ収集、データ共有、セキュリティプラクティスなどのプライバシプラクティスを強調するプライバシー栄養ラベルを満たさなければならないという義務を課している。これらのプライバシラベルには、各データタイプに関連するデータタイプや目的など、アプリのデータプラクティスに関する非常に詳細な情報が含まれている。これにより、アプリケーションのデータプラクティスを大規模に理解するための、ユニークなヴァンテージポイントが得られます。

Privacy nutrition labels provide a way to understand an app's key data practices without reading the long and hard-to-read privacy policies. Recently, the app distribution platforms for iOS(Apple) and Android(Google) have implemented mandates requiring app developers to fill privacy nutrition labels highlighting their privacy practices such as data collection, data sharing, and security practices. These privacy labels contain very fine-grained information about the apps' data practices such as the data types and purposes associated with each data type. This provides us with a unique vantage point from which we can understand apps' data practices at scale.

翻訳日:2023-04-26 23:31:18 公開日:2023-04-25

# 融合型グラフ状態生成のグラフ理論的最適化

Graph-theoretical optimization of fusion-based graph state generation ( http://arxiv.org/abs/2304.11988v2 )

ライセンス: Link先を確認

Seok-Hyung Lee and Hyunseok Jeong

(参考訳) グラフ状態は、測定ベースの量子コンピューティングや量子リピータなど、様々な量子情報処理タスクのための汎用的なリソースである。タイプII融合ゲートは、小さなグラフ状態を組み合わせることで全光学的なグラフ状態の生成を可能にするが、その非決定論的性質は大きなグラフ状態の効率的な生成を妨げる。本稿では,Python パッケージ OptGraphState とともに,任意のグラフ状態の融合ベースの生成を効果的に最適化するグラフ理論戦略を提案する。我々の戦略は、対象のグラフ状態を単純化し、融合ネットワークを構築し、融合の順序を決定する3つの段階からなる。提案手法を用いることで,ランダムグラフとよく知られたグラフの資源オーバーヘッドを評価する。われわれの戦略とソフトウェアは、フォトニックグラフ状態を用いた実験可能なスキームの開発と評価を支援することを期待している。

Graph states are versatile resources for various quantum information processing tasks, including measurement-based quantum computing and quantum repeaters. Although the type-II fusion gate enables all-optical generation of graph states by combining small graph states, its non-deterministic nature hinders the efficient generation of large graph states. In this work, we present a graph-theoretical strategy to effectively optimize fusion-based generation of any given graph state, along with a Python package OptGraphState. Our strategy comprises three stages: simplifying the target graph state, building a fusion network, and determining the order of fusions. Utilizing this proposed method, we evaluate the resource overheads of random graphs and various well-known graphs. We expect that our strategy and software will assist researchers in developing and assessing experimentally viable schemes that use photonic graph states.

翻訳日:2023-04-26 23:25:50 公開日:2023-04-25

# 説明可能なAIにおける異文化倫理の実践に向けて

Towards a Praxis for Intercultural Ethics in Explainable AI ( http://arxiv.org/abs/2304.11861v2 )

ライセンス: Link先を確認

Chinasa T. Okolo

(参考訳) 説明可能なAI(XAI)は、機械学習モデルがどのように機能し、予測を生成するかを理解するのに役立つというアイデアで、しばしば推奨される。それでも、これらのメリットのほとんどは、マシンラーニング開発者など、専門的なドメイン知識を持つ人たちに限られています。最近の研究は、AIを説明可能なものにすることは、特にグローバル・サウスの低リソース領域において、現実の文脈でAIをより便利にするための実行可能な方法である、と論じている。 AIは国境を越えたが、限られた作業は説明可能なAIの概念を「大国」に民主化することに集中しており、文化的、社会的に異なる領域のユーザーのニーズを満たす新しいアプローチを探求し開発する余地が残っている。本稿では,文化間倫理アプローチの概念について紹介する。文化的ニュアンスがテクノロジの採用と利用にどのように影響するか、aiのような技術的概念がいかに説明されるかを妨げる要因、そしてxaiの開発における文化間倫理アプローチの統合がユーザ理解を改善し、これらの手法の効率的な利用を促進するかを検討する。

Explainable AI (XAI) is often promoted with the idea of helping users understand how machine learning models function and produce predictions. Still, most of these benefits are reserved for those with specialized domain knowledge, such as machine learning developers. Recent research has argued that making AI explainable can be a viable way of making AI more useful in real-world contexts, especially within low-resource domains in the Global South. While AI has transcended borders, a limited amount of work focuses on democratizing the concept of explainable AI to the "majority world", leaving much room to explore and develop new approaches within this space that cater to the distinct needs of users within culturally and socially-diverse regions. This article introduces the concept of an intercultural ethics approach to AI explainability. It examines how cultural nuances impact the adoption and use of technology, the factors that impede how technical concepts such as AI are explained, and how integrating an intercultural ethics approach in the development of XAI can improve user understanding and facilitate efficient usage of these methods.

翻訳日:2023-04-26 23:25:37 公開日:2023-04-25

# Gen-NeRF:アルゴリズム・ハードウエア共同設計による効率的で一般化可能なニューラルラジアンス場

Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design ( http://arxiv.org/abs/2304.11842v2 )

ライセンス: Link先を確認

Yonggan Fu, Zhifan Ye, Jiayi Yuan, Shunyao Zhang, Sixu Li, Haoran You, Yingyan Lin

(参考訳) 新しいビュー合成は、様々な拡張現実および仮想現実(AR/VR)アプリケーションにおいて没入型体験を可能にするために不可欠な機能であり、そのクロスシーンの一般化能力により、一般化可能なニューラルレイディアンス場(NeRF)が人気を博している。それらの約束にもかかわらず、一般化可能なNeRFの実際のデバイス展開は、シーン機能を取得するために大量のメモリアクセスを必要とするため、その禁止的な複雑さによってボトルネックになり、レイマーチングプロセスはメモリバウンドになる。この目的のために,提案するGen-NeRFは,リアルタイムに一般化可能なNeRFを初めて実現可能な,一般化可能なNeRFアクセラレーション専用のアルゴリズムハードウェアの共同設計フレームワークである。アルゴリズム側では、gen-nerfは3dシーンの異なる領域がレンダリングされたピクセルに異なる貢献をするという事実を利用して、粗く効果的なサンプリング戦略を統合する。ハードウェア面では、Gen-NeRFは、そのエピポーラ幾何学的関係を利用して、異なる光線間でのデータ再利用機会を最大化するアクセラレーターマイクロアーキテクチャを強調している。さらに、Gen-NeRFアクセラレータは、ポイント・ツー・ハードウエアマッピング時のデータの局所性を向上するカスタマイズされたデータフローと、メモリバンク競合を最小限に抑える最適化されたシーン特徴記憶戦略を備えている。提案するGen-NeRFフレームワークがリアルタイムかつ一般化可能な新規ビュー合成に有効であることを示す。

Novel view synthesis is an essential functionality for enabling immersive experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for which generalizable Neural Radiance Fields (NeRFs) have gained increasing popularity thanks to their cross-scene generalization capability. Despite their promise, the real-device deployment of generalizable NeRFs is bottlenecked by their prohibitive complexity due to the required massive memory accesses to acquire scene features, causing their ray marching process to be memory-bounded. To this end, we propose Gen-NeRF, an algorithm-hardware co-design framework dedicated to generalizable NeRF acceleration, which for the first time enables real-time generalizable NeRFs. On the algorithm side, Gen-NeRF integrates a coarse-then-focus sampling strategy, leveraging the fact that different regions of a 3D scene contribute differently to the rendered pixel, to enable sparse yet effective sampling. On the hardware side, Gen-NeRF highlights an accelerator micro-architecture to maximize the data reuse opportunities among different rays by making use of their epipolar geometric relationship. Furthermore, our Gen-NeRF accelerator features a customized dataflow to enhance data locality during point-to-hardware mapping and an optimized scene feature storage strategy to minimize memory bank conflicts. Extensive experiments validate the effectiveness of our proposed Gen-NeRF framework in enabling real-time and generalizable novel view synthesis.

翻訳日:2023-04-26 23:25:18 公開日:2023-04-25

# 階層拡散オートエンコーダと異方性画像操作

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation ( http://arxiv.org/abs/2304.11829v2 )

ライセンス: Link先を確認

Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu

(参考訳) 拡散モデルは画像合成のための印象的な視覚品質を達成している。しかし、拡散モデルの潜在空間を解釈し、操作する方法は広く研究されていない。以前の作業の拡散オートエンコーダは、セマンティック表現をセマンティックな潜在コードにエンコードする。これらの制限を緩和するために,拡散モデルの潜在空間に対して,細粒度と低レベルの特徴階層を利用する階層型拡散オートエンコーダ(HDAE)を提案する。 HDAEの階層的潜在空間は本質的に異なる抽象的な意味論のレベルを符号化し、より包括的な意味表現を提供する。さらに,不連続画像操作のための切断特徴に基づくアプローチを提案する。提案手法の有効性を,画像再構成,スタイル混合,制御可能な補間,ディテール保存・アンタングル画像操作,マルチモーダル・セマンティック画像合成に応用して検証した。

Diffusion models have attained impressive visual quality for image synthesis. However, how to interpret and manipulate the latent space of diffusion models has not been extensively explored. Prior work diffusion autoencoders encode the semantic representations into a semantic latent code, which fails to reflect the rich information of details and the intrinsic feature hierarchy. To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models. The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations. In addition, we propose a truncated-feature-based approach for disentangled image manipulation. We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable interpolation, detail-preserving and disentangled image manipulation, and multi-modal semantic image synthesis.

翻訳日:2023-04-26 23:24:51 公開日:2023-04-25

# 部分閉塞に対するロバストなアプローチは

Now You See Me: Robust approach to Partial Occlusions ( http://arxiv.org/abs/2304.11779v2 )

ライセンス: Link先を確認

Karthick Prasad Gunasekaran, Nikita Jaiman

(参考訳) オブジェクトの排除はコンピュータビジョンにおいて不可欠である問題の1つである。畳み込みニューラルネットワークス(CNN)は、正規画像分類のための様々な手法を提供するが、部分閉塞画像の分類には効果がないことが証明されている。部分閉塞(partial occlusion)は、オブジェクトが他のオブジェクト/スペースによって部分的に閉塞されるシナリオである。この問題が解決されると、さまざまなシナリオを促進する大きな可能性を秘めます。特に私たちは、自動運転のシナリオとその影響に関心を持っています。自動運転車の研究は、この10年でもっともホットな話題の1つであり、運転標識や人や物体を異なる角度で隠蔽する状況が数多くある。犯罪の処理、様々なグループの所得水準の予測など、交通データのビデオ分析にさらに拡張できる状況において、その重要さを考えると、多くの面で活用される可能性がある。本稿では,Stanford Car Datasetを応用し,さまざまなサイズと性質のオクルージョンを付加することで,私たち独自の合成データセットを導入する。作成したデータセットでは,VGG-19,ResNet 50/101,GoogleNet,DenseNet 121などのアートCNNモデルのさまざまな状態を用いて総合解析を行った。さらに,これらをスクラッチから微調整し,データセットにトレーニングすることにより,これらのモデルの性能に及ぼす咬合比率と性質の変化の影響を深く研究し,異なるシナリオでトレーニングした場合,すなわち,オクルード画像と未オクルード画像を用いたトレーニング時のパフォーマンスが,部分的オクルージョンに対してより頑健なものになるかについて検討した。

Occlusions of objects is one of the indispensable problems in Computer vision. While Convolutional Neural Net-works (CNNs) provide various state of the art approaches for regular image classification, they however, prove to be not as effective for the classification of images with partial occlusions. Partial occlusion is scenario where an object is occluded partially by some other object/space. This problem when solved,holds tremendous potential to facilitate various scenarios. We in particular are interested in autonomous driving scenario and its implications in the same. Autonomous vehicle research is one of the hot topics of this decade, there are ample situations of partial occlusions of a driving sign or a person or other objects at different angles. Considering its prime importance in situations which can be further extended to video analytics of traffic data to handle crimes, anticipate income levels of various groups etc.,this holds the potential to be exploited in many ways. In this paper, we introduce our own synthetically created dataset by utilising Stanford Car Dataset and adding occlusions of various sizes and nature to it. On this created dataset, we conducted a comprehensive analysis using various state of the art CNN models such as VGG-19, ResNet 50/101, GoogleNet, DenseNet 121. We further in depth study the effect of varying occlusion proportions and nature on the performance of these models by fine tuning and training these from scratch on dataset and how is it likely to perform when trained in different scenarios, i.e., performance when training with occluded images and unoccluded images, which model is more robust to partial occlusions and soon.

翻訳日:2023-04-26 23:24:32 公開日:2023-04-25

# NAIST-SIC対応英語・日本語同時翻訳コーパス

NAIST-SIC-Aligned: Automatically-Aligned English-Japanese Simultaneous Interpretation Corpus ( http://arxiv.org/abs/2304.11766v2 )

ライセンス: Link先を確認

Jinming Zhao, Yuka Ko, Kosuke Doi, Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

(参考訳) 同時解釈(si)データが同時機械翻訳(simt)にどのように影響するかは疑問である。大規模なトレーニングコーパスがないため、研究は限られている。本稿では,自動アライメントされた英日siデータセットであるnaist-sic-alignedを導入することで,このギャップを埋めることを目的とする。非整合コーパスNAIST-SIC から,コーパスを並列化してモデルトレーニングに適した2段階アライメント手法を提案する。第1段階は、ソース文とターゲット文の多対多マッピングを行う粗いアライメントであり、第2段階は、アライメントペアの品質を向上させるために、イントラ・インター・センテンスフィルタリングを行う細粒度のアライメントである。コーパスの品質を確保するため、各ステップは定量的または質的に検証されている。これは文献における最初のオープンソースの大規模並列SIデータセットである。評価目的の小さなテストセットも手作業でキュレートしました。 SIコーパスの構築とSiMTの研究が進むことを願っている。データは \url{https://github.com/mingzi151/ahc-si} にある。

It remains a question that how simultaneous interpretation (SI) data affects simultaneous machine translation (SiMT). Research has been limited due to the lack of a large-scale training corpus. In this work, we aim to fill in the gap by introducing NAIST-SIC-Aligned, which is an automatically-aligned parallel English-Japanese SI dataset. Starting with a non-aligned corpus NAIST-SIC, we propose a two-stage alignment approach to make the corpus parallel and thus suitable for model training. The first stage is coarse alignment where we perform a many-to-many mapping between source and target sentences, and the second stage is fine-grained alignment where we perform intra- and inter-sentence filtering to improve the quality of aligned pairs. To ensure the quality of the corpus, each step has been validated either quantitatively or qualitatively. This is the first open-sourced large-scale parallel SI dataset in the literature. We also manually curated a small test set for evaluation purposes. We hope our work advances research on SI corpora construction and SiMT. Please find our data at \url{https://github.com/mingzi151/AHC-SI}.

翻訳日:2023-04-26 23:24:04 公開日:2023-04-25

# 倫理的・哲学的原理による信頼できる医療人工知能の確立

Ensuring Trustworthy Medical Artificial Intelligencethrough Ethical and Philosophical Principles ( http://arxiv.org/abs/2304.11530v2 )

ライセンス: Link先を確認

Debesh Jha, Ashish Rauniyar, Abhiskek Srivastava, Desta Haileselassie Hagos, Nikhil Kumar Tomar, Vanshali Sharma, Elif Keles, Zheyuan Zhang, Ugur Demir, Ahmet Topcu, Anis Yazidi, Jan Erik H{\aa}akeg{\aa}rd, and Ulas Bagci

(参考訳) 人工知能(AI)手法は、医療専門家や患者の経験を高めることで、多くの医療に革命をもたらす可能性がある。 aiベースのコンピュータ支援診断ツールは、臨床専門家のレベルに匹敵する能力や性能を発揮できれば、非常に有益である。その結果、先進的な医療サービスは発展途上国では手頃な価格で提供でき、専門医の欠如の問題にも対処できる。 AIベースのツールは、患者の治療の時間、リソース、全体的なコストを節約できる。さらに、人間とは対照的に、AIは大量の入力からデータの複雑な関係を明らかにし、医学における新たなエビデンスベースの知識へと導くことができる。しかし、医療におけるAIの統合は、バイアス、透明性、自律性、責任、説明責任など、いくつかの倫理的および哲学的な懸念を提起する。本稿では、AIを用いた医療画像分析の最近の進歩、既存の標準、および臨床現場におけるAIの応用のための倫理的問題やベストプラクティスを理解することの重要性を強調する。我々は、AIの技術的および倫理的課題と、病院や公共機関にAIを配置することの意味について取り上げる。また、倫理的課題、データ不足、人種的バイアス、透明性の欠如、アルゴリズム的バイアスに対処するための重要な手段と手法についても論じる。最後に、私たちは、医療アプリケーションにおけるAIに関連する倫理的課題に対処するための推奨事項と今後の方向性を提供し、このワークフローをより効率的に、正確で、アクセス可能で、透明で、世界中の患者に信頼できるものにするために、AIを臨床環境にデプロイすることを目的としています。

Artificial intelligence (AI) methods have great potential to revolutionize numerous medical care by enhancing the experience of medical experts and patients. AI based computer-assisted diagnosis tools can have a tremendous benefit if they can outperform or perform similarly to the level of a clinical expert. As a result, advanced healthcare services can be affordable in developing nations, and the problem of a lack of expert medical practitioners can be addressed. AI based tools can save time, resources, and overall cost for patient treatment. Furthermore, in contrast to humans, AI can uncover complex relations in the data from a large set of inputs and even lead to new evidence-based knowledge in medicine. However, integrating AI in healthcare raises several ethical and philosophical concerns, such as bias, transparency, autonomy, responsibility and accountability, which must be addressed before integrating such tools into clinical settings. In this article, we emphasize recent advances in AI-assisted medical image analysis, existing standards, and the significance of comprehending ethical issues and best practices for the applications of AI in clinical settings. We cover the technical and ethical challenges of AI and the implications of deploying AI in hospitals and public organizations. We also discuss promising key measures and techniques to address the ethical challenges, data scarcity, racial bias, lack of transparency, and algorithmic bias. Finally, we provide our recommendation and future directions for addressing the ethical challenges associated with AI in healthcare applications, with the goal of deploying AI into the clinical settings to make the workflow more efficient, accurate, accessible, transparent, and reliable for the patient worldwide.

翻訳日:2023-04-26 23:23:46 公開日:2023-04-25

# プロンプティングによる大規模言語モデルの性能向上

Boosting Theory-of-Mind Performance in Large Language Models via Prompting ( http://arxiv.org/abs/2304.11490v2 )

ライセンス: Link先を確認

Shima Rahimi Moghaddam, Christopher J. Honey

(参考訳) 大規模言語モデル(llm)は2023年に多くのタスクで優れているが、複雑な推論では依然として課題に直面している。エージェントの信念、目標、精神状態を理解することを必要とする理論・オブ・ミンド(ToM)タスクは、人間を含む常識的推論に不可欠であり、この分野におけるLLMのパフォーマンスを高めることが不可欠である。本研究では, GPT-4 と 3 つの GPT-3.5 変種 (Davinci-2, Davinci-3, GPT-3.5-Turbo) のTOM 性能を測定し, テキスト内学習の有効性を検討した。思考推論の2ショット連鎖とステップバイステップ思考指示を特徴とするプロンプトを評価した。人間のフィードバックからの強化学習(RLHF)で訓練したLSM(Davinci-2を除く全てのモデル)は、文脈内学習によりToMの精度を向上させた。 GPT-4はゼロショットで最高の性能を示し、80%の精度に達したが、それでもテストセットの87%の精度には届かなかった。しかし、インコンテキスト学習のプロンプトを供給された場合、全てのRLHF学習LLMは80%ToMの精度を達成し、GPT-4は100%に達した。これらの結果は、適切なプロンプトがLLM ToM推論を促進することを示し、LLM認知能力の文脈依存性を強調している。

Large language models (LLMs) excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents' beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but still fell short of the 87% human accuracy on the test set. However, when supplied with prompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToM accuracy, with GPT-4 reaching 100%. These results demonstrate that appropriate prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacities.

翻訳日:2023-04-26 23:23:17 公開日:2023-04-25

# 積分近似の改良による拡散型サンプリングプロセスの高速化について

On Accelerating Diffusion-Based Sampling Process via Improved Integration Approximation ( http://arxiv.org/abs/2304.11328v2 )

ライセンス: Link先を確認

Guoqiang Zhang, Niwa Kenta, W. Bastiaan Kleijn

(参考訳) 1つの一般的な拡散に基づくサンプリング戦略は、逆常微分方程式(ODE)を効果的に解こうとするものである。得られたODEソルバの係数は、ODE定式化、逆離散時間ステップ、および使用されるODE法により予め決定される。本稿では,改良された積分近似(IIA)により,特定の係数を最適化することにより,人気のあるODEベースのサンプリングプロセスの高速化を検討する。各逆時間ステップにおいて、選択された係数に対して平均二乗誤差(MSE)関数を最小化する。 MSEは、元のODEソルバを一連の微細な時間ステップに適用し、原理的には次の拡散隠れ状態を予測するためのより正確な積分近似を与える。事前学習された拡散モデルが与えられた場合、特定の数の神経機能評価(nfes)のためのiaaの手順は、サンプルのバッチで1回だけ行う必要がある。選択された係数に対する最小MSE (MMSE) による最適解は、後に復元され再利用され、サンプリングプロセスが高速化される。 EDMおよびDDIMの広範囲にわたる実験により、IIA法はNFEの数が小さい場合に顕著な性能向上をもたらすことが示された。

One popular diffusion-based sampling strategy attempts to solve the reverse ordinary differential equations (ODEs) effectively. The coefficients of the obtained ODE solvers are pre-determined by the ODE formulation, the reverse discrete timesteps, and the employed ODE methods. In this paper, we consider accelerating several popular ODE-based sampling processes by optimizing certain coefficients via improved integration approximation (IIA). At each reverse timestep, we propose to minimize a mean squared error (MSE) function with respect to certain selected coefficients. The MSE is constructed by applying the original ODE solver for a set of fine-grained timesteps which in principle provides a more accurate integration approximation in predicting the next diffusion hidden state. Given a pre-trained diffusion model, the procedure for IIA for a particular number of neural functional evaluations (NFEs) only needs to be conducted once over a batch of samples. The obtained optimal solutions for those selected coefficients via minimum MSE (MMSE) can be restored and reused later on to accelerate the sampling process. Extensive experiments on EDM and DDIM show the IIA technique leads to significant performance gain when the numbers of NFEs are small.

翻訳日:2023-04-26 23:22:51 公開日:2023-04-25

# UBC-DLNLP at SemEval-2023 Task 12:Transfer Learning がアフリカ感情分析に及ぼす影響

UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis ( http://arxiv.org/abs/2304.11256v2 )

ライセンス: Link先を確認

Gagan Bhatia, Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

(参考訳) 我々は2023afrisenti-semeval共有タスクへの我々の貢献について述べ、そこでは14の異なるアフリカの言語における感情分析のタスクに取り組む。完全教師付き設定(サブタスクAとB)の下で単言語モデルと多言語モデルの両方を開発する。また、ゼロショット設定(サブタスクC)のモデルも開発する。私たちのアプローチでは、6つの言語モデルを使って転送学習を実験します。開発データではf1-scoreが70.36、テストデータではf1-scoreが66.13である。当然のことながら、複数の言語にわたる感情分析のための伝達学習と微調整技術の有効性を示した。我々のアプローチは、異なる言語やドメインにおける他の感情分析タスクに適用できる。

We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared task, where we tackle the task of sentiment analysis in 14 different African languages. We develop both monolingual and multilingual models under a full supervised setting (subtasks A and B). We also develop models for the zero-shot setting (subtask C). Our approach involves experimenting with transfer learning using six language models, including further pertaining of some of these models as well as a final finetuning stage. Our best performing models achieve an F1-score of 70.36 on development data and an F1-score of 66.13 on test data. Unsurprisingly, our results demonstrate the effectiveness of transfer learning and fine-tuning techniques for sentiment analysis across multiple languages. Our approach can be applied to other sentiment analysis tasks in different languages and domains.

翻訳日:2023-04-26 23:22:29 公開日:2023-04-25

# 3次元物体検出のための完全スパース融合

Fully Sparse Fusion for 3D Object Detection ( http://arxiv.org/abs/2304.12310v2 )

ライセンス: Link先を確認

Yingyan Li, Lue Fan, Yang Liu, Zehao Huang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang and Tieniu Tan

(参考訳) 現在一般的なマルチモーダル3d検出手法は、通常高密度バードズ・アイビュー(bev)特徴マップを使用するlidarベースの検出器上に構築されている。しかし、このようなBEV特徴マップのコストは検出範囲に2次的であるため、長距離検出には適さない。完全にスパースなアーキテクチャは、長距離知覚において非常に効率的であるため注目されている。本稿では,新たに出現するフルスパースアーキテクチャにおいて,画像のモダリティを効果的に活用する方法を検討する。特にインスタンスクエリを利用することで,十分に研究された2dインスタンスセグメンテーションをlidar側に統合し,完全なスパース検出器内の3dインスタンスセグメンテーション部分と並列化する。この設計は,完全スパース特性を維持しつつ,2次元と3次元の両面に均一なクエリベースの融合フレームワークを実現する。広範な実験では、広く使われているnuscenesデータセットとlong-range argoverse 2データセットの最先端の結果が示されている。特に、長距離LiDAR認識設定における提案手法の推論速度は、他の最先端マルチモーダル3D検出方法よりも2.7$\times$である。コードは \url{https://github.com/BraveGroup/FullySparseFusion} でリリースされる。

Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not suitable for long-range detection. Fully sparse architecture is gaining attention as they are highly efficient in long-range perception. In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture. Particularly, utilizing instance queries, our framework integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the fully sparse detector. This design achieves a uniform query-based fusion framework in both the 2D and 3D sides while maintaining the fully sparse characteristic. Extensive experiments showcase state-of-the-art results on the widely used nuScenes dataset and the long-range Argoverse 2 dataset. Notably, the inference speed of the proposed method under the long-range LiDAR perception setting is 2.7 $\times$ faster than that of other state-of-the-art multimodal 3D detection methods. Code will be released at \url{https://github.com/BraveGroup/FullySparseFusion}.

翻訳日:2023-04-26 23:14:01 公開日:2023-04-25

# 3次元偏波空間モードを持つ高次元量子鍵分布の逆設計による可変ベクトルビームデコーダ

Tunable vector beam decoder by inverse design for high-dimensional quantum key distribution with 3D polarized spatial modes ( http://arxiv.org/abs/2304.12296v2 )

ライセンス: Link先を確認

Eileen Otte (1), Alexander D. White (2), Nicholas A. G\"usken (1), Jelena Vu\v{c}kovi\'c (2), Mark L. Brongersma (1) ((1) Geballe Laboratory for Advance Materials, Stanford University, Stanford, CA, USA, (2) E. L. Ginzton Laboratory, Stanford University, Stanford, CA, USA)

(参考訳) 光の空間モードは次元を増やすために非常に魅力的になり、量子鍵分布(QKD)におけるセキュリティと情報容量が増大している。これまでは横電界成分のみが検討されてきたが、縦偏光成分は無視されている。本稿では,qkdにおける電界振動の3つの空間次元を,波長可変なオン・ア・チップベクトルビームデコーダ(vbd)を実装して包含する手法を提案する。この逆設計装置は、高次元(HD)QKDに対する3次元偏光非偏光基底状態の「準備」と「測定」を開拓し、多機能オンチップフォトニクスプラットフォームにおける空間モードとHD QKDの統合の道を開く。

Spatial modes of light have become highly attractive to increase the dimension and, thereby, security and information capacity in quantum key distribution (QKD). So far, only transverse electric field components have been considered, while longitudinal polarization components have remained neglected. Here, we present an approach to include all three spatial dimensions of electric field oscillation in QKD by implementing our tunable, on-a-chip vector beam decoder (VBD). This inversely designed device pioneers the "preparation" and "measurement" of three-dimensionally polarized mutually unbiased basis states for high-dimensional (HD) QKD and paves the way for the integration of HD QKD with spatial modes in multifunctional on-a-chip photonics platforms.

翻訳日:2023-04-26 23:13:41 公開日:2023-04-25

# usa-net: ロボットメモリのための統一意味表現とアフォーアンス表現

USA-Net: Unified Semantic and Affordance Representations for Robot Memory ( http://arxiv.org/abs/2304.12164v2 )

ライセンス: Link先を確認

Benjamin Bolte, Austin Wang, Jimmy Yang, Mustafa Mukadam, Mrinal Kalakrishnan, Chris Paxton

(参考訳) ロボットが「シンクの上に茶色のキャビネットを開く」といったオープンエンドの指示に従うためには、シーンの幾何学と環境の意味の両方を理解する必要がある。ロボットシステムは、しばしばこれらを別々のパイプラインを通して処理し、しばしば非常に異なる表現空間を使用する。本研究では,シーンのセマンティクスと空間的余裕の両方を識別可能な地図にエンコードする世界表現を構築するための簡易な方法であるUSA-Netを提案する。これにより、オープンエンド語彙を用いて指定されたシーンの場所をナビゲートできる勾配ベースのプランナーを構築することができる。私たちは、このプランナーを使って、勾配情報を利用していないグリッドベースのプランナーのパスよりも、CLIP埋め込みスペースのゴールクエリよりも10-30%短い5-10%短いトラジェクトリを生成します。私たちの知る限り、これは1つの暗黙のマップで意味論と余裕の両方を最適化する最初のエンドツーエンドの微分可能なプランナーです。コードとビジュアルは、私たちのウェブサイトで利用可能です。

In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present USA-Net, a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website: https://usa.bolte.cc/

翻訳日:2023-04-26 23:13:23 公開日:2023-04-25

# 重要ノードのブリッジネス同定によるスキップグラムに基づくノード埋め込みのポストホック説明の生成

Generating Post-hoc Explanations for Skip-gram-based Node Embeddings by Identifying Important Nodes with Bridgeness ( http://arxiv.org/abs/2304.12036v2 )

ライセンス: Link先を確認

Hogun Park and Jennifer Neville

(参考訳) ネットワーク内のノード表現学習は、ネットワーク固有の特性と構造を保持しながら、連続ベクトル空間内の関係情報を符号化する重要な機械学習技術である。近年,Skip-gramモデルからDeepWalk,LINE,struc2vec,PTE,UserItem2vec,RWJBGなどの教師なしノード埋め込み手法が登場し,既存のリレーショナルモデルよりもノード分類やリンク予測などの下流タスクで性能が向上している。しかし, 埋込法や理論研究が欠如していることから, 埋込法に関するポストホックな説明は難しい問題である。本稿では,Skip-gramをベースとした埋め込みのグローバルな説明は,スペクトルクラスタを意識した局所摂動下でのブリッジネスの計算によって得られることを示す。さらに, 学習グラフ埋め込みベクトルに関するトップq大域的説明をより効率的に行うために, graph-wgd と呼ぶ新しい勾配に基づく説明法を提案する。実験により, Graph-wGD を用いたスコアによるノードのランク付けは, 真のブリッジネススコアと高い相関性を示した。また, Graph-wGD が選択したトップqノードレベルの説明は,5つの実世界のグラフを用いて,近年の代替案で選択されたノードと比較して,より重要度が高く,乱れ時にクラスラベルの予測値が大きく変化する。

Node representation learning in a network is an important machine learning technique for encoding relational information in a continuous vector space while preserving the inherent properties and structures of the network. Recently, unsupervised node embedding methods such as DeepWalk, LINE, struc2vec, PTE, UserItem2vec, and RWJBG have emerged from the Skip-gram model and perform better performance in several downstream tasks such as node classification and link prediction than the existing relational models. However, providing post-hoc explanations of Skip-gram-based embeddings remains a challenging problem because of the lack of explanation methods and theoretical studies applicable for embeddings. In this paper, we first show that global explanations to the Skip-gram-based embeddings can be found by computing bridgeness under a spectral cluster-aware local perturbation. Moreover, a novel gradient-based explanation method, which we call GRAPH-wGD, is proposed that allows the top-q global explanations about learned graph embedding vectors more efficiently. Experiments show that the ranking of nodes by scores using GRAPH-wGD is highly correlated with true bridgeness scores. We also observe that the top-q node-level explanations selected by GRAPH-wGD have higher importance scores and produce more changes in class label prediction when perturbed, compared with the nodes selected by recent alternatives, using five real-world graphs.

翻訳日:2023-04-26 23:13:06 公開日:2023-04-25

# 2次元 $\pm J$ Ising モデルの非平衡臨界ダイナミクス

Nonequilibrium critical dynamics of the bi-dimensional $\pm J$ Ising model ( http://arxiv.org/abs/2304.11997v2 )

ライセンス: Link先を確認

Ramgopal Agrawal, Leticia F. Cugliandolo, and Marco Picco

(参考訳) $\pm J$ Ising モデルは単純なフラストレーションのスピンモデルであり、交換結合は独立に確率$p$の離散値 $-J$ と確率$-p$の $+J$ を取る。量子誤り訂正符号との接続により特に魅力的である。本稿では,二次元$\pm j$ isingモデルの非平衡臨界挙動を,初期条件の異なる点から常磁性強磁性(pf)遷移線上の臨界点$t_c(p)$へのクエンチ後の非平衡臨界挙動,特に,多臨界西森点(np)以下について検討する。動的臨界指数 $z_c$ は、NP の反発的固定点による漸近前特徴として同定される NP の上下のクエンチの非普遍的挙動を示すようである。一方、NPに直接クエンチすると、このダイナミクスは、z_c \simeq 6.02(6)$で漸近状態に達する。また、臨界ダイナミクス中に(スピンサインのように)幾何学的なスピンクラスターを考える。 PFライン上の各普遍性クラスは、対応するパラメータ $\kappa$ を持つ確率ローナー進化(SLE)によって特徴付けられる。さらに, パラ磁性相からの臨界クエンチに対しては, フラストレーションによらず, 大規模スケールにおいて創発的な臨界パーコレーショントポロジーを示す。

The $\pm J$ Ising model is a simple frustrated spin model, where the exchange couplings independently take the discrete value $-J$ with probability $p$ and $+J$ with probability $1-p$. It is especially appealing due to its connection to quantum error correcting codes. Here, we investigate the nonequilibrium critical behavior of the bi-dimensional $\pm J$ Ising model, after a quench from different initial conditions to a critical point $T_c(p)$ on the paramagnetic-ferromagnetic (PF) transition line, especially, above, below and at the multicritical Nishimori point (NP). The dynamical critical exponent $z_c$ seems to exhibit non-universal behavior for quenches above and below the NP, which is identified as a pre-asymptotic feature due to the repulsive fixed point at the NP. Whereas, for a quench directly to the NP, the dynamics reaches the asymptotic regime with $z_c \simeq 6.02(6)$. We also consider the geometrical spin clusters (of like spin signs) during the critical dynamics. Each universality class on the PF line is uniquely characterized by the stochastic Loewner evolution (SLE) with corresponding parameter $\kappa$. Moreover, for the critical quenches from the paramagnetic phase, the model, irrespective of the frustration, exhibits an emergent critical percolation topology at the large length scales.

翻訳日:2023-04-26 23:12:37 公開日:2023-04-25

# MMC:テキスト記述を用いた画像のマルチモーダルカラー化

MMC: Multi-Modal Colorization of Images using Textual Descriptions ( http://arxiv.org/abs/2304.11993v2 )

ライセンス: Link先を確認

Subhankar Ghosh, Saumik Bhattacharya, Prasun Roy, Umapada Pal, and Michael Blumenstein

(参考訳) 異なる色でさまざまなオブジェクトを扱うことは、画像のカラー化技術にとって大きな課題である。したがって、複雑な現実世界のシーンでは、既存のカラー化アルゴリズムは色の一貫性を保たないことが多い。本研究では,カラー化されるグレースケール画像とともに,補助条件としてテキスト記述を統合することにより,カラー化プロセスの忠実性を向上させる。そこで我々は,2つの入力(grayscale imageと各エンコードされたテキスト記述)を取り込んで,関連する色成分の予測を試みるディープネットワークを提案する。また、画像内の各オブジェクトを予測し、それぞれの記述で色付けし、それぞれの属性を色化プロセスに組み込む。その後、融合モデルがすべての画像オブジェクト(セグメント)を融合して最終的な色付け画像を生成する。各テキスト記述には画像に存在するオブジェクトの色情報が含まれているため、テキストエンコーディングは予測された色の全体的な品質を改善するのに役立つ。提案手法は,LPIPS,PSNR,SSIMの指標を用いて,既存のカラー化手法よりも優れた性能を示す。

Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To do so, we have proposed a deep network that takes two inputs (grayscale image and the respective encoded text description) and tries to predict the relevant color components. Also, we have predicted each object in the image and have colorized them with their individual description to incorporate their specific attributes in the colorization process. After that, a fusion model fuses all the image objects (segments) to generate the final colorized image. As the respective textual descriptions contain color information of the objects present in the image, text encoding helps to improve the overall quality of predicted colors. In terms of performance, the proposed method outperforms existing colorization techniques in terms of LPIPS, PSNR and SSIM metrics.

翻訳日:2023-04-26 23:12:12 公開日:2023-04-25

# クディット・クリフォード階層におけるW状態回路のスケーリング

Scaling W state circuits in the qudit Clifford hierarchy ( http://arxiv.org/abs/2304.12504v1 )

ライセンス: Link先を確認

Lia Yeh

(参考訳) 我々は$\sqrt[d]{Z}$ gate と呼ばれる新しいqudit gateを識別する。これはクリフォード階層の $d^{\text{th}}$ における任意の奇数素数次元 $d$ に対する qutrit $t$ ゲートの別の一般化である。このゲートはフォールトトレラントに実現可能であり、ある予想が成立するならば、qudit $\{ |0\rangle , |1\rangle \}$ subspace においてclifford+$\sqrt[d]{z}$ gate set, $d$-qubit $w$ states を決定論的に構成する。立方体の場合、決定論的かつフォールトトレラントな構成は、qubit $W$ サイズ3、T$ カウント3、6、パワー3に対して与えられる。さらに、これらの構成を適用して、$W$状態サイズを任意のサイズに再帰的にスケールし、$O(N)$ gate countと$O(\text{log }N)$ depthにします。これは任意のサイズ qubit $W$ state に対してより決定論的であり、任意の素数 $d$-dimensional qudit $W$ state に対して、サイズは$d$である。これらの目的のために、任意の素数のクディット次元における |0\rangle $- controlled pauli $x$ ゲートと制御された hadamard ゲートの構成を考案する。これらの分解はクリフォード+$T$ for $d > 3$で正確な合成が知られていないが、独立な興味を持つ。

We identify a novel qudit gate which we call the $\sqrt[d]{Z}$ gate. This is an alternate generalization of the qutrit $T$ gate to any odd prime dimension $d$, in the $d^{\text{th}}$ level of the Clifford hierarchy. Using this gate which is efficiently realizable fault-tolerantly should a certain conjecture hold, we deterministically construct in the Clifford+$\sqrt[d]{Z}$ gate set, $d$-qubit $W$ states in the qudit $\{ |0\rangle , |1\rangle \}$ subspace. For qutrits, this gives deterministic and fault-tolerant constructions for the qubit $W$ state of sizes three with $T$ count 3, six, and powers of three. Furthermore, we adapt these constructions to recursively scale the $W$ state size to arbitrary size $N$, in $O(N)$ gate count and $O(\text{log }N)$ depth. This is moreover deterministic for any size qubit $W$ state, and for any prime $d$-dimensional qudit $W$ state, size a power of $d$. For these purposes, we devise constructions of the $ |0\rangle $-controlled Pauli $X$ gate and the controlled Hadamard gate in any prime qudit dimension. These decompositions, for which exact synthesis is unknown in Clifford+$T$ for $d > 3$, may be of independent interest.

翻訳日:2023-04-26 22:29:14 公開日:2023-04-25

# CNN支援ステガノグラフィー-確立されたステガノグラフィー技術による機械学習の統合

CNN-Assisted Steganography -- Integrating Machine Learning with Established Steganographic Techniques ( http://arxiv.org/abs/2304.12503v1 )

ライセンス: Link先を確認

Andrew Havard, Theodore Manikas, Eric C. Larson, Mitchell A. Thornton

(参考訳) ステグアナリシスによってステゴメディアの発見にレジリエンスを増すことによりステガノグラフィを改善する方法を提案する。本手法は,steganographic assistant convolutional neural network (sa-cnn) の導入により,steganographic approachのクラスを強化する。従来の研究では、ステゴイメージングに適用されるステガナライザーとしてトレーニングされたニューラルネットワークを使用して、ステゴイメージ内に隠された情報の存在を発見することに成功した。以上の結果から, ステガナリザーは, ステゴイメージ発生時にSA-CNNを併用した場合, 効果が低いことが明らかとなった。我々はまた、連続的な空間ではなく、より小さく離散的な空間内でsa-cnnの可能な全てのアウトプットを表現する利点とデメリットを探求する。我々のSA-CNNは、情報を埋め込むカバーメディアの特性に基づいて、ある種のパラメトリックステガノグラフィーアルゴリズムをカスタマイズすることを可能にする。したがって、sa-cnnは、カバーメディアの特定のインスタンスごとにコアステガノグラフィーアルゴリズムを特に構成できるという意味で適応的である。 S-UNIWARD を用いたSA-CNN の使用, 使用の有無の両面での実験結果が得られた。次に、SA-CNNと非対応のステガナライザーであるYedroudj-Netに対して、両方のステガナライザーを合成し、その結果を比較した。ニューラルネットワークと手作りアルゴリズムの統合に対するこのアプローチは、ステガノグラフアルゴリズムの信頼性と適応性を増大させると考えている。

We propose a method to improve steganography by increasing the resilience of stego-media to discovery through steganalysis. Our approach enhances a class of steganographic approaches through the inclusion of a steganographic assistant convolutional neural network (SA-CNN). Previous research showed success in discovering the presence of hidden information within stego-images using trained neural networks as steganalyzers that are applied to stego-images. Our results show that such steganalyzers are less effective when SA-CNN is employed during the generation of a stego-image. We also explore the advantages and disadvantages of representing all the possible outputs of our SA-CNN within a smaller, discrete space, rather than a continuous space. Our SA-CNN enables certain classes of parametric steganographic algorithms to be customized based on characteristics of the cover media in which information is to be embedded. Thus, SA-CNN is adaptive in the sense that it enables the core steganographic algorithm to be especially configured for each particular instance of cover media. Experimental results are provided that employ a recent steganographic technique, S-UNIWARD, both with and without the use of SA-CNN. We then apply both sets of stego-images, those produced with and without SA-CNN, to an exmaple steganalyzer, Yedroudj-Net, and we compare the results. We believe that this approach for the integration of neural networks with hand-crafted algorithms increases the reliability and adaptability of steganographic algorithms.

翻訳日:2023-04-26 22:28:40 公開日:2023-04-25

# デジタル双生児のための因果意味コミュニケーション : 一般化可能な模倣学習アプローチ

Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach ( http://arxiv.org/abs/2304.12502v1 )

ライセンス: Link先を確認

Christo Kurisummoottil Thomas, Walid Saad, Yong Xiao

(参考訳) デジタルツイン(dt)は、コミュニケーション(例えば6g)、コンピューティング(例えばエッジコンピューティング)、人工知能(ai)技術と共に、物理的世界の仮想表現を活用して、多くの接続されたインテリジェンスサービスを可能にする。ディジタルツイン(DT)に基づく大量のネットワークデータを扱うために、無線システムは、因果推論などのAI技術を活用して、厳密な通信制約下での情報意思決定を容易にするために意味コミュニケーション(SC)のパラダイムを利用することができる。本稿では,DTベースの無線システムに対して,因果意味通信(CSC)と呼ばれる新しいフレームワークを提案する。 CSCシステムは、DTを用いた最適なネットワーク制御ポリシーにアクセス可能な送信機が、帯域制限された無線チャネル上でSCを使用して、最適な制御アクションを実行するための知識を改善する方法を教える、模倣学習(IL)問題として提起される。ソースデータの因果構造は、エンド・ツー・エンド因果推論(deep end-to-end causal inference)の枠組みから新たなアプローチを用いて抽出され、因果的に不変である意味表現の作成を可能にする。受信機のCSCデコーダは、高いセマンティック信頼性を確保しつつセマンティック情報を抽出して推定するように設計されている。受信制御ポリシ、セマンティックデコーダ、因果推論は、変分推論フレームワーク内の二段階最適化問題として定式化される。この問題は、生成aiの世界モデルにインスパイアされたネットワーク状態モデルと呼ばれる新しい概念を用いて解決され、データ生成につながる環境ダイナミクスを忠実に表現する。シミュレーションの結果,提案したCSCシステムは,より優れたセマンティック信頼性とより少ないセマンティック表現を実現することにより,最先端のSCシステムよりも優れていた。

A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints by utilizing AI techniques such as causal reasoning. In this paper, a novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems. The CSC system is posed as an imitation learning (IL) problem, where the transmitter, with access to optimal network control policies using a DT, teaches the receiver using SC over a bandwidth limited wireless channel how to improve its knowledge to perform optimal control actions. The causal structure in the source data is extracted using novel approaches from the framework of deep end-to-end causal inference, thereby enabling the creation of a semantic representation that is causally invariant, which in turn helps generalize the learned knowledge of the system to unseen scenarios. The CSC decoder at the receiver is designed to extract and estimate semantic information while ensuring high semantic reliability. The receiver control policies, semantic decoder, and causal inference are formulated as a bi-level optimization problem within a variational inference framework. This problem is solved using a novel concept called network state models, inspired from world models in generative AI, that faithfully represents the environment dynamics leading to data generation. Simulation results demonstrate that the proposed CSC system outperforms state-of-the-art SC systems by achieving better semantic reliability and reduced semantic representation.

翻訳日:2023-04-26 22:28:15 公開日:2023-04-25

# 量子ニューラルネットワークとテンソルネットワークを用いた断面ストックリターン予測

The cross-sectional stock return predictions via quantum neural network and tensor network ( http://arxiv.org/abs/2304.12501v1 )

ライセンス: Link先を確認

Nozomu Kobayashi, Yoshiyuki Suimon, Koichi Miyamoto, Kosuke Mitarai

(参考訳) 本稿では,量子および量子に触発された機械学習アルゴリズムのストックリターン予測への応用について検討する。具体的には,ノイズの多い中間スケール量子コンピュータに適したアルゴリズムであるquantum neural networkと,線形回帰やニューラルネットワークなどの古典モデルに対する量子学習アルゴリズムであるtensor networkの性能を評価する。それらの能力を評価するため、予測に基づいてポートフォリオを構築し、投資実績を測定する。日本の株式市場における実証研究によれば、テンソルネットワークモデルは、線形およびニューラルネットワークモデルを含む従来のベンチマークモデルよりも優れた性能を達成している。量子ニューラルネットワークモデルは、古典的ニューラルネットワークモデルよりもずっと低いリスク調整過剰リターンを達成するが、量子ニューラルネットワークとテンソルネットワークモデルの両方が、最新の市場環境において優れたパフォーマンスを示し、入力特徴間の非線形性を捉える能力を示している。

In this paper we investigate the application of quantum and quantum-inspired machine learning algorithms to stock return predictions. Specifically, we evaluate performance of quantum neural network, an algorithm suited for noisy intermediate-scale quantum computers, and tensor network, a quantum-inspired machine learning algorithm, against classical models such as linear regression and neural networks. To evaluate their abilities, we construct portfolios based on their predictions and measure investment performances. The empirical study on the Japanese stock market shows the tensor network model achieves superior performance compared to classical benchmark models, including linear and neural network models. Though the quantum neural network model attains the lowered risk-adjusted excess return than the classical neural network models over the whole period, both the quantum neural network and tensor network models have superior performances in the latest market environment, which suggests capability of model's capturing non-linearity between input features.

翻訳日:2023-04-26 22:27:46 公開日:2023-04-25

# パッチ拡散:拡散モデルの高速化とデータ効率の向上

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models ( http://arxiv.org/abs/2304.12526v1 )

ライセンス: Link先を確認

Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou

(参考訳) 拡散モデルは強力ですが、トレーニングには多くの時間とデータが必要です。汎用的なパッチ指向トレーニングフレームワークであるパッチ拡散(Patch Diffusion)を提案し,データ効率を改善しながらトレーニング時間を大幅に削減し,より広範なユーザへの拡散モデルトレーニングの民主化を支援する。私たちのイノベーションの核心は、パッチレベルの新しい条件スコア関数で、元のイメージのパッチ位置を追加の座標チャネルとして含み、一方、パッチサイズはトレーニング中にランダム化され、多様化され、複数のスケールでクロスリージョン依存関係をエンコードする。本手法によるサンプリングは元の拡散モデルと同じくらい簡単である。 Patch Diffusionを通じて、同等またはより良い世代品質を維持しながら、より高速なトレーニングを実現することができます。一方、パッチ拡散は比較的小さなデータセット(例えば$$)で訓練された拡散モデルの性能を、ゼロからトレーニングするために5000イメージまで改善する。我々はCelebA-64$\times$64で1.77、AFHQv2-Wild-64$\times$64で1.93を達成する。コードとトレーニング済みのモデルを近々共有する予定です。

Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.

翻訳日:2023-04-26 22:19:09 公開日:2023-04-25

# CIMLA:微分因果関係の推論のための解釈可能なAI

CIMLA: Interpretable AI for inference of differential causal networks ( http://arxiv.org/abs/2304.12523v1 )

ライセンス: Link先を確認

Payam Dibaeinia, Saurabh Sinha

(参考訳) 高次元データからの因果関係の発見は、バイオインフォマティクスにおける大きな問題である。機械学習と特徴帰属モデルは、この文脈で大きな期待を示してきたが、因果解釈は欠如している。本稿では,ある変数が他の変数に与える影響を反映した因果量を,ある仮定の下で推定する特徴帰属モデルを示す。我々はこの知見を利用して、因果関係の条件依存的な変化を発見するための新しいツールCIMLAを実装した。 CIMLAを用いて生物学的条件間の遺伝子制御ネットワークの差異を同定し,近年注目されている。シミュレーションデータセットの広範なベンチマークを用いて、CIMLAは変数の分離に頑健であり、先行手法よりも精度が高いことを示す。最後に、我々はCIMLAを用いて、アルツハイマー病(AD)の患者から収集した単細胞RNA-seqデータセットを分析し、ADのいくつかの潜在的な調節因子を発見した。

The discovery of causal relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model estimates a causal quantity reflecting the influence of one variable on another, under certain assumptions. We leverage this insight to implement a new tool, CIMLA, for discovering condition-dependent changes in causal relationships. We then use CIMLA to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data sets, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Finally, we employ CIMLA to analyze a previously published single-cell RNA-seq data set collected from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.

翻訳日:2023-04-26 22:18:47 公開日:2023-04-25

# ロバストな位相検索のための適応的停止条件を持つ新しい近近線形アルゴリズム

A New Inexact Proximal Linear Algorithm with Adaptive Stopping Criteria for Robust Phase Retrieval ( http://arxiv.org/abs/2304.12522v1 )

ライセンス: Link先を確認

Zhong Zheng, Shiqian Ma, and Lingzhou Xue

(参考訳) 本稿では,非平滑かつ非凸最適化問題であるロバスト位相探索問題を考察する。サブプロブレムを不正確に解いた不正確な近位線形アルゴリズムを提案する。我々の貢献はサブプロブレムに対する2つの適応的停止基準である。提案手法の収束挙動を解析した。合成データと実データの両方について実験を行い,本手法が従来の近位線形アルゴリズムや劣勾配法よりも効率的であることを実証した。

This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.

翻訳日:2023-04-26 22:18:27 公開日:2023-04-25

# hint-aug: ファウンデーションビジョントランスフォーマーからのヒントをブーストされたマイナショットパラメーター効率のチューニングへ

Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning ( http://arxiv.org/abs/2304.12520v1 )

ライセンス: Link先を確認

Zhongzhi Yu, Shang Wu, Yonggan Fu, Shunyao Zhang, Yingyan (Celine) Lin

(参考訳) 下流タスクにおけるファンデーション・ビジョン・トランスフォーマー(FViT)のチューニング需要が増大しているにもかかわらず、データ制限シナリオ(例:数ショットチューニング)下でのFViTのポテンシャルを完全に解放することは、FViTsのデータハングリーの性質のため、依然として課題である。一般的なデータ拡張技術はこの文脈では、わずかなチューニングデータに含まれる機能に制限があるため、不足している。事前学習されたFViT自身は、広く使われているパラメータ効率のチューニングで完全に保存されている大規模事前学習データから、非常に代表的な特徴をすでに習得している。そこで我々は、これらの学習機能を活用してチューニングデータを増強することで、FViTチューニングの有効性を高めることができると仮定した。そこで,本研究では,事前学習したfvitsの学習機能を用いて,サンプルの過剰に適合した部分の強化を行い,少数音調律におけるfvitの強化を目的とした,ヒントベースデータ拡張(hint-aug)というフレームワークを提案する。特に、Hint-Augは、2つの重要なイネーブルを統合している: 1) ファンデーションViTの過信パッチを検出するための注意深い過剰適合検知器(AOD)、(2) コンフュージョンベースの特徴注入(CFI)モジュールは、事前訓練されたFViTから上記AODが検出した過信パッチを注入し、チューニング中の特徴の多様性を高める。 5つのデータセットと3つのパラメータ効率のチューニング技術に関する大規模な実験とアブレーション研究は、Hint-Augの有効性を一貫して検証している。例えば、Petデータセットでは、Hint-AugはSOTAデータ拡張メソッドよりも50%少ないトレーニングデータで2.22%高い精度を達成する。

Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs' potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug's effectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.

翻訳日:2023-04-26 22:18:19 公開日:2023-04-25

# RenderDiffusion: 画像生成としてのテキスト生成

RenderDiffusion: Text Generation as Image Generation ( http://arxiv.org/abs/2304.12519v1 )

ライセンス: Link先を確認

Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen

(参考訳) 拡散モデルはテキスト生成の新しい生成パラダイムとなっている。本稿では,テキストの個別な分類的性質を考慮し,テキスト誘導画像生成によるテキスト生成のための新しい拡散手法である「textsc{RenderDiffusion}」を提案する。私たちのキーとなるアイデアは、ターゲットのテキストを視覚言語コンテンツを含む \emph{glyph image} としてレンダリングすることです。このように、条件付きテキスト生成をグリフ画像生成タスクとしてキャストすることができ、離散的なテキストに連続拡散モデルを適用するのは自然である。特に,入力テキストで条件付けされた高忠実度グリフ画像を生成するために,カスケードされたアーキテクチャ(ベースと超解像拡散モデル)を利用する。さらに,生成されたグリフ画像から視覚言語コンテンツを最終的なテキストに変換するために,テキスト接地モジュールを設計した。 4つの条件付きテキスト生成タスクと2種類のメトリクス(品質と多様性)に対する実験では、事前訓練された言語モデルを含むいくつかのベースラインよりも同等またはそれ以上の結果が得られる。また,最近の拡散モデルと比較して大きな改善がみられた。

Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose \textsc{RenderDiffusion}, a novel diffusion approach for text generation via text-guided image generation. Our key idea is to render the target text as a \emph{glyph image} containing visual language content. In this way, conditional text generation can be cast as a glyph image generation task, and it is then natural to apply continuous diffusion models to discrete texts. Specially, we utilize a cascaded architecture (\ie a base and a super-resolution diffusion model) to generate high-fidelity glyph images, conditioned on the input text. Furthermore, we design a text grounding module to transform and refine the visual language content from generated glyph images into the final texts. In experiments over four conditional text generation tasks and two classes of metrics (\ie quality and diversity), \textsc{RenderDiffusion} can achieve comparable or even better results than several baselines, including pretrained language models. Our model also makes significant improvements compared to the recent diffusion model.

翻訳日:2023-04-26 22:17:38 公開日:2023-04-25

# IMUPoser:電話・時計・イヤホンにおけるIMUを用いたフルボディポーズ推定

IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds ( http://arxiv.org/abs/2304.12518v1 )

ライセンス: Link先を確認

Vimal Mollyn, Riku Arakawa, Mayank Goel, Chris Harrison, Karan Ahuja

(参考訳) 体の動きの追跡は、フィットネス、モバイルゲーム、コンテキスト対応バーチャルアシスタント、リハビリに強力な用途を持つ可能性がある。しかし、ユーザーはこの目的を達成するために特別なスーツやセンサーアレイを装着する可能性は低い。代わりに、多くのユーザーが所有しているスマートフォン、スマートウォッチ、イヤホンなどのデバイスで既にIMUを使って身体のポーズを推定できる可能性を探る。このアプローチには、低価格のコモディティimusからのノイズデータや、ユーザ本体の計測点数がばらばらで流動的であることなど、いくつかの課題がある。私たちのパイプラインは、利用可能なIMUデータのサブセットを受け取ります。このモデルを評価するために、我々は、さまざまなアクティビティコンテキストにわたって、市販の消費者デバイスを装着または保持する10人の参加者から収集したimmposerデータセットを作成した。 IMUデータセットと既存のデータセットの両方でベンチマークを行い、システムの包括的な評価を行う。

Tracking body pose on-the-go could have powerful uses in fitness, mobile gaming, context-aware virtual assistants, and rehabilitation. However, users are unlikely to buy and wear special suits or sensor arrays to achieve this end. Instead, in this work, we explore the feasibility of estimating body pose using IMUs already in devices that many users own -- namely smartphones, smartwatches, and earbuds. This approach has several challenges, including noisy data from low-cost commodity IMUs, and the fact that the number of instrumentation points on a users body is both sparse and in flux. Our pipeline receives whatever subset of IMU data is available, potentially from just a single device, and produces a best-guess pose. To evaluate our model, we created the IMUPoser Dataset, collected from 10 participants wearing or holding off-the-shelf consumer devices and across a variety of activity contexts. We provide a comprehensive evaluation of our system, benchmarking it on both our own and existing IMU datasets.

翻訳日:2023-04-26 22:17:18 公開日:2023-04-25

# 大規模言語モデルによる意味圧縮

Semantic Compression With Large Language Models ( http://arxiv.org/abs/2304.12512v1 )

ライセンス: Link先を確認

Henry Gilbert, Michael Sandborn, Douglas C. Schmidt, Jesse Spencer-Smith, Jules White

(参考訳) 大規模言語モデル(LLM)の台頭は、情報検索、質問応答、要約、コード生成タスクに革命をもたらしている。しかしながら、事実的に不正確な情報を時折提示すること(「幻覚」と呼ばれる)に加えて、llmは本質的に一度に処理できる入出力トークンの数によって制限されるため、大きなセットや連続的な情報ストリームを処理するタスクでは効果が低下する可能性がある。データのサイズを減らす一般的なアプローチは、ロスレス圧縮またはロスレス圧縮である。しかし、いくつかのケースでは、必要な意味的精度や意図が伝達される限り、元のデータからすべての詳細を完全回復する必要はないかもしれない。本稿では,LLMの研究への3つの貢献について述べる。まず, GPT-3.5 と GPT-4 を ChatGPT インタフェースを用いて, LLM を用いた近似圧縮の実現可能性について検討した。第2に,LLMがテキストやコードを圧縮し,プロンプトの圧縮表現をリコールし,操作する能力について検討し,定量化する。第3に,本研究では,LLMによって圧縮されたテキストと非圧縮されたテキスト間の保存意図のレベルを定量化する2つの新しい指標,ERE(Exact Reconstructive Effectiveness)とSRE(Semantic Reconstructive Effectiveness)を提案する。我々の最初の結果は、GPT-4がテキストのセマンティックな意味を保ちながら、テキストを効果的に圧縮して再構築できることを示し、現在の制限よりも$\sim$5$\times$多くのトークンを活用するための道を提供する。

The rise of large language models (LLMs) is revolutionizing information retrieval, question answering, summarization, and code generation tasks. However, in addition to confidently presenting factually inaccurate information at times (known as "hallucinations"), LLMs are also inherently limited by the number of input and output tokens that can be processed at once, making them potentially less effective on tasks that require processing a large set or continuous stream of information. A common approach to reducing the size of data is through lossless or lossy compression. Yet, in some cases it may not be strictly necessary to perfectly recover every detail from the original data, as long as a requisite level of semantic precision or intent is conveyed. This paper presents three contributions to research on LLMs. First, we present the results from experiments exploring the viability of approximate compression using LLMs, focusing specifically on GPT-3.5 and GPT-4 via ChatGPT interfaces. Second, we investigate and quantify the capability of LLMs to compress text and code, as well as to recall and manipulate compressed representations of prompts. Third, we present two novel metrics -- Exact Reconstructive Effectiveness (ERE) and Semantic Reconstruction Effectiveness (SRE) -- that quantify the level of preserved intent between text compressed and decompressed by the LLMs we studied. Our initial results indicate that GPT-4 can effectively compress and reconstruct text while preserving the semantic essence of the original text, providing a path to leverage $\sim$5$\times$ more tokens than present limits allow.

翻訳日:2023-04-26 22:16:59 公開日:2023-04-25

# モデルフリー強化学習による形式的仕様の実現

Fulfilling Formal Specifications ASAP by Model-free Reinforcement Learning ( http://arxiv.org/abs/2304.12508v1 )

ライセンス: Link先を確認

Mengyu Liu and Pengyuan Lu and Xin Chen and Fanxin Kong and Oleg Sokolsky and Insup Lee

(参考訳) モデルレス強化学習ソリューション,すなわちASAP-Phiフレームワークを提案し,エージェントがASAPの正式な仕様を満たすことを奨励する。このフレームワークは、仕様を満たさないトレースに定量的なセマンティック報酬を割り当てるピースワイズ報酬関数と、残りに対して高い一定報酬を付与する。次に、soft actor-critic(sac)やdeep deterministic policy gradient(ddpg)などのアクタ-クリティックベースのアルゴリズムでエージェントを訓練する。さらに、ASAP-Phiは仕様の達成を優先するポリシーを生成する。最先端ベンチマークに関するアブレーション研究を含む広範な実験が行われている。その結果,97\%のテストケースで十分な速度のトラジェクタを見つけ出すことができ,ベースラインを打ち破ることができた。

We propose a model-free reinforcement learning solution, namely the ASAP-Phi framework, to encourage an agent to fulfill a formal specification ASAP. The framework leverages a piece-wise reward function that assigns quantitative semantic reward to traces not satisfying the specification, and a high constant reward to the remaining. Then, it trains an agent with an actor-critic-based algorithm, such as soft actor-critic (SAC), or deep deterministic policy gradient (DDPG). Moreover, we prove that ASAP-Phi produces policies that prioritize fulfilling a specification ASAP. Extensive experiments are run, including ablation studies, on state-of-the-art benchmarks. Results show that our framework succeeds in finding sufficiently fast trajectories for up to 97\% test cases and defeats baselines.

翻訳日:2023-04-26 22:16:32 公開日:2023-04-25

# 加速度MRIのためのタスク特化戦略の学習

Learning Task-Specific Strategies for Accelerated MRI ( http://arxiv.org/abs/2304.12507v1 )

ライセンス: Link先を確認

Zihui Wu, Tianwei Yin, Yu Sun, Robert Frost, Andre van der Kouwe, Adrian V. Dalca, Katherine L. Bouman

(参考訳) 圧縮型磁気共鳴イメージング(CS-MRI)は、診断タスクのためのサブサンプル計測から視覚情報を回復しようとする。従来のCS-MRI法は、計測サブサンプリング、画像再構成、タスク予測を別々に扱うことが多く、結果として準最適エンドツーエンドのパフォーマンスが得られる。本研究では,特定のタスクに適したCS-MRIシステムを設計するための統合フレームワークとしてTACKLEを提案する。最近の共同設計技術を活用して、TACKLEはサブサンプリング、再構築、予測戦略を共同で最適化し、下流タスクのパフォーマンスを向上させる。提案手法は従来のCS-MRI法よりも様々なタスクの性能向上を実現している。また、トレーニングデータから異なる取得設定を用いて新しいデータセットを実験的に収集することにより、TACKLEの一般化能力を評価する。さらなる微調整がなければ、TACKLEは堅牢に機能し、数値と視覚の両方の改善につながる。

Compressed sensing magnetic resonance imaging (CS-MRI) seeks to recover visual information from subsampled measurements for diagnostic tasks. Traditional CS-MRI methods often separately address measurement subsampling, image reconstruction, and task prediction, resulting in suboptimal end-to-end performance. In this work, we propose TACKLE as a unified framework for designing CS-MRI systems tailored to specific tasks. Leveraging recent co-design techniques, TACKLE jointly optimizes subsampling, reconstruction, and prediction strategies to enhance the performance on the downstream task. Our results on multiple public MRI datasets show that the proposed framework achieves improved performance on various tasks over traditional CS-MRI methods. We also evaluate the generalization ability of TACKLE by experimentally collecting a new dataset using different acquisition setups from the training data. Without additional fine-tuning, TACKLE functions robustly and leads to both numerical and visual improvements.

翻訳日:2023-04-26 22:16:20 公開日:2023-04-25

# 一般化ベイズ加法回帰木に対する後方濃度の理論

Theory of Posterior Concentration for Generalized Bayesian Additive Regression Trees ( http://arxiv.org/abs/2304.12505v1 )

ライセンス: Link先を確認

Enakshi Saha

(参考訳) Bayesian Additive Regression Trees (BART) は非線形回帰関数をモデル化するための強力な半パラメトリックアンサンブル学習技術である。当初、BARTは連続した応答変数とバイナリな応答変数のみを予測するために提案されていたが、長年にわたって、様々なアプリケーション領域においてより広範な応答変数(例えばカテゴリやカウントデータ)を推定するのに適する複数の拡張が出現してきた。本稿では、ベイズ木に対する一般化された枠組みと、応答変数が指数関数的なファミリー分布から来る付加的なアンサンブルについて述べる。応答分布について十分な条件を導出し, 後部が最小マックスで集中する条件を対数係数まで導出する。本稿では,BARTとその変種を実証的に成功させる理論的根拠を提供する。

Bayesian Additive Regression Trees (BART) are a powerful semiparametric ensemble learning technique for modeling nonlinear regression functions. Although initially BART was proposed for predicting only continuous and binary response variables, over the years multiple extensions have emerged that are suitable for estimating a wider class of response variables (e.g. categorical and count data) in a multitude of application areas. In this paper we describe a Generalized framework for Bayesian trees and their additive ensembles where the response variable comes from an exponential family distribution and hence encompasses a majority of these variants of BART. We derive sufficient conditions on the response distribution, under which the posterior concentrates at a minimax rate, up to a logarithmic factor. In this regard our results provide theoretical justification for the empirical success of BART and its variants.

翻訳日:2023-04-26 22:16:04 公開日:2023-04-25

# オブジェクトセマンティクスは私たちが必要とする深さを与える:空中深度補完へのマルチタスクアプローチ

Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion ( http://arxiv.org/abs/2304.12542v1 )

ライセンス: Link先を確認

Sara Hatami Gazani, Fardad Dadboud, Miodrag Bolic, Iraj Mantegh, Homayoun Najjaran

(参考訳) 深度完了と物体検出は、しばしば空中3Dマッピング、経路計画、無人航空機(UAV)の衝突回避に使用される2つの重要なタスクである。一般的な解決策としては、LiDARセンサーによる測定があるが、生成された点雲はスパースで不規則であり、3Dレンダリングと安全クリティカルな意思決定におけるシステムの能力を制限していることが多い。この課題を軽減するために、UAV上の他のセンサー(オブジェクト検出に使用されるカメラ)からの情報を利用して、深度補正プロセスがより高密度な3Dモデルを生成するのに役立つ。 2つのセンサーからのデータを融合させながら、空中深度補完と物体検出の両方を実行することは、資源効率に課題をもたらす。本稿では,2つのタスクをひとつのパスで共同実行するための新しいアプローチを提案する。提案手法は,2つのタスクを共同学習機能に公開するエンコーダに着目したマルチタスク学習モデルに基づく。物体検出経路によって学習されたシーンにおける物体の意味的期待が、不足した深さ値を置きながら深さ完了経路の性能をいかに高めるかを示す。実験の結果,提案するマルチタスクネットワークは,特に欠陥入力に対して,シングルタスクネットワークよりも優れていることがわかった。

Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.

翻訳日:2023-04-26 22:10:32 公開日:2023-04-25

# 物理インフォームドインバータブルニューラルネットワークを用いた逆問題に対する効率的なベイズ推論

Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems ( http://arxiv.org/abs/2304.12541v1 )

ライセンス: Link先を確認

Xiaofei Guan, Xintong Wang, Hao Wu

(参考訳) 本稿では,物理インバータブルニューラルネットワーク (pi-inn) を用いたベイズ逆問題に対する新しい解法を提案する。 PI-INNのアーキテクチャは、可逆ニューラルネットワーク(INN)とニューラルネットワーク(NB-Net)の2つのサブネットワークで構成されている。 NB-Netの助けを借りてパラメトリック入力とIPN出力の間の可逆写像を構築し、後方分布の抽出可能な推定を行い、効率的なサンプリングと精度の高い密度評価を可能にする。さらに、PI-INNの損失関数は、残基物理インフォームド損失項と、新しい独立損失項の2つの成分を含む。提案する独立損失項は、推定密度関数を有効利用することにより、ランダム潜在変数をガウス化し、inn出力の2つの部分間の統計的独立性を確保することができる。逆運動学, 1-d, 2-d拡散方程式の逆問題, 地震時トモグラフィなど, 提案したPI-INNの効率と精度を示す数値実験を行った。

In the paper, we propose a novel approach for solving Bayesian inverse problems with physics-informed invertible neural networks (PI-INN). The architecture of PI-INN consists of two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The invertible map between the parametric input and the INN output with the aid of NB-Net is constructed to provide a tractable estimation of the posterior distribution, which enables efficient sampling and accurate density evaluation. Furthermore, the loss function of PI-INN includes two components: a residual-based physics-informed loss term and a new independence loss term. The presented independence loss term can Gaussianize the random latent variables and ensure statistical independence between two parts of INN output by effectively utilizing the estimated density function. Several numerical experiments are presented to demonstrate the efficiency and accuracy of the proposed PI-INN, including inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations, and seismic traveltime tomography.

翻訳日:2023-04-26 22:10:11 公開日:2023-04-25

# 敵対的ネットワーク摂動下における意見制御--stackelbergゲームアプローチ

Opinion Control under Adversarial Network Perturbation: A Stackelberg Game Approach ( http://arxiv.org/abs/2304.12540v1 )

ライセンス: Link先を確認

Yuejiang Li, Zhanjiang Chen, H. Vicky Zhao

(参考訳) 新たなソーシャルネットワークプラットフォームによって、ユーザーは自分の意見を共有したり、他人と意見を交換したりできる。しかし、悪質なユーザーが故意に極端な意見や噂、誤った情報を他人に広める、敵対的なネットワークの混乱は、ソーシャルネットワークにおいてユビキタスである。このような敵対的ネットワークの摂動は、世論の形成に大きな影響を与え、我々の社会を脅かす。したがって、敵ネットワーク摂動の影響を研究・制御することが重要である。学界と業界の両方で、世論のダイナミクスをガイドし、制御するために多大な努力がなされてきたが、これらの研究の多くは、ネットワークが静的であり、そのような敵のネットワーク摂動を無視していると仮定している。本研究は,Friedkin-Johnsen意見力学モデルに基づいて,敵対的ネットワーク摂動をモデル化し,そのネットワークの意見への影響を分析する。そして, 敵の視点から, その最適ネットワーク摂動を解析し, ネットワークの意見を最大に変化させる。次に,ネットワークディフェンダーの観点から,stackelbergゲームを定式化し,そのような敵対的ネットワーク摂動下でもネットワークの意見を制御することを目的とする。定式化されたstackelbergゲームを解くために,計画的サブグレードエントアルゴリズムを考案する。実社会ネットワークにおける大規模シミュレーションは, 敵ネットワーク摂動の影響と, 提案した意見制御アルゴリズムの有効性を検証した。

The emerging social network platforms enable users to share their own opinions, as well as to exchange opinions with others. However, adversarial network perturbation, where malicious users intentionally spread their extreme opinions, rumors, and misinformation to others, is ubiquitous in social networks. Such adversarial network perturbation greatly influences the opinion formation of the public and threatens our societies. Thus, it is critical to study and control the influence of adversarial network perturbation. Although tremendous efforts have been made in both academia and industry to guide and control the public opinion dynamics, most of these works assume that the network is static, and ignore such adversarial network perturbation. In this work, based on the well-accepted Friedkin-Johnsen opinion dynamics model, we model the adversarial network perturbation and analyze its impact on the networks' opinion. Then, from the adversary's perspective, we analyze its optimal network perturbation, which maximally changes the network's opinion. Next, from the network defender's perspective, we formulate a Stackelberg game and aim to control the network's opinion even under such adversarial network perturbation. We devise a projected subgradient algorithm to solve the formulated Stackelberg game. Extensive simulations on real social networks validate our analysis of the adversarial network perturbation's influence and the effectiveness of the proposed opinion control algorithm.

翻訳日:2023-04-26 22:09:51 公開日:2023-04-25

# 空間制約付きテキスト誘導眼鏡操作

Text-guided Eyeglasses Manipulation with Spatial Constraints ( http://arxiv.org/abs/2304.12539v1 )

ライセンス: Link先を確認

Jiacheng Wang, Ping Liu, Jingen Liu, Wei Xu

(参考訳) メガネのバーチャル試着には、異なる形状とスタイルの眼鏡を物理的に試すことなく、顔画像に配置する。既存の方法は印象的な結果を示しているが、様々な眼鏡のスタイルは限られており、相互作用は常に直感的あるいは効率的であるとは限らない。そこで本稿では,これらの制約に対処するために,バイナリマスクとテキストに基づく眼鏡形状とスタイルをそれぞれ制御可能な眼鏡操作方式を提案する。具体的には,マスク条件を抽出するマスクエンコーダと,テキストとマスク条件を同時に注入可能な変調モジュールを提案する。この設計により、テクスト記述と空間制約の両方に基づいて眼鏡の外観を細かく制御することができる。提案手法は,無関係な領域を保存し,局所的な編集を向上する疎結合マッパーと疎結合戦略を含む。様々なモーダリティ条件の異なる収束速度を扱うために2段階のトレーニングスキームを用い,眼鏡の形状とスタイルの両方をうまく制御した。広範な比較実験とアブレーション分析により,無関係領域を保ちながら多様な眼鏡スタイルを実現するためのアプローチの有効性が示された。

Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on. While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient. To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively. Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions. This design allows for fine-grained control of the eyeglasses' appearance based on both textual descriptions and spatial constraints. Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing. We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses. Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.

翻訳日:2023-04-26 22:09:27 公開日:2023-04-25

# GARCIA:多粒性コントラスト学習を用いたロングテールクエリの表現

GARCIA: Powering Representations of Long-tail Query with Multi-granularity Contrastive Learning ( http://arxiv.org/abs/2304.12537v1 )

ライセンス: Link先を確認

Weifan Wang, Binbin Hu, Zhicheng Peng, Mingjie Zhong, Zhiqiang Zhang, Zhongyi Liu, Guannan Zhang, Jun Zhou

(参考訳) 近年、サービスプラットフォームの成長は、ユーザと商店双方にとって大きな便宜をもたらし、サービス検索エンジンは、テキストクエリによる望ましい結果の迅速な取得により、ユーザエクスペリエンスの向上に重要な役割を担っている。残念ながら、ユーザーの制御不能な検索習慣は、通常大量のロングテールクエリを持ち込み、検索モデルの能力を著しく脅かす。最近出現しているグラフニューラルネットワーク(GNN)とコントラスト学習(CL)に触発されて、ロングテール問題を緩和し、かなりのパフォーマンスを達成するために、いくつかの取り組みが行われた。それでも、いくつかの大きな弱点に直面している。最も重要なことは、彼らは効果的な知識伝達のために頭と尾の間の文脈構造を明示的に利用せず、より一般化された表現に対して意図レベル情報は一般的に無視されることである。そこで本研究では,グラフに基づく知識伝達と意図に基づく表現一般化を対比的に活用する新しい枠組み garcia を開発した。特に、適応エンコーダを用いて、クエリやサービスの情報表現と、意図の階層構造を考慮した表現を生成する。テールクエリとサービスを完全に理解するために,我々は,知識伝達,構造拡張,意図の一般化を通じて表現を駆動する,新しい多粒性コントラスト学習モジュールをGARCIAに装備する。その後、完全なガルシアは事前訓練と微調整の方法でよく訓練される。最後に、オフライン環境とオンライン環境の両方で広範な実験を行い、サービス検索シナリオにおけるテールクエリの改善と全体的なパフォーマンスに関するGARCIAの優れた能力を示す。

Recently, the growth of service platforms brings great convenience to both users and merchants, where the service search engine plays a vital role in improving the user experience by quickly obtaining desirable results via textual queries. Unfortunately, users' uncontrollable search customs usually bring vast amounts of long-tail queries, which severely threaten the capability of search models. Inspired by recently emerging graph neural networks (GNNs) and contrastive learning (CL), several efforts have been made in alleviating the long-tail issue and achieve considerable performance. Nevertheless, they still face a few major weaknesses. Most importantly, they do not explicitly utilize the contextual structure between heads and tails for effective knowledge transfer, and intention-level information is commonly ignored for more generalized representations. To this end, we develop a novel framework GARCIA, which exploits the graph based knowledge transfer and intention based representation generalization in a contrastive setting. In particular, we employ an adaptive encoder to produce informative representations for queries and services, as well as hierarchical structure aware representations of intentions. To fully understand tail queries and services, we equip GARCIA with a novel multi-granularity contrastive learning module, which powers representations through knowledge transfer, structure enhancement and intention generalization. Subsequently, the complete GARCIA is well trained in a pre-training&fine-tuning manner. At last, we conduct extensive experiments on both offline and online environments, which demonstrates the superior capability of GARCIA in improving tail queries and overall performance in service search scenarios.

翻訳日:2023-04-26 22:09:09 公開日:2023-04-25

# ラテント分類器誘導による合成視覚生成の探索

Exploring Compositional Visual Generation with Latent Classifier Guidance ( http://arxiv.org/abs/2304.12536v1 )

ライセンス: Link先を確認

Changhao Shi, Haomiao Ni, Kai Li, Shaobo Han, Mingfu Liang, Martin Renqiang Min

(参考訳) 拡散確率モデルは画像生成と操作の分野で大きな成功を収めている。本稿では,合成視覚タスクの潜在意味空間における拡散モデルと分類器指導を用いた新しいパラダイムについて検討する。リニアファッション。具体的には、有意味な潜在空間を持つ任意の事前学習された生成モデルに対して、潜在拡散モデルと補助潜在分類器を訓練し、潜在表現生成の非線形ナビゲーションを容易にする。潜在分類器指導による条件付き生成は,訓練中の条件付きログ確率の下限を最大化する。操作中に元のセマンティクスを維持するために,合成性を達成する上で重要な新しい指導用語を導入する。さらなる仮定により、非線形演算は単純な潜在算術的アプローチに還元されることを示す。潜在分類器指導に基づくこのパラダイムは,事前学習した生成モデルと無関係であり,実画像および合成画像の逐次操作と画像生成における競合結果を示す。以上の結果から,潜在型分類法は,他の強力な競合手法が存在する場合でも,さらなる探索に役立つ有望なアプローチであることが示唆された。

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. linear fashion. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.

翻訳日:2023-04-26 22:08:42 公開日:2023-04-25

# Img2Vec:Token-Diversityの教師がマスクオートエンコーダーを支援

Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders ( http://arxiv.org/abs/2304.12535v1 )

ライセンス: Link先を確認

Heng Pan, Chenyang Liu, Wenxiao Wang, Li Yuan, Hongfa Wang, Zhifeng Li, Wei Liu

(参考訳) 本稿では,マスク画像モデリング(mim)のための画像からベクトルへのパイプライン(img2vec)を提案する。学習対象としてmimにどのような深い特徴が適しているかを検討するため,我々は,学習対象として画像から特徴ベクトルに変換するための訓練された自己教師モデルを用いた簡易mimフレームワークを提案し,その特徴抽出器を教師モデルとしても知られている。驚くべきことに、MIMモデルは、Transformerベースのモデル(例えば、ViT-Large、307M)のような面倒な教師によるものよりも、より軽いモデル(例えば、ResNet-50、26M)によって生成される画像特徴の恩恵を経験的に見出した。この注目すべき現象を分析するために,新しい特徴であるトークン多様性を考案し,異なるモデルから生成した特徴の特性を評価する。トークンの多様性は、異なるトークン間の特徴差を測定する。広範な実験と可視化を通じて,大規模モデルがmimを改善できるという認識を超えて,教師モデルの高いトークン多様性も重要であると仮定する。以上の議論に基づき、Img2Vecは高いトークン多様性を持つ教師モデルを採用し、画像特徴を生成する。 Img2VecはImageNetの未ラベルデータにViT-Bで事前トレーニングされた。さらに、大型モデル、ViT-L と ViT-H で Img2Vec をスケールアップし、それぞれ 86.7\% と 87.5\% の精度を得る。また、COCOでは51.8\% mAP、ADE20Kでは50.7\% mIoUなど、他の下流タスクでは最先端の結果も達成している。 Img2Vecは、MIM学習を深く特徴付けるのに適した、シンプルで効果的なフレームワークである。

We present a pipeline of Image to Vector (Img2Vec) for masked image modeling (MIM) with deep features. To study which type of deep features is appropriate for MIM as a learning target, we propose a simple MIM framework with serials of well-trained self-supervised models to convert an Image to a feature Vector as the learning target of MIM, where the feature extractor is also known as a teacher model. Surprisingly, we empirically find that an MIM model benefits more from image features generated by some lighter models (e.g., ResNet-50, 26M) than from those by a cumbersome teacher like Transformer-based models (e.g., ViT-Large, 307M). To analyze this remarkable phenomenon, we devise a novel attribute, token diversity, to evaluate the characteristics of generated features from different models. Token diversity measures the feature dissimilarity among different tokens. Through extensive experiments and visualizations, we hypothesize that beyond the acknowledgment that a large model can improve MIM, a high token-diversity of a teacher model is also crucial. Based on the above discussion, Img2Vec adopts a teacher model with high token-diversity to generate image features. Img2Vec pre-trained on ImageNet unlabeled data with ViT-B yields 85.1\% top-1 accuracy on fine-tuning. Moreover, we scale up Img2Vec on larger models, ViT-L and ViT-H, and get $86.7\%$ and $87.5\%$ accuracy respectively. It also achieves state-of-the-art results on other downstream tasks, e.g., 51.8\% mAP on COCO and 50.7\% mIoU on ADE20K. Img2Vec is a simple yet effective framework tailored to deep feature MIM learning, accomplishing superb comprehensive performance on representative vision tasks.

翻訳日:2023-04-26 22:08:25 公開日:2023-04-25

# ランダムウォーク確率ADMMによる個人化フェデレーション学習の安定化

Mobilizing Personalized Federated Learning via Random Walk Stochastic ADMM ( http://arxiv.org/abs/2304.12534v1 )

ライセンス: Link先を確認

Ziba Parsons, Fei Dou, Houyi Du, Jin Lu

(参考訳) 本研究では,中央サーバと全クライアントとの一貫した接続が維持できず,データ分散が不均一である実世界シナリオにおいて,連合学習(fl)を実装する際の障壁について検討する。これらの課題に対処するために、サーバが隣接するクライアントのグループ間を移動してローカルモデルを学習するフェデレーション設定の動員に焦点を当てる。具体的には,モデルの学習に十分な数の接続クライアントがある限り,動的およびアドホックなネットワーク条件に適応可能な乗算器のランダムウォーク確率的交互方向法(rwsadmm)を提案する。 RWSADMMでは、サーバはクライアントのグループに向かってランダムに歩く。データの不均一性に対処するコンセンサス更新の代わりに、ハード不等式制約に基づいて、隣接するクライアント間の局所的な近接を定式化する。提案手法は,集中サーバが通信するクライアント数を削減し,通信コストを低減し,スケーラビリティを向上させる。

In this research, we investigate the barriers associated with implementing Federated Learning (FL) in real-world scenarios, where a consistent connection between the central server and all clients cannot be maintained, and data distribution is heterogeneous. To address these challenges, we focus on mobilizing the federated setting, where the server moves between groups of adjacent clients to learn local models. Specifically, we propose a new algorithm, Random Walk Stochastic Alternating Direction Method of Multipliers (RWSADMM), capable of adapting to dynamic and ad-hoc network conditions as long as a sufficient number of connected clients are available for model training. In RWSADMM, the server walks randomly toward a group of clients. It formulates local proximity among adjacent clients based on hard inequality constraints instead of consensus updates to address data heterogeneity. Our proposed method is convergent, reduces communication costs, and enhances scalability by reducing the number of clients the central server needs to communicate with.

翻訳日:2023-04-26 22:07:50 公開日:2023-04-25

# SEA:マルチエージェント強化学習のための空間的拡張アーキテクチャ

SEA: A Spatially Explicit Architecture for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2304.12532v1 )

ライセンス: Link先を確認

Dapeng Li, Zhiwei Xu, Bin Zhang, Guoliang Fan

(参考訳) 空間情報は様々な分野において不可欠である。エージェントの空間的位置に応じて明確にモデル化する方法は、特にエージェントの数が変化し、スケールが巨大である場合に、マルチエージェント問題にとって非常に重要である。本稿では,コンピュータビジョンにおけるポイントクラウドタスクに着想を得て,マルチエージェント強化学習のための空間情報抽出構造を提案する。エージェントは、空間エンコーダデコーダ構造を通じて、近隣とグローバル情報を効果的に共有することができる。本手法は,分散実行(CTDE)パラダイムを用いた集中型学習に準じる。さらに,本手法は,既存の多種多様な強化学習アルゴリズムに対して,小さな修正を加えることで適用可能であり,様々なエージェントで問題に対処することができる。複数のマルチエージェントシナリオにおける実験は、既存の手法が空間的に明示的なアーキテクチャを追加することで説得力のある結果が得られることを示している。

Spatial information is essential in various fields. How to explicitly model according to the spatial location of agents is also very important for the multi-agent problem, especially when the number of agents is changing and the scale is enormous. Inspired by the point cloud task in computer vision, we propose a spatial information extraction structure for multi-agent reinforcement learning in this paper. Agents can effectively share the neighborhood and global information through a spatially encoder-decoder structure. Our method follows the centralized training with decentralized execution (CTDE) paradigm. In addition, our structure can be applied to various existing mainstream reinforcement learning algorithms with minor modifications and can deal with the problem with a variable number of agents. The experiments in several multi-agent scenarios show that the existing methods can get convincing results by adding our spatially explicit architecture.

翻訳日:2023-04-26 22:07:33 公開日:2023-04-25

# ChatGPTを用いた人間-ロボット協調の信頼性向上

Improved Trust in Human-Robot Collaboration with ChatGPT ( http://arxiv.org/abs/2304.12529v1 )

ライセンス: Link先を確認

Yang Ye, Hengxu You, Jing Du

(参考訳) 人工知能の時代、ロボットが人間の生活の様々な側面に深く関わるようになるにつれ、ロボットのコラボレーションはますます重要になっている。しかし、ロボットを信頼する人間のオペレーターの問題は、主に人間とロボット間の適切な意味理解とコミュニケーションが欠如しているため、重要な関心事である。 ChatGPTのような大規模言語モデル(LLM)の出現は、対話的でコミュニケーション的で堅牢な人間とロボットのコラボレーションアプローチを開発する機会を提供する。本稿では,ChatGPTが人間とロボットの協調作業における信頼に与える影響を考察する。本研究は, chatgpt を用いたロボット制御システム robogpt を設計, 操作者が道具を取り出すのを助けるために7自由度ロボットアームを制御し, 操作者は自然言語でロボットアームを操作できる。ロボットにChatGPTを組み込むことは、ロボットが人間とより効果的にコミュニケーションできる能力に起因して、ロボットとロボットのコラボレーションに対する信頼を著しく高めることを示した。さらに、ChatGPTは人間の言語のニュアンスを理解し、適切に応答することで、より自然で直感的な人間とロボットの相互作用を構築するのに役立つ。本研究の成果は,人間ロボット協調システムの開発に重要な意味を持つ。

Human robot collaboration is becoming increasingly important as robots become more involved in various aspects of human life in the era of Artificial Intelligence. However, the issue of human operators trust in robots remains a significant concern, primarily due to the lack of adequate semantic understanding and communication between humans and robots. The emergence of Large Language Models (LLMs), such as ChatGPT, provides an opportunity to develop an interactive, communicative, and robust human-robot collaboration approach. This paper explores the impact of ChatGPT on trust in a human-robot collaboration assembly task. This study designs a robot control system called RoboGPT using ChatGPT to control a 7-degree-of-freedom robot arm to help human operators fetch, and place tools, while human operators can communicate with and control the robot arm using natural language. A human-subject experiment showed that incorporating ChatGPT in robots significantly increased trust in human-robot collaboration, which can be attributed to the robot's ability to communicate more effectively with humans. Furthermore, ChatGPT ability to understand the nuances of human language and respond appropriately helps to build a more natural and intuitive human-robot interaction. The findings of this study have significant implications for the development of human-robot collaboration systems.

翻訳日:2023-04-26 22:07:20 公開日:2023-04-25

# 画像テキスト検索のための学習可能なピラーベースリグレード

Learnable Pillar-based Re-ranking for Image-Text Retrieval ( http://arxiv.org/abs/2304.12570v1 )

ライセンス: Link先を確認

Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, Tat-Seng Chua

(参考訳) 画像テキスト検索は、意味的類似性に基づいて、モダリティギャップを橋渡しし、クロスモーダルコンテンツを取得することを目的としている。先行研究は通常、ペアワイズ関係(すなわち、データサンプルが他のデータと一致するかどうか)に焦点を当てるが、高次隣接関係(すなわち、複数のデータサンプル間のマッチング構造)を無視している。一般的なポストプロセッシング手法であるリグレードは, 単一モダリティ検索タスクにおいて, 隣り合う関係を捕捉する優位性を明らかにしている。しかし、既存の再分類アルゴリズムを直接画像テキスト検索に拡張するのは効果がない。本稿では,一般化,柔軟性,スパーシティ,非対称性という4つの視点から理由を分析し,新しい学習可能な柱型再ランキングパラダイムを提案する。具体的には,まず最上位の個体間およびモード間近傍を柱として選択し,それらと柱間の隣接関係でデータサンプルを再構成する。このように、各サンプルは類似性のみを用いてマルチモーダルピラー空間にマッピングでき、一般化が保証される。その後、関係を柔軟に活用し、近傍のばらばらな正の項目を発掘するために、隣り合うグラフ推論モジュールを設計する。また,クロスモーダル協調を促進し,非対称モダリティを整合させる構造アライメント制約を提案する。さまざまなベースバックボーンに加えて,flickr30kとms-cocoという2つのベンチマークデータセットで広範な実験を行い,提案手法の有効性,優越性,一般化,転送性について実証した。

Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities. Prior work usually focuses on the pairwise relations (i.e., whether a data sample matches another) but ignores the higher-order neighbor relations (i.e., a matching structure among multiple data samples). Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks. However, it is ineffective to directly extend existing re-ranking algorithms to image-text retrieval. In this paper, we analyze the reason from four perspectives, i.e., generalization, flexibility, sparsity, and asymmetry, and propose a novel learnable pillar-based re-ranking paradigm. Concretely, we first select top-ranked intra- and inter-modal neighbors as pillars, and then reconstruct data samples with the neighbor relations between them and the pillars. In this way, each sample can be mapped into a multimodal pillar space only using similarities, ensuring generalization. After that, we design a neighbor-aware graph reasoning module to flexibly exploit the relations and excavate the sparse positive items within a neighborhood. We also present a structure alignment constraint to promote cross-modal collaboration and align the asymmetric modalities. On top of various base backbones, we carry out extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, demonstrating the effectiveness, superiority, generalization, and transferability of our proposed re-ranking paradigm.

翻訳日:2023-04-26 22:00:27 公開日:2023-04-25

# KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis

KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis ( http://arxiv.org/abs/2304.12569v1 )

ライセンス: Link先を確認

Antoine Nzeyimana

(参考訳) 本稿では、著者が「セメヴァル-2023タスク12:アフリカ語感情分析」に入力したシステムについて述べる。システムはKinyarwanda言語に焦点を当て、言語固有のモデルを使用する。 kinyarwanda形態素は2層トランスフォーマーアーキテクチャでモデル化され、トランスフォーマーモデルはマルチタスクマスク形態素予測を用いて大きなテキストコーパスで事前学習される。このモデルは実験的なプラットフォームにデプロイされ、ユーザーは機械学習コードを書くことなく、トレーニング済みの言語モデルの微調整を試すことができる。共有タスクへの最後の応募は、34チーム中2位を獲得し、72.50%の重み付きF1得点を達成しました。評価結果の分析は,タスクの高精度化における課題を強調し,改善すべき領域を特定する。

This paper describes the system entered by the author to the SemEval-2023 Task 12: Sentiment analysis for African languages. The system focuses on the Kinyarwanda language and uses a language-specific model. Kinyarwanda morphology is modeled in a two tier transformer architecture and the transformer model is pre-trained on a large text corpus using multi-task masked morphology prediction. The model is deployed on an experimental platform that allows users to experiment with the pre-trained language model fine-tuning without the need to write machine learning code. Our final submission to the shared task achieves second ranking out of 34 teams in the competition, achieving 72.50% weighted F1 score. Our analysis of the evaluation results highlights challenges in achieving high accuracy on the task and identifies areas for improvement.

翻訳日:2023-04-26 21:59:58 公開日:2023-04-25

# マルチモーダルモデリングと異種GNNを用いた性能最適化

Performance Optimization using Multimodal Modeling and Heterogeneous GNN ( http://arxiv.org/abs/2304.12568v1 )

ライセンス: Link先を確認

Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Anna Sikora, Eduardo Cesar, Ali Jannesari

(参考訳) HPCアーキテクチャにおける不均一性と構成性の向上は、これらのシステムにおける自動チューニングアプリケーションとランタイムパラメータを非常に複雑にしている。ユーザはパラメータを設定するためのオプションを多数提示する。アプリケーション固有のソリューションに加えて、汎用的な検索戦略を使用することも一般的なアプローチであり、最良の構成や収束までの時間を特定することが大きな障壁となることが多い。したがって、様々なチューニングタスクに容易にスケールして適応できる汎用的で効率的なチューニングアプローチが必要となる。本稿では,複数のタスクに適応できるほど汎用的な並列コード領域のチューニング手法を提案する。本稿では、IRに基づくプログラミングモデルを分析し、タスク固有の性能最適化を行う。この目的のために,多モードグラフニューラルネットワークとオートエンコーダ(MGA)チューナを提案する。これは,異種グラフニューラルネットワークに適応したマルチモーダル深層学習に基づくアプローチであり,別個のモダリティとして機能するIRベースのコード表現をモデル化するための自動エンコーダをデノライズする。このアプローチは、並列コード領域/カーネルをチューニングするための構文、セマンティクス、構造対応irベースのコード表現をモデル化するパイプラインの一部として使用します。我々はPolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, LULESHベンチマークから得られたOpenMPおよびOpenCLコード領域/カーネルを広範囲に実験した。タスクにマルチモーダル学習技術を適用する。 i)openmpループにおけるスレッド数、スケジューリングポリシー、チャンクサイズを最適化すること。 ii)openclカーネルの異種デバイスマッピングのための最善のデバイスを特定すること。実験の結果,このマルチモーダル学習に基づくアプローチは,すべての実験で最先端技術を上回ることがわかった。

Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoizing Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply our multimodal learning techniques to the tasks of i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.

翻訳日:2023-04-26 21:59:45 公開日:2023-04-25

# proto-value network: 補助タスクによる表現学習のスケーリング

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks ( http://arxiv.org/abs/2304.12567v1 )

ライセンス: Link先を確認

Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

(参考訳) 補助的タスクは、深層強化学習エージェントが学習した表現を改善する。分析学的には、それらの効果は合理的によく理解されているが、実際には、その主な用途は、表現の学習方法としてではなく、主要な学習目標を支持することである。多くの補助的なタスクが手続き的に定義されるので、環境に関する情報の本質的に無限の情報源として扱うことができるので、これはおそらく驚くべきことである。本研究は,エージェントネットワークのタスク数とサイズを同時に増加させる設定に着目し,豊かな表現を学習するための補助的タスクの有効性について検討する。この目的のために、後継の尺度に基づく補助タスクの新しいファミリーを導出する。これらのタスクは実装が容易であり、理論的な特性をアピールする。適切なオフポリシー学習ルールと組み合わせることで、結果は表現学習アルゴリズムであり、mahadevan & maggioni (2007)のproto-value関数を深層強化学習に拡張したものと解釈できる。アーケード学習環境における一連の実験を通じて,proto-valueネットワークは,線形近似と環境の報酬関数との相互作用(約4m)のみを用いて,確立されたアルゴリズムに匹敵する性能を得るための豊富な特徴を生成できることを実証した。

Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function.

翻訳日:2023-04-26 21:59:14 公開日:2023-04-25

# AdaNPC:テスト時間適応のための非パラメトリック分類器の探索

AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation ( http://arxiv.org/abs/2304.12566v1 )

ライセンス: Link先を確認

Yi-Fan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

(参考訳) 最近の機械学習タスクの多くは、未認識分布に一般化できるモデルの開発に重点を置いている。ドメイン一般化(DG)は、様々な分野において重要なトピックの一つとなっている。いくつかの文献では、DGはターゲットのドメイン情報を利用せずに任意に困難であることを示している。この問題に対処するため,テスト時適応(TTA)手法を提案する。既存のTTA手法では、推論段階でオフラインのターゲットデータや高度な最適化手順が必要となる。本研究では,テスト時間適応(AdaNPC)を実行するために非パラメトリック分類を用いる。特に、トレーニングドメインの特徴とラベルペアを含むメモリを構築します。推論中、テストインスタンスが与えられた場合、AdaNPCはまずメモリからK個のクローズドサンプルをリコールして予測を投票し、次にテスト機能と予測ラベルをメモリに追加する。このように、メモリ内のサンプル分布は、トレーニング分布からテスト分布へと徐々に変化し、余分な計算コストが少なくなる。提案手法の背後にある合理性を理論的に正当化する。さらに,広範な数値実験でモデルをテストする。 AdaNPCは様々なDGベンチマークの競争ベースラインを大幅に上回っている。特に、適応ターゲットが一連のドメインである場合、AdaNPCの適応精度は高度なTTA法よりも50%高い。コードはhttps://github.com/yfzhang114/AdaNPCで入手できる。

Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target data or extra sophisticated optimization procedures during the inference stage. In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC). In particular, we construct a memory that contains the feature and label pairs from training domains. During inference, given a test instance, AdaNPC first recalls K closed samples from the memory to vote for the prediction, and then the test feature and predicted label are added to the memory. In this way, the sample distribution in the memory can be gradually changed from the training distribution towards the test distribution with very little extra computation cost. We theoretically justify the rationality behind the proposed method. Besides, we test our model on extensive numerical experiments. AdaNPC significantly outperforms competitive baselines on various DG benchmarks. In particular, when the adaptation target is a series of domains, the adaptation accuracy of AdaNPC is 50% higher than advanced TTA methods. The code is available at https://github.com/yfzhang114/AdaNPC.

翻訳日:2023-04-26 21:58:51 公開日:2023-04-25

# 要求情報検索におけるChatGPTの予備評価

A Preliminary Evaluation of ChatGPT in Requirements Information Retrieval ( http://arxiv.org/abs/2304.12562v1 )

ライセンス: Link先を確認

Jianzhang Zhang, Yiyang Chen, Nan Niu, Chuang Liu

(参考訳) コンテキスト: 最近では、ChatGPTがプログラミングタスクを実行し、一般的なドメインの質問に答える素晴らしい能力を示しています。目的:我々は,ChatGPTが要求分析タスクでどのように機能するかを実証的に評価し,ChatGPTが表現する大規模言語モデルの生成が,要求工学における自然言語処理の研究と実践に与える影響について考察する。方法:2つの共通要件情報検索タスク,2つの典型的な要件アーチファクトを含む4つの公開データセット,ChatGPTとタスクプロンプトのクエリ,定量的および定性的な結果分析を含む評価パイプラインを設計する。結果: 定量的な結果から、ChatGPTはゼロショット設定ですべてのデータセットで同等またはそれ以上のF\beta$値を達成する。定性的分析は、ChatGPTの強力な自然言語処理能力と限定的な要求工学ドメイン知識を示している。結論: 評価結果から,chatgptはゼロショット設定下で複数の言語を含む異なるタイプのアーティファクトから要求情報を取得することができる。大規模言語モデルに基づく要求検索モデルの研究と,それに対応するツールの開発は,研究コミュニティや産業コミュニティにとって重要である。

Context: Recently, many illustrative examples have shown ChatGPT's impressive ability to perform programming tasks and answer general domain questions. Objective: We empirically evaluate how ChatGPT performs on requirements analysis tasks to derive insights into how generative large language model, represented by ChatGPT, influence the research and practice of natural language processing for requirements engineering. Method: We design an evaluation pipeline including two common requirements information retrieval tasks, four public datasets involving two typical requirements artifacts, querying ChatGPT with fixed task prompts, and quantitative and qualitative results analysis. Results: Quantitative results show that ChatGPT achieves comparable or better $F\beta$ values in all datasets under a zero-shot setting. Qualitative analysis further illustrates ChatGPT's powerful natural language processing ability and limited requirements engineering domain knowledge. Conclusion: The evaluation results demonstrate ChatGPT' impressive ability to retrieve requirements information from different types artifacts involving multiple languages under a zero-shot setting. It is worthy for the research and industry communities to study generative large language model based requirements retrieval models and to develop corresponding tools.

翻訳日:2023-04-26 21:58:34 公開日:2023-04-25

# TCR:ショートビデオのタイトル生成とアテンションリファインメントによるカバー選択

TCR: Short Video Title Generation and Cover Selection with Attention Refinement ( http://arxiv.org/abs/2304.12561v1 )

ライセンス: Link先を確認

Yakun Yu, Jiuding Yang, Weidong Guo, Hui Liu, Yu Xu, and Di Niu

(参考訳) ユーザー生成ショートビデオの普及に伴い、コンテンツクリエイターがコンテンツを潜在的視聴者に宣伝することはますます困難になっている。短いビデオのタイトルやカバーを自動的に生成することで、視聴者の注意を引くことができる。既存のビデオキャプションの研究は主に、視聴者の注意を引くためのビデオタイトルに適合しない行動の事実記述を生成することに焦点を当てている。さらに,マルチモーダル情報に基づくカバー選択の研究は少ない。これらの問題は、短いビデオタイトル生成とカバーセレクション(TG-CS)のジョイントタスクを具体的にサポートするための調整された方法の必要性と、研究を支援するための対応するデータセットの作成の必要性を動機付けている。本稿では,まず,魅力あるタイトルとカバー付きビデオを含む,SVTG(Short Video Title Generation)という実世界のデータセットを収集し,提示する。そこで我々は,TG-CS の注意再定義 (TCR) 手法を用いたタイトル生成とカバー選択を提案する。精錬手順は、モデルトレーニングを洗練させるために、各サンプル内の高品質なサンプルと高関連フレームとテキストトークンを段階的に選択する。広範にわたる実験により,tcr手法は既存の様々な字幕生成手法より優れており,ノイズの多い実世界のショートビデオに対して,より優れたカバーを選択できることを示した。

With the widespread popularity of user-generated short videos, it becomes increasingly challenging for content creators to promote their content to potential viewers. Automatically generating appealing titles and covers for short videos can help grab viewers' attention. Existing studies on video captioning mostly focus on generating factual descriptions of actions, which do not conform to video titles intended for catching viewer attention. Furthermore, research for cover selection based on multimodal information is sparse. These problems motivate the need for tailored methods to specifically support the joint task of short video title generation and cover selection (TG-CS) as well as the demand for creating corresponding datasets to support the studies. In this paper, we first collect and present a real-world dataset named Short Video Title Generation (SVTG) that contains videos with appealing titles and covers. We then propose a Title generation and Cover selection with attention Refinement (TCR) method for TG-CS. The refinement procedure progressively selects high-quality samples and highly relevant frames and text tokens within each sample to refine model training. Extensive experiments show that our TCR method is superior to various existing video captioning methods in generating titles and is able to select better covers for noisy real-world short videos.

翻訳日:2023-04-26 21:58:13 公開日:2023-04-25

# SwinFSR: SwinIRと周波数領域知識を用いたステレオ画像超解法

SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge ( http://arxiv.org/abs/2304.12556v1 )

ライセンス: Link先を確認

Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang and Jun Chen

(参考訳) ステレオ・イメージ・スーパーレゾリューション(stereosr)は近年、携帯電話、自動運転車、ロボットにデュアルカメラを多用するなど、大きな注目を集めている。本稿では,swiinirの拡張と高速フーリエ畳み込み(ffc)によって得られた周波数領域知識をもとに,swiinfsrという新しいステレオsr手法を提案する。具体的には、グローバル情報を効果的に収集するために、FFCを用いて周波数領域の知識を明示的に取り入れ、特徴抽出に残りのSwin Fourier Transformerブロック(RSFTB)を用いることで、SwinIRのResidual Swin Transformerブロック(RSTB)を変更する。また、ステレオビューの効率良く正確な融合のために、rcamと呼ばれる新しいクロスアテンションモジュールを提案し、最先端のクロスアテンションモジュールよりも少ない計算コストで高い競合性能を実現する。広範な実験結果とアブレーション実験により,提案するsinfsrの有効性と有効性が実証された。

Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.

翻訳日:2023-04-26 21:57:53 公開日:2023-04-25

# 対人訓練と対人訓練の併用

Combining Adversaries with Anti-adversaries in Training ( http://arxiv.org/abs/2304.12550v1 )

ライセンス: Link先を確認

Xiaoling Zhou, Nan Yang, Ou Wu

(参考訳) 敵対的トレーニングは、ディープニューラルネットワークの堅牢性を改善する効果的な学習技術である。本研究では,異なるサンプルが異なる摂動方向(対向方向,反対向方向)と様々な摂動境界を持つことができるというより一般的な摂動範囲の下で,対向学習が深層学習モデルに与える影響を理論的に検討した。理論的な考察から,学習における反逆者(反逆者摂動のサンプル)と反逆者(反逆者摂動のサンプル)の組み合わせは,いくつかの典型的な学習シナリオ(例えば,ノイズラベル学習と不均衡学習)において,クラス間の公正性向上と頑健性と一般化のトレードオフの改善に有効であることが示唆された。本研究の理論的知見に基づいて,各トレーニングサンプルに異なる境界を持つ敵と反敵を結合した,より一般的な学習目標を示す。メタ学習は組み合わせ重量を最適化するために利用される。異なる学習シナリオにおけるベンチマークデータセットの実験により,提案手法の有効性が検証された。

Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.

翻訳日:2023-04-26 21:57:33 公開日:2023-04-25

# coupa: オンラインからオフラインのサービスプラットフォームのための産業向けレコメンデーションシステム

COUPA: An Industrial Recommender System for Online to Offline Service Platforms ( http://arxiv.org/abs/2304.12549v1 )

ライセンス: Link先を確認

Sicong Xie, Binbin Hu, Fengze Li, Ziqi Liu, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

(参考訳) ユーザーが小売サービス(例えばエンターテイメントやダイニング)をローカルに発見することを支援するため、オンライン・トゥ・オフライン(O2O)サービスプラットフォームは近年人気を集めており、現在のレコメンデーターシステムに大きく挑戦している。 O2OサービスのフィードライクなシナリオであるAlipayの実際のデータから、当社のシナリオでは、繰り返しベースの時間パターンと位置バイアスが一般的に存在していることが分かり、推奨の有効性を脅かしている。そこで本研究では, ユーザの嗜好を特徴付ける産業システムであるCOUPAを提案する。(1) 時間意識的嗜好: 注意機構を備えた連続時間意識的ポイントプロセスを用いて, 推奨のための時間パターンを完全に把握する。 2)位置認識選好:位置個人化モジュールを備えた位置セレクタコンポーネントは、位置バイアスをパーソナライズするパーソナライズされた方法で緩和する。最後に、Alipay上でCOUPAを慎重に実装、デプロイし、エッジ、ストリーミング、バッチコンピューティング、および2段階のオンラインサービスモードで、いくつかの一般的な推奨シナリオをサポートする。我々は、COUPAが一貫して優れたパフォーマンスを達成し、助言のための直感的な証拠を提供する可能性を実証する広範な実験を行う。

Aiming at helping users locally discovery retail services (e.g., entertainment and dinning), Online to Offline (O2O) service platforms have become popular in recent years, which greatly challenge current recommender systems. With the real data in Alipay, a feeds-like scenario for O2O services, we find that recurrence based temporal patterns and position biases commonly exist in our scenarios, which seriously threaten the recommendation effectiveness. To this end, we propose COUPA, an industrial system targeting for characterizing user preference with following two considerations: (1) Time aware preference: we employ the continuous time aware point process equipped with an attention mechanism to fully capture temporal patterns for recommendation. (2) Position aware preference: a position selector component equipped with a position personalization module is elaborately designed to mitigate position bias in a personalized manner. Finally, we carefully implement and deploy COUPA on Alipay with a cooperation of edge, streaming and batch computing, as well as a two-stage online serving mode, to support several popular recommendation scenarios. We conduct extensive experiments to demonstrate that COUPA consistently achieves superior performance and has potential to provide intuitive evidences for recommendation

翻訳日:2023-04-26 21:57:12 公開日:2023-04-25

# MMRDN:オブジェクト指向シーンにおけるマルチビュー操作関係検出のための一貫性表現

MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes ( http://arxiv.org/abs/2304.12592v1 )

ライセンス: Link先を確認

Han Wang, Jiayuan Zhang, Lipeng Wan, Xingyu Chen, Xuguang Lan, Nanning Zheng

(参考訳) 操作関係検出(mrd)は、ロボットが物体を正しい順に掴むように誘導することを目的としており、物体の積み重ねられた場面における把持の安全性と信頼性を確保するために重要である。事前定義された視点から収集されたデータでトレーニングされたディープニューラルネットワークによる操作関係は、非構造化環境での視覚的転位に制限がある。マルチビューデータは、より包括的な空間情報を提供するが、マルチビューMDDの課題はドメインシフトである。本稿では,2次元および3次元マルチビューデータを用いて訓練を行うマルチビューmrdネットワーク(mmrdn)という,新しいマルチビュー融合フレームワークを提案する。異なるビューからの2Dデータを共通の隠れ空間に投影し、埋め込みをVon-Mises-Fisher分布の集合に適合させて一貫した表現を学習する。さらに、3Dデータ内の位置情報を利用して、各オブジェクト対の点雲からK$Maximum Vertical Neighbors (KMVN) 点のセットを選択し、これら2つのオブジェクトの相対的な位置を符号化する。最後に、多視点2Dデータと3Dデータの特徴を結合して、オブジェクトの相互関係を予測する。挑戦的なREGRADデータセットの実験結果から、MMRDNはマルチビューMDDタスクにおいて最先端の手法よりも優れていることが示された。また,合成データで学習したモデルが実世界のシナリオに移行できることも実証した。

Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-view data provide more comprehensive information in space, while a challenge of multi-view MRD is domain shift. In this paper, we propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN), which is trained by 2D and 3D multi-view data. We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions to learn the consistent representations. Besides, taking advantage of position information within the 3D data, we select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects. Finally, the features of multi-view 2D and 3D data are concatenated to predict the pairwise relationship of objects. Experimental results on the challenging REGRAD dataset show that MMRDN outperforms the state-of-the-art methods in multi-view MRD tasks. The results also demonstrate that our model trained by synthetic data is capable to transfer to real-world scenarios.

翻訳日:2023-04-26 21:51:12 公開日:2023-04-25

# コントラスト学習と一貫した意味・構造制約による教師なし合成画像の洗練

Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic and Structure Constraints ( http://arxiv.org/abs/2304.12591v1 )

ライセンス: Link先を確認

Ganning Zhao, Tingwei Shen, Suya You, and C.-C. Jay Kuo

(参考訳) ディープニューラルネットワーク(dnn)トレーニングには,コンピュータ生成合成画像のリアリズムの確保が不可欠である。合成されたデータセットと実世界のデータセットのセマンティックな分布が異なるため、合成された画像と精巧な画像の間にセマンティックなミスマッチが存在する。近年,相関パッチの抽出と非相関パッチの分離にコントラスト学習(cl)が成功している。本研究では,合成画像と精細画像間の意味的・構造的整合性を利用して,意味的歪みを低減するためにCLを採用する。さらに, 高い負のマイニングを取り入れて, さらなる性能向上を図る。定性的および定量的な測定値を用いた他のベンチマーク手法と比較し,本手法が最先端の性能を提供することを示す。

Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlated patches together and push uncorrelated ones apart. In this work, we exploit semantic and structural consistency between synthetic and refined images and adopt CL to reduce the semantic distortion. Besides, we incorporate hard negative mining to improve the performance furthermore. We compare the performance of our method with several other benchmarking methods using qualitative and quantitative measures and show that our method offers the state-of-the-art performance.

翻訳日:2023-04-26 21:50:45 公開日:2023-04-25

# ContrastMotion:大規模LiDAR点雲に対する自己教師型シーンモーション学習

ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds ( http://arxiv.org/abs/2304.12589v1 )

ライセンス: Link先を確認

Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma

(参考訳) 本稿では,BEV表現を用いたLiDARに基づく自律走行のための新しい自律走行推定器を提案する。通常採用されるデータレベルの構造一貫性のための自己教師付き戦略とは異なり、連続フレーム内の柱間の特徴レベルの一貫性を通じてシーンの動きを予測することにより、動的シーンにおけるノイズ点と視点交換点雲の影響を解消する。具体的には,識別的かつロバストな特徴を対比学習で学習するために,ネットワークにより疑似教師付き信号を提供する \textit{soft discriminative loss} を提案する。また,ポイントクラウドフレーム間の正当な補償を自動学習し,特徴抽出を促進する \textit{gated multi-frame fusion}ブロックを提案する。最後に、特徴距離に基づいて柱対応確率を予測し、さらにシーンの動きを予測するために、‘textit{pillar association} を提案する。広汎な実験により,シーンフローと動き予測の両タスクにおけるtextbf{ContrastMotion}の有効性と優位性を示した。コードはすぐに入手できる。

In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.

翻訳日:2023-04-26 21:50:30 公開日:2023-04-25

# MixNeRF: 特徴混在ハッシュテーブルを備えたメモリ効率の良いNeRF

MixNeRF: Memory Efficient NeRF with Feature Mixed-up Hash Table ( http://arxiv.org/abs/2304.12587v1 )

ライセンス: Link先を確認

Yongjae Lee, Li Yang and Deliang Fan

(参考訳) ニューラル・ラディアンス・フィールド(NeRF)はフォトリアリスティック・ノベルビューの生成において顕著な性能を示した。 NeRFの出現以来,多層パーセプトロン(MLP)ネットワークの複雑さを減らし,グリッドなどの明示的な構造を持つ特徴を管理することで,極めて高速なトレーニングを実現している研究が数多く行われている。しかし、高密度グリッドに格納するには大きなメモリスペースが必要であり、それによってコンピュータシステムのメモリボトルネックが発生し、トレーニング時間も大きくなる。そこで本研究では,メモリ効率を向上し,復元品質を維持しつつトレーニング時間を短縮するために混合ハッシュテーブルを用いたメモリ効率のよいnrfフレームワークであるmixnerfを提案する。まず,マルチレベル機能グリッドの一部を1つに適応的に混合し,単一のハッシュテーブルにマップする \textit{mixed-up hash table} を設計した。その後、グリッド点の正しいインデックスを得るために、任意のレベルグリッドのインデックスを標準グリッドのインデックスに変換する \textit{index transformation} 法をさらに設計する。最先端のInstant-NGP、TensoRF、DVGOとベンチマークした大規模な実験によると、MixNeRFは、同じGPUハードウェア上で、同様のあるいはそれ以上のリコンストラクション品質で、最速のトレーニング時間を達成できる。ソースコードは \url{https://github.com/nfyfamr/MixNeRF} で入手できる。

Neural radiance field (NeRF) has shown remarkable performance in generating photo-realistic novel views. Since the emergence of NeRF, many studies have been conducted, among which managing features with explicit structures such as grids has achieved exceptionally fast training by reducing the complexity of multilayer perceptron (MLP) networks. However, storing features in dense grids requires significantly large memory space, which leads to memory bottleneck in computer systems and thus large training time. To address this issue, in this work, we propose MixNeRF, a memory-efficient NeRF framework that employs a mixed-up hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. We first design a \textit{mixed-up hash table} to adaptively mix part of multi-level feature grids into one and map it to a single hash table. Following that, in order to obtain the correct index of a grid point, we further design an \textit{index transformation} method that transforms indices of an arbitrary level grid to those of a canonical grid. Extensive experiments benchmarking with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MixNeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality. Source code is available at \url{https://github.com/nfyfamr/MixNeRF}.

翻訳日:2023-04-26 21:50:14 公開日:2023-04-25

# 複素力学系における創発的組織のための物理インフォームド表現学習

Physics-Informed Representation Learning for Emergent Organization in Complex Dynamical Systems ( http://arxiv.org/abs/2304.12586v1 )

ライセンス: Link先を確認

Adam Rupe and Karthik Kashinath and Nalini Kumar and James P. Crutchfield

(参考訳) 非線形に相互作用するシステムコンポーネントは、新しい特性と異なる時空スケールで現象を生成する不安定性を導入することが多い。これは自発的自己組織化として知られ、熱力学的平衡から遠く離れた系においてユビキタスである。我々は,データ駆動型アルゴリズムを実践的に構築する,創発的組織のための理論的基盤的枠組みを導入する。ビルディングブロックは時空の光錐で、局所的な相互作用を通じてシステムがどのように情報を伝達するかを捉えます。複雑な時空間系において,光円錐,局所因果状態,組織的挙動,コヒーレント構造の予測等価クラスが成立することを示す。物理インフォームド機械学習アルゴリズムと高性能コンピューティング実装を用いて,実世界の領域科学問題に対する局所因果状態の適用性を実証した。局所因果状態が渦を捕捉し, 2次元乱流中におけるパワーロー崩壊挙動を示す。そして、既知の(ハリケーンや大気の川)と新しい極端な気象事象がピクセルレベルで識別され、高解像度の気候データで時間を通して追跡されることを示す。

Nonlinearly interacting system components often introduce instabilities that generate phenomena with new properties and at different space-time scales than the components. This is known as spontaneous self-organization and is ubiquitous in systems far from thermodynamic equilibrium. We introduce a theoretically-grounded framework for emergent organization that, via data-driven algorithms, is constructive in practice. Its building blocks are spacetime lightcones that capture how information propagates across a system through local interactions. We show that predictive equivalence classes of lightcones, local causal states, capture organized behaviors and coherent structures in complex spatiotemporal systems. Using our unsupervised physics-informed machine learning algorithm and a high-performance computing implementation, we demonstrate the applicability of the local causal states for real-world domain science problems. We show that the local causal states capture vortices and their power-law decay behavior in two-dimensional turbulence. We then show that known (hurricanes and atmospheric rivers) and novel extreme weather events can be identified on a pixel-level basis and tracked through time in high-resolution climate data.

翻訳日:2023-04-26 21:49:47 公開日:2023-04-25

# 光学顕微鏡観察から直接学習するイメージングメカニズム

Learning imaging mechanism directly from optical microscopy observations ( http://arxiv.org/abs/2304.12584v1 )

ライセンス: Link先を確認

Ze-Hao Wang (1 and 2), Long-Kun Shan (1 and 2), Tong-Tian Weng (1 and 2), Tian-Long Chen (3), Qi-Yu Wang (1 and 2), Xiang-Dong Chen (1, 2 and 4), Zhang-Yang Wang (3), Guang-Can Guo (1, 2 and 4), Fang-Wen Sun (1, 2 and 4) ((1) CAS Key Laboratory of Quantum Information, University of Science and Technology of China, Hefei, 230026, China, (2) CAS Center For Excellence in Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, 230026, China, (3) University of Texas at Austin, Austin, TX 78705, USA, (4) Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China)

(参考訳) 光顕微鏡画像はナノ世界の直接可視化を通じて科学研究において重要な役割を担い、そこではイメージング機構は点拡散関数(PSF)とエミッターの畳み込みとして記述される。 PSFや同等のPSFの事前知識に基づいて、ナノ世界のより正確な探査が可能である。しかし,psfを顕微鏡画像から直接抽出することは極めて困難である。本稿では,自己教師型学習の助けを借りて,生の顕微鏡画像から直接PSFとエミッタの学習可能な推定を可能にする物理インフォームドマスク付きオートエンコーダ(PiMAE)を提案する。本手法を合成データと実世界実験で実証し,高い精度と雑音のロバスト性を示した。 PiMAEは、正規化ルート平均二乗誤差(NRMSE)測定値によって測定された平均19.6\%と50.7\%(35タスク)で、合成データタスクにおいてDeepSTORMとRichardson-Lucyアルゴリズムを上回っている。これは、DeepSTORMで使われている教師ありアプローチや、リチャードソン・ルーシーアルゴリズムで知られているPSF仮定とは対照的である。本手法は,光学顕微鏡における隠蔽イメージング機構の実現に有効な手法であり,さらに多くのシステムで隠蔽機構を学習することができる。

Optical microscopy image plays an important role in scientific research through the direct visualization of the nanoworld, where the imaging mechanism is described as the convolution of the point spread function (PSF) and emitters. Based on a priori knowledge of the PSF or equivalent PSF, it is possible to achieve more precise exploration of the nanoworld. However, it is an outstanding challenge to directly extract the PSF from microscopy images. Here, with the help of self-supervised learning, we propose a physics-informed masked autoencoder (PiMAE) that enables a learnable estimation of the PSF and emitters directly from the raw microscopy images. We demonstrate our method in synthetic data and real-world experiments with significant accuracy and noise robustness. PiMAE outperforms DeepSTORM and the Richardson-Lucy algorithm in synthetic data tasks with an average improvement of 19.6\% and 50.7\% (35 tasks), respectively, as measured by the normalized root mean square error (NRMSE) metric. This is achieved without prior knowledge of the PSF, in contrast to the supervised approach used by DeepSTORM and the known PSF assumption in the Richardson-Lucy algorithm. Our method, PiMAE, provides a feasible scheme for achieving the hidden imaging mechanism in optical microscopy and has the potential to learn hidden mechanisms in many more systems.

翻訳日:2023-04-26 21:49:29 公開日:2023-04-25

# 非定常環境における動的システムのリアルタイム安全性評価:方法と手法のレビュー

Real-time Safety Assessment of Dynamic Systems in Non-stationary Environments: A Review of Methods and Techniques ( http://arxiv.org/abs/2304.12583v1 )

ライセンス: Link先を確認

Zeyi Liu and Songqiao Hu and Xiao He

(参考訳) 動的システムのリアルタイム安全性評価(RTSA)は,特に非定常環境において,産業や輸送などの分野において重要な意味を持つ重要な課題である。しかし,非定常環境におけるリアルタイム安全性評価手法の包括的レビューの欠如は,関連手法の進歩と洗練を妨げている。本稿では,非定常環境におけるRTSAタスクの手法と手法について概説する。特に、非定常環境におけるrtsaアプローチの背景と意義を最初に強調する。次に、定義、分類、および主な課題をカバーする問題記述を示す。本稿では,オンラインアクティブラーニング,オンラインセミ教師付きラーニング,オンライン転送学習,オンライン異常検出といった関連技術の最近の進歩を概観する。最後に,今後の展望と今後の研究の方向性について論じる。本総説は,非定常環境におけるリアルタイム安全評価手法の総合的かつ最新の概観を提供することを目的としており,この分野の研究者や実践者にとって貴重な資源となる。

Real-time safety assessment (RTSA) of dynamic systems is a critical task that has significant implications for various fields such as industrial and transportation applications, especially in non-stationary environments. However, the absence of a comprehensive review of real-time safety assessment methods in non-stationary environments impedes the progress and refinement of related methods. In this paper, a review of methods and techniques for RTSA tasks in non-stationary environments is provided. Specifically, the background and significance of RTSA approaches in non-stationary environments are firstly highlighted. We then present a problem description that covers the definition, classification, and main challenges. We review recent developments in related technologies such as online active learning, online semi-supervised learning, online transfer learning, and online anomaly detection. Finally, we discuss future outlooks and potential directions for further research. Our review aims to provide a comprehensive and up-to-date overview of real-time safety assessment methods in non-stationary environments, which can serve as a valuable resource for researchers and practitioners in this field.

翻訳日:2023-04-26 21:49:04 公開日:2023-04-25

# 学習軌跡は一般化指標である

Learning Trajectories are Generalization Indicators ( http://arxiv.org/abs/2304.12579v1 )

ライセンス: Link先を確認

Jingwen Fu, Zhizheng Zhang, Dacheng Yin, Yan Lu, Nanning Zheng

(参考訳) 本稿では,深層ニューラルネットワーク(dnn)の学習軌跡と,それに対応する一般化能力との関係について,広範に使用される勾配降下法と確率的勾配降下法を用いて検討する。本稿では,軌道情報をモデル化するための線形近似関数を構築し,それに基づくよりリッチな軌道情報を持つ新しい一般化を提案する。提案する一般化は,学習軌跡の複雑さと,学習集合のバイアスと多様性の比率に依存する。実験結果から,提案手法は様々な学習段階,学習率,ラベルノイズレベルの一般化傾向を効果的に捉えていることがわかった。

The aim of this paper is to investigate the connection between learning trajectories of the Deep Neural Networks (DNNs) and their corresponding generalization capabilities when being optimized with broadly used gradient descent and stochastic gradient descent algorithms. In this paper, we construct Linear Approximation Function to model the trajectory information and we propose a new generalization bound with richer trajectory information based on it. Our proposed generalization bound relies on the complexity of learning trajectory and the ratio between the bias and diversity of training set. Experimental results indicate that the proposed method effectively captures the generalization trend across various training steps, learning rates, and label noise levels.

翻訳日:2023-04-26 21:48:50 公開日:2023-04-25

# CPUアーキテクチャの高レベルループとテンソル抽象化によるディープラーニングとHPCカーネルの調和

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures ( http://arxiv.org/abs/2304.12576v1 )

ライセンス: Link先を確認

Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke

(参考訳) 過去10年間で、ディープラーニング(DL)アルゴリズム、プログラミングシステム、ハードウェアは、ハイパフォーマンスコンピューティング(HPC)のアルゴリズムと融合してきた。それでも、DLとHPCシステムのプログラミング手法は停滞しており、高度に最適化されているが、プラットフォーム固有の非フレキシブルなベンダー最適化ライブラリに依存している。このようなライブラリは、特定のプラットフォーム、カーネル、その形状に対して、ベンダーが専用の最適化作業を行っているのに対して、残りのユースケースではパフォーマンスが劣り、パフォーマンスガラスジャウの非可搬性コードが得られる。この研究は、現代的なCPUアーキテクチャのための効率的でポータブルなDLとHPCカーネルを開発するためのフレームワークを導入する。カーネル開発を2つのステップで分解する。 1)テンソル処理プリミティブ(tpps: compact, versatile set of 2d-tensor operator)を用いた計算コアの表現 2) TPPのまわりの論理ループを高水準で宣言的に表現するのに対して, 正確なインスタンス化(順序付け, タイリング, 並列化)は単純なノブによって決定される。我々は、スタンドアロンカーネルと、さまざまなCPUプラットフォームにおける最先端実装よりも優れたエンドツーエンドワークロードを使用して、このアプローチの有効性を実証する。

During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant, relying on highly-optimized, yet platform-specific and inflexible vendor-optimized libraries. Such libraries provide close-to-peak performance on specific platforms, kernels and shapes thereof that vendors have dedicated optimizations efforts, while they underperform in the remaining use-cases, yielding non-portable codes with performance glass-jaws. This work introduces a framework to develop efficient, portable DL and HPC kernels for modern CPU architectures. We decompose the kernel development in two steps: 1) Expressing the computational core using Tensor Processing Primitives (TPPs): a compact, versatile set of 2D-tensor operators, 2) Expressing the logical loops around TPPs in a high-level, declarative fashion whereas the exact instantiation (ordering, tiling, parallelization) is determined via simple knobs. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.

翻訳日:2023-04-26 21:48:38 公開日:2023-04-25

# 真理探索アルゴリズムの公正性とバイアス:実験的解析

Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis ( http://arxiv.org/abs/2304.12573v1 )

ライセンス: Link先を確認

Simone Lazier, Saravanan Thirumuruganathan, Hadis Anahideh

(参考訳) 機械学習(ML)ベースのアプローチは、社会的影響のある多くのアプリケーションでますます使われている。 mlモデルのトレーニングには大量のラベル付きデータが必要であり、クラウドソーシングは複数のワーカーからラベルを取得するための主要なパラダイムである。群衆労働者は信頼できないラベルを提供し、これに対処するために、多数決のような真理発見(TD)アルゴリズムを適用して、労働者の反応の矛盾からコンセンサスラベルを決定する。しかし、これらのコンセンサスラベルは、性別、人種、政治的所属といったセンシティブな属性に基づいてバイアスを受ける可能性があることに注意する必要がある。センシティブな属性が関与していない場合でも、ラベルは毒性のような主観的な側面の異なる視点でバイアスを受けることができる。本稿では,TDアルゴリズムのバイアスと公平性を系統的に研究する。既存の2つの群集ラベル付きデータセットを用いて,非自明な割合の労働者が偏りのある結果をもたらし,TDに対する単純なアプローチが準最適であることを明らかにする。私たちの研究は、一般的なTDアルゴリズムがパナセアではないことも示しています。さらに,このような不公平な作業員が下流のmlタスクに与える影響を定量化し,公平性とラベルバイアスの修正が非効率であることを示す。我々はこれらの問題を改善できる新しいバイアス対応の真理発見アルゴリズムの設計を訴えて、論文を締めくくります。

Machine learning (ML) based approaches are increasingly being used in a number of applications with societal impact. Training ML models often require vast amounts of labeled data, and crowdsourcing is a dominant paradigm for obtaining labels from multiple workers. Crowd workers may sometimes provide unreliable labels, and to address this, truth discovery (TD) algorithms such as majority voting are applied to determine the consensus labels from conflicting worker responses. However, it is important to note that these consensus labels may still be biased based on sensitive attributes such as gender, race, or political affiliation. Even when sensitive attributes are not involved, the labels can be biased due to different perspectives of subjective aspects such as toxicity. In this paper, we conduct a systematic study of the bias and fairness of TD algorithms. Our findings using two existing crowd-labeled datasets, reveal that a non-trivial proportion of workers provide biased results, and using simple approaches for TD is sub-optimal. Our study also demonstrates that popular TD algorithms are not a panacea. Additionally, we quantify the impact of these unfair workers on downstream ML tasks and show that conventional methods for achieving fairness and correcting label biases are ineffective in this setting. We end the paper with a plea for the design of novel bias-aware truth discovery algorithms that can ameliorate these issues.

翻訳日:2023-04-26 21:48:16 公開日:2023-04-25

# 遺伝的にインスパイアされた対流伝熱促進

Genetically-inspired convective heat transfer enhancement ( http://arxiv.org/abs/2304.12618v1 )

ライセンス: Link先を確認

Rodrigo Castellanos and Andrea Ianiro and Stefano Discetti

(参考訳) 平坦なプレート上の乱流境界層(TBL)における対流熱伝達を、線形遺伝的アルゴリズム制御(LGAC)に基づく人工知能アプローチを用いて促進する。アクチュエータは、フリーストリームに整列した6つのスロットジェットの集合である。開ループ最適周期強制は、キャリア周波数、デューティサイクル、アクチュエータ間の位相差を制御パラメータとして定義する。制御法則は、未飽和のTBLと定常ジェットによる作動に関して最適化される。コスト関数は、壁対流熱伝達率とアクチュエータのコストを含む。制御器の性能は赤外線サーモグラフィにより評価され、粒子画像速度測定でも特徴付けられる。最適制御器はわずかに非対称な流れ場を与える。 LGACアルゴリズムは、すべてのアクチュエータに対して同じ周波数とデューティサイクルに収束する。この周波数は, 壁近傍で発生する大規模乱流構造の特性移動時間の逆数と著しく等しいことに注意が必要である。複数のジェットアクチュエータ間の位相差は非常に関係があることが示され、フロー非対称性の主要因となった。その結果、アクティベーション空間内の未探索のコントローラに対する機械学習制御の可能性が特定される。さらに,本研究は,高度な計測技術と高度なアルゴリズムを併用した実験研究の可能性を実証するものである。

The convective heat transfer in a turbulent boundary layer (TBL) on a flat plate is enhanced using an artificial intelligence approach based on linear genetic algorithms control (LGAC). The actuator is a set of six slot jets in crossflow aligned with the freestream. An open-loop optimal periodic forcing is defined by the carrier frequency, the duty cycle and the phase difference between actuators as control parameters. The control laws are optimised with respect to the unperturbed TBL and to the actuation with a steady jet. The cost function includes the wall convective heat transfer rate and the cost of the actuation. The performance of the controller is assessed by infrared thermography and characterised also with particle image velocimetry measurements. The optimal controller yields a slightly asymmetric flow field. The LGAC algorithm converges to the same frequency and duty cycle for all the actuators. It is noted that such frequency is strikingly equal to the inverse of the characteristic travel time of large-scale turbulent structures advected within the near-wall region. The phase difference between multiple jet actuation has shown to be very relevant and the main driver of flow asymmetry. The results pinpoint the potential of machine learning control in unravelling unexplored controllers within the actuation space. Our study furthermore demonstrates the viability of employing sophisticated measurement techniques together with advanced algorithms in an experimental investigation.

翻訳日:2023-04-26 21:41:10 公開日:2023-04-25

# 双方向セマンティック整合性制約を用いた弱覚的時間的行動定位

Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint ( http://arxiv.org/abs/2304.12616v1 )

ライセンス: Link先を確認

Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao

(参考訳) WTAL(Weakly Supervised Temporal Action Localization)は、トレーニングデータセット内のビデオレベルのカテゴリラベルのみを考慮し、ビデオに対するアクションの時間的境界を分類し、ローカライズすることを目的としている。トレーニング中の境界情報の欠如により、既存のアプローチではwtalを分類問題、すなわち局所化のための時間クラス活性化マップ(t-cam)の生成として定式化している。しかし、分類損失のみの場合、モデルはサブ最適化されるため、アクション関連のシーンは異なるクラスラベルを区別するのに十分である。アクション関連シーンにおける他のアクション(すなわち、ポジティブアクションと同じシーン)について、このサブ最適化モデルは、コシーンアクションをポジティブアクションと誤分類する。この誤分類に対処するために,双方向意味一貫性制約(bi-scc)という,単純かつ効率的な手法を提案する。提案するbi-sccは,まず,映像間における肯定的行動とコシーン的動作の相関関係を破る拡張映像を生成するために,時間的文脈拡張を採用し,その後に意味的一貫性制約(scc)を用いて,オリジナル映像と拡張映像の予測を一貫性を持たせ,コシーン動作を抑制する。しかし、この拡張ビデオは、当初の時間的文脈を破壊してしまう。一貫性の制約を単純に適用すれば、局所化されたポジティブアクションの完全性に影響を及ぼす。そこで我々は,オリジナルビデオと拡張ビデオの相互監督により,協調行動の抑制と肯定的行動の整合性を確保しつつ,双方向的にSCCを増強する。最後に,提案するBi-SCCを現在のWTALアプローチに適用し,その性能を向上する。実験の結果,THUMOS14およびActivityNetの最先端手法よりも優れた性能を示した。

Weakly Supervised Temporal Action Localization (WTAL) aims to classify and localize temporal boundaries of actions for the video, given only video-level category labels in the training datasets. Due to the lack of boundary information during training, existing approaches formulate WTAL as a classificationproblem, i.e., generating the temporal class activation map (T-CAM) for localization. However, with only classification loss, the model would be sub-optimized, i.e., the action-related scenes are enough to distinguish different class labels. Regarding other actions in the action-related scene ( i.e., the scene same as positive actions) as co-scene actions, this sub-optimized model would misclassify the co-scene actions as positive actions. To address this misclassification, we propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi-SCC), to discriminate the positive actions from co-scene actions. The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions. However, we find that this augmented video would destroy the original temporal context. Simply applying the consistency constraint would affect the completeness of localized positive actions. Hence, we boost the SCC in a bidirectional way to suppress co-scene actions while ensuring the integrity of positive actions, by cross-supervising the original and augmented videos. Finally, our proposed Bi-SCC can be applied to current WTAL approaches, and improve their performance. Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.

翻訳日:2023-04-26 21:40:52 公開日:2023-04-25

# STM-UNet:スウィントランスとマルチスケールMLPを用いた医用画像分割のための効率的なU字型アーキテクチャ

STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation ( http://arxiv.org/abs/2304.12615v1 )

ライセンス: Link先を確認

Lei Shi, Tianyu Gao, Zheng Zhang and Junxing Zhang

(参考訳) 自動医療画像分割は、医師がより早く正確に診断するのに役立つ。近年,医用画像分割のための深層学習モデルが大きな進歩を遂げている。しかし、既存のモデルはu字型アーキテクチャを効率的に改善するためにトランスフォーマーやmlpを効果的に活用できなかった。さらに,MLPのマルチスケール特徴は,U字型アーキテクチャのボトルネックにおいて完全に抽出されていない。本稿では,Swin TransformerとマルチスケールMLP,すなわちSTM-UNetに基づく効率的なU字型アーキテクチャを提案する。特に、スウィントランスブロックは、残留接続の形でstm-unetの接続をスキップするために追加され、グローバル特徴のモデリング能力と長距離依存性を高めることができる。一方,並列畳み込みモジュールを備えた新しいpcas-mlpは,セグメンテーション性能の向上に寄与するため,アーキテクチャのボトルネックとして設計・実装されている。 isic 2016とisic 2018の実験結果は,提案手法の有効性を示している。また,本手法はIoUとDiceの観点から,最先端の手法よりも優れている。提案手法は,高セグメンテーション精度と低モデル複雑性とのトレードオフを向上した。

Automated medical image segmentation can assist doctors to diagnose faster and more accurate. Deep learning based models for medical image segmentation have made great progress in recent years. However, the existing models fail to effectively leverage Transformer and MLP for improving U-shaped architecture efficiently. In addition, the multi-scale features of the MLP have not been fully extracted in the bottleneck of U-shaped architecture. In this paper, we propose an efficient U-shaped architecture based on Swin Transformer and multi-scale MLP, namely STM-UNet. Specifically, the Swin Transformer block is added to skip connection of STM-UNet in form of residual connection, which can enhance the modeling ability of global features and long-range dependency. Meanwhile, a novel PCAS-MLP with parallel convolution module is designed and placed into the bottleneck of our architecture to contribute to the improvement of segmentation performance. The experimental results on ISIC 2016 and ISIC 2018 demonstrate the effectiveness of our proposed method. Our method also outperforms several state-of-the-art methods in terms of IoU and Dice. Our method has achieved a better trade-off between high segmentation accuracy and low model complexity.

翻訳日:2023-04-26 21:40:19 公開日:2023-04-25

# ゼーマン場におけるスピン軌道結合を持つ量子磁石中のマグノンのボース・アインシュタイン凝縮

Bose-Einstein condensations of magnons in quantum magnets with spin-orbit coupling in a Zeeman field ( http://arxiv.org/abs/2304.12612v1 )

ライセンス: Link先を確認

Fadi Sun and Jinwu Ye

(参考訳) スピン軌道結合(soc)を有する量子磁石のゼーマン場への応答を,効果的な動作の構築と再正規化群(rg)解析により検討した。低$ h_{c1} $ と上限臨界場 $ h_{c2} $ の量子相転移には、コンペンサート (c-) または in-commensurate (ic-) でのマグノン凝縮によって駆動される新しいクラスが存在する。中間IC-スカイミオン結晶(IC-SkX)相は、k_0$とラベルされたRGフローの固定点の列によって制御される。我々は、IC-SkX相のスピン軌道構造を決定する実効作用の量子スピンと順序パラメータの関係を導出する。また、ic-skx内のエキゾチック励起スペクトルを決定する$ h_{c1} $および$ h_{c2} $付近の演算子内容も分析した。 c-およびic-momentaにおけるマグノン凝縮の差を考察した。 2つの臨界場 $ h_{c1} < h_{c2} $ と中間 IC-SkX 相はゼーマン場における SOC を持つ任意の量子磁石の一般的な特徴である。ゼーマン場におけるSOCを含むいくつかの材料や低温原子系への実験的含意を示す。

We study the response of a quantum magnet with spin-orbit coupling (SOC) to a Zeeman field by constructing effective actions and performing Renormalization Group (RG) analysis. There are several novel classes of quantum phase transitions at a low $ h_{c1} $ and an upper critical field $ h_{c2} $ driven by magnon condensations at commensurate (C-) or in-commensurate (IC-) momenta $ 0 < k_0 < \pi $. The intermediate IC- Skyrmion crystal (IC-SkX) phase is controlled by a line of fixed points in the RG flows labeled by $ k_0 $. We derive the relations between the quantum spin and the order parameters of the effective actions which determine the spin-orbital structures of the IC-SkX phase. We also analyze the operator contents near $ h_{c1} $ and $ h_{c2} $ which determine the exotic excitation spectra inside the IC-SkX. The intrinsic differences between the magnon condensations at the C- and IC- momenta are explored. The two critical fields $ h_{c1} < h_{c2} $ and the intermediate IC-SkX phase could be a generic feature to any quantum magnets with SOC in a Zeeman field. Experimental implications to some materials or cold atom systems with SOC in a Zeeman field are presented.

翻訳日:2023-04-26 21:40:00 公開日:2023-04-25

# 不確かさ・劣化ヒステリシスシステムのモデリングのための双方向DeepONetアプローチ

A Bi-fidelity DeepONet Approach for Modeling Uncertain and Degrading Hysteretic Systems ( http://arxiv.org/abs/2304.12609v1 )

ライセンス: Link先を確認

Subhayan De and Patrick T. Brewick

(参考訳) ヒステリシスの低下のような非線形系は、工学的応用においてしばしば発生する。また,不確実性の存在やそのようなシステムのモデル化がますます困難になっている。一方で、劣化効果の性質を知らずに開発された原始モデルからのデータセットを容易に得ることができる。本稿では,ヒステリティックシステムの劣化効果を,ディープオペレータネットワーク(deeponet)を訓練する真のシステムの動作の重要な特性の多くをキャプチャする低忠実性表現として考慮せずに,原始モデルからのデータセットを使用する。 3つの数値例を用いて,低忠実度モデルと真のシステムの応答との差をモデル化するためにdeeponetsを用いた場合,モデルパラメータの不確実性が存在する場合の予測誤差が大幅に向上することを示す。

Nonlinear systems, such as with degrading hysteretic behavior, are often encountered in engineering applications. In addition, due to the ubiquitous presence of uncertainty and the modeling of such systems becomes increasingly difficult. On the other hand, datasets from pristine models developed without knowing the nature of the degrading effects can be easily obtained. In this paper, we use datasets from pristine models without considering the degrading effects of hysteretic systems as low-fidelity representations that capture many of the important characteristics of the true system's behavior to train a deep operator network (DeepONet). Three numerical examples are used to show that the proposed use of the DeepONets to model the discrepancies between the low-fidelity model and the true system's response leads to significant improvements in the prediction error in the presence of uncertainty in the model parameters for degrading hysteretic systems.

翻訳日:2023-04-26 21:39:29 公開日:2023-04-25

# 医療保険コスト予測における回帰モデルの性能評価

Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance ( http://arxiv.org/abs/2304.12605v1 )

ライセンス: Link先を確認

Jonelle Angelo S. Cenita, Paul Richie F. Asuncion, Jayson M. Victoriano

(参考訳) 本研究は,医療保険の費用予測における回帰モデルの性能評価を目的とした。機械学習における3つの回帰モデル、すなわち線形回帰、グラディエントブースティング、サポートベクトルマシンが使用された。性能はRMSE(Root Mean Square)、r2(R Square)、K-Fold Cross-validationを用いて評価される。この研究はまた、医療保険のコストを予測する上で最も重要な特徴を指摘し、データベース(KDD)プロセスにおける知識発見に依存している。 (KDD)プロセスとは、データから有用な知識を発見するプロセス全般を指す。その結果, 3つの回帰モデルのうち, 勾配ブースティングが最も高い r2 (r 平方) 0.892 と最低 rmse (根平均平方根) 1336.594 が得られた。さらに,3つの回帰モデルのr2(R角)結果と10Foldクロスバリデーション重み付き平均値の有意差は認められなかった。また、記述統計のボックスプロットを用いた探索データ解析(eda)では、電荷と喫煙者の特徴として、あるグループの中央値が他のグループのボックスの外側にあることが観察されているため、2つのグループの間には違いがある。グラディエント・ブースティングは3つの回帰モデルでより良い性能を発揮すると結論付けている。 K-Fold Cross-Validation は、3つの回帰モデルが良いと結論付けた。さらに、記述統計のボックスプロットを用いた探索データ分析(EDA)では、最も高い料金は喫煙者の特徴によるものであると断定する。

The study aimed to evaluate the regression models' performance in predicting the cost of medical insurance. The Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used. The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation. The study also sought to pinpoint the feature that would be most important in predicting the cost of medical insurance.The study is anchored on the knowledge discovery in databases (KDD) process. (KDD) process refers to the overall process of discovering useful knowledge from data. It show the performance evaluation results reveal that among the three (3) Regression models, Gradient boosting received the highest r2 (R Square) 0.892 and the lowest RMSE (Root Mean Square) 1336.594. Furthermore, the 10-Fold Cross-validation weighted mean findings are not significantly different from the r2 (R Square) results of the three (3) regression models. In addition, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics observed that in the charges and smoker features the median of one group lies outside of the box of the other group, so there is a difference between the two groups. It concludes that Gradient boosting appears to perform better among the three (3) regression models. K-Fold Cross-Validation concluded that the three (3) regression models are good. Moreover, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics ceases that the highest charges are due to the smoker feature.

翻訳日:2023-04-26 21:39:15 公開日:2023-04-25

# 時間知識グラフ推論のための適応パスメモリネットワーク

Adaptive Path-Memory Network for Temporal Knowledge Graph Reasoning ( http://arxiv.org/abs/2304.12604v1 )

ライセンス: Link先を確認

Hao Dong, Zhiyuan Ning, Pengyang Wang, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou, Yanjie Fu

(参考訳) 時間的知識グラフ(TKG)推論は、歴史情報に基づく未来の行方不明事実の予測を目的としており、近年研究の関心が高まっている。推論作業の歴史的構造と時間的特性をモデル化するための多くの作品が作成されている。ほとんどの既存の作業は、主にエンティティ表現に依存するグラフ構造をモデル化している。しかしながら、現実のシナリオにおけるtkgエンティティの大きさは相当であり、時間が経つにつれて新しいエンティティが増えている。そこで本研究では,問合せ対象と各対象候補間の時間的経路情報を履歴時間を通して適応的にモデル化する適応パスメモリネットワーク(daemon)というtkgの特徴を持つ新しいアーキテクチャモデルを提案する。実体表現に頼らずに歴史的な情報をモデル化する。具体的には、DaeMonはパスメモリを使用して、隣接するタイムスタンプ間のメモリパス戦略を考慮して、パス集約ユニットから得られた時間パス情報を記録する。実世界の4つのTKGデータセットで実施された大規模な実験により、提案モデルが大幅に性能向上し、MRRにおいて最大4.8%の絶対値を達成することを示した。

Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently. Lots of works have been made to model the historical structural and temporal characteristics for the reasoning task. Most existing works model the graph structure mainly depending on entity representation. However, the magnitude of TKG entities in real-world scenarios is considerable, and an increasing number of new entities will arise as time goes on. Therefore, we propose a novel architecture modeling with relation feature of TKG, namely aDAptivE path-MemOry Network (DaeMon), which adaptively models the temporal path information between query subject and each object candidate across history time. It models the historical information without depending on entity representation. Specifically, DaeMon uses path memory to record the temporal path information derived from path aggregation unit across timeline considering the memory passing strategy between adjacent timestamps. Extensive experiments conducted on four real-world TKG datasets demonstrate that our proposed model obtains substantial performance improvement and outperforms the state-of-the-art up to 4.8% absolute in MRR.

翻訳日:2023-04-26 21:38:49 公開日:2023-04-25

# 深層学習は純粋数学者にとって有用なツールか?

Is deep learning a useful tool for the pure mathematician? ( http://arxiv.org/abs/2304.12602v1 )

ライセンス: Link先を確認

Geordie Williamson

(参考訳) 純粋数学者がディープラーニングのツールを研究で使う際に期待するものを、個人的および非公式に説明します。

A personal and informal account of what a pure mathematician might expect when using tools from deep learning in their research.

翻訳日:2023-04-26 21:38:28 公開日:2023-04-25

# セグメンテーション・オールモデルの土木インフラ欠陥評価への適用

Application of Segment Anything Model for Civil Infrastructure Defect Assessment ( http://arxiv.org/abs/2304.12600v1 )

ライセンス: Link先を確認

Mohsen Ahmadi, Ahmad Gholizadeh Lonbar, Abbas Sharifi, Ali Tarlani Beris, Mohammadsadegh Nouri, Amir Sharifzadeh Javidi

(参考訳) 本研究では,コンクリート構造物のひび割れ検出のための2つの深層学習モデルSAMとU-Netの性能評価を行う。その結果, 各モデルにはそれぞれ, 異なる種類のひび割れを検出するための強みと限界があることがわかった。 SAMのユニークな亀裂検出手法を用いて、画像は亀裂の位置を識別する様々な部分に分割され、縦断裂の検出をより効果的にする。一方、U-Netモデルは正のラベル画素を識別し、スポーリングクラックのサイズと位置を正確に検出する。両モデルを組み合わせることで、より正確で包括的なき裂検出結果が得られる。コンクリート構造物の安全性と長寿命を確保するためには, ひび割れ検出に先進技術を用いることが重要である。本研究は, 橋梁, 建物, 道路など, 各種コンクリート構造物にSAMおよびU-Netモデルを用いることで, ひび割れ検出の精度と効率を向上し, 維持・修理に要する時間と資源を削減できることから, 土木工学に重要な意味を持つ可能性がある。結論として,本研究で提示されたSAMおよびU-Netモデルは,コンクリート構造物のひび割れを検知し,より正確かつ包括的な結果をもたらすような両モデルの強度を活用する,有望なソリューションを提供する。

This research assesses the performance of two deep learning models, SAM and U-Net, for detecting cracks in concrete structures. The results indicate that each model has its own strengths and limitations for detecting different types of cracks. Using the SAM's unique crack detection approach, the image is divided into various parts that identify the location of the crack, making it more effective at detecting longitudinal cracks. On the other hand, the U-Net model can identify positive label pixels to accurately detect the size and location of spalling cracks. By combining both models, more accurate and comprehensive crack detection results can be achieved. The importance of using advanced technologies for crack detection in ensuring the safety and longevity of concrete structures cannot be overstated. This research can have significant implications for civil engineering, as the SAM and U-Net model can be used for a variety of concrete structures, including bridges, buildings, and roads, improving the accuracy and efficiency of crack detection and saving time and resources in maintenance and repair. In conclusion, the SAM and U-Net model presented in this study offer promising solutions for detecting cracks in concrete structures and leveraging the strengths of both models that can lead to more accurate and comprehensive results.

翻訳日:2023-04-26 21:38:25 公開日:2023-04-25

# 変圧器とUNetの深層学習モデルによる舗装き裂の検出

Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet ( http://arxiv.org/abs/2304.12596v1 )

ライセンス: Link先を確認

Yu Zhang and Lin Zhang

(参考訳) 破壊は、建物や道路などのエンジニアリング構造の主要な破壊モードの1つである。表面き裂の効果的検出は損傷評価と構造維持に重要である。近年,深層学習技術の出現と発展により,表面き裂検出が容易になる可能性が示唆されている。現在、ほとんどのタスクは畳み込みニューラルネットワーク(CNN)によって実行されており、CNNの制限は、最近導入されたトランスフォーマーアーキテクチャによって改善される可能性がある。本研究では, モデル精度, 計算複雑性, モデル安定性により, 舗装面き裂検出の性能を評価するための9つの有望モデルについて検討した。亀裂ラベル付き224×224ピクセルの711画像を作成し、最適損失関数を選択し、検証データセットとテストデータセットの評価指標を比較し、データ詳細を分析し、各モデルのセグメンテーション結果を確認した。一般に、トランスフォーマーベースのモデルは、トレーニングプロセス中に収束しやすく、精度が高いが、通常、メモリ消費が増加し、処理効率が低下する。 9つのモデルのうち、スウィノネットは他の2つのトランスフォーマーよりも優れており、9つのモデルの中で最も高い精度を示している。その結果,様々な深層学習モデルによる表面き裂検出に光を当て,今後の応用の指針を提供する必要がある。

Fracture is one of the main failure modes of engineering structures such as buildings and roads. Effective detection of surface cracks is significant for damage evaluation and structure maintenance. In recent years, the emergence and development of deep learning techniques have shown great potential to facilitate surface crack detection. Currently, most reported tasks were performed by a convolutional neural network (CNN), while the limitation of CNN may be improved by the transformer architecture introduced recently. In this study, we investigated nine promising models to evaluate their performance in pavement surface crack detection by model accuracy, computational complexity, and model stability. We created 711 images of 224 by 224 pixels with crack labels, selected an optimal loss function, compared the evaluation metrics of the validation dataset and test dataset, analyzed the data details, and checked the segmentation outcomes of each model. We find that transformer-based models generally are easier to converge during the training process and have higher accuracy, but usually exhibit more memory consumption and low processing efficiency. Among nine models, SwinUNet outperforms the other two transformers and shows the highest accuracy among nine models. The results should shed light on surface crack detection by various deep-learning models and provide a guideline for future applications in this field.

翻訳日:2023-04-26 21:38:02 公開日:2023-04-25

# 医用画像のためのジェネリストビジョン基礎モデル:ゼロショット・メディカル・セグメンテーションにおけるセグメンテーションモデルの一事例

Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation ( http://arxiv.org/abs/2304.12637v1 )

ライセンス: Link先を確認

Peilun Shi, Jianing Qiu, Sai Mu Dalike Abaxi, Hao Wei, Frank P.-W. Lo, Wu Yuan

(参考訳) 医用画像における最近のsegment anything model (sam) について検討し、光学コヒーレンス断層撮影(oct)、磁気共鳴画像(mri)、ct(ct)などの様々な画像モダリティをカバーする9つの医用画像分割ベンチマークの定量的・定性的ゼロショットセグメンテーション結果と、皮膚科、眼科、放射線科の異なる応用について報告する。実験の結果,samは一般領域の画像,例えば医用画像などではゼロショットセグメンテーション性能は限定されているものの,一般領域の画像では見事なセグメンテーション性能を示していることが明らかとなった。さらに、SAMは、異なる未知の医療領域で異なるゼロショットセグメンテーション性能を示した。例えば、0.8704の平均ダイススコアは、網膜オクターのブラッハ膜層下のセグメンテーションで、セグメンテーション精度は、網膜色素上皮のセグメンテーション時に0.0688に低下する。血管などの特定の構造的標的ではSAMのゼロショットセグメンテーションは完全に失敗したが、少量のデータによる単純な微調整はセグメンテーションの品質を著しく改善する可能性がある。本研究は,医用画像における特定の課題を解くための汎用的ビジョン基盤モデルの汎用性を示し,また,様々な医療データセットにアクセスし,医療領域の複雑さを克服する上での課題に対処する上で,必要なパフォーマンスを実現する大きな可能性を示した。

We examine the recent Segment Anything Model (SAM) on medical images, and report both quantitative and qualitative zero-shot segmentation results on nine medical image segmentation benchmarks, covering various imaging modalities, such as optical coherence tomography (OCT), magnetic resonance imaging (MRI), and computed tomography (CT), as well as different applications including dermatology, ophthalmology, and radiology. Our experiments reveal that while SAM demonstrates stunning segmentation performance on images from the general domain, for those out-of-distribution images, e.g., medical images, its zero-shot segmentation performance is still limited. Furthermore, SAM demonstrated varying zero-shot segmentation performance across different unseen medical domains. For example, it had a 0.8704 mean Dice score on segmenting under-bruch's membrane layer of retinal OCT, whereas the segmentation accuracy drops to 0.0688 when segmenting retinal pigment epithelium. For certain structured targets, e.g., blood vessels, the zero-shot segmentation of SAM completely failed, whereas a simple fine-tuning of it with small amount of data could lead to remarkable improvements of the segmentation quality. Our study indicates the versatility of generalist vision foundation models on solving specific tasks in medical imaging, and their great potential to achieve desired performance through fine-turning and eventually tackle the challenges of accessing large diverse medical datasets and the complexity of medical domains.

翻訳日:2023-04-26 21:32:50 公開日:2023-04-25

# 教師なし人物再識別のためのカメラ内類似性を用いた擬似ラベル再構成

Pseudo Labels Refinement with Intra-camera Similarity for Unsupervised Person Re-identification ( http://arxiv.org/abs/2304.12634v1 )

ライセンス: Link先を確認

Pengna Li, Kangyi Wu, Sanping Zhou.Qianxin Huang, Jinjun Wang

(参考訳) unsupervised person re-id(re-id)は、個人識別ラベルなしで、カメラ間で人物画像を取得することを目的としている。ほとんどのクラスタリングベースの方法は、画像の特徴をクラスタに大まかに分割し、異なるカメラ間のドメインシフトによる特徴分布ノイズを無視する。この課題に対処するために,カメラ内類似性をクラスタリングした新しいラベルリファインメントフレームワークを提案する。カメラ内特徴分布は、歩行者やラベルの出現により多くの注意を払っている。我々は各カメラにそれぞれローカルクラスタを取得するためのカメラ内トレーニングを行い、各カメラ間クラスタを局所的な結果で洗練する。したがって、信頼性の高い擬似ラベルを自己ペーストした方法でRe-IDモデルをトレーニングする。実験により,提案手法が最先端性能を上回ることを示した。

Unsupervised person re-identification (Re-ID) aims to retrieve person images across cameras without any identity labels. Most clustering-based methods roughly divide image features into clusters and neglect the feature distribution noise caused by domain shifts among different cameras, leading to inevitable performance degradation. To address this challenge, we propose a novel label refinement framework with clustering intra-camera similarity. Intra-camera feature distribution pays more attention to the appearance of pedestrians and labels are more reliable. We conduct intra-camera training to get local clusters in each camera, respectively, and refine inter-camera clusters with local results. We hence train the Re-ID model with refined reliable pseudo labels in a self-paced way. Extensive experiments demonstrate that the proposed method surpasses state-of-the-art performance.

翻訳日:2023-04-26 21:32:18 公開日:2023-04-25

# PUNR:ニュースレコメンデーションのためのユーザ行動モデリングによる事前学習

PUNR: Pre-training with User Behavior Modeling for News Recommendation ( http://arxiv.org/abs/2304.12633v1 )

ライセンス: Link先を確認

Guangyuan Ma, Hongtao Liu, Xing Wu, Wanhui Qian, Zhepeng Lv, Qing Yang, Songlin Hu

(参考訳) ニュースレコメンデーションは、ユーザーの行動に基づいてクリック行動を予測することを目的としている。ユーザの表現を効果的にモデル化する方法は、望ましいニュースを推奨するキーとなる。既存の作品は、主に監督された微調整段階の改善に焦点を当てている。しかし、ユーザ表現に最適化された PLM ベースの教師なし事前学習手法がまだ存在しない。本研究では,ユーザ行動マスキングとユーザ行動生成という2つのタスクを備えた教師なし事前学習パラダイムを提案する。まず,ユーザ行動マスキング事前学習タスクを導入し,その状況行動に基づいてマスキングユーザ行動の復元を行う。このようにして、このモデルはより強く、より包括的なユーザーニュースリーディングパターンを捉えることができる。さらに,ユーザエンコーダから派生したユーザ表現ベクトルを強化するために,新しいユーザ行動生成事前学習タスクを導入する。上記の事前学習したユーザモデリングエンコーダを用いて、下流の微調整でニュースやユーザ表現を得る。実世界のニュースベンチマークの評価では、既存のベースラインよりも大幅にパフォーマンスが向上している。

News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.

翻訳日:2023-04-26 21:32:04 公開日:2023-04-25

# 私がbm25のように説明する:密集したモデルのランクリストをスパース近似で解釈する

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation ( http://arxiv.org/abs/2304.12631v1 )

ライセンス: Link先を確認

Michael Llordes, Debasis Ganguly, Sumit Bhatia and Chirag Agarwal

(参考訳) ニューラル検索モデル (NRM) は、密集した文書表現を通して意味的意味を捉える能力により、統計的に優れていることが示されている。しかしこれらのモデルは、明示的な項マッチングに依存しないため、解釈性に乏しい。局所的なクエリごとの説明の一形態として,NAMの結果とスパース検索システムの結果集合との類似性を最大化することによって生成される等価クエリの概念を導入する。このアプローチをrm3ベースのクエリ拡張や検索効率のコントラストの違い、および各アプローチによって生成された用語と比較する。

Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system with the equivalent query. We then compare this approach with existing methods such as RM3-based query expansion and contrast differences in retrieval effectiveness and in the terms generated by each approach.

翻訳日:2023-04-26 21:31:50 公開日:2023-04-25

# 都市全体の大気汚染予測のための時空間グラフ畳み込みニューラルネットワークモデル

Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting ( http://arxiv.org/abs/2304.12630v1 )

ライセンス: Link先を確認

Van-Duc Le

(参考訳) 都市全体の大気汚染予測は、市全体の大気質を正確に予測しようと試みている。大気汚染は時空間的に変化し、多くの複雑な要因に依存するため、この問題は解決される。過去の研究では,都市全体をイメージとして考慮し,コンボリューショナル・ロング・短期記憶(ConvLSTM)モデルを用いて時空間の特徴を学習することで,この問題を解決している。しかし、大気汚染やその他の影響要因が自然グラフ構造を持つため、画像に基づく表現は理想的ではないかもしれない。本研究では, グラフ畳み込みネットワーク (gcn) が都市全体の空気品質を読み取る空間的特徴を効率的に表現できることを考察する。具体的には、GCNアーキテクチャをRNN構造に密に統合し、空気質値とその影響因子の時空間特性を効率よく学習することにより、ConvLSTMモデルを時空間グラフ畳み込みリカレントニューラルネットワーク(時空間GCRNN)モデルに拡張する。提案手法は, 大気汚染予測のための最新のConvLSTMモデルと比較して, パラメータの数ははるかに少ないが, 優れた性能を示す。また,本手法は,実世界の大気汚染データセットにおけるハイブリッドgcn法よりも優れている。

Citywide Air Pollution Forecasting tries to precisely predict the air quality multiple hours ahead for the entire city. This topic is challenged since air pollution varies in a spatiotemporal manner and depends on many complicated factors. Our previous research has solved the problem by considering the whole city as an image and leveraged a Convolutional Long Short-Term Memory (ConvLSTM) model to learn the spatiotemporal features. However, an image-based representation may not be ideal as air pollution and other impact factors have natural graph structures. In this research, we argue that a Graph Convolutional Network (GCN) can efficiently represent the spatial features of air quality readings in the whole city. Specially, we extend the ConvLSTM model to a Spatiotemporal Graph Convolutional Recurrent Neural Network (Spatiotemporal GCRNN) model by tightly integrating a GCN architecture into an RNN structure for efficient learning spatiotemporal characteristics of air quality values and their influential factors. Our extensive experiments prove the proposed model has a better performance compare to the state-of-the-art ConvLSTM model for air pollution predicting while the number of parameters is much smaller. Moreover, our approach is also superior to a hybrid GCN-based method in a real-world air pollution dataset.

翻訳日:2023-04-26 21:31:39 公開日:2023-04-25

# ヤンミルズ方程式に基づく角運動波の予測

Predicting Angular-Momentum Waves Based on Yang-Mills Equation ( http://arxiv.org/abs/2304.12625v1 )

ライセンス: Link先を確認

Xing-Yan Fan, Xiang-Ru Xie, and Jing-Ling Chen

(参考訳) 物理学における最もエレガントな理論の1つとして、ヤン=ミルズ理論は古典的な電磁現象を統一するマクスウェルの方程式を取り入れるだけでなく、電弱と強い相互作用を簡潔に説明する標準模型を基礎としている。アービアン$U(1)$の場合、電磁場はヤン・ミルズ方程式の最も単純な古典解である。それにもかかわらず、最も単純な量子状態、すなわち、マクスウェルの非可換ポテンシャルを持つ方程式における「磁気」と「電気」の場の考察について、多くの研究がなされている。マクスウェル方程式によって予測される電磁波と同様に、最も単純なyang-mills方程式の量子解はsu(2)角運動量波を予測できる。このような角運動量波は、スピン角運動量(ディラック電子の'spin zitterbewegung''のような)の振動の実験で実現可能である。

As one of the most elegant theories in physics, Yang-Mills theory not only incorporates Maxwell's equations unifying the classical electromagnetic phenomena, but also underpins the standard model explaining the electroweak and strong interactions in a succinct way. As an Abelian $U(1)$ case, the electromagnetic field is the simplest classical solution of Yang-Mills equation. Notwithstanding, there is a paucity of studies about the simplest quantum situation, namely the consideration of the ``magnetic'' and ``electric'' fields in Maxwell's equations with non-Abelian potentials, which is exactly the staple of our present work. Akin to the electromagnetic waves predicted by Maxwell's equations, the quantum solution of the simplest Yang-Mills equation may predict the SU(2) angular-momentum waves. Such angular-momentum waves can be possibly realized in the experiments with oscillations of the spin angular momentum (such as the ``spin Zitterbewegung'' of Dirac's electron).

翻訳日:2023-04-26 21:31:19 公開日:2023-04-25

# 形状ネット:3次元形状の知識蒸留を付加入力として用いたパノラマ画像からのルームレイアウト推定

Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs ( http://arxiv.org/abs/2304.12624v1 )

ライセンス: Link先を確認

Mizuki Tabata, Kana Kurata, Junichiro Tamamatsu

(参考訳) パノラマ画像から部屋のレイアウトを推定することは、バーチャル/拡張現実と家具レイアウトシミュレーションにおいて重要である。これは、角や境界の位置などの3次元(3D)幾何を識別し、3D再構成を行う。しかし,オクルージョンは部屋のレイアウト推定に悪影響を及ぼしうる一般的な問題であり,これまでは十分に研究されていない。建物の図面として部屋の3次元形状情報とコーナーの座標を画像データセットから得ることができるので,2次元パノラマ情報と3次元情報の両方をモデルに提供して咬合を効果的に処理することを提案する。しかし、モデルに3d情報を送るだけでは、遮蔽領域の形状情報を利用するには不十分である。そこで我々は、3D情報を有効に活用するために、3Dインターセクション・オーバー・ユニオン(IoU)ロスを導入した。図面が手に入らない場合や、図面から逸脱した場合もある。そこで本研究では,画像と3次元情報の両方を訓練したモデルから,画像のみを入力とするモデルへ知識を抽出する手法を提案する。提案モデルはShape-Netと呼ばれ,ベンチマークデータセット上での最先端(SOTA)性能を実現する。また, 既存のモデルと比較して咬合像の精度が有意に向上し, 咬合処置の有効性も確認した。

Estimating the layout of a room from a single-shot panoramic image is important in virtual/augmented reality and furniture layout simulation. This involves identifying three-dimensional (3D) geometry, such as the location of corners and boundaries, and performing 3D reconstruction. However, occlusion is a common issue that can negatively impact room layout estimation, and this has not been thoroughly studied to date. It is possible to obtain 3D shape information of rooms as drawings of buildings and coordinates of corners from image datasets, thus we propose providing both 2D panoramic and 3D information to a model to effectively deal with occlusion. However, simply feeding 3D information to a model is not sufficient to utilize the shape information for an occluded area. Therefore, we improve the model by introducing 3D Intersection over Union (IoU) loss to effectively use 3D information. In some cases, drawings are not available or the construction deviates from a drawing. Considering such practical cases, we propose a method for distilling knowledge from a model trained with both images and 3D information to a model that takes only images as input. The proposed model, which is called Shape-Net, achieves state-of-the-art (SOTA) performance on benchmark datasets. We also confirmed its effectiveness in dealing with occlusion through significantly improved accuracy on images with occlusion compared with existing models.

翻訳日:2023-04-26 21:30:55 公開日:2023-04-25

# Rydberg-dressedatomによるハドロン状態の量子シミュレーション

Quantum simulation of hadronic states with Rydberg-dressed atoms ( http://arxiv.org/abs/2304.12623v1 )

ライセンス: Link先を確認

Zihan Wang, Feiyang Wang, Joseph Vovrosh, Johannes Knolle, Florian Mintert and Rick Mukherjee

(参考訳) 閉じ込め現象は高エネルギー物理学でよく知られており、一次元量子スピン鎖の低エネルギー領域壁励起に対しても実現可能である。 2つのドメイン壁からなるバウンド状態は中間子のように振る舞うことができ、最近のvovrosh et alの作品ではそうである。 [PRX Quantum 3, 040309 (2022)], 一対の中間子がハドロン状態に類似したメタ安定閉じ込め誘起境界状態(4つのドメイン壁からなる)を動的に形成できることが実証された。しかし、このプロトコルはVovroshらで議論された。 [prx量子3,040309 (2022)] 特性的に非単調な距離依存性を持つ相互作用の使用は、自然界では容易ではないため、実験的な実現への挑戦となる。この点において、リドバーグ原子は閉じ込め関連物理学をシミュレートするために必要なプラットフォームを提供することができる。一次元の逆場イジングモデルに対するスピン-スピン相互作用を工学するために、Rydberg-dressed 原子を相互作用させることによって得られる柔軟性を利用する。我々の数値シミュレーションは、Rydberg-dressedの相互作用がハドロン生成に適する様々な有効なポテンシャルをもたらすことを示しており、現在の捕捉イオン実験の代替として、Rydbergプラットフォームによる閉じ込め物理学をシミュレートする可能性を開く。

The phenomenon of confinement is well known in high-energy physics and can also be realized for low-energy domain-wall excitations in one-dimensional quantum spin chains. A bound state consisting of two domain-walls can behave like a meson, and in a recent work of Vovrosh et al. [PRX Quantum 3, 040309 (2022)] , it was demonstrated that a pair of mesons could dynamically form a meta-stable confinement-induced bound state (consisting of four domain-walls) akin to a hadronic state. However, the protocol discussed in Vovrosh et al. [PRX Quantum 3, 040309 (2022)] involving the use of interactions with characteristically non-monotonic distance dependence is not easy to come by in nature, thus, posing a challenge for its experimental realization. In this regard, Rydberg atoms can provide the required platform for simulating confinement-related physics. We exploit the flexibility offered by interacting Rydberg-dressed atoms to engineering modified spin-spin interactions for the one-dimensional transverse field Ising model. Our numerical simulations show how Rydberg-dressed interactions can give rise to a variety of effective potentials that are suitable for hadron formation, which opens the possibility of simulating confinement physics with Rydberg platforms as a viable alternative to current trapped-ion experiments.

翻訳日:2023-04-26 21:30:13 公開日:2023-04-25

# 刈り取った視覚モデルのバイアス : 奥行き解析と対策

Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures ( http://arxiv.org/abs/2304.12622v1 )

ライセンス: Link先を確認

Eugenia Iofinova, Alexandra Peste, Dan Alistarh

(参考訳) プルーニング(pruning) - すなわち、ニューラルネットワークのパラメータのかなりのサブセットをゼロに設定する - は、モデル圧縮の最も一般的な方法の1つである。しかし、最近のいくつかの研究は、プルーニングが圧縮モデルの出力にバイアスを誘導または悪化させる可能性があるという問題を提起している。この現象の既存の証拠にもかかわらず、ニューラルネットワークのプルーニングと誘導バイアスの関係はよく理解されていない。本研究では,コンピュータビジョンのための畳み込みニューラルネットワークにおいて,この現象を系統的に研究し,特徴付ける。第一に, 密度の高いモデルに比べて精度が低下せず, バイアスが著しく増加するような, 10%未満の残量で, 高いスパースモデルを得ることが可能であることを示す。同時に、高い空間では、プルーニングされたモデルは出力に高い不確実性を示し、相関性も増加し、バイアスの増加に直接関連していることもわかりました。本研究では,非圧縮モデルのみに基づいて,刈り込みによってバイアスが増大するかどうかを判定し,圧縮後の予測に最も影響を受けやすい試料を同定する。

Pruning - that is, setting a significant subset of the parameters of a neural network to zero - is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression.

翻訳日:2023-04-26 21:29:45 公開日:2023-04-25

# 医用samアダプタ : 医用画像分割のためのsegment anythingモデルの適用

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation ( http://arxiv.org/abs/2304.12620v1 )

ライセンス: Link先を確認

Junde Wu and Rao Fu and Huihui Fang and Yuanpei Liu and Zhaowei Wang and Yanwu Xu and Yueming Jin and Tal Arbel

(参考訳) Segment Anything Model (SAM)は画像セグメンテーションの分野で最近人気を集めている。全面的なセグメンテーションタスクとプロンプトベースのインターフェースの素晴らしい機能のおかげで、SAMはコミュニティ内で激しい議論を巻き起こした。イメージセグメンテーションのタスクはSAMによって「完了」されたと多くの名高い専門家から言われている。しかし, イメージセグメンテーションは, イメージセグメンテーションファミリーの重要な分枝であるが, セグメンテーション"Anything"の範囲には含まれていないようである。多くの個人実験や最近の研究では、SAMは医療画像のセグメンテーションのサブパールを担っていることが示されている。自然な疑問は、SAMの強力なセグメンテーション能力を医療画像セグメンテーションに拡張するために、パズルの欠片を見つける方法である。本稿では、パラメータ効率のよい微調整パラダイムに従って事前学習したSAMモデルをAdapterで微調整することで、可能な解を提案する。この単純な実装は、医療画像のセグメンテーションにおいて驚くほど優れた性能を示しており、一般的なNLP技術であるAdapterをコンピュータビジョンのケースに転送する試みの1つだ。医用SAMアダプタ (MSA) は, CT, MRI, 超音波画像, 眼底画像, 皮膚内視鏡画像など, 様々な画像モダリティを有する19の医用画像セグメンテーションタスクにおいて, 優れた性能を示した。 MSAは、nnUNet、TransUNet、UNetr、MedSegDiffなど、幅広い最先端(SOTA)医療画像セグメンテーション手法より優れている。コードは、https://github.com/WuJunde/Medical-SAM-Adapter.comでリリースされる。

The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Thanks to its impressive capabilities in all-round segmentation tasks and its prompt-based interface, SAM has sparked intensive discussion within the community. It is even said by many prestigious experts that image segmentation task has been "finished" by SAM. However, medical image segmentation, although an important branch of the image segmentation family, seems not to be included in the scope of Segmenting "Anything". Many individual experiments and recent studies have shown that SAM performs subpar in medical image segmentation. A natural question is how to find the missing piece of the puzzle to extend the strong segmentation capability of SAM to medical image segmentation. In this paper, we present a possible solution by fine-tuning the pretrained SAM model following parameter-efficient fine-tuning paradigm with Adapter. Although this work is still one of a few to transfer the popular NLP technique Adapter to computer vision cases, this simple implementation shows surprisingly good performance on medical image segmentation. A medical image adapted SAM, which we have dubbed Medical SAM Adapter (MSA), shows superior performance on 19 medical image segmentation tasks with various image modalities including CT, MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, such as nnUNet, TransUNet, UNetr, MedSegDiff, and so on. Code will be released at: https://github.com/WuJunde/Medical-SAM-Adapter.

翻訳日:2023-04-26 21:29:25 公開日:2023-04-25

# パッチベース3次元自然シーン生成の一例

Patch-based 3D Natural Scene Generation from a Single Example ( http://arxiv.org/abs/2304.12670v1 )

ライセンス: Link先を確認

Weiyu Li, Xuelin Chen, Jue Wang, Baoquan Chen

(参考訳) 典型的にはユニークで複雑な自然シーンの3次元生成モデルを対象としている。必要な量のトレーニングデータの欠如と、様々なシーン特性の存在下でアドホックなデザインを持つことの難しさにより、既存の設定が難解になる。従来のパッチベースのイメージモデルに触発されて,パッチレベルでの3Dシーンの合成を提唱する。この研究の核心は、シーン表現と生成パッチが隣のモジュールに最も近い重要なアルゴリズム設計であり、古典的な2Dパッチベースのフレームワークから3D生成まで、ユニークな課題に対処する。これらのデザイン選択は、集合レベルでは、様々な模範的なシーンで示されるように、現実的な幾何学的構造と視覚的外観の両方を持つ高品質な一般的な自然のシーンを多種多様な量で生成できる、堅牢で効果的で効率的なモデルに寄与する。

We target a 3D generative model for general natural scenes that are typically unique and intricate. Lacking the necessary volumes of training data, along with the difficulties of having ad hoc designs in presence of varying scene characteristics, renders existing setups intractable. Inspired by classical patch-based image models, we advocate for synthesizing 3D scenes at the patch level, given a single example. At the core of this work lies important algorithmic designs w.r.t the scene representation and generative patch nearest-neighbor module, that address unique challenges arising from lifting classical 2D patch-based framework to 3D generation. These design choices, on a collective level, contribute to a robust, effective, and efficient model that can generate high-quality general natural scenes with both realistic geometric structure and visual appearance, in large quantities and varieties, as demonstrated upon a variety of exemplar scenes.

翻訳日:2023-04-26 21:21:41 公開日:2023-04-25

# 反事実的説明の相違:透明性がいかに欺くか

Disagreement amongst counterfactual explanations: How transparency can be deceptive ( http://arxiv.org/abs/2304.12667v1 )

ライセンス: Link先を確認

Dieter Brughmans, Lissa Melis, David Martens

(参考訳) 事実的説明は、複雑な機械学習アルゴリズムのステークホルダーにデータ駆動決定の説明を提供するための説明可能な人工知能(XAI)技術として、ますます使われている。反事実的説明の人気は、それらを生成するアルゴリズムのブームをもたらした。しかし、全てのアルゴリズムが同じインスタンスに対して一様説明を生成するわけではない。いくつかの文脈において、複数の可能な説明が有用であるが、反事実的説明の多様性が利害関係者の間で意見の相違をもたらす状況がある。倫理的な問題は、例えば、悪意のあるエージェントがこの多様性を使用して、センシティブな特徴を隠すことによって不公平な機械学習モデルを公正にするときに発生する。世界中の議員は、データ駆動、高リスクの決定に関する説明の権利を政策に含める傾向があるため、これらの倫理的な問題を理解して対処すべきである。 XAIにおける不一致問題に関する文献レビューでは、この問題は実証的に反実的な説明のために評価されていないことが判明した。そこで本研究では,40のデータセットに対して,ブラックボックスモデル2つのモデルに対して,説明生成手法12を用いて大規模な実験分析を行い,1920000以上の説明を得た。本研究は,試験方法の相違点が著しく高いことを明らかにする。悪意のあるユーザは、複数の偽りの説明が利用可能であれば、望ましい機能を除外したり、含めたりすることができる。この不一致は、主にデータセットの特徴と反現実的アルゴリズムのタイプによって引き起こされているようだ。 XAIはアルゴリズムによる意思決定の透明性を重視しているが、我々の分析は、この自己宣言の透明性を主張する。

Counterfactual explanations are increasingly used as an Explainable Artificial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are beneficial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies, these ethical issues should be understood and addressed. Our literature review on the disagreement problem in XAI reveals that this problem has never been empirically assessed for counterfactual explanations. Therefore, in this work, we conduct a large-scale empirical analysis, on 40 datasets, using 12 explanation-generating methods, for two black-box models, yielding over 192.0000 explanations. Our study finds alarmingly high disagreement levels between the methods tested. A malicious user is able to both exclude and include desired features when multiple counterfactual explanations are available. This disagreement seems to be driven mainly by the dataset characteristics and the type of counterfactual algorithm. XAI centers on the transparency of algorithmic decision-making, but our analysis advocates for transparency about this self-proclaimed transparency

翻訳日:2023-04-26 21:21:24 公開日:2023-04-25

# ベイズ最適化と自己蒸留

Bayesian Optimization Meets Self-Distillation ( http://arxiv.org/abs/2304.12666v1 )

ライセンス: Link先を確認

HyunJae Lee, Heon Song, Hyeonsoo Lee, Gi-hyeon Lee, Suyeong Park and Donggeun Yoo

(参考訳) ベイズ最適化(BO)は、複数のトレーニング試験からの観察に基づいて、約束されるハイパーパラメータ構成を反復的に提案することにより、モデル性能の向上に大きく貢献している。しかし、前回の試験から得られた部分的な知識(すなわち、トレーニングされたモデルの性能とそのハイパーパラメータ構成)のみを転送する。一方、自己蒸留(SD)はタスクモデル自体から学んだ部分的知識のみを伝達する。すべてのトレーニングトライアルから得られた知識をフル活用するために,BOとSDを組み合わせたBOSSフレームワークを提案する。 BOSS は BO を通じて有望なハイパーパラメータ構成を提案し、従来の BO プロセスでは放棄されていた SD の以前の試行から事前訓練されたモデルを慎重に選択する。 BOSSは、一般的な画像分類、ノイズラベルによる学習、半教師付き学習、医療画像解析タスクなど、幅広いタスクにおいてBOとSDの両方よりもはるかに優れたパフォーマンスを実現している。

Bayesian optimization (BO) has contributed greatly to improving model performance by suggesting promising hyperparameter configurations iteratively based on observations from multiple training trials. However, only partial knowledge (i.e., the measured performances of trained models and their hyperparameter configurations) from previous trials is transferred. On the other hand, Self-Distillation (SD) only transfers partial knowledge learned by the task model itself. To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. BOSS suggests promising hyperparameter configurations through BO and carefully selects pre-trained models from previous trials for SD, which are otherwise abandoned in the conventional BO process. BOSS achieves significantly better performance than both BO and SD in a wide range of tasks including general image classification, learning with noisy labels, semi-supervised learning, and medical image analysis tasks.

翻訳日:2023-04-26 21:20:58 公開日:2023-04-25

# 統合困難な事前評価による動的ビデオフレーム補間

Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment ( http://arxiv.org/abs/2304.12664v1 )

ライセンス: Link先を確認

Ban Chen, Xin Jin, Youxin Chen, Longhai Wu, Jie Chen, Jayoon Koo, Cheul-hee Hahm

(参考訳) ビデオフレーム補間(vfi)は近年、大きな進歩を遂げている。既存のVFIモデルは、精度と効率の良好なトレードオフを達成するのに依然として苦労している。しかし、小さな動きや明瞭なテクスチャを持つ簡単なサンプルは単純なモデルで競合する結果を得ることができ、重い計算を必要としない。本稿では,難易度評価とビデオフレーム補間を組み合わせた統合パイプラインを提案する。具体的には、まずプレアセスメントモデルを利用して入力フレームの補間難度を測定し、次に動的に適切なVFIモデルを選択して補間結果を生成する。さらに、大規模なVFI困難度評価データセットを収集し、アノテートして、事前評価モデルをトレーニングする。大規模な実験により, 高速なサンプルを高速なモデルに通過させながら, 重いモデルによる予測が困難であることが確認された。

Video frame interpolation(VFI) has witnessed great progress in recent years. While existing VFI models still struggle to achieve a good trade-off between accuracy and efficiency: fast models often have inferior accuracy; accurate models typically run slowly. However, easy samples with small motion or clear texture can achieve competitive results with simple models and do not require heavy computation. In this paper, we present an integrated pipeline which combines difficulty assessment with video frame interpolation. Specifically, it firstly leverages a pre-assessment model to measure the interpolation difficulty level of input frames, and then dynamically selects an appropriate VFI model to generate interpolation results. Furthermore, a large-scale VFI difficulty assessment dataset is collected and annotated to train our pre-assessment model. Extensive experiments show that easy samples pass through fast models while difficult samples inference with heavy models, and our proposed pipeline can improve the accuracy-efficiency trade-off for VFI.

翻訳日:2023-04-26 21:20:43 公開日:2023-04-25

# 資源割当のためのロバスト深層強化学習のためのマルチタスクアプローチ

A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation ( http://arxiv.org/abs/2304.12660v1 )

ライセンス: Link先を確認

Steffen Gracla, Carsten Bockelmann, Armin Dekorsy

(参考訳) 現代のコミュニケーションシステムの複雑さが増すにつれ、機械学習アルゴリズムは研究の焦点となっている。しかし、パフォーマンス要求は複雑さと並行して厳しくなっています。医療分野など、将来のワイヤレスをターゲットとするいくつかの重要なアプリケーションでは、厳格で信頼性の高いパフォーマンス保証が不可欠だが、バニラ機械学習手法はこの種の要件に対処することが示されている。そのため、このようなアプリケーションによる要求に対処するため、これらの手法を拡張できるかどうかが疑問視される。本稿では,稀で重要なイベントを適切に処理しなければならない組み合わせ資源配分問題について考察する。本稿では,これをマルチタスク学習問題として扱い,この領域から弾性重み強化と勾配エピソディックメモリという2つの方法を選択し,それらをバニラアクタ批判スケジューラに統合する。我々は、ブラックスワンイベントを扱う際の彼らのパフォーマンスと、トレーニングデータ分布を増強する最新技術を比較し、マルチタスクアプローチが極めて有効であることを報告した。

With increasing complexity of modern communication systems, machine learning algorithms have become a focal point of research. However, performance demands have tightened in parallel to complexity. For some of the key applications targeted by future wireless, such as the medical field, strict and reliable performance guarantees are essential, but vanilla machine learning methods have been shown to struggle with these types of requirements. Therefore, the question is raised whether these methods can be extended to better deal with the demands imposed by such applications. In this paper, we look at a combinatorial resource allocation challenge with rare, significant events which must be handled properly. We propose to treat this as a multi-task learning problem, select two methods from this domain, Elastic Weight Consolidation and Gradient Episodic Memory, and integrate them into a vanilla actor-critic scheduler. We compare their performance in dealing with Black Swan Events with the state-of-the-art of augmenting the training data distribution and report that the multi-task approach proves highly effective.

翻訳日:2023-04-26 21:20:25 公開日:2023-04-25

# CoDi:混合型語彙合成のためのコントラスト拡散モデル

CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis ( http://arxiv.org/abs/2304.12654v1 )

ライセンス: Link先を確認

Chaejeong Lee, Jayoung Kim, Noseong Park

(参考訳) 近年、表データへの注目が高まり、様々なタスクに合成テーブルを適用する試みが様々なシナリオに向けて拡大されている。最近の生成モデリングの進歩により、表データ合成モデルによって生成された偽データは洗練され現実的になる。しかし、表データの離散変数(コラム)のモデル化は依然として困難である。本研究では,2つの拡散モデルを用いて連続変数と離散変数を別々に処理することを提案する。 2つの拡散モデルは、互いに読み合うことによって訓練中に共存する。さらに,拡散モデルをさらにバインドするために,負のサンプリング法を用いたコントラスト学習手法を導入する。実世界の11の表型データセットと8つのベースラインメソッドを用いた実験で,提案手法であるcodiの有効性を実証した。

With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.

翻訳日:2023-04-26 21:20:07 公開日:2023-04-25

# グラフ注意に基づく部分観測可能平均場多元強化学習

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention ( http://arxiv.org/abs/2304.12653v1 )

ライセンス: Link先を確認

Min Yang, Guanjun Liu, Ziyuan Zhou

(参考訳) 従来のマルチエージェント強化学習アルゴリズムは大規模マルチエージェント環境では難しい。近年,平均場理論の導入により,マルチエージェント強化学習のスケーラビリティが向上している。本稿では、各エージェントが一定の範囲内で他のエージェントを観察できる部分観測可能マルチエージェント強化学習(MARL)について考察する。この部分的観測性は、エージェントが周囲のエージェントの行動の質を評価する能力に影響する。本稿では,より効果的な行動を選択するために,局所観測からより効果的な情報を取り出す手法の開発に着目する。この分野での以前の研究では、近傍エージェントの平均アクションを更新するために確率分布や重み付け平均場を用いるが、近隣エージェントの特徴情報を十分に考慮せず、局所最適となる。 In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. 我々は,MAgentsフレームワークにおける3つの課題に対してGAMFQを評価する。実験により、GAMFQは最先端の部分的に観測可能な平均場強化学習アルゴリズムを含むベースラインを上回っていることが示された。

Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.

翻訳日:2023-04-26 21:19:54 公開日:2023-04-25

# 動きブレアを有する大規模シーンのためのハイブリッドニューラルレンダリング

Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur ( http://arxiv.org/abs/2304.12652v1 )

ライセンス: Link先を確認

Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi

(参考訳) 新規なビューイメージのレンダリングは多くのアプリケーションにとって非常に望ましい。近年の進歩にもかかわらず、不可避なアーティファクト(例えば、動きのぼかし)で、野生のイメージから大規模シーンの高忠実さとビュー一貫性を保った斬新なビューをレンダリングすることは、依然として困難である。そこで我々は,画像ベース表現とニューラル3D表現を結合して高品質なビュー一貫性画像を生成するハイブリッドなニューラルレンダリングモデルを開発した。さらに、野生で撮影された画像には、レンダリングされた画像の品質を劣化させる動きのぼやけなど、必然的に人工物が含まれている。そこで本研究では,画像のぼかし効果をシミュレートし,ぼやけた画像の悪影響を軽減し,事前計算した品質認識重みに基づいて学習中の重要度を低減させる手法を提案する。実データおよび合成データに関する広範な実験により,新しい視点合成のための最先端のポイントベース手法を超越したモデルが証明された。コードはhttps://daipengwa.github.io/hybrid-rendering-projectpageで入手できる。

Rendering novel view images is highly desirable for many applications. Despite recent progress, it remains challenging to render high-fidelity and view-consistent novel views of large-scale scenes from in-the-wild images with inevitable artifacts (e.g., motion blur). To this end, we develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images. Besides, images captured in the wild inevitably contain artifacts, such as motion blur, which deteriorates the quality of rendered images. Accordingly, we propose strategies to simulate blur effects on the rendered images to mitigate the negative influence of blurriness images and reduce their importance during training based on precomputed quality-aware weights. Extensive experiments on real and synthetic data demonstrate our model surpasses state-of-the-art point-based methods for novel view synthesis. The code is available at https://daipengwa.github.io/Hybrid-Rendering-ProjectPage.

翻訳日:2023-04-26 21:19:32 公開日:2023-04-25

# q ベース平衡

Q-based Equilibria ( http://arxiv.org/abs/2304.12647v1 )

ライセンス: Link先を確認

Olivier Compte (Paris School of Economics)

(参考訳) 動的環境において、q-learningは、各選択肢に関連する継続値の見積もり(q値)を提供する適応規則である。ナイーブポリシーは、常に高いQ値を持つ選択肢を選択することである。例えば、協力を優先する寛大さのバイアスを組み込んだルールなど、他のルールよりも体系的にいくつかの選択肢を好むようなqに基づく政策ルールのファミリーを考える。 Compte と Postlewaite [2018] の精神では、この Q ベースの規則の族の中で平衡バイアス(あるいは Qb-平衡)を求める。各種モニタリング技術による古典ゲームについて検討する。

In dynamic environments, Q-learning is an adaptative rule that provides an estimate (a Q-value) of the continuation value associated with each alternative. A naive policy consists in always choosing the alternative with highest Q-value. We consider a family of Q-based policy rules that may systematically favor some alternatives over others, for example rules that incorporate a leniency bias that favors cooperation. In the spirit of Compte and Postlewaite [2018], we look for equilibrium biases (or Qb-equilibria) within this family of Q-based rules. We examine classic games under various monitoring technologies.

翻訳日:2023-04-26 21:19:11 公開日:2023-04-25

# 変更検出に必要な情報:深い3dポイントのクラウド変更検出を改善する

Change detection needs change information: improving deep 3D point cloud change detection ( http://arxiv.org/abs/2304.12639v1 )

ライセンス: Link先を確認

Iris de G\'elis (1 and 2), Thomas Corpetti (3) and S\'ebastien Lef\`evre (2) ((1) Magellium, (2) Institut de Recherche en Informatique et Syst\`emes Al\'eatoires IRISA - UMR 6074 - Universit\'e Bretagne Sud, (3) Littoral - Environnement - T\'el\'ed\'etection - G\'eomatique LETG - UMR 6554 - Universit\'e Rennes 2)

(参考訳) 変更検出は、特にマルチテンポラリデータに関して、変更領域を迅速に識別する重要なタスクである。都市環境などの複雑な地形では、垂直情報は変化をハイライトするだけでなく、異なるカテゴリーに分類するために非常に有用な知識であることが判明した。本稿では,生の3dポイントクラウド(pcs)を用いて,ラスタ化プロセスによる情報の損失を回避するために,変更セグメント化に着目した。近年,ディープ・ラーニングがシームズ・ネットワークを通じて情報を符号化することで,このタスクの有効性を証明しているが,本研究では,ディープ・ネットワークの初期段階における変更情報の利用についても検討する。そこで我々はまず,手作り機能,特に変更関連機能を備えたSiamese KPConv State-of-The-Art(SoTA)ネットワークを提案する。これにより、変化のクラスに対するIoU(Intersection over Union)の平均は4.70 %向上する。変更関連機能により大きな改善が得られたことを考慮し、oneconvfusion、triplet kpconv、エンコーダfusion siamkpconvという3d pcs変更セグメンテーションに対応する3つの新しいアーキテクチャを提案する。 3つのネットワークは、初期段階における変更情報を考慮しており、SoTA法より優れている。特に、最後のネットワークであるEncoder Fusion SiamKPConvは、変更検出タスクのための変更情報にネットワークを集中させることの価値を強調した変更クラスよりも、IoUの平均の5%以上でSoTAを追い越している。

Change detection is an important task to rapidly identify modified areas, in particular when multi-temporal data are concerned. In landscapes with complex geometry such as urban environment, vertical information turn out to be a very useful knowledge not only to highlight changes but also to classify them into different categories. In this paper, we focus on change segmentation directly using raw 3D point clouds (PCs), to avoid any loss of information due to rasterization processes. While deep learning has recently proved its effectiveness for this particular task by encoding the information through Siamese networks, we investigate here the idea of also using change information in early steps of deep networks. To do this, we first propose to provide the Siamese KPConv State-of-The-Art (SoTA) network with hand-crafted features and especially a change-related one. This improves the mean of Intersection over Union (IoU) over classes of change by 4.70\%. Considering that the major improvement was obtained thanks to the change-related feature, we propose three new architectures to address 3D PCs change segmentation: OneConvFusion, Triplet KPConv, and Encoder Fusion SiamKPConv. All the three networks take into account change information in early steps and outperform SoTA methods. In particular, the last network, entitled Encoder Fusion SiamKPConv, overtakes SoTA with more than 5% of mean of IoU over classes of change emphasizing the value of having the network focus on change information for change detection task.

翻訳日:2023-04-26 21:19:02 公開日:2023-04-25

# Phylo2Vec:バイナリツリーのベクトル表現

Phylo2Vec: a vector representation for binary trees ( http://arxiv.org/abs/2304.12693v1 )

ライセンス: Link先を確認

Matthew J Penn, Neil Scheidwasser, Mark P Khurana, David A Duch\^ene, Christl A Donnelly, Samir Bhatt

(参考訳) 生物学的データから推定される2つの系統樹は、生物の共有進化の歴史を理解する中心である。任意の最適度基準(例えば最大可能性)による木内の潜在ノード配置の推測はnp問題であり、無数のヒューリスティックなアプローチの発展を促している。しかし、これらのヒューリスティックは、ランダムな木を均一にサンプリングしたり、因果的に成長する木空間を効果的に探索する体系的な手段を欠いていることが多い。そこで本研究では,系統樹の新規表現であるphylo2vecについて述べる。 Phylo2Vecは、$n$の葉を持つ任意の二分木を長さ$n$の整数ベクトルにマッピングする。我々はPhylo2Vecが系統樹の空間によく定義され、客観的であることを証明する。 Phylo2Vecの利点は2つある。一二分木を簡単に一様にサンプリングすること二超大型又は小型の跳躍で樹木空間を横断する系統的能力概念実証として,Phylo2Vecを用いて5つの実世界のデータセットの最大推定を行い,単純な登山に基づく最適化がランダムから最適木へのツリー空間の広さを効率的に横切ることを示す。

Binary phylogenetic trees inferred from biological data are central to understanding the shared evolutionary history of organisms. Inferring the placement of latent nodes in a tree by any optimality criterion (e.g., maximum likelihood) is an NP-hard problem, propelling the development of myriad heuristic approaches. Yet, these heuristics often lack a systematic means of uniformly sampling random trees or effectively exploring a tree space that grows factorially, which are crucial to optimisation problems such as machine learning. Accordingly, we present Phylo2Vec, a new parsimonious representation of a phylogenetic tree. Phylo2Vec maps any binary tree with $n$ leaves to an integer vector of length $n$. We prove that Phylo2Vec is both well-defined and bijective to the space of phylogenetic trees. The advantages of Phylo2Vec are twofold: i) easy uniform sampling of binary trees and ii) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill climbing-based optimisation efficiently traverses the vastness of tree space from a random to an optimal tree.

翻訳日:2023-04-26 21:13:30 公開日:2023-04-25

# 量子スキームによる古典的相関の生成

The Generations of Classical Correlations via Quantum Schemes ( http://arxiv.org/abs/2304.12690v1 )

ライセンス: Link先を確認

Zhenyu Chen and Lijinzhi Lin and Xiaodie Lin and Zhaohui Wei and Penghui Yao

(参考訳) アリスとボブの2つの分離したパーティが2部的な量子状態またはシードと呼ばれる古典的な相関を共有し、シード上で局所的な量子または古典演算を行うことで、ターゲットとなる古典的相関を生成しようとすると仮定する。 Alice と Bob が対象とする古典的相関を生成するために与えられた種を使うことができるかどうか。この問題にはリッチな数学的構造があることが示される。まず、種が純粋な二成分状態であっても、上記の決定問題はnp困難であり、種が古典的相関である場合にも同様の結論を導くことができ、この問題は一般に解くのが困難であることを示す。さらに, シードが純粋な量子状態である場合, 対象の古典的相関がシード純状態と一致する正の半定値分解の正の形式を持つかどうかを突き止め, 現在の問題と最適化理論の興味深い関係を明らかにした。この観測および他の知見に基づいて、ターゲットの古典的相関を生成するために、シード純状態が満たさなければならないいくつかの必要条件を与え、これらの条件は、シードが混合量子状態である場合にも一般化できることを示した。最後に、正の半定値分解の正の形式が問題を解く上で重要な役割を担っているため、任意の古典的相関を計算できるアルゴリズムを開発し、テストするケースで十分な性能を発揮する。

Suppose two separated parties, Alice and Bob, share a bipartite quantum state or a classical correlation called a seed, and they try to generate a target classical correlation by performing local quantum or classical operations on the seed, i.e., any communications are not allowed. We consider the following fundamental problem about this setting: whether Alice and Bob can use a given seed to generate a target classical correlation. We show that this problem has rich mathematical structures. Firstly, we prove that even if the seed is a pure bipartite state, the above decision problem is already NP-hard and a similar conclusion can also be drawn when the seed is also a classical correlation, implying that this problem is hard to solve generally. Furthermore, we prove that when the seed is a pure quantum state, solving the problem is equivalent to finding out whether the target classical correlation has some canonical form of positive semi-definite factorizations that matches the seed pure state, revealing an interesting connection between the current problem and optimization theory. Based on this observation and other insights, we give several necessary conditions where the seed pure state has to satisfy to generate the target classical correlation, and it turns out that these conditions can also be generalized to the case that the seed is a mixed quantum state. Lastly, since canonical forms of positive semi-definite factorizations play a crucial role in solving the problem, we develop an algorithm that can compute them for an arbitrary classical correlation, which has decent performance on the cases we test.

翻訳日:2023-04-26 21:13:12 公開日:2023-04-25

# 意味, 言語モデル, 理解不能なホラーの計算について

On the Computation of Meaning, Language Models and Incomprehensible Horrors ( http://arxiv.org/abs/2304.12686v1 )

ライセンス: Link先を確認

Michael Timothy Bennett

(参考訳) 我々は、意味、コミュニケーション、シンボル出現に関する包括的機械論的説明を提供するために、意味の基礎理論と人工知能(agi)の数学的形式理論を統合する。この合成は、プラグマティクス、論理的真理条件意味論、パーセアン・セミオティックスを統一し、伝統的に機械的説明を避けてきた現象に対処する計算可能モデルとして、AGIと言語の性質に関するより広範な議論の両方に重要である。機械が有意義な発話や人間の意味を理解することができる条件を調べることにより、現在の言語モデルでは、人間と同じ意味の理解を持たず、その応答に特徴付けるような意味も持たないことを示す。そこで本研究では,人間の感情をシミュレーションし,弱表現を構築するための最適化モデルを提案する。我々の発見は、意味と知性の関係と、意味を理解して意図する機械を構築する方法に光を当てた。

We integrate foundational theories of meaning with a mathematical formalism of artificial general intelligence (AGI) to offer a comprehensive mechanistic explanation of meaning, communication, and symbol emergence. This synthesis holds significance for both AGI and broader debates concerning the nature of language, as it unifies pragmatics, logical truth conditional semantics, Peircean semiotics, and a computable model of enactive cognition, addressing phenomena that have traditionally evaded mechanistic explanation. By examining the conditions under which a machine can generate meaningful utterances or comprehend human meaning, we establish that the current generation of language models do not possess the same understanding of meaning as humans nor intend any meaning that we might attribute to their responses. To address this, we propose simulating human feelings and optimising models to construct weak representations. Our findings shed light on the relationship between meaning and intelligence, and how we can build machines that comprehend and intend meaning.

翻訳日:2023-04-26 21:12:43 公開日:2023-04-25

# 自己監督型シングルフレームと多フレーム深度推定の相互影響の探索

Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation ( http://arxiv.org/abs/2304.12685v1 )

ライセンス: Link先を確認

Jie Xiang, Yun Wang, Lifeng An, Haiyang Liu and Jian Liu

(参考訳) 自己教師付きシングルフレームとマルチフレーム深度推定のどちらの手法もトレーニングのためにラベル付きモノクロビデオを必要とするが、それらが利用する情報は様々である。単フレーム法と多フレーム法の相補的な情報を考えると、多フレーム深度を改善するために単フレーム深度を活用しようとする研究もある。しかし、この手法では、単一フレーム深さと多フレーム深さの違いを生かさず、多フレーム深さを改善したり、複数フレーム深さを最適化したりすることはできない。シングルフレームとマルチフレームの相互影響をフル活用するために,新しい自己教師型トレーニングフレームワークを提案する。具体的には,まず,単一フレーム深度に誘導された画素方向適応深度サンプリングモジュールを導入し,マルチフレームモデルを訓練する。次に, 最小再プロジェクションに基づく蒸留損失を活用し, 知識をマルチフレーム深度ネットワークからシングルフレームネットワークに移し, シングルフレーム深度を改善する。最後に,改良された単一フレーム深度を,複数フレーム深度推定の性能をさらに向上させる前兆とみなす。 kitti と cityscapes のデータセットにおける実験結果から,本手法は自己教師付き単眼環境における既存手法よりも優れていることが示された。

Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based features while multi-frame methods focus on geometric cues. Considering the complementary information of single-frame and multi-frame methods, some works attempt to leverage single-frame depth to improve multi-frame depth. However, these methods can neither exploit the difference between single-frame depth and multi-frame depth to improve multi-frame depth nor leverage multi-frame depth to optimize single-frame depth models. To fully utilize the mutual influence between single-frame and multi-frame methods, we propose a novel self-supervised training framework. Specifically, we first introduce a pixel-wise adaptive depth sampling module guided by single-frame depth to train the multi-frame model. Then, we leverage the minimum reprojection based distillation loss to transfer the knowledge from the multi-frame depth network to the single-frame network to improve single-frame depth. Finally, we regard the improved single-frame depth as a prior to further boost the performance of multi-frame depth estimation. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms existing approaches in the self-supervised monocular setting.

翻訳日:2023-04-26 21:12:25 公開日:2023-04-25

# ドキュメンテーション:リアルタイムスクリーンカメラのロバストなドキュメンテーション

Docmarking: Real-Time Screen-Cam Robust Document Image Watermarking ( http://arxiv.org/abs/2304.12682v1 )

ライセンス: Link先を確認

Aleksey Yakushev, Yury Markin, Dmitry Obydenkov, Alexander Frolov, Stas Fomin, Manuk Akopyan, Alexander Kozachok, Arthur Gaynov

(参考訳) 本稿では,スクリーン写真の形での機密文書漏洩の調査に焦点をあてる。提案されたアプローチは、そもそもリークを防ぐのではなく、リークのソースを決定することを目的としている。方法は、スクリーンに透かしを半透明の画像と識別することで、人間の目にはほとんど認識できない。ウォーターマーク画像は静止状態であり、常にスクリーン上に残されているため、スクリーンの撮影された写真ごとにウォーターマークが存在する。このアプローチの重要なコンポーネントは3つのニューラルネットワークである。第1のネットワークは、この画像が画面上に表示されるとほとんど見えないように、埋め込みメッセージ付き画像を生成する。他の2つのニューラルネットワークは、組み込みメッセージを高精度に取得するために使用される。開発手法は異なるスクリーンとカメラで総合的にテストされた。実験の結果,提案手法は高い効率を示した。

This paper focuses on investigation of confidential documents leaks in the form of screen photographs. Proposed approach does not try to prevent leak in the first place but rather aims to determine source of the leak. Method works by applying on the screen a unique identifying watermark as semi-transparent image that is almost imperceptible for human eyes. Watermark image is static and stays on the screen all the time thus watermark present on every captured photograph of the screen. The key components of the approach are three neural networks. The first network generates an image with embedded message in a way that this image is almost invisible when displayed on the screen. The other two neural networks are used to retrieve embedded message with high accuracy. Developed method was comprehensively tested on different screen and cameras. Test results showed high efficiency of the proposed approach.

翻訳日:2023-04-26 21:11:59 公開日:2023-04-25

# 分散ロバスト最適化による微分プライバシー

Differential Privacy via Distributionally Robust Optimization ( http://arxiv.org/abs/2304.12681v1 )

ライセンス: Link先を確認

Aras Selvi and Huikang Liu and Wolfram Wiesemann

(参考訳) 近年、データセットの統計情報を共有するためのデファクトスタンダードとしてディファレンシャルプライバシが登場し、関連する個人に関する個人情報の開示が制限されている。これは、公開する統計をランダムに摂動させることによって達成され、結果として、プライバシの正確さのトレードオフにつながる: より大きな摂動は、より強力なプライバシー保証を提供するが、それらは、受信者に対して低いユーティリティを提供する、正確さの少ない統計をもたらす。したがって、特に興味を持つのは、選択されたプライバシーレベルに対して最高の精度を提供する最適なメカニズムである。これまで、この分野の研究は、事前の摂動の族を特定し、その漸近的および/または最良クラスの最適性を証明することに重点を置いてきた。本稿では,非漸近的かつ非条件的最適性を保証するメカニズムのクラスを開発する。この目的のために、無限次元分布ロバスト最適化問題として機構設計問題を定式化する。この問題には強い双対性が与えられ、この双対性を利用して有限次元上界および下界問題の収束階層を構築する。上界 (primal) は実装可能な摂動に対応しており、その準最適性は下界 (dual) で有界である。両方の境界問題は、固有の問題構造を利用する切断平面技術によって数秒以内に解決できる。数値実験により,我々の摂動は,標準ベンチマーク問題と同様に人工物に関する文献のこれまでの最良の結果を上回ることができることを示した。

In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.

翻訳日:2023-04-26 21:11:50 公開日:2023-04-25

# 付加ガウス雑音下における通信制約帯域

Communication-Constrained Bandits under Additive Gaussian Noise ( http://arxiv.org/abs/2304.12680v1 )

ライセンス: Link先を確認

Prathamesh Mayekar, Jonathan Scarlett, and Vincent Y.F. Tan

(参考訳) そこで本研究では,クライアントが学習者に,対応するアームプルに対する報奨に基づいてコミュニケーション制約付きフィードバックを提供する分散確率的多腕バンディットについて検討する。私たちの設定では、クライアントは、エンコードされた報酬の第二のモーメントが$p$以下であるように、報酬をエンコードする必要があります。この設定のために、情報理論的な下限 $\omega\left(\sqrt{\frac{kt}{\mathtt{snr} \wedge1}} \right)$ が任意のスキームのミニマックス後悔に基づいて導出され、ここで $ \mathtt{snr} := \frac{p}{\sigma^2}$, $k$ と $t$ はそれぞれ腕数と時間軸数である。さらに、この下限を小さな加法係数にマッチさせるマルチフェーズ帯域幅アルゴリズム、$\mathtt{UE\text{-}UCB++}$を提案する。 $\mathtt{UE\text{-}UCB++}$は初期フェーズで一様探索を行い、最終フェーズで {\em upper confidence bound }(UCB)banditアルゴリズムを利用する。 $\mathtt{UE\text{-}UCB++}$の興味深い特徴は、一様探索フェーズで生成された平均報酬の粗い推定が、次のフェーズで符号化プロトコルを洗練させ、その後のフェーズにおける報酬のより正確な平均見積もりをもたらすことである。この正の補強サイクルは、均一な探査ラウンドの数を減らし、我々の下界と密接に一致する。

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $\sigma^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{\sigma^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.

翻訳日:2023-04-26 21:11:25 公開日:2023-04-25

# 量子エンタングルメント浄化の進歩

Advances in quantum entanglement purification ( http://arxiv.org/abs/2304.12679v1 )

ライセンス: Link先を確認

Peishun Yan, Lan Zhou, Wei Zhong, and Yubo Sheng

(参考訳) その発見以来、量子の絡み合いは量子通信と計算において有望な資源となる。しかし、量子チャネルにノイズが存在するため、絡み合いは脆弱である。絡み合い精製は、低品質の絡み合い状態から高品質の絡み合い状態を蒸留する強力なツールである。本稿では, エンタングルメント浄化理論, 線形光学を用いたエンタングルメント浄化プロトコル(EPP), クロスカー非線形性を持つEPP, ハイパーエンタングルメントEPP, 決定論的EPP, 測定に基づくEPPなど, 絡み合い浄化の概観を紹介する。また、線形光学におけるEPPの実験的進歩についても概説する。最後に,EPPの今後の発展に向けた展望について考察する。このレビューは、将来の長距離量子通信と量子ネットワークにおける実践的な実装の道を開くかもしれない。

Since its discovery, the quantum entanglement becomes a promising resource in quantum communication and computation. However, the entanglement is fragile due to the presence of noise in quantum channels. Entanglement purification is a powerful tool to distill high quality entangled states from the low quality entangled states. In this review, we present an overview of entanglement purification, including the basic entanglement purification theory, the entanglement purification protocols (EPPs) with linear optics, EPPs with cross-Kerr nonlinearities, hyperentanglement EPPs, deterministic EPPs, and measurement-based EPPs. We also review experimental progresses of EPPs in linear optics. Finally, we give the discussion on potential outlook for the future development of EPPs. This review may pave the way for practical implementations in future long-distance quantum communication and quantum network.

翻訳日:2023-04-26 21:10:48 公開日:2023-04-25

# 最大符号化速度低減による文表現圧縮

Compressing Sentence Representation with maximum Coding Rate Reduction ( http://arxiv.org/abs/2304.12674v1 )

ライセンス: Link先を確認

Domagoj \v{S}everdija, Tomislav Prusina, Antonio Jovanovi\'c, Luka Borozan, Jurica Maltar, and Domagoj Matijevi\'c

(参考訳) ほとんどの自然言語推論問題では、文表現は意味検索タスクに必要である。近年、事前訓練された大規模言語モデルはそのような表現の計算に非常に効果的である。これらのモデルは高次元の文埋め込みを生成する。大型モデルと小型モデルの間に明らかなパフォーマンスギャップがある。したがって、空間的および時間的ハードウェアの制限により、より小さなモデルを使用する場合、大言語モデルの蒸留版である同等の結果を得る必要がある。本稿では, 汎用多様体クラスタリングのための新しい手法であるmcr2objective(maximum coding rate reduction)に基づいて学習した投影層を用いて, 事前学習した蒸留モデルの拡張により, 文表現モデル文-bertのモデル蒸留を評価する。複雑性と文埋め込みサイズを低減した新しい言語モデルは,セマンティック検索ベンチマークにおいて同等の結果が得られることを示す。

In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. In recent years, pre-trained large language models have been quite effective for computing such representations. These models produce high-dimensional sentence embeddings. An evident performance gap between large and small models exists in practice. Hence, due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model, which is usually a distilled version of the large language model. In this paper, we assess the model distillation of the sentence representation model Sentence-BERT by augmenting the pre-trained distilled model with a projection layer additionally learned on the Maximum Coding Rate Reduction (MCR2)objective, a novel approach developed for general-purpose manifold clustering. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.

翻訳日:2023-04-26 21:10:33 公開日:2023-04-25

# 時間窓における状態生成を必要とする量子プロトコルの解析ツール

Tools for the analysis of quantum protocols requiring state generation within a time window ( http://arxiv.org/abs/2304.12673v1 )

ライセンス: Link先を確認

Bethany Davies, Thomas Beauchamp, Gayane Vardoyan, Stephanie Wehner

(参考訳) 量子プロトコルは一般に、特定の数の量子リソース状態を同時に利用できる必要がある。重要な例の1つは、ある数の絡み合った対を必要とする量子ネットワークプロトコルである。ここでは、プロセスが時間ステップ毎に何らかの確率 p$ を持つ量子資源状態を生成し、時間依存ノイズの対象となる量子メモリに格納する設定を考える。アプリケーションに十分な品質を維持するため、各リソース状態は$w$の時間ステップ後にメモリから破棄される。 $s$ をプロトコルによって要求される所望のリソース状態の数とする。確率分布 $X_{(w,s)}$ の量子リソース状態の年齢を特徴付け、$s$状態がウィンドウ$w$で生成される。時間依存ノイズモデルと組み合わせることで、この分布の知識は$s$量子リソースの忠実度統計量を計算することができる。また、待ち時間 $\tau_{(w,s)}$ がウィンドウ$w$内で生成されるまで、待ち時間 $\tau_{(w,s)}$ の第1と第2の瞬間の正確なソリューションも提供します。期待される待ち時間 $\mathbb{E}(\tau_{(w,s)})$ と分布 $X_{(w,s)}$ を記述する統計量に対する一般閉形式式を得るのは難しいので、あるパラメータ体系における計算を支援する2つの新しい結果を示す。この研究で示された手法は、量子プロトコルの実行の分析と最適化に利用できる。具体的には、Blind Quantum Computing(BQC)プロトコルの例として、$w$と$p$を推論して、プロトコル実行の成功率を最適化する方法について説明する。

Quantum protocols commonly require a certain number of quantum resource states to be available simultaneously. An important class of examples is quantum network protocols that require a certain number of entangled pairs. Here, we consider a setting in which a process generates a quantum resource state with some probability $p$ in each time step, and stores it in a quantum memory that is subject to time-dependent noise. To maintain sufficient quality for an application, each resource state is discarded from the memory after $w$ time steps. Let $s$ be the number of desired resource states required by a protocol. We characterise the probability distribution $X_{(w,s)}$ of the ages of the quantum resource states, once $s$ states have been generated in a window $w$. Combined with a time-dependent noise model, the knowledge of this distribution allows for the calculation of fidelity statistics of the $s$ quantum resources. We also give exact solutions for the first and second moments of the waiting time $\tau_{(w,s)}$ until $s$ resources are produced within a window $w$, which provides information about the rate of the protocol. Since it is difficult to obtain general closed-form expressions for statistical quantities describing the expected waiting time $\mathbb{E}(\tau_{(w,s)})$ and the distribution $X_{(w,s)}$, we present two novel results that aid their computation in certain parameter regimes. The methods presented in this work can be used to analyse and optimise the execution of quantum protocols. Specifically, with an example of a Blind Quantum Computing (BQC) protocol, we illustrate how they may be used to infer $w$ and $p$ to optimise the rate of successful protocol execution.

翻訳日:2023-04-26 21:10:20 公開日:2023-04-25

# 勾配ブースティング法による銀河系外電波源の形態分類

Morphological Classification of Extragalactic Radio Sources Using Gradient Boosting Methods ( http://arxiv.org/abs/2304.12729v1 )

ライセンス: Link先を確認

Abdollah Masoud Darya, Ilias Fernini, Marley Vellasco, Abir Hussain

(参考訳) 電波天文学の分野は、新しく任命された電波望遠鏡によって、1日に生成されるデータ量の増加を目撃している。この分野で最も重要な問題の1つは、銀河系外電波源のモルフォロジーに基づく自動分類である。銀河系外電波源の形態分類の分野での最近の貢献は、畳み込みニューラルネットワークに基づく分類器の提案である。あるいは、畳み込みニューラルネットワークに対するデータ効率の代替として、主成分分析を伴う勾配向上機械学習手法を提案する。近年, 表型データを用いた分類問題に対して, 深層学習における勾配促進法の有効性が示されている。この研究で考慮された勾配向上手法は、XGBoost、LightGBM、CatBoostの実装に基づいている。本研究は,データセットサイズが分類器の性能に及ぼす影響についても検討する。この研究では、Best-Heckmanサンプルからの電波源を用いて、クラス0、クラスI、クラスIIの3つの主要なファナロフ・ライリークラスに基づいて、3クラス分類問題を考える。提案された3つの勾配向上手法は、画像の4分の1未満を使用して、最先端の畳み込みニューラルネットワークベースの分類器より優れており、CatBoostが最も精度が高い。これは主にファナロフ・ライリークラスiiソースの分類における勾配促進法の精度が向上し、3-4\%高いリコールが得られたためである。

The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3--4\% higher recall.

翻訳日:2023-04-26 21:03:35 公開日:2023-04-25

# 非エルミート2軌道モデルの位相的性質

Topological properties of a non-Hermitian two-orbital model ( http://arxiv.org/abs/2304.12723v1 )

ライセンス: Link先を確認

Dipendu Halder and Saurabh Basu

(参考訳) 単位セル毎に2つの軌道からなるタイトな結合鎖の非エルミタン(NH)バージョンを徹底的に解析する。非ハーミティシティは、それぞれPT対称および非PT対称のケースに分岐し、非相互近接ホッピング振幅と純粋に想像上の現場電位エネルギーによって特徴づけられる。モデルの局所化とトポロジカルな性質に関する研究は、いくつかの興味深い結果を示している。例えば、それらは異なる特徴、すなわち非PT対称の場合のラインギャップとPT対称の場合のポイントギャップを持つ複素エネルギーギャップを持つ。さらに、NH系の特徴であるNH皮膚効果は、ここでは存在せず、状態の局所密度を計算することによって確認される。両NH変種に対するバルク境界対応は両直交条件に従う。さらに、逆参加比によって得られるエッジモードの局在化は、ハミルトニアンのパラメータに様々な依存性を示す。また、位相的性質は位相不変量の挙動、すなわち有限値からゼロへの急な遷移を示す複素ベリー位相と区別できる。興味深いことに、PT対称系は、パラメータの値によってPT破壊と未破壊の位相に分割される。最後に、結果がhermitianモデルでベンチマークされ、nh変種で得られた結果の比較と対比が行われる。

We perform a thorough analysis of a non-Hermitian (NH) version of a tight binding chain comprising of two orbitals per unit cell. The non-Hermiticity is further bifurcated into PT symmetric and non-PT symmetric cases, respectively, characterized by non-reciprocal nearest neighbour hopping amplitudes and purely imaginary onsite potential energies. The studies on the localization and the topological properties of our models reveal several intriguing results. For example, they have complex energy gaps with distinct features, that is, a line gap for the non-PT symmetric case and a point gap for the PT symmetric case. Further, the NH skin effect, a distinctive feature of the NH system, is non-existent here and is confirmed via computing the local density of states. The bulk-boundary correspondence for both the NH variants obeys a bi-orthogonal condition. Moreover, the localization of the edge modes obtained via the inverse participation ratio shows diverse dependencies on the parameters of the Hamiltonian. Also, the topological properties are discernible from the behaviour of the topological invariant, namely, the complex Berry phase, which shows a sharp transition from a finite value to zero. Interestingly, the PT symmetric system is found to split between a PT broken and an unbroken phase depending on the values of the parameters. Finally, the results are benchmarked with the Hermitian model to compare and contrast those obtained for the NH variants.

翻訳日:2023-04-26 21:03:14 公開日:2023-04-25

# dual cross-attention を用いた眼球追跡誘導型深層複数インスタンス学習による眼底疾患検出

Eye tracking guided deep multiple instance learning with dual cross-attention for fundus disease detection ( http://arxiv.org/abs/2304.12719v1 )

ライセンス: Link先を確認

Hongyang Jiang, Jingqi Huang, Chen Tang, Xiaoqing Zhang, Mengdi Gao, Jiang Liu

(参考訳) 深層ニューラルネットワーク(dnns)は,眼科医の診断ミスや誤診率の軽減を支援するため,眼底疾患のコンピュータ支援診断(cad)システムの開発を促進する。しかし、CADシステムの大部分はデータ駆動であるが、パフォーマンスに優しい医学的事前知識が不足している。そこで本稿では,眼科医の視線追跡情報を利用したHuman-in-the-loop (HITL) CADシステムを提案する。具体的には,視線追跡による視線マップがチェリーピックの診断関連事例に有用であったマルチ・インスタンス・ラーニング(MIL)に基づいてHITL CADシステムを実装した。さらに, 二重クロスアテンションMIL (DCAMIL) ネットワークを用いて, ノイズの抑制効果について検討した。一方, トレーニングバッグのインスタンスを充実・標準化するために, シーケンス拡張モジュールとドメイン逆数モジュールの両方を導入し, 本手法の堅牢性を高めた。我々は,新たに構築したデータセット(amd-gazeとdr-gaze)について,それぞれamdと早期dr検出のための比較実験を行った。眼科医の視線追跡情報を完全に探索し, HITL CADシステムの実現可能性と提案したDCAMILの優位性を実証した。これらの調査から,医学的先行知識として医師の視線マップが臨床疾患のCADシステムに寄与する可能性が示唆された。

Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, helping ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by leveraging ophthalmologists' eye-tracking information, which is more efficient and accurate. Concretely, the HITL CAD system was implemented on the multiple instance learning (MIL), where eye-tracking gaze maps were beneficial to cherry-pick diagnosis-related instances. Furthermore, the dual-cross-attention MIL (DCAMIL) network was utilized to curb the adverse effects of noisy instances. Meanwhile, both sequence augmentation module and domain adversarial module were introduced to enrich and standardize instances in the training bag, respectively, thereby enhancing the robustness of our method. We conduct comparative experiments on our newly constructed datasets (namely, AMD-Gaze and DR-Gaze), respectively for the AMD and early DR detection. Rigorous experiments demonstrate the feasibility of our HITL CAD system and the superiority of the proposed DCAMIL, fully exploring the ophthalmologists' eye-tracking information. These investigations indicate that physicians' gaze maps, as medical prior knowledge, is potential to contribute to the CAD systems of clinical diseases.

翻訳日:2023-04-26 21:02:55 公開日:2023-04-25

# 量子サービス提供の比較:MaxCutにおけるQAOAの事例

Comparing Quantum Service Offerings: A Case Study of QAOA for MaxCut ( http://arxiv.org/abs/2304.12718v1 )

ライセンス: Link先を確認

Julian Obst and Johanna Barzen and Martin Beisel and Frank Leymann and Marie Salm and Felix Truger

(参考訳) 量子コンピューティングの出現に伴い、多くの量子デバイスがクラウド経由でアクセスできるようになった。しかし、この分野の急速な発展により、これらの量子特化サービスの提供は、ソフトウェア開発者に課す能力と要件が著しく異なる。これは、これらのサービスをアプリケーションの一部として使用することに関心がある量子コンピューティング領域の外部の実践者にとって、特に困難である。本稿では,異なるハードウェア技術に基づく複数のデバイスを比較し,それぞれに同じ実験を行うことにより,異なる提供物を通じて提供する。実験から得られた教訓を文書化することにより,量子特化製品の利用を簡素化し,主要な量子ハードウェア技術間の差異を明らかにすることを目的とする。

With the emergence of quantum computing, a growing number of quantum devices is accessible via cloud offerings. However, due to the rapid development of the field, these quantum-specific service offerings vary significantly in capabilities and requirements they impose on software developers. This is particularly challenging for practitioners from outside the quantum computing domain who are interested in using these offerings as parts of their applications. In this paper, we compare several devices based on different hardware technologies and provided through different offerings, by conducting the same experiment on each of them. By documenting the lessons learned from our experiments, we aim to simplify the usage of quantum-specific offerings and illustrate the differences between predominant quantum hardware technologies.

翻訳日:2023-04-26 21:02:28 公開日:2023-04-25

# 格子ゲージ理論における閉じ込め物質のメソスコピックスケールにおけるコヒーレンス

Coherence of confined matter in lattice gauge theories at the mesoscopic scale ( http://arxiv.org/abs/2304.12713v1 )

ライセンス: Link先を確認

Enrico C. Domanti, Paolo Castorina, Dario Zappal\`a and Luigi Amico

(参考訳) ゲージ理論は時空局所対称性を示す物理系に現れる。基本的な相互作用から統計力学、凝縮物質、最近では量子計算まで、物理学の重要な領域の強力な記述を提供する。そのため、この分野では極めて深い理解が得られている。量子技術の出現により、量子シミュレーションによって元の量子場理論の重要な特徴を捉えることができる低エネルギーアナログが集中的に研究されている。本稿では,メソスコピック空間スケールに制約のある格子ゲージ理論について検討する。そこで本研究では,メゾスコピックサイズのリング状格子に存在する中間子を有効磁場で貫通するダイナミクスについて検討する。これらの条件下では、中間子はユニークな特徴によって特徴づけられる。我々は、磁場と中間子の内部構造との結合を反映した新しいタイプのアハロノフ・ボーム振動を発見した。中間子のコヒーレンス特性は、持続電流と相関関数の特定の特徴によって定量化される。磁場がクエンチされると、アハロノフ・ボーム振動と中間子電流のメゾスコピック的特徴と相関関係は特定の物質波ダイナミクスを開始する。

Gauge theories arise in physical systems displaying space-time local symmetries. They provide a powerful description of important realms of physics ranging from fundamental interactions, to statistical mechanics, condensed matter and more recently quantum computation. As such, a remarkably deep understanding has been achieved in the field. With the advent of quantum technology, lower energy analogs, capable to capture important features of the original quantum field theories through quantum simulation, have been intensively studied. Here, we study lattice gauge theories constrained to mesoscopic spatial scales. To this end, we study the dynamics of mesons residing in a ring-shaped lattice of mesoscopic size pierced by an effective magnetic field. We demonstrate that, in these conditions, mesons are characterized by unique features. We find a new type of Aharonov-Bohm oscillations reflecting the coupling between the magnetic field and the internal structure of the meson. The coherence properties of the meson are quantified by the persistent current and by specific features of the correlation functions. When the magnetic field is quenched, Aharonov-Bohm oscillations and mesoscopic features of the meson current and correlations start a specific matter-wave dynamics.

翻訳日:2023-04-26 21:02:18 公開日:2023-04-25

# ロバスト深部平衡モデルの学習

Learning Robust Deep Equilibrium Models ( http://arxiv.org/abs/2304.12707v1 )

ライセンス: Link先を確認

Haoyu Chu, Shikui Wei, and Ting Liu

(参考訳) 深層平衡(deq)モデルは、単一の非線形層の不動点を解くことで従来の深さを捨てる深層学習において有望な暗黙層モデルのクラスとして出現した。その成功にもかかわらず、これらのモデルの不動点の安定性は未だよく分かっていない。近年、Lyapunov理論は、別のタイプの暗黙的層モデルであるNeural ODEsに適用され、対向的ロバスト性を示す。非線形力学系としてDECモデルを考慮し、リアプノフ理論による証明可能な安定性を保証した頑健なDECモデルLyaDEQを提案する。我々の手法の要点は、DEC モデルの固定点が Lyapunov 安定であることを保証することで、LyaDEQ モデルが小さな初期摂動に抵抗することを可能にすることである。互いに近接するリアプノフ安定不動点による逆防御の悪さを避けるため、リアプノフ安定モジュールの後に直交完全連結層を追加して異なる不動点を分離する。 lyadeqモデルは,よく知られた敵の攻撃下,広く使用されているデータセット上で評価され,実験によりロバスト性が著しく向上した。さらに,LyaDEQモデルは,対戦訓練などの他の防御手法と組み合わせることで,より優れた対戦力を実現することができることを示す。

Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models in deep learning, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. Recently, Lyapunov theory has been applied to Neural ODEs, another type of implicit layer model, to confer adversarial robustness. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed provable stability via Lyapunov theory. The crux of our method is ensuring the fixed points of the DEQ models are Lyapunov stable, which enables the LyaDEQ models to resist the minor initial perturbations. To avoid poor adversarial defense due to Lyapunov-stable fixed points being located near each other, we add an orthogonal fully connected layer after the Lyapunov stability module to separate different fixed points. We evaluate LyaDEQ models on several widely used datasets under well-known adversarial attacks, and experimental results demonstrate significant improvement in robustness. Furthermore, we show that the LyaDEQ model can be combined with other defense methods, such as adversarial training, to achieve even better adversarial robustness.

翻訳日:2023-04-26 21:02:00 公開日:2023-04-25

# BERTはプロソディについて何を学ぶのか?

What does BERT learn about prosody? ( http://arxiv.org/abs/2304.12706v1 )

ライセンス: Link先を確認

Sofoklis Kakouros and Johannah O'Mahony

(参考訳) 自然言語処理アプリケーションでは、言語モデルはほぼ至るところで使われている。モデル設計は、訓練中に所定の言語目標を定義するのではなく、言語の一般化された表現を学習することを目的としているため、モデルが暗黙的にキャプチャする表現の分析と解釈は、解釈可能性とモデル性能のギャップを埋める上で重要である。いくつかの研究は、モデルが表現能力に関する洞察を与える言語情報を探索してきた。しかし、現在の研究では、韻律がモデルが学習する言語の構造情報の一部であるかどうかについて検討していない。本研究では,異なる層でキャプチャされた表現をBERTで探索する実験を行った。以上の結果から,韻律的優位性に関する情報は多くの層にまたがるが,中層を中心にして,BERTは構文情報や意味情報に大きく依存していることが示唆された。

Language models have become nearly ubiquitous in natural language processing applications achieving state-of-the-art results in many tasks including prosody. As the model design does not define predetermined linguistic targets during training but rather aims at learning generalized representations of the language, analyzing and interpreting the representations that models implicitly capture is important in bridging the gap between interpretability and model performance. Several studies have explored the linguistic information that models capture providing some insights on their representational capacity. However, the current studies have not explored whether prosody is part of the structural information of the language that models learn. In this work, we perform a series of experiments on BERT probing the representations captured at different layers. Our results show that information about prosodic prominence spans across many layers but is mostly focused in middle layers suggesting that BERT relies mostly on syntactic and semantic information.

翻訳日:2023-04-26 21:01:37 公開日:2023-04-25

# 野生動物保護者エンパワーメント:深層学習と3/4gカメラトラップを用いた生物多様性保全のための公平なデジタルスチュワードシップと報酬システム

Empowering Wildlife Guardians: An Equitable Digital Stewardship and Reward System for Biodiversity Conservation using Deep Learning and 3/4G Camera Traps ( http://arxiv.org/abs/2304.12703v1 )

ライセンス: Link先を確認

Paul Fergus, Carl Chalmers, Steven Longmore, Serge Wich, Carmen Warmenhove, Jonathan Swart, Thuto Ngongwane, Andr\'e Burger, Jonathan Ledgard, and Erik Meijaard

(参考訳) 我々の惑星の生物多様性は脅威にさらされており、約100万種が数十年以内に絶滅すると予想されている。理由は、狩猟、過剰漁、汚染、都市化と農業のための土地の転換など、ネガティブな人間の行動である。自然に利益をもたらす活動のための慈善団体や政府からのかなりの投資にもかかわらず、世界の野生生物の数は減少し続けている。地域の野生生物保護者は歴史的に地球環境保全活動において重要な役割を担い、様々なレベルで持続可能性を達成する能力を示した。 2021年、COP26は彼らの貢献を認め、年間170億米ドルを約束したが、これは地球生物多様性の80%を保護するため、利用可能な世界の生物多様性予算(年間1240億米ドルと年間143億米ドル)のごく一部である。本稿では,動物が自身の資金を所有する「種間貨幣」に基づく急進的な新しいソリューションを提案する。デジタル双生児を各種のために作ることで、動物は提供したサービスのために保護者に資金を分配することができる。例えば、サイは、生存状態が良好である限り、カメラトラップで検出されるたびに、その保護者に対して支払いを行うことができる。このアプローチの有効性をテストするため、南アフリカのリンポポ州のヴェルゲヴォンデンゲーム保護区の400km2のエリアに27台のカメラトラップが配備された。モーショントリガーで撮影されたカメラトラップは10ヶ月間動作し、ディープラーニングを使って12種の動物を撮影しました。各種について、その場しのぎの銀行口座を設置し、クレジットは {\pounds}100。動物がカメラで捕獲され、うまく分類された度に、1ペニー(種の実際の価値を決定するための任意の量-メカニズム)が動物アカウントから関連する保護者に転送された。

The biodiversity of our planet is under threat, with approximately one million species expected to become extinct within decades. The reason; negative human actions, which include hunting, overfishing, pollution, and the conversion of land for urbanisation and agricultural purposes. Despite significant investment from charities and governments for activities that benefit nature, global wildlife populations continue to decline. Local wildlife guardians have historically played a critical role in global conservation efforts and have shown their ability to achieve sustainability at various levels. In 2021, COP26 recognised their contributions and pledged US$1.7 billion per year; however, this is a fraction of the global biodiversity budget available (between US$124 billion and US$143 billion annually) given they protect 80% of the planets biodiversity. This paper proposes a radical new solution based on "Interspecies Money," where animals own their own money. Creating a digital twin for each species allows animals to dispense funds to their guardians for the services they provide. For example, a rhinoceros may release a payment to its guardian each time it is detected in a camera trap as long as it remains alive and well. To test the efficacy of this approach 27 camera traps were deployed over a 400km2 area in Welgevonden Game Reserve in Limpopo Province in South Africa. The motion-triggered camera traps were operational for ten months and, using deep learning, we managed to capture images of 12 distinct animal species. For each species, a makeshift bank account was set up and credited with {\pounds}100. Each time an animal was captured in a camera and successfully classified, 1 penny (an arbitrary amount - mechanisms still need to be developed to determine the real value of species) was transferred from the animal account to its associated guardian.

翻訳日:2023-04-26 21:01:21 公開日:2023-04-25

# 参加ゲーム

The Participation Game ( http://arxiv.org/abs/2304.12700v1 )

ライセンス: Link先を確認

Mark Thomas Kennedy, Nelson Phillips

(参考訳) チューリングの有名な「模擬ゲーム」や、先進的な事前学習型トランスフォーマーの最近の進歩にインスパイアされた私たちは、AI進化における新たなフロンティアを指して、機械が社会構築プロセスに参加することを示します。参加ゲームは創造的で遊び心のある競争であり、人間が世界を理解し秩序づけるために使用するカテゴリを適用、曲げ、拡張することを要求する。ゲームを定義し、aiのテストとして模倣を超えた理由を与えると、参加ゲームと人間の知性を示す社会構築のプロセスとの類似性が強調される。次に社会の基本構成とガバナンスの選択肢について論じる。

Inspired by Turing's famous "imitation game" and recent advances in generative pre-trained transformers, we pose the participation game to point to a new frontier in AI evolution where machines will join with humans as participants in social construction processes. The participation game is a creative, playful competition that calls for applying, bending, and stretching the categories humans use to make sense of and order their worlds. After defining the game and giving reasons for moving beyond imitation as a test of AI, we highlight parallels between the participation game and processes of social construction, a hallmark of human intelligence. We then discuss implications for fundamental constructs of societies and options for governance.

翻訳日:2023-04-26 21:00:47 公開日:2023-04-25

# 漏洩波ホログラムによる軌道角運動量発生器の設計のためのディープラーニングフレームワーク

Deep Learning Framework for the Design of Orbital Angular Momentum Generators Enabled by Leaky-wave Holograms ( http://arxiv.org/abs/2304.12695v1 )

ライセンス: Link先を確認

Naser Omrani, Fardin Ghorbani, Sina Beyraghi, Homayoon Oraizi, Hossein Soleimani

(参考訳) 本稿では,Flat Optics (FO) と機械学習 (ML) 技術を組み合わせて,OAMを駆動する電磁波を発生させる漏洩波ホログラフィックアンテナの設計手法を提案する。本システムの性能を向上させるために,機械学習を用いて放射線パターン全体を効果的に制御できる数学的関数,すなわち,放射線パターンの中心ヌル深さを増加させると同時にサイドローブレベル(sll)を低下させる。様々なシナリオにおいて最適な結果を得るためには,ホログラフィック理論に基づくインピーダンス方程式のパラメータの精密チューニングが必要である。本研究では,パラメータの近似値を決定するために機械学習を適用した。各パラメータの最適な値を決定でき、合計77,000個の生成されたデータセットを使用して、所望の放射線パターンが得られる。さらに、MLの使用は時間を節約するだけでなく、手動パラメータチューニングや従来の最適化手法よりも正確で正確な結果をもたらす。

In this paper, we present a novel approach for the design of leaky-wave holographic antennas that generates OAM-carrying electromagnetic waves by combining Flat Optics (FO) and machine learning (ML) techniques. To improve the performance of our system, we use a machine learning technique to discover a mathematical function that can effectively control the entire radiation pattern, i.e., decrease the side lobe level (SLL) while simultaneously increasing the central null depth of the radiation pattern. Precise tuning of the parameters of the impedance equation based on holographic theory is necessary to achieve optimal results in a variety of scenarios. In this research, we applied machine learning to determine the approximate values of the parameters. We can determine the optimal values for each parameter, resulting in the desired radiation pattern, using a total of 77,000 generated datasets. Furthermore, the use of ML not only saves time, but also yields more precise and accurate results than manual parameter tuning and conventional optimization methods.

翻訳日:2023-04-26 21:00:34 公開日:2023-04-25

# 高効率・長期依存性学習能力を有する平行スパイキングニューロン

Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability ( http://arxiv.org/abs/2304.12760v1 )

ライセンス: Link先を確認

Wei Fang, Zhaofei Yu, Zhaokun Zhou, Yanqi Chen, Zhengyu Ma, Timoth\'ee Masquelier, Yonghong Tian

(参考訳) スパイキングニューラルネットワーク(SNN)のバニラスパイクニューロンは、チャージ・ファイア・リセット・ニューラルダイナミクスを使用し、シリアルでしかシミュレートできず、長期間の依存関係を学べない。リセットを取り除くと、ニューロンのダイナミクスは非イテレーティブな形で再構成され、並列化できる。一般定式化にリセットすることなくニューロンのダイナミクスを書き換えることにより,時間ステップ間の密接な接続を用いて時間情報の利用を最大化する並列スパイキングニューロン(psn)を提案する。低遅延推論における将来の入力の使用を避けるため、重みにマスクを追加し、マスク付きPSNを得る。時間ステップ間で重みを共有することにより、スライディングPSNは可変長のシーケンスを扱うことができる。シミュレーション速度と時間・静的データ分類におけるpsnファミリーの評価を行い,psnファミリーの効率と精度において圧倒的な優位性を示した。私たちの知る限りでは、これはスパイクニューロンの並列化に関する最初の研究であり、スパイク深層学習コミュニティの基盤となるでしょう。我々のコードは \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron} で公開されている。

Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated in serial and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics are reformulated in a non-iterative form and can be parallelized. By rewriting neuronal dynamics without resetting to a general formulation, we propose the Parallel Spiking Neuron (PSN), which uses dense connections between time-steps to maximize the utilization of temporal information. To avoid the use of future inputs for low-latency inference, we add masks on the weights and obtain the masked PSN. By sharing weights across time-steps, the sliding PSN is proposed with the ability to deal with sequences with variant lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To our best knowledge, this is the first research about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning community. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.

翻訳日:2023-04-26 20:54:57 公開日:2023-04-25

# Node機能拡張によるネットワークアライメントの仮想化

Node Feature Augmentation Vitaminizes Network Alignment ( http://arxiv.org/abs/2304.12751v1 )

ライセンス: Link先を確認

Jin-Duk Park, Cong Tran, Won-Yong Shin, Xin Cao

(参考訳) ネットワークアライメント(NA)は、与えられたネットワークのトポロジカルおよび/または特徴情報を用いて、複数のネットワークにまたがるノード対応を発見するタスクである。 naメソッドは無数のシナリオで目覚ましい成功を収めてきたが、プライバシの懸念やアクセス制限のために常に利用できるとは限らない、事前のアンカーリンクや/またはノード機能などの追加情報なしでは有効ではない。そこで本研究では,新しいna法であるgrad-align+を提案する。grad-align+は最先端のna法であるgrad-alignをベースとし,全てのノード対が見つかるまでノード対の一部のみを徐々に発見する。 Grad-Align+を設計する際には、NAタスクの実行という意味でノード機能を拡張する方法と、拡張ノード機能を最大限活用してNAメソッドを設計する方法を説明します。この目的を達成するために、3つの主要コンポーネントからなるGrad-Align+を開発します。 1)中心性に基づくノード特徴増強(CNFA) 2)グラフニューラルネットワーク(gnn)による拡張ノードの特徴と組込み類似度計算 3)アライメント・クロスネットワーク・ニアペア(ACN)の情報を用いた類似度計算による段階的NA。包括的実験を通して、Grad-Align+が示すことを実証する。 (a)ベンチマークNAメソッドよりも大きなマージンによる優位性。 (b)CNFAの有効性を確認するための実証的検証と理論的知見。 (c)各構成要素の影響 (d)ネットワークノイズに対する堅牢性、及び (e)計算効率。

Network alignment (NA) is the task of discovering node correspondences across multiple networks using topological and/or feature information of given networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their effectiveness is not without additional information such as prior anchor links and/or node features, which may not always be available due to privacy concerns or access restrictions. To tackle this practical challenge, we propose Grad-Align+, a novel NA method built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers only a part of node pairs until all node pairs are found. In designing Grad-Align+, we account for how to augment node features in the sense of performing the NA task and how to design our NA method by maximally exploiting the augmented node features. To achieve this goal, we develop Grad-Align+ consisting of three key components: 1) centrality-based node feature augmentation (CNFA), 2) graph neural network (GNN)-aided embedding similarity calculation alongside the augmented node features, and 3) gradual NA with similarity calculation using the information of aligned cross-network neighbor-pairs (ACNs). Through comprehensive experiments, we demonstrate that Grad-Align+ exhibits (a) the superiority over benchmark NA methods by a large margin, (b) empirical validations as well as our theoretical findings to see the effectiveness of CNFA, (c) the influence of each component, (d) the robustness to network noises, and (e) the computational efficiency.

翻訳日:2023-04-26 20:54:38 公開日:2023-04-25

# ブロックチェーンの大規模言語モデル

Blockchain Large Language Models ( http://arxiv.org/abs/2304.12749v1 )

ライセンス: Link先を確認

Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais

(参考訳) 本稿では,異常なブロックチェーントランザクションを検出するための動的,リアルタイムなアプローチを提案する。提案するツールであるTXRANKは,ブロックチェーンアクティビティのトレース表現を生成して,大規模な言語モデルをスクラッチからトレーニングして,リアルタイム侵入検出システムとして動作させる。従来の方法とは異なり、txrankは制限のない検索空間を提供し、事前定義されたルールやパターンに依存しないように設計されている。本稿では,Ethereumトランザクションの異常検出ツールとしてTXRANKの有効性を示す。実験では,68万トランザクションのデータセット間の異常なトランザクションを効果的に識別し,バッチ処理のスループットは平均で2284トランザクションである。以上の結果から,TXRANKは,被害者契約と相互作用する最も異常なトランザクションのうち,124件中49件をランク付けし,異常なトランザクションを識別した。この研究は、トランスフォーマーアーキテクチャと互換性のあるカスタムデータエンコーディング、ドメイン固有のトークン化技術、Ethereum仮想マシン(EVM)トレース表現用に特別に開発されたツリーエンコーディングメソッドを導入することで、ブロックチェーントランザクション分析の分野に貢献する。

This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, TXRANK, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, TXRANK is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of TXRANK through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, TXRANK identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.

翻訳日:2023-04-26 20:54:17 公開日:2023-04-25

# 暗黙的カメラモデル学習による撮像過程の反転

Inverting the Imaging Process by Learning an Implicit Camera Model ( http://arxiv.org/abs/2304.12748v1 )

ライセンス: Link先を確認

Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Qing Wang

(参考訳) 暗黙の座標に基づくニューラルネットワークによる視覚的信号の表現は、従来の離散的信号表現の効果的な代替として、コンピュータビジョンやグラフィックスでかなりの人気を集めている。本稿では、シーンのみをモデル化する既存の暗黙的ニューラルネットワークとは対照的に、カメラの物理的撮像過程をディープニューラルネットワークとして表現する新しい暗黙的カメラモデルを提案する。新たな暗黙カメラモデルが2つの逆イメージングタスクに与える影響を実証する。一オールインフォーカス写真を生成して ii) hdrイメージング。具体的には、暗黙のぼかし発生器と暗黙のトーンマッパーを考案し、それぞれカメラの撮像プロセスの開口と露出をモデル化する。我々の暗黙カメラモデルは、マルチフォーカススタックとマルチ露光ブラケット監視の下で暗黙のシーンモデルと共同で学習する。我々は,多数のテスト画像とビデオに対して,新しいモデルの有効性を実証し,高精度で視覚に訴えるオールインフォーカスと高ダイナミックレンジの画像を生成する。原則として、新しい暗黙的ニューラルカメラモデルは、他の様々な逆画像処理の恩恵を受ける可能性がある。

Representing visual signals with implicit coordinate-based neural networks, as an effective replacement of the traditional discrete signal representation, has gained considerable popularity in computer vision and graphics. In contrast to existing implicit neural representations which focus on modelling the scene only, this paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network. We demonstrate the power of this new implicit camera model on two inverse imaging tasks: i) generating all-in-focus photos, and ii) HDR imaging. Specifically, we devise an implicit blur generator and an implicit tone mapper to model the aperture and exposure of the camera's imaging process, respectively. Our implicit camera model is jointly learned together with implicit scene models under multi-focus stack and multi-exposure bracket supervision. We have demonstrated the effectiveness of our new model on a large number of test images and videos, producing accurate and visually appealing all-in-focus and high dynamic range images. In principle, our new implicit neural camera model has the potential to benefit a wide array of other inverse imaging tasks.

翻訳日:2023-04-26 20:53:56 公開日:2023-04-25

# データベース管理システムのためのディープラーニングに基づくオートチューニング

Deep learning based Auto Tuning for Database Management System ( http://arxiv.org/abs/2304.12747v1 )

ライセンス: Link先を確認

Karthick Prasad Gunasekaran, Kajal Tiwari, Rachana Acharya

(参考訳) データベースシステム構成の管理は、システムのあらゆる側面を制御する何百もの構成ノブがあるため、難しい作業である。これは、これらのノブが標準化、独立、あるいは普遍的でないという事実によって複雑であり、最適な設定を決定するのが困難である。 An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. 本手法は,本論文で提案する手法の拡張に関するものである。我々はgmmクラスタリングを用いて,ランダムフォレストなどのアンサンブルモデルとニューラルネットワークなどの非線形モデルを組み合わせた予測モデルを構築した。

The management of database system configurations is a challenging task, as there are hundreds of configuration knobs that control every aspect of the system. This is complicated by the fact that these knobs are not standardized, independent, or universal, making it difficult to determine optimal settings. An automated approach to address this problem using supervised and unsupervised machine learning methods to select impactful knobs, map unseen workloads, and recommend knob settings was implemented in a new tool called OtterTune and is being evaluated on three DBMSs, with results demonstrating that it recommends configurations as good as or better than those generated by existing tools or a human expert.In this work, we extend an automated technique based on Ottertune [1] to reuse training data gathered from previous sessions to tune new DBMS deployments with the help of supervised and unsupervised machine learning methods to improve latency prediction. Our approach involves the expansion of the methods proposed in the original paper. We use GMM clustering to prune metrics and combine ensemble models, such as RandomForest, with non-linear models, like neural networks, for prediction modeling.

翻訳日:2023-04-26 20:53:38 公開日:2023-04-25

# 一般化ラミアンス場表現のための局所暗黙的光線関数

Local Implicit Ray Function for Generalizable Radiance Field Representation ( http://arxiv.org/abs/2304.12746v1 )

ライセンス: Link先を確認

Xin Huang, Qi Zhang, Ying Feng, Xiaoyu Li, Xuan Wang, Qing Wang

(参考訳) 本稿では、新しいビューレンダリングのための一般化可能なニューラルレンダリング手法であるLIRF(Local Implicit Ray Function)を提案する。現在一般化されているneural radiance fields(nerf)メソッドは、ピクセル毎に1光線でシーンをサンプリングし、入力ビューとレンダリングビューが異なる解像度でシーンコンテンツをキャプチャすると、ぼやけやエイリアスされたビューをレンダリングする。そこで本研究では,円錐フラストラムからの情報を集約して光線を構成するLIRFを提案する。円錐フラスタム内の3次元位置が与えられた場合、LIRFは3次元座標と円錐フラスタムの特徴を入力とし、局所体積放射場を予測する。座標は連続しているため、LIRFはボリュームレンダリングを通じて、高品質の新規ビューを継続的に評価する。さらに,トランスを用いた特徴マッチングによる各入力ビューの可視重量予測を行い,閉鎖領域の性能向上を図る。実世界のシーンにおける実験結果から,任意のスケールで見ないシーンの新規なビューレンダリングにおいて,この手法が最先端の手法よりも優れていることが確認された。

We propose LIRF (Local Implicit Ray Function), a generalizable neural rendering approach for novel view rendering. Current generalizable neural radiance fields (NeRF) methods sample a scene with a single ray per pixel and may therefore render blurred or aliased views when the input views and rendered views capture scene content with different resolutions. To solve this problem, we propose LIRF to aggregate the information from conical frustums to construct a ray. Given 3D positions within conical frustums, LIRF takes 3D coordinates and the features of conical frustums as inputs and predicts a local volumetric radiance field. Since the coordinates are continuous, LIRF renders high-quality novel views at a continuously-valued scale via volume rendering. Besides, we predict the visible weights for each input view via transformer-based feature matching to improve the performance in occluded areas. Experimental results on real-world scenes validate that our method outperforms state-of-the-art methods on novel view rendering of unseen scenes at arbitrary scales.

翻訳日:2023-04-26 20:53:23 公開日:2023-04-25

# サブギガヘルツ周波数における小型インダクタ・キャパシタ共振器

Compact inductor-capacitor resonators at sub-gigahertz frequencies ( http://arxiv.org/abs/2304.12744v1 )

ライセンス: Link先を確認

Qi-Ming Chen and Priyank Singh and Rostislav Duda and Giacomo Catto and Aarne Ker\"anen and Arman Alizadeh and Timm M\"orstedt and Aashish Sah and Andr\'as Gunyh\'o and Wei Liu and Mikko M\"ott\"onen

(参考訳) 小型インダクタ・キャパシタ(LC)共振器は、コプラナー導波路(CPW)共振器とは対照的に、単純なラムプ素子回路表現を持つが、通常は高精度なモデリングのための洗練された有限要素法(FEM)シミュレーションを必要とする。本稿では、回路形状から直接電気的特性を満足な精度で得られるコプラナーLC共振器の簡単な解析モデルを提案する。 Q_{\rm i}\gtrsim 2\times 10^{5}$(約300,{\rm MHz}$から1,{\rm GHz}$)の高内部品質共振器(Q_{\rm i}\gtrsim 2\times 10^{5}$)に対する実験結果は、導出した解析モデルと詳細なFEMシミュレーションの両方に優れた整合性を示す。これらの結果から、共鳴周波数の偏差が2\%未満のサブギガヘルツ共振器の設計が可能となり、例えば超感度低温検出器の実装にすぐに応用できることを示した。達成された2乗ミリオーダーのコンパクト共振器サイズは、数百個のマイクロ波共振器を単一のチップに統合してフォトニック格子を実現するための実現可能な方法を示している。

Compact inductor-capacitor (LC) resonators, in contrast to coplanar waveguide (CPW) resonators, have a simple lumped-element circuit representation but usually call for sophisticated finite-element method (FEM) simulations for an accurate modelling. Here we present an simple analytical model for a family of coplanar LC resonators where the electrical properties are directly obtained from the circuit geometry with a satisfying accuracy. Our experimental results on $10$ high-internal-quality-factor resonators ($Q_{\rm i}\gtrsim 2\times 10^{5}$), with frequency ranging roughly from $300\,{\rm MHz}$ to $1\,{\rm GHz}$, show an excellent consistency with both the derived analytical model and detailed FEM simulations. These results showcase the ability to design sub-gigahertz resonators with less than $2\%$ deviation in the resonance frequency, which has immediate applications, for example, in the implementation of ultrasensitive cryogenic detectors. The achieved compact resonator size of the order of a square millimeter indicates a feasible way to integrate hundreds of microwave resonators on a single chip for realizing photonic lattices.

翻訳日:2023-04-26 20:53:05 公開日:2023-04-25

# 二次元系における位相相転移の偏光ジャンプ

Polarization Jumps across Topological Phase Transitions in Two-dimensional Systems ( http://arxiv.org/abs/2304.12742v1 )

ライセンス: Link先を確認

Hiroki Yoshida, Tiantian Zhang, Shuichi Murakami

(参考訳) チャーン数や$\mathbb{z}_2$位相不変量のような位相不変量の変化を伴う位相相転移では、ギャップは閉まり、電気分極は遷移時に定義されない。本稿では,2次元の位相相転移における偏極の跳躍が,中間ワイル半金属相におけるワイル点の位置と単極電荷によって説明されることを示す。偏極の跳躍は、チャーン数の値を変えることなく、$\mathbb{z}_2$位相相転移および相転移においてワイル双極子によって記述される。一方、チャーン数が相転移で変化するとき、ジャンプは相互空間の基準点から測定されたワイル点の相対的な位置で表される。

In topological phase transitions involving a change in topological invariants such as the Chern number and the $\mathbb{Z}_2$ topological invariant, the gap closes, and the electric polarization becomes undefined at the transition. In this paper, we show that the jump of polarization across such topological phase transitions in two dimensions is described in terms of positions and monopole charges of Weyl points in the intermediate Weyl semimetal phase. We find that the jump of polarization is described by the Weyl dipole at $\mathbb{Z}_2$ topological phase transitions and at phase transitions without any change in the value of the Chern number. Meanwhile, when the Chern number changes at the phase transition, the jump is expressed in terms of the relative positions of Weyl points measured from a reference point in the reciprocal space.

翻訳日:2023-04-26 20:52:36 公開日:2023-04-25

# ICU外傷患者の早期発作発症予測のためのNPRL:夜間プロファイル表現学習

NPRL: Nightly Profile Representation Learning for Early Sepsis Onset Prediction in ICU Trauma Patients ( http://arxiv.org/abs/2304.12737v1 )

ライセンス: Link先を確認

Tucker Stewart, Katherine Stern, Grant O'Keefe, Ankur Teredesai, Juhua Hu

(参考訳) セプシス(Sepsis)は、感染の有無に応じて発症する症候群である。重篤な臓器機能障害を特徴とし、世界中の集中治療室(ICU)で死因の1つとなっている。これらの合併症は抗生物質の早期投与によって軽減できるため、敗血症の発症を早期に予測する能力は患者の生存と幸福に不可欠である。医療インフラ内に展開されている現在の機械学習アルゴリズムは、パフォーマンスが悪く、早期の敗血症を予測できない。近年では、深層学習の手法がセプシスを予測するために提案されているが、発症時期(例えば、患者の全訪問をセプシスの発症と分類するなど)を把握できないものや、医療施設(例えば、発症時期をアプリオリと呼ぶ必要があるような固定時間を用いてトレーニングインスタンスを作成するなど)に展開することができないものもある。そこで本研究では,夜間に収集したデータを用いて,毎朝24時間以内に敗血症が発症するか否かを予測する新しい現実的な予測フレームワークを提案する。しかし, 予測率を日次に引き上げるにつれ, 負のインスタンス数が増加する一方, 正のインスタンスのインスタンス数は同じである。その後,重度のクラス不均衡が問題となり,稀な敗血症症例の把握が困難となった。この問題に対処するため,各患者に対して夜間プロファイル表現学習(NPRL)を提案する。 nprlが理論的にレアイベント問題を緩和できることを証明します。レベル1トラウマセンターのデータを用いた実証研究により,提案手法の有効性がさらに示された。

Sepsis is a syndrome that develops in response to the presence of infection. It is characterized by severe organ dysfunction and is one of the leading causes of mortality in Intensive Care Units (ICUs) worldwide. These complications can be reduced through early application of antibiotics, hence the ability to anticipate the onset of sepsis early is crucial to the survival and well-being of patients. Current machine learning algorithms deployed inside medical infrastructures have demonstrated poor performance and are insufficient for anticipating sepsis onset early. In recent years, deep learning methodologies have been proposed to predict sepsis, but some fail to capture the time of onset (e.g., classifying patients' entire visits as developing sepsis or not) and others are unrealistic to be deployed into medical facilities (e.g., creating training instances using a fixed time to onset where the time of onset needs to be known apriori). Therefore, in this paper, we first propose a novel but realistic prediction framework that predicts each morning whether sepsis onset will occur within the next 24 hours using data collected at night, when patient-provider ratios are higher due to cross-coverage resulting in limited observation to each patient. However, as we increase the prediction rate into daily, the number of negative instances will increase while that of positive ones remain the same. Thereafter, we have a severe class imbalance problem, making a machine learning model hard to capture rare sepsis cases. To address this problem, we propose to do nightly profile representation learning (NPRL) for each patient. We prove that NPRL can theoretically alleviate the rare event problem. Our empirical study using data from a level-1 trauma center further demonstrates the effectiveness of our proposal.

翻訳日:2023-04-26 20:52:20 公開日:2023-04-25

# CitePrompt: 科学論文の引用内容の特定にPromptsを使う

CitePrompt: Using Prompts to Identify Citation Intent in Scientific Papers ( http://arxiv.org/abs/2304.12730v1 )

ライセンス: Link先を確認

Avishek Lahiri, Debarshi Kumar Sanyal, Imon Mukherjee

(参考訳) 科学論文の引用は、知的系統の追跡に役立つだけでなく、作品の科学的意義を示す有用な指標でもある。引用意図は、与えられた文脈における引用の役割を特定することで有益である。本稿では,引用意図分類のためのプロンプトベース学習のhherto unexploredアプローチを用いたフレームワークであるcitepromptを提案する。我々は、事前学習された言語モデル、プロンプトテンプレート、およびプロンプト言語化の適切な選択により、最先端の手法で得られたものよりも優れた結果を得るだけでなく、科学的文書に関する外部情報よりも少ない結果を得ることができると主張している。 ACL-ARCデータセットの最先端結果を報告するとともに、SciCiteデータセットは1つを除くすべてのベースラインモデルに対して大幅に改善されている。引用意図分類のための大きなラベル付きデータセットを見つけるのは非常に難しいため、まず、このタスクを少数ショットおよびゼロショット設定に変換することを提案する。 ACL-ARCデータセットでは、ゼロショット設定で53.86%のF1スコアを報告し、5ショット設定と10ショット設定でそれぞれ63.61%と66.99%に改善した。

Citations in scientific papers not only help us trace the intellectual lineage but also are a useful indicator of the scientific significance of the work. Citation intents prove beneficial as they specify the role of the citation in a given context. In this paper, we present CitePrompt, a framework which uses the hitherto unexplored approach of prompt-based learning for citation intent classification. We argue that with the proper choice of the pretrained language model, the prompt template, and the prompt verbalizer, we can not only get results that are better than or comparable to those obtained with the state-of-the-art methods but also do it with much less exterior information about the scientific document. We report state-of-the-art results on the ACL-ARC dataset, and also show significant improvement on the SciCite dataset over all baseline models except one. As suitably large labelled datasets for citation intent classification can be quite hard to find, in a first, we propose the conversion of this task to the few-shot and zero-shot settings. For the ACL-ARC dataset, we report a 53.86% F1 score for the zero-shot setting, which improves to 63.61% and 66.99% for the 5-shot and 10-shot settings, respectively.

翻訳日:2023-04-26 20:51:53 公開日:2023-04-25

# ガスおよび超臨界キセノンの2光子励起と吸収分光

Two-photon excitation and absorption spectroscopy of gaseous and supercritical xenon ( http://arxiv.org/abs/2304.12803v1 )

ライセンス: Link先を確認

Thilo vom H\"ovel (1), Franz Huybrechts (1), Eric Boltersdorf (1), Christian Wahl (1), Frank Vewinger (1), Martin Weitz (1) ((1) Institut f\"ur Angewandte Physik, Universit\"at Bonn)

(参考訳) 高圧条件下での気体の分光は、プラズマ物理学や天体物理学などの様々な分野に関心がある。近年,光子ボース・アインシュタイン凝縮体の波長範囲を真空紫外へ拡張するために,高気圧の希ガス環境を熱化媒体として利用することも提案されている。本研究では,5p^6$電子基底状態から5p^56p$および5p^56p^\prime$励起状態状態状態への遷移を推定し,95 \; \text{bar}$の圧力に対するガス状および超臨界キセノンの2光子分光の実験結果について報告する。将来的な真空紫外光子凝縮のポンプ方式の探求を目指して,これらの高密度キセノン試料の2光子励起スペクトルの縮退を観測した。さらに, キセノンの第2エキシマ連続体における放射の再吸収が, ストークスシフトの影響を受け, 補助光場の照射によって促進されるかどうかを検討した。この目的のために吸収測定が行われ、5p^6 \rightarrow 5p^56p$ 2-photon遷移を非退化させる。

Spectroscopy of gases under high-pressure conditions is of interest in various fields such as plasma physics or astrophysics. Recently, it has also been proposed to utilize a high-pressure noble gas environment as a thermalization medium to extend the wavelength range of photon Bose-Einstein condensates to the vacuum-ultraviolet, from the presently accessible visible and near-infrared spectral regimes. In this work, we report on experimental results of two-photon spectroscopy of gaseous and supercritical xenon for pressures as high as $95 \; \text{bar}$, probing the transitions from the $5p^6$ electronic ground state to the $5p^56p$ and $5p^56p^\prime$ excited state configurations. Aiming at the exploration of possible pumping schemes for future vacuum-ultraviolet photon condensates, we have recorded degenerate two-photon excitation spectra of such dense xenon samples. In further measurements, we have investigated whether irradiation of an auxiliary light field can enhance the reabsorption of the emission on the second excimer continuum of xenon, which is subject to a large Stokes shift. To this end, absorption measurements have been conducted, driving the $5p^6 \rightarrow 5p^56p$ two-photon transitions non-degenerately.

翻訳日:2023-04-26 20:45:15 公開日:2023-04-25

# 拡張クラスタ: ニューラルネットワークの正確なパラメータ回復

Expand-and-Cluster: Exact Parameter Recovery of Neural Networks ( http://arxiv.org/abs/2304.12794v1 )

ライセンス: Link先を確認

Flavio Martinelli, Berfin Simsek, Johanni Brea and Wulfram Gerstner

(参考訳) インプット・アウトプット・マッピングを用いて,ニューラルネットワーク(ANN)の隠れパラメータを復元できるか? 本稿では,全ネットワークパラメータを識別するために,隠れレイヤの数と探索されたANNのアクティベーション関数だけを必要とする,'Expand-and-Cluster'と呼ばれる方式を提案する。拡張フェーズでは,教師としてANNの探索データを用いて,学生ネットワークの規模を拡大する一連のネットワークを訓練する。拡張は、特定のサイズの学生ネットワークにおいて最小限の損失が一貫して到達した場合に停止する。クラスタリングフェーズでは、拡張した学生の重みベクトルがクラスター化され、超流動ニューロンを原理的に構造的プルーニングすることができる。因子4の過度パラメータ化は、最小数のニューロンを確実に同定し、元のネットワークパラメータを、可変困難な150の玩具問題のファミリーで80\%のタスクで検索するのに十分である。さらに、MNISTデータに基づいてトレーニングされた教師ネットワークは、ニューロン番号の5\%以下のオーバーヘッドで識別することができる。したがって、教師と同一の大きさの学生ネットワークの直接訓練は、非凸損失関数のため事実上不可能であるが、軽度のオーバーパラメータ化とクラスタリングと構造化プルーニングによるトレーニングは、ターゲットネットワークを正しく識別する。

Can we recover the hidden parameters of an Artificial Neural Network (ANN) by probing its input-output mapping? We propose a systematic method, called `Expand-and-Cluster' that needs only the number of hidden layers and the activation function of the probed ANN to identify all network parameters. In the expansion phase, we train a series of student networks of increasing size using the probed data of the ANN as a teacher. Expansion stops when a minimal loss is consistently reached in student networks of a given size. In the clustering phase, weight vectors of the expanded students are clustered, which allows structured pruning of superfluous neurons in a principled way. We find that an overparameterization of a factor four is sufficient to reliably identify the minimal number of neurons and to retrieve the original network parameters in $80\%$ of tasks across a family of 150 toy problems of variable difficulty. Furthermore, a teacher network trained on MNIST data can be identified with less than $5\%$ overhead in the neuron number. Thus, while direct training of a student network with a size identical to that of the teacher is practically impossible because of the non-convex loss function, training with mild overparameterization followed by clustering and structured pruning correctly identifies the target network.

翻訳日:2023-04-26 20:44:50 公開日:2023-04-25

# 超音波イメージングのためのクラッタフィルタとしての特異値分解法について

On the Use of Singular Value Decomposition as a Clutter Filter for Ultrasound Flow Imaging ( http://arxiv.org/abs/2304.12783v1 )

ライセンス: Link先を確認

Kai Riemer, Marcelo Lerendegui, Matthieu Toulemonde, Jiaqi Zhu, Christopher Dunsby, Peter D. Weinberg, Meng-Xing Tang

(参考訳) Singular Value Decomposition (SVD) に基づくフィルタリングは, 高フレームレート超音波流画像におけるクラッタ, 流れ, ノイズをかなり分離する。 SVDをクラッタフィルタとして用いることで、ベクトルフローイメージング、機能超音波、超高分解能超音波ローカライゼーション顕微鏡などの技術が大幅に改善された。クラッタとノイズの除去は、組織、流れ、ノイズがそれぞれ特異値の異なる部分集合で表されるという仮定に依存し、それらの信号は非相関であり直交部分空間に置かれる。この仮定は、近壁や微小血管の流れといった組織の動きの存在に失敗し、特異値閾値の誤った選択の影響を受けうる。したがって、フロー、クラッタ、ノイズの分離は不完全であり、元のデータに存在しない画像アーティファクトにつながる可能性がある。強度の時間的および空間的変動は最も一般的なアーティファクトであり、外観や強度によって異なる。フロー信号がばらばらに分布する微小血管では、ゴーストとスプリットアーティファクトが観察される。特異値閾値選択, 組織運動, フレーム速度, 流れ信号振幅, 取得長は, これらの人工物の有病率に影響を及ぼす。 SVDクラッタやノイズ除去による人工物の原因を理解することは,その解釈に必要である。

Filtering based on Singular Value Decomposition (SVD) provides substantial separation of clutter, flow and noise in high frame rate ultrasound flow imaging. The use of SVD as a clutter filter has greatly improved techniques such as vector flow imaging, functional ultrasound and super-resolution ultrasound localization microscopy. The removal of clutter and noise relies on the assumption that tissue, flow and noise are each represented by different subsets of singular values, so that their signals are uncorrelated and lay on orthogonal sub-spaces. This assumption fails in the presence of tissue motion, for near-wall or microvascular flow, and can be influenced by an incorrect choice of singular value thresholds. Consequently, separation of flow, clutter and noise is imperfect, which can lead to image artefacts not present in the original data. Temporal and spatial fluctuation in intensity are the commonest artefacts, which vary in appearance and strengths. Ghosting and splitting artefacts are observed in the microvasculature where the flow signal is sparsely distributed. Singular value threshold selection, tissue motion, frame rate, flow signal amplitude and acquisition length affect the prevalence of these artefacts. Understanding what causes artefacts due to SVD clutter and noise removal is necessary for their interpretation.

翻訳日:2023-04-26 20:44:28 公開日:2023-04-25

# 分散強化学習における学習力向上のための損失と後退

Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning ( http://arxiv.org/abs/2304.12778v1 )

ライセンス: Link先を確認

Martin Holen, Per-Arne Andersen, Kristian Muri Knausg{\aa}rd, Morten Goodwin

(参考訳) 本稿では,Reinforcement Learning (RL)環境における分散エージェントの学習手法として,Reward-Weighted (R-Weighted) とLos-Weighted (L-Weighted) の2つを紹介する。 R/L重み付け法は、勾配の和や平均化など、複数のエージェントを訓練するための標準的な慣行を置き換える。我々の手法のコアは、報酬(R-Weighted)や損失(L-Weighted)が他のアクターと比較してどれだけ高いかに基づいて、各アクターの勾配をスケールすることである。トレーニング中、各エージェントは同じ環境の異なる初期化バージョンで動作し、異なるアクターとは異なる勾配を与える。本質的に、各エージェントのr重みとl重みは、他のエージェントにその可能性を知らせ、学習のためにどの環境を優先すべきかを再び報告する。分散学習のこのアプローチは、報酬や損失の少ない環境の方が、報酬や損失の少ない環境よりも重要な情報を持っているため可能である。 R-Weighted法は複数のRL環境において最先端の手法よりも優れていることを実証的に実証した。

This paper introduces two learning schemes for distributed agents in Reinforcement Learning (RL) environments, namely Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. The R/L weighted methods replace standard practices for training multiple agents, such as summing or averaging the gradients. The core of our methods is to scale the gradient of each actor based on how high the reward (for R-Weighted) or the loss (for L-Weighted) is compared to the other actors. During training, each agent operates in differently initialized versions of the same environment, which gives different gradients from different actors. In essence, the R-Weights and L-Weights of each agent inform the other agents of its potential, which again reports which environment should be prioritized for learning. This approach of distributed learning is possible because environments that yield higher rewards, or low losses, have more critical information than environments that yield lower rewards or higher losses. We empirically demonstrate that the R-Weighted methods work superior to the state-of-the-art in multiple RL environments.

翻訳日:2023-04-26 20:44:08 公開日:2023-04-25

# クラス注意伝達に基づく知識蒸留

Class Attention Transfer Based Knowledge Distillation ( http://arxiv.org/abs/2304.12777v1 )

ライセンス: Link先を確認

Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin

(参考訳) 従来の知識蒸留法は, モデル圧縮作業において, 優れた性能を示してきたが, 学生ネットワークの性能向上にどのように役立つかを説明することは困難である。本研究では,高い解釈性と競争性能を有する知識蒸留法を提案する。まず、主流CNNモデルの構造を再検討し、クラス識別領域を識別する能力を持つことがCNNにとって重要であることを明らかにした。さらに,クラスアクティベーションマップの転送により,この能力の獲得と向上が可能であることを示す。そこで本研究では,cat-kd(class attention transfer based knowledge distillation)を提案する。従来のKD法とは違って,CAT-KDの解釈性の向上だけでなく,CNNの理解の向上にも寄与する知識のいくつかの特性を探索し,提示する。高い解釈性を持つ一方で、CAT-KDは複数のベンチマークで最先端のパフォーマンスを達成する。コードはhttps://github.com/gzyaftermath/cat-kd。

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

翻訳日:2023-04-26 20:43:46 公開日:2023-04-25

# 状態空間が不十分:機械翻訳に注意が必要

State Spaces Aren't Enough: Machine Translation Needs Attention ( http://arxiv.org/abs/2304.12776v1 )

ライセンス: Link先を確認

Ali Vardasbi, Telmo Pessoa Pires, Robin M. Schmidt, Stephan Peitz

(参考訳) 構造化状態空間 (Structured State Spaces for Sequences, S4) は、視覚、言語モデリング、オーディオなどの様々なタスクで成功したシーケンスモデルである。数学的定式化のおかげで、入力を1つの隠れた状態に圧縮し、注意のメカニズムを必要とせずに、長距離の依存関係をキャプチャできる。本研究では,S4を機械翻訳(MT)に適用し,WMT'14とWMT'16のエンコーダ・デコーダの変種を評価する。言語モデリングの成功とは対照的に、S4 は Transformer の約4 BLEU ポイントで遅れており、長文に反故意に苦労している。最後に、このギャップは、s4が完全なソース文を単一の隠れ状態において要約できないことによるものであり、注意機構を導入することでギャップを閉じることができることを示す。

Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT'14 and WMT'16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4's inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.

翻訳日:2023-04-26 20:43:29 公開日:2023-04-25

# デコーダネットワーク上の逆リプシッツ制約による後部崩壊の制御

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network ( http://arxiv.org/abs/2304.12770v1 )

ライセンス: Link先を確認

Yuri Kinoshita, Kenta Oono, Kenji Fukumizu, Yuichi Yoshida, Shin-ichi Maeda

(参考訳) 変分オートエンコーダ(VAE)は、過去数十年で大きな成功を収めてきた深層生成モデルの1つである。しかし、実際には、エンコーダが一致したり、あるいは崩壊した場合に発生する後方崩壊と呼ばれる問題に苦しんでおり、前者は入力データの潜在構造からの情報を取得していない。本研究では,デコーダに逆リプシッツニューラルネットワークを導入し,このアーキテクチャに基づいて,具体的な理論的保証を備えた多種多様なVAEモデルに対する後方崩壊の度合いを,単純かつ明確な方法で制御できる新しい手法を提案する。また,いくつかの数値実験により,本手法の有効性を示す。

Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.

翻訳日:2023-04-26 20:43:12 公開日:2023-04-25

# ゼロサム行列ゲームにおける学習の1次クエリ複雑度(近似)ナッシュ平衡のキャラクタリゼーション

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games ( http://arxiv.org/abs/2304.12768v1 )

ライセンス: Link先を確認

H\'edi Hadiji, Sarah Sachs (UvA), Tim van Erven (UvA), Wouter M. Koolen (CWI)

(参考訳) 0-sum $K\times K$Matrixゲームに対する1次クエリモデルでは、プレイヤーは、相手がプレイするランダム化アクションの下で、可能なすべてのアクションに対する期待された支払いを観測する。これは古典的なモデルであり、rakhlinとsridharanによって、$\epsilon$-approximate nash equilibriaが$o(\ln k / \epsilon)$の代わりに$o(\ln k / \epsilon^2)$クエリから効率的に計算できることが発見された後、新たな関心を集めている。まず、厳密な平衡値(\epsilon=0$)を学習するクエリの複雑さを、線形の$k$という多くのクエリを必要とすることを示すことによって、完全に特徴付けします。第二に、$\epsilon > 0$ の場合、電流の複雑さの上界は$O(\min(\ln(K) / \epsilon , K))$である。そのような行列は単一の問合せで完全同定できるので、既知の可算集合において入出力値を持つハード行列を構築することによって、ノローバウンドが導出可能であることを証明できる。これにより、例えば、ハイパーキューブ上のtoa部分モジュラー最適化問題をバイナリ行列としてエンコードすることで減らすことができる。次に、下界に対する新しい手法を導入し、$\tilde\Omega(\log(1 / (K\epsilon))$を任意の$\epsilon \leq1 / cK^4$に対して、$c$は$K$の一定の独立性を持つ。我々は,このギャップを上界で縮めるために,我々の技術を改善するための今後の方向性をさらに明らかにする。

In the first-order query model for zero-sum $K\times K$ matrix games, playersobserve the expected pay-offs for all their possible actions under therandomized action played by their opponent. This is a classical model,which has received renewed interest after the discoveryby Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computedefficiently from $O(\ln K / \epsilon) $ instead of $O( \ln K / \epsilon^2)$ queries.Surprisingly, the optimal number of such queries, as a function of both$\epsilon$ and $K$, is not known.We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linearin $K$, which means that it is essentially as hard as querying the wholematrix, which can also be done with $K$ queries. Second, for $\epsilon > 0$, the currentquery complexity upper bound stands at $O(\min(\ln(K) / \epsilon , K))$. We argue that, unfortunately, obtaining matchinglower bound is not possible with existing techniques: we prove that nolower bound can be derived by constructing hard matrices whose entriestake values in a known countable set, because such matrices can be fullyidentified by a single query. This rules out, for instance, reducing toa submodular optimization problem over the hypercube by encoding itas a binary matrix. We then introduce a new technique for lower bounds,which allows us to obtain lower bounds of order$\tilde\Omega(\log(1 / (K\epsilon)))$ for any $\epsilon \leq1 / cK^4$, where $c$ is a constant independent of $K$. We furtherdiscuss possible future directions to improve on our techniques in orderto close the gap with the upper bounds.

翻訳日:2023-04-26 20:43:00 公開日:2023-04-25

# 損失関数からの量子表現の分離

Decoupling Quantile Representations from Loss Functions ( http://arxiv.org/abs/2304.12766v1 )

ライセンス: Link先を確認

Aditya Challa, Snehanshu Saha, Soma Dhavala

(参考訳) 同時量子化回帰(sqr)手法はディープラーニングモデルの不確かさを推定するために用いられてきたが、その応用は中央量子化の解 ({\tau} = 0.5) が平均絶対誤差 (mae) を最小化しなければならないという要件によって制限されている。本稿では、この制限を、同時二項量子化回帰(SBQR)の場合に、量子化と推定確率の双対性を示すことによって解決する。これにより、損失関数から量子化表現の構成を分離し、中央の量子化関数に任意の分類器 f(x) を割り当て、異なる {\tau} の値でSBQR量子化表現の全スペクトルを生成することができる。アプローチを2つのアプリケーションで検証します。 (i)分布外サンプルを検出し、量子表現が標準確率出力を上回ることを示す。 (II) 歪みに対する量子表現のロバスト性を示すモデルを校正する。結論として,これらの結果から生じるいくつかの仮説を考察した。

The simultaneous quantile regression (SQR) technique has been used to estimate uncertainties for deep learning models, but its application is limited by the requirement that the solution at the median quantile ({\tau} = 0.5) must minimize the mean absolute error (MAE). In this article, we address this limitation by demonstrating a duality between quantiles and estimated probabilities in the case of simultaneous binary quantile regression (SBQR). This allows us to decouple the construction of quantile representations from the loss function, enabling us to assign an arbitrary classifier f(x) at the median quantile and generate the full spectrum of SBQR quantile representations at different {\tau} values. We validate our approach through two applications: (i) detecting out-of-distribution samples, where we show that quantile representations outperform standard probability outputs, and (ii) calibrating models, where we demonstrate the robustness of quantile representations to distortions. We conclude with a discussion of several hypotheses arising from these findings.

翻訳日:2023-04-26 20:42:17 公開日:2023-04-25

# 摂動一貫性学習によるテスト時間適応

Test-Time Adaptation with Perturbation Consistency Learning ( http://arxiv.org/abs/2304.12764v1 )

ライセンス: Link先を確認

Yi Su, Yixin Ji, Juntao Li, Hai Ye, Min Zhang

(参考訳) 現在、事前学習された言語モデル(plm)は、分散シフト問題にうまく対応できず、実際のテストシナリオで失敗するトレーニングセットでトレーニングされたモデルとなる。この問題に対処するため、テスト時間適応(TTA)は、テスト時にテストデータに適合するようにモデルパラメータを更新する大きな可能性を示す。既存のTTA手法は、よく設計された補助的タスクや擬似ラベルに基づく自己学習戦略に依存している。しかし,これらの手法は性能向上と計算コストに関して良好なトレードオフを達成できない。このようなジレンマに関するいくつかの知見を得るために、我々は2つの代表的TTA手法、すなわち、テントとオイルを探索し、安定した予測が良いバランスを達成するための鍵であることを確かめる。そこで本研究では, 分散シフトを伴うサンプルに対して安定な予測を行うために, 簡易なテスト時間適応法である摂動整合学習(PCL)を提案する。逆方向の強靭性および言語間移動に関する広範囲な実験により,本手法は強いPLMバックボーンと従来の最先端TTA法よりも推論時間が少なく,高い,あるいは同等の性能を達成できることが証明された。

Currently, pre-trained language models (PLMs) do not cope well with the distribution shift problem, resulting in models trained on the training set failing in real test scenarios. To address this problem, the test-time adaptation (TTA) shows great potential, which updates model parameters to suit the test data at the testing time. Existing TTA methods rely on well-designed auxiliary tasks or self-training strategies based on pseudo-label. However, these methods do not achieve good trade-offs regarding performance gains and computational costs. To obtain some insights into such a dilemma, we take two representative TTA methods, i.e., Tent and OIL, for exploration and find that stable prediction is the key to achieving a good balance. Accordingly, in this paper, we propose perturbation consistency learning (PCL), a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts. Extensive experiments on adversarial robustness and cross-lingual transferring demonstrate that our method can achieve higher or comparable performance with less inference time over strong PLM backbones and previous state-of-the-art TTA methods.

翻訳日:2023-04-26 20:41:57 公開日:2023-04-25

# 自然言語処理のための市民科学プロジェクトから学んだこと

Lessons Learned from a Citizen Science Project for Natural Language Processing ( http://arxiv.org/abs/2304.12836v1 )

ライセンス: Link先を確認

Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, G\"ozde G\"ul \c{S}ahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho, Iryna Gurevych

(参考訳) 多くの自然言語処理(nlp)システムは、訓練と評価に注釈付きコーパスを使用する。しかし、ラベル付きデータはしばしば入手するのにコストがかかり、アノテーションプロジェクトのスケーリングは難しいため、アノテーションタスクは有料のクラウドワーカーにアウトソースされることが多い。市民科学はクラウドソーシングの代替であり、NLPの文脈では比較的研究されていない。この環境で市民科学がどの程度有効かを調べるため、既存のクラウドソースデータセットの一部を注釈付けすることで、NLPの市民科学における様々なボランティアグループへの参加を探索研究する。この結果から,高品質なアノテーションが得られ,モチベーションの高いボランティアを惹きつけるだけでなく,スケーラビリティや時間的関与,法的・倫理的問題といった要因も考慮する必要があることがわかった。ガイドラインの形で学んだ教訓を要約し、市民科学の今後の取り組みを支援するコードとデータを提供します。

Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.

翻訳日:2023-04-26 20:35:46 公開日:2023-04-25

# 機械学習のための確実性の新しい情報理論

A New Information Theory of Certainty for Machine Learning ( http://arxiv.org/abs/2304.12833v1 )

ライセンス: Link先を確認

Arthur Jun Zhang

(参考訳) クロード・シャノンは、通信符号化理論におけるランダム分布の不確かさを定量化するためにエントロピーを考案した。エントロピーの不確実性は、数学的モデリングにおける直接的使用を制限する。そこで我々は,エントロピーの正準双対として,基礎となる分布の確実性を定量化する新しい概念トロエンピーを提案する。機械学習の応用例を2つ紹介する。まず, 古典的文書分類について, 文書クラスラベルを活用すべく, トレンピーに基づく重み付けスキームを開発した。 2つ目は、シーケンシャルデータに対する自己トロエンピー重み付け方式で、ニューラルネットワークベースの言語モデルに簡単に組み込めることを示し、劇的なパープレキシティ低減を実現する。また、量子トレンピーをフォン・ノイマンエントロピーの双対として定義し、量子系の確実性を定量化する。

Claude Shannon coined entropy to quantify the uncertainty of a random distribution for communication coding theory. We observe that the uncertainty nature of entropy also limits its direct usage in mathematical modeling. Therefore we propose a new concept troenpy,as the canonical dual of entropy, to quantify the certainty of the underlying distribution. We demonstrate two applications in machine learning. The first is for the classical document classification, we develop a troenpy based weighting scheme to leverage the document class label. The second is a self-troenpy weighting scheme for sequential data and show that it can be easily included in neural network based language models and achieve dramatic perplexity reduction. We also define quantum troenpy as the dual of the Von Neumann entropy to quantify the certainty of quantum systems.

翻訳日:2023-04-26 20:35:32 公開日:2023-04-25

# 深部量子化ニューラルネットワークによる敵攻撃に対するロバスト性の改善

Improving Robustness Against Adversarial Attacks with Deeply Quantized Neural Networks ( http://arxiv.org/abs/2304.12829v1 )

ライセンス: Link先を確認

Ferheen Ayaz, Idris Zakariyya, Jos\'e Cano, Sye Loong Keoh, Jeremy Singer, Danilo Pau, Mounia Kharbouche-Harrari

(参考訳) 機械学習(ML)モデル、特にディープニューラルネットワーク(DNN)のメモリフットプリントを削減することは、リソースに制約のある小さなデバイスへのデプロイメントを可能にする上で不可欠である。しかし、DNNモデルの欠点は、敵攻撃に対する脆弱性であり、入力にわずかな摂動を加えることで騙される可能性がある。そのため、リソース制約のある組み込みデバイスにデプロイ可能な正確で堅牢で小さなDNNモデルをどうやって作るかが課題である。本稿では,学習ループに深い量子化損失を考慮に入れたQKeras(QKeras)という,自動量子化対応トレーニングフレームワークでトレーニングした,敵のブラックボックス攻撃に対して堅牢な,小さなDNNモデルの開発結果について報告する。そこで我々は,QKeras とヤコビアン正則化 (JR) が,DNN トポロジとJR 層ごとのアプローチを利用して,頑健で微妙に量子化された DNN モデルを生成することで,共最適化戦略を実現する方法について検討した。その結果、この共最適化戦略を実装した新しいdnnモデルが、画像とオーディオの入力の両方を含む3つのデータセット上で考案、開発、テストされ、そのパフォーマンスは、ホワイトボックスとブラックボックスのさまざまな攻撃に対する既存のベンチマークと比較された。実験結果から,提案したDNNモデルの平均精度は,CIFAR-10画像データセットとGoogle Speech Commands音声データセットのサブセットに対して,MLCommons/Tinyベンチマークのホワイトボックス,ブラックボックス攻撃の有無で8.3%,79.5%高かった。 SVHNイメージデータセットのブラックボックス攻撃でも6.5%精度が向上した。

Reducing the memory footprint of Machine Learning (ML) models, particularly Deep Neural Networks (DNNs), is essential to enable their deployment into resource-constrained tiny devices. However, a disadvantage of DNN models is their vulnerability to adversarial attacks, as they can be fooled by adding slight perturbations to the inputs. Therefore, the challenge is how to create accurate, robust, and tiny DNN models deployable on resource-constrained embedded devices. This paper reports the results of devising a tiny DNN model, robust to adversarial black and white box attacks, trained with an automatic quantizationaware training framework, i.e. QKeras, with deep quantization loss accounted in the learning loop, thereby making the designed DNNs more accurate for deployment on tiny devices. We investigated how QKeras and an adversarial robustness technique, Jacobian Regularization (JR), can provide a co-optimization strategy by exploiting the DNN topology and the per layer JR approach to produce robust yet tiny deeply quantized DNN models. As a result, a new DNN model implementing this cooptimization strategy was conceived, developed and tested on three datasets containing both images and audio inputs, as well as compared its performance with existing benchmarks against various white-box and black-box attacks. Experimental results demonstrated that on average our proposed DNN model resulted in 8.3% and 79.5% higher accuracy than MLCommons/Tiny benchmarks in the presence of white-box and black-box attacks on the CIFAR-10 image dataset and a subset of the Google Speech Commands audio dataset respectively. It was also 6.5% more accurate for black-box attacks on the SVHN image dataset.

翻訳日:2023-04-26 20:35:16 公開日:2023-04-25

# 深層強化学習に基づく漢方処方計画のための最適化フレームワーク

A optimization framework for herbal prescription planning based on deep reinforcement learning ( http://arxiv.org/abs/2304.12828v1 )

ライセンス: Link先を確認

Kuo Yang, Zecong Yu, Xin Su, Xiong He, Ning Wang, Qiguang Zheng, Feidie Yu, Zhuang Liu, Tiancai Wen and Xuezhong Zhou

(参考訳) 慢性疾患の治療計画は、医学的人工知能、特に伝統中国医学(tcm)において重要な課題である。しかし, 臨床経験の異なる慢性疾患患者に対して, 最適な逐次治療戦略を作成することは, さらなる探索を必要とする課題である。本研究では,慢性疾患治療のための深層強化学習(PrescDRL)に基づくTCMハーブ処方計画フレームワークを提案する。 PrescDRLは、すべてのステップで最大報酬を得るのではなく、長期的な効果に焦点を当てたシーケンシャルなハーバル処方の最適化モデルである。糖尿病の経時的診断と治療のための高品質ベンチマークデータセットを構築し,本ベンチマークに対するprescdrlの評価を行った。以上の結果から,PrescDRLは,医師と比較して1段階の報酬が117%,153%改善した。さらにprescdrlは処方薬の予測においてベンチマークを上回り、精度は40.5%向上し、リコールは63%向上した。本研究は,TCMにおける臨床知能診断と治療の改善に人工知能を用いる可能性を示すものである。

Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription planning framework based on deep reinforcement learning for chronic disease treatment (PrescDRL). PrescDRL is a sequential herbal prescription optimization model that focuses on long-term effectiveness rather than achieving maximum reward at every step, thereby ensuring better patient outcomes. We constructed a high-quality benchmark dataset for sequential diagnosis and treatment of diabetes and evaluated PrescDRL against this benchmark. Our results showed that PrescDRL achieved a higher curative effect, with the single-step reward improving by 117% and 153% compared to doctors. Furthermore, PrescDRL outperformed the benchmark in prescription prediction, with precision improving by 40.5% and recall improving by 63%. Overall, our study demonstrates the potential of using artificial intelligence to improve clinical intelligent diagnosis and treatment in TCM.

翻訳日:2023-04-26 20:34:42 公開日:2023-04-25

# オフライン強化学習におけるExact Energy-Guided Diffusion Smplingのコントラストエネルギー予測

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning ( http://arxiv.org/abs/2304.12824v1 )

ライセンス: Link先を確認

Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu

(参考訳) ガイドサンプリングは実世界のタスクに拡散モデルを適用するための重要なアプローチであり、サンプリング手順中に人間の定義したガイダンスを埋め込む。本稿では、誘導が(正規化されていない)エネルギー関数によって定義される一般的な設定を考える。この設定の主な課題は、サンプリング分布とエネルギー関数によって共同で定義される拡散サンプリング手順の中間ガイダンスが未知であり、推定が難しいことである。この課題に対処するために,中間ガイダンスの正確な定式化と,コントラストエネルギー予測(CEP)と呼ばれる新たなトレーニング目標を提案する。提案手法は,モデル容量とデータサンプルの無制限で正確なガイダンスに収束することが保証されている。オフライン強化学習(RL)に適用することで,本手法の有効性を示す。 D4RLベンチマークの大規模な実験により、我々の手法は既存の最先端アルゴリズムよりも優れていることが示された。また,高次元データにおけるCEPのスケーラビリティを示すために,画像合成にCEPを適用する例を示す。

Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

翻訳日:2023-04-26 20:34:22 公開日:2023-04-25

# シャノン情報と重み付けスキームの新規双対化

A Novel Dual of Shannon Information and Weighting Scheme ( http://arxiv.org/abs/2304.12814v1 )

ライセンス: Link先を確認

Arthur Jun Zhang

(参考訳) シャノン情報理論は、もともと開発された通信技術だけでなく、機械学習や人工知能といった多くの科学や工学分野でも大きな成功を収めている。有名な重み付けスキームtf-idfに触発されて,情報エントロピーが自然双対であることを発見した。古典的シャノン情報理論を補うために、新しい量、すなわちトレンピーを提案する。トレンピーは、基盤となる分布の確実性、共通性、および類似性を測定する。そこで本研究では,クラスラベル付き文書に対するトレンピーに基づく重み付け方式,すなわち正のクラス周波数(pcf)を提案する。公開データセットの集合では、PCFに基づく重み付け方式が古典的なTF-IDFと、kNN設定で一般的な最適輸送に基づく単語移動距離アルゴリズムより優れていることを示す。我々はさらに,情報量エントロピーとトロエンピーの期待オッズ比とみなすことができる,新たなオッズ比型機能である期待クラス情報バイアス(ECIB)を開発した。実験では、単純なロジスティック回帰モデルにおいて、新しいECIB機能と単純なバイナリ項機能を含めることで、さらなる性能向上が期待できる。単純な新しい重み付けスキームとECIB機能は、非常に効果的であり、線形順序複雑性で計算できる。

Shannon Information theory has achieved great success in not only communication technology where it was originally developed for but also many other science and engineering fields such as machine learning and artificial intelligence. Inspired by the famous weighting scheme TF-IDF, we discovered that information entropy has a natural dual. We complement the classical Shannon information theory by proposing a novel quantity, namely troenpy. Troenpy measures the certainty, commonness and similarity of the underlying distribution. To demonstrate its usefulness, we propose a troenpy based weighting scheme for document with class labels, namely positive class frequency (PCF). On a collection of public datasets we show the PCF based weighting scheme outperforms the classical TF-IDF and a popular Optimal Transportation based word moving distance algorithm in a kNN setting. We further developed a new odds-ratio type feature, namely Expected Class Information Bias(ECIB), which can be regarded as the expected odds ratio of the information quantity entropy and troenpy. In the experiments we observe that including the new ECIB features and simple binary term features in a simple logistic regression model can further significantly improve the performance. The simple new weighting scheme and ECIB features are very effective and can be computed with linear order complexity.

翻訳日:2023-04-26 20:34:03 公開日:2023-04-25

# 多光子高次元GHZ状態の合成

Preparation of multiphoton high-dimensional GHZ state ( http://arxiv.org/abs/2304.12813v1 )

ライセンス: Link先を確認

Wen-Bo Xing, Xiao-Min Hu, Yu Guo, Bi-Heng Liu, Chuan-Feng Li and Guang-Can Guo

(参考訳) 多部類高次元絡み合わせは多部類2次元絡み合わせとは異なる物理を呈する。しかし、多次元高次元絡み合わせの作り方はまだ線形光学の課題である。本稿では,光学系において任意の次元の準備プロトコルを持つ多光子GHZ状態を提案する。本プロトコルでは,高次元エンタングルメントゲートを実現するために補助エンタングルメントを用い,高次元エンタングルペアを多成分の高次元ghz状態に接続する。具体的には、光子の経路自由度を用いて4粒子の3次元ghz状態を作成する例を示す。本手法は他の自由度まで拡張でき、任意の次元で任意のghz絡み合いを生成することができる。

Multipartite high-dimensional entanglement presents different physics from multipartite two-dimensional entanglement. However, how to prepare multipartite high-dimensional entanglement is still a challenge with linear optics. In this paper, a multiphoton GHZ state with arbitrary dimensions preparation protocol is proposed in optical systems. In this protocol, we use auxiliary entanglements to realize a high-dimensional entanglement gate, so that high-dimensional entangled pairs can be connected into a multipartite high-dimensional GHZ state. Specifically, we give an example of using photons' path degree of freedom to prepare a 4-particle 3-dimensional GHZ state. Our method can be extended to other degrees of freedom and can generate arbitrary GHZ entanglement in any dimension.

翻訳日:2023-04-26 20:33:43 公開日:2023-04-25

# 科学のための教師なしドメイン転送:ディファリング応答モデルを用いたLArTPC検出器シミュレーション間の翻訳のための深層学習手法の探索

Unsupervised Domain Transfer for Science: Exploring Deep Learning Methods for Translation between LArTPC Detector Simulations with Differing Response Models ( http://arxiv.org/abs/2304.12858v1 )

ライセンス: Link先を確認

Yi Huang, Dmitrii Torbunov, Brett Viren, Haiwang Yu, Jin Huang, Meifeng Lin, Yihui Ren

(参考訳) 深層学習(DL)技術は科学、特に潜在的な解法や発見への道筋の合理化に広く応用されている。しかし、DLモデルは実際の実験データに適用されていないシミュレーションの結果に基づいて訓練されることが多い。このように、シミュレーションされたデータと実際のデータの系統的な違いは、モデルのパフォーマンスを低下させる可能性がある。本研究は,シミュレーションデータと実データとの系統的差異の玩具モデルに関する研究である。完全に教師なしでタスクに依存しない方法で、体系的に異なる2つのサンプルの違いを減らす。本手法は, 画像対画像変換技術の最近の進歩に基づき, 模擬液体アルゴン時間投影室 (lartpc) 検出器の2組の試料について検証を行い, シミュレーションデータと実データとの共通系統的差異を制御的に示す。 LArTPCベースの検出器は次世代粒子検出器を表現し、独自の高分解能粒子トラックデータを生成する。この研究は、Simple Liquid-Argon Track Samples(SLATS)と呼ばれる生成されたLArTPCデータセットをオープンソースとして公開した。

Deep learning (DL) techniques have broad applications in science, especially in seeking to streamline the pathway to potential solutions and discoveries. Frequently, however, DL models are trained on the results of simulation yet applied to real experimental data. As such, any systematic differences between the simulated and real data may degrade the model's performance -- an effect known as "domain shift." This work studies a toy model of the systematic differences between simulated and real data. It presents a fully unsupervised, task-agnostic method to reduce differences between two systematically different samples. The method is based on the recent advances in unpaired image-to-image translation techniques and is validated on two sets of samples of simulated Liquid Argon Time Projection Chamber (LArTPC) detector events, created to illustrate common systematic differences between the simulated and real data in a controlled way. LArTPC-based detectors represent the next-generation particle detectors, producing unique high-resolution particle track data. This work open-sources the generated LArTPC data set, called Simple Liquid-Argon Track Samples (or SLATS), allowing researchers from diverse domains to study the LArTPC-like data for the first time.

翻訳日:2023-04-26 20:25:58 公開日:2023-04-25

# 機械学習アプリケーションにおける例外の原因は何か? stack overflowにおける機械学習関連スタックトレースのマイニング

What Causes Exceptions in Machine Learning Applications? Mining Machine Learning-Related Stack Traces on Stack Overflow ( http://arxiv.org/abs/2304.12857v1 )

ライセンス: Link先を確認

Amin Ghadesi, and Maxime Lamothe, and Heng Li

(参考訳) ディープラーニングを含む機械学習(ML)は、最近、広範囲のアプリケーションで大きな人気を集めている。しかし、従来のソフトウェアと同様に、MLアプリケーションはプログラミングエラーに起因するバグに免疫がない。明示的なプログラミングエラーは通常、エラーメッセージとスタックトレースを通じて現れる。これらのスタックトレースは、異常な状況や例外につながる関数呼び出しの連鎖を記述する。実際、これらの例外はソフトウェアスタック全体(アプリケーションやライブラリを含む)にまたがる可能性がある。したがって、スタックトレースのパターンを研究することは、実践者や研究者がMLアプリケーションにおける例外の原因と、ML開発者が直面する課題を理解するのに役立つ。そのために、Stack Overflow (SO)をマイニングし、7つの人気のあるPython MLライブラリに関連する11,449のスタックトレースを調査しました。まず,スタックトレースを含むML質問は,スタックトレースのない質問よりも人気が高いが,回答が受け入れられる可能性は低い。第2に,mlスタックトレースに繰り返し発生するパターンは,さまざまなmlライブラリにわたっても存在し,多数のスタックトレースをカバーするパターンはごく一部である。第3に、スタックトレースパターンから5つの高レベルカテゴリと25の低レベルタイプを導出します。ほとんどのパターンは、ピソンの基本構文、モデルのトレーニング、並列化、データ変換、サブプロセスの実行と関連しています。さらに、サブプロセス呼び出し、外部モジュール実行、リモートAPI呼び出しに関連するパターンは、SOで受け入れられる可能性が最も低い。この結果から,研究者,MLライブラリプロバイダ,およびMLアプリケーション開発者に,MLライブラリとそのアプリケーションの品質向上に関する知見が得られた。

Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 11,449 stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces gain more popularity than questions without stack traces; however, they are less likely to get accepted answers. Second, we observe that recurrent patterns exists in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 25 low-level types from the stack trace patterns: most patterns are related to python basic syntax, model training, parallelization, data transformation, and subprocess invocation. Furthermore, the patterns related to subprocess invocation, external module execution, and remote API call are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library providers, and ML application developers to improve the quality of ML libraries and their applications.

翻訳日:2023-04-26 20:25:36 公開日:2023-04-25

# マルチレゾリューション・コンテクストネットワークによる網膜血管セグメンテーションと逆学習

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning ( http://arxiv.org/abs/2304.12856v1 )

ライセンス: Link先を確認

Tariq M. Khan, Syed S. Naqvi, Antonio Robles-Kelly, Imran Razzak

(参考訳) 網膜疾患のタイムリーで手頃なコンピュータ支援診断は、視覚障害の予防に不可欠である。正確な網膜血管セグメンテーションは、このような視力低下疾患の進行と診断において重要な役割を果たす。そこで本稿では,意味的に異なる特徴間のコンテキスト依存を学習するためのマルチスケール特徴を抽出し,複数方向のリカレント学習を用いて従来と後者の依存関係をモデル化することにより,これらの問題に対処する多分解コンテキストネットワーク(MRC-Net)を提案する。もう1つの鍵となるアイデアは、地域ベースのスコアの最適化による前景セグメンテーション改善のための敵の設定のトレーニングである。この新たな戦略は、訓練可能なパラメータの数を比較的低く保ちながら、サイススコア(およびそれに対応するジャカードインデックス)の観点からセグメンテーションネットワークの性能を高める。我々は,drive,stare, chaseの3つのベンチマークデータセットを用いて本手法を評価し,他の文献と比較して優れた性能を示す。

Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former-latter and latter-former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score (and correspondingly Jaccard index) while keeping the number of trainable parameters comparatively low. We have evaluated our method on three benchmark datasets, including DRIVE, STARE, and CHASE, demonstrating its superior performance as compared with competitive approaches elsewhere in the literature.

翻訳日:2023-04-26 20:25:12 公開日:2023-04-25

# デジタルヘルスツインユースケースのための適応型サービス機能チェーンオーケストレーション:ヒューリスティックブーストq-learningアプローチ

Adaptive Services Function Chain Orchestration For Digital Health Twin Use Cases: Heuristic-boosted Q-Learning Approach ( http://arxiv.org/abs/2304.12853v1 )

ライセンス: Link先を確認

Jamila Alsayed Kassem, Li Zhong, Arie Taal, Paola Grosso

(参考訳) デジタルツイン(Digital Twin, DT)は、医療部門で活用および展開するための重要な技術である。しかし、このようなアプリケーションで直面する主な課題は、厳格な健康データ共有ポリシー、高性能ネットワーク要件、インフラストラクチャリソースの制限である。本稿では,vnfs(adaptive virtual network function)をプロビジョニングすることで,さまざまなデータ共有シナリオに関連するセキュリティポリシを強制することによる,すべての課題に対処する。柔軟性と動的コンテナスケジューリングのためのマルチノードクラスタインフラストラクチャ上に,Cloud-Native Networkオーケストレータを定義します。提案フレームワークでは,対象とするデータ共有ユースケース,関連するポリシ,インフラストラクチャ構成を考慮し,サービス機能チェーン(sfc)をプロビジョニングし,人的介入をほとんど必要とせずにルーティング構成を提供する。さらに、SFCをデプロイする際の \textit{optimal} はユースケース自体に依存しており、パフォーマンス要件を満たすためにリソース利用やレイテンシを優先するようにハイパーパラメータを調整します。その結果、デジタルヘルスツインのユースケースに対して、ポリシーアウェア、要件アウェア、リソースアウェアといった適応型ネットワークオーケストレーションを提供する。

Digital Twin (DT) is a prominent technology to utilise and deploy within the healthcare sector. Yet, the main challenges facing such applications are: Strict health data-sharing policies, high-performance network requirements, and possible infrastructure resource limitations. In this paper, we address all the challenges by provisioning adaptive Virtual Network Functions (VNFs) to enforce security policies associated with different data-sharing scenarios. We define a Cloud-Native Network orchestrator on top of a multi-node cluster mesh infrastructure for flexible and dynamic container scheduling. The proposed framework considers the intended data-sharing use case, the policies associated, and infrastructure configurations, then provision Service Function Chaining (SFC) and provides routing configurations accordingly with little to no human intervention. Moreover, what is \textit{optimal} when deploying SFC is dependent on the use case itself, and we tune the hyperparameters to prioritise resource utilisation or latency in an effort to comply with the performance requirements. As a result, we provide an adaptive network orchestration for digital health twin use cases, that is policy-aware, requirements-aware, and resource-aware.

翻訳日:2023-04-26 20:24:55 公開日:2023-04-25

# モノのインターネットによるスマート教育に向けて:レビュー

Towards Smart Education through the Internet of Things: A Review ( http://arxiv.org/abs/2304.12851v1 )

ライセンス: Link先を確認

Afzal Badshah, Anwar Ghani, Ali Daud, Ateeqa Jalal, Muhammad Bilal, Jon Crowcroft

(参考訳) IoTは、効果的な対面およびオンライン教育システムを支援するスマートスペースを作成するための基本的な技術である。スマート教育(IoTとAIを教育システムに統合する)への移行は、学習者のエンゲージメント、モチベーション、出席、深層学習に具体的な影響を与えている。伝統的な教育は管理、教育、評価、教室の監督など多くの課題に直面している。近年のICT(IoT、AI、5Gなど)の発展は、様々な面でのスマートソリューションを生み出しているが、スマートソリューションは教育システムにはあまり組み込まれていない。特に新型コロナウイルスのパンデミックは、教育における新しいスマートソリューションの採用をさらに強調してきた。本研究は関連研究をレビューし,対処する。 (i)解決可能な伝統的教育制度の問題点 (二)スマート教育への移行、及び (III)スマート教育への移行(計算と社会的抵抗)における研究課題これらの研究を踏まえ、従来のシステムの問題に対して、スマートソリューション(スマート教育、スマートアセスメント、スマート教室、スマート管理など)が導入された。この探索的研究は、ICT、IoT、AIをスマート教育に統合する学者や市場にとって、新たなトレンドを開くものだ。

IoT is a fundamental enabling technology for creating smart spaces, which can assist the effective face-to-face and online education systems. The transition to smart education (integrating IoT and AI into the education system) is appealing, which has a concrete impact on learners' engagement, motivation, attendance, and deep learning. Traditional education faces many challenges, including administration, pedagogy, assessment, and classroom supervision. Recent developments in ICT (e.g., IoT, AI and 5G, etc.) have yielded lots of smart solutions for various aspects of life; however, smart solutions are not well integrated into the education system. In particular, the COVID-19 pandemic situation had further emphasized the adoption of new smart solutions in education. This study reviews the related studies and addresses the (i) problems in the traditional education system with possible solutions, (ii) the transition towards smart education, and (iii) research challenges in the transition to smart education (i.e, computational and social resistance). Considering these studies, smart solutions (e.g., smart pedagogy, smart assessment, smart classroom, smart administration, etc.) are introduced to the problems of the traditional system. This exploratory study opens new trends for scholars and the market to integrate ICT, IoT, and AI into smart education.

翻訳日:2023-04-26 20:24:35 公開日:2023-04-25

# 単眼深度推定のための深さ関係自己注意

Depth-Relative Self Attention for Monocular Depth Estimation ( http://arxiv.org/abs/2304.12849v1 )

ライセンス: Link先を確認

Kyuhong Shim, Jiyoung Kim, Gusang Lee, Byonghyo Shim

(参考訳) 単一のRGB画像において、正確な深さの手がかりが不完全であるため、単眼深度推定は非常に難しい。この制限を克服するために、ディープニューラルネットワークは、RGB情報から抽出されたサイズ、日陰、テクスチャなど、さまざまな視覚的ヒントに依存している。しかし,そのようなヒントを過度に活用すると,網羅的な視点を考慮せずにRGB情報に偏りが生じる。本稿では,相対深度を自己注意のガイダンスとして用いたRelative Depth Transformer (RED-T) という新しい深度推定モデルを提案する。特に、モデルでは、高い注意重みを近深さの画素に、低い注意重みを遠深のピクセルに割り当てる。その結果、類似した深度の特徴は互いにより近づきやすくなり、視覚的ヒントが誤用されることが少なくなる。提案モデルでは, 単分子深度推定ベンチマークにおいて競合結果が得られ, RGB情報に偏りが小さいことを示す。さらに,学習中の観測可能な深度範囲を制限し,未知の深度に対するモデルのロバスト性を評価するための新しい単眼深度推定ベンチマークを提案する。

Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehensive view. We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention. Specifically, the model assigns high attention weights to pixels of close depth and low attention weights to pixels of distant depth. As a result, the features of similar depth can become more likely to each other and thus less prone to misused visual hints. We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths.

翻訳日:2023-04-26 20:24:16 公開日:2023-04-25

# semeval-2023タスク10:不均衡データセットにおけるテキスト分類性能に及ぼすデータ拡張と半教師付き学習技術の影響

NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset ( http://arxiv.org/abs/2304.12847v1 )

ライセンス: Link先を確認

Sana Sabah Al-Azzawi, Gy\"orgy Kov\'acs, Filip Nilsson, Tosin Adewumi, Marcus Liwicki

(参考訳) 本稿では,ソーシャルメディア投稿におけるオンライン性差別の検出と分類に着目し,semeval23タスク10の方法論を提案する。ソーシャルメディアプラットフォーム上で有害なコンテンツを検出することは、こうした投稿のユーザーへの害を軽減する上で非常に重要である。このタスクの解決策は、細調整されたトランスフォーマーベースモデル(BERTweet、RoBERTa、DeBERTa)のアンサンブルに基づいています。クラス不均衡に関する問題を緩和し,モデルの一般化能力を向上させるため,データ強化と半教師付き学習も実験した。特に、データ拡張では、すべてのクラスで、または表現不足のクラスでのみ、バックトランスレーションを使用します。これらの戦略がパイプライン全体の性能に与える影響を広範な実験を通じて分析する。半教師付き学習では、かなりの量のドメイン内データが利用可能な場合、半教師付き学習は特定のモデルの性能を高めることができる。提案手法(Githubでソースコードが公開されている)では,サブタスクAのF1スコアが0.8613に達した。

In this paper, we propose a methodology for task 10 of SemEval23, focusing on detecting and classifying online sexism in social media posts. The task is tackling a serious issue, as detecting harmful content on social media platforms is crucial for mitigating the harm of these posts on users. Our solution for this task is based on an ensemble of fine-tuned transformer-based models (BERTweet, RoBERTa, and DeBERTa). To alleviate problems related to class imbalance, and to improve the generalization capability of our model, we also experiment with data augmentation and semi-supervised learning. In particular, for data augmentation, we use back-translation, either on all classes, or on the underrepresented classes only. We analyze the impact of these strategies on the overall performance of the pipeline through extensive experiments. while for semi-supervised learning, we found that with a substantial amount of unlabelled, in-domain data available, semi-supervised learning can enhance the performance of certain models. Our proposed method (for which the source code is available on Github attains an F1-score of 0.8613 for sub-taskA, which ranked us 10th in the competition

翻訳日:2023-04-26 20:23:57 公開日:2023-04-25

# (地方)差別プライバシーは公平性に異なる影響を与えない

(Local) Differential Privacy has NO Disparate Impact on Fairness ( http://arxiv.org/abs/2304.12845v1 )

ライセンス: Link先を確認

H\'eber H. Arcolezi, Karima Makhlouf, Catuscia Palamidessi

(参考訳) 近年、堅牢なプライバシー保護手法であるローカル微分プライバシー(LDP)が、現実世界のアプリケーションに広く採用されている。 LDPを使えば、ユーザーは分析のためにデータを送信する前にデバイス上でデータを摂動することができる。しかし、複数の機密情報の収集が様々な産業で普及するにつれて、LDPの下での単一機密属性の収集は不十分である。データ内の関連属性は、それでも機密属性に関する推論につながる可能性がある。本稿では,LPP下での複数属性の収集が公平性に及ぼす影響を実証研究する。機密属性のドメインサイズの変化を考慮した新しいプライバシ予算配分方式を提案する。これは一般的に、最先端のソリューションよりも、私たちの実験におけるプライバシーと実用性と公正性のトレードオフに結びつきました。その結果, LDPは, モデルの性能に悪影響を及ぼすことなく, 学習問題の公平性をわずかに向上させることがわかった。我々は,グループフェアネスの指標と7つの最新LDPプロトコルを用いて,3つのベンチマークデータセットの評価実験を行った。全体として、この研究は、差分プライバシーが機械学習における公平性の悪化につながるという一般的な信念に挑戦する。

In recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in real-world applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industries, collecting a single sensitive attribute under LDP may not be sufficient. Correlated attributes in the data may still lead to inferences about the sensitive attribute. This paper empirically studies the impact of collecting multiple sensitive attributes under LDP on fairness. We propose a novel privacy budget allocation scheme that considers the varying domain size of sensitive attributes. This generally led to a better privacy-utility-fairness trade-off in our experiments than the state-of-art solution. Our results show that LDP leads to slightly improved fairness in learning problems without significantly affecting the performance of the models. We conduct extensive experiments evaluating three benchmark datasets using several group fairness metrics and seven state-of-the-art LDP protocols. Overall, this study challenges the common belief that differential privacy necessarily leads to worsened fairness in machine learning.

翻訳日:2023-04-26 20:23:36 公開日:2023-04-25

# 寒冷電磁ダイスプロシウムダイポール用最先端装置の包括的特性評価

Comprehensive Characterization of a State-of-the-Art Apparatus for Cold Electromagnetic Dysprosium Dipoles ( http://arxiv.org/abs/2304.12844v1 )

ライセンス: Link先を確認

Gregor Anich, Rudolf Grimm, Emil Kirilov

(参考訳) 我々は、量子ガス顕微鏡(qgm)を4分の1マイクロメートルの解像度で組み込んだ新しい超低温ジスプロシウム(dy)装置を開発した。 QGMと冷却・トラップ領域は同じ真空ガラス容器内にあり、それらの間の単純な原子輸送を保証している。我々は,レーザーおよび蒸発冷却,格子負荷,輸送およびQGM焦点面におけるボゾン同位体164 Dyの雲の正確な位置決めについて実験を行った。フル容量化に向けたQGMの基本的特徴と今後の計画について概説する。また、大きな磁気と電気の双極子モーメントを持つDyの密接な正対準レベルを利用すれば、XYZモデルのような量子磁性の複雑なスピンモデルをシミュレートできるプラットフォームも提示する。磁気双極子-双極子結合を持ち,Ising,交換,スピン軌道を含む縮退アイソスピン-1/2系を分離する。最後は、格子幾何学に依存する非対称な可変率を持つスピンモデルをもたらす。

We developed a new advanced ultra-cold Dysprosium (Dy) apparatus, which incorporates a quantum gas microscope (QGM) with a resolution of a quarter micrometer. The QGM and the cooling and trapping regions are within the same vacuum glass vessel assuring simple atom transport between them. We demonstrate the essential experimental steps of laser and evaporative cooling, lattice loading, transporting and precise positioning of a cloud of the bosonic isotope 164 Dy at the QGM focal plane. Preliminary basic characterization of the QGM and future plans in enabling its full capacity are outlined. We also present a feasible platform for simulating complex spin models of quantum magnetism, such as XYZ model, by exploiting a set of closely spaced opposite parity levels in Dy with a large magnetic and electric dipole moment. We isolate a degenerate isospin-1/2 system, which possesses both magnetic and electric dipole-dipole coupling, containing Ising, exchange and spin-orbit terms. The last gives rise to a spin model with asymmetric tunable rates, dependable on the lattice geometry.

翻訳日:2023-04-26 20:23:18 公開日:2023-04-25

# 都市ビブランシーにおける時空間性差

Spatiotemporal gender differences in urban vibrancy ( http://arxiv.org/abs/2304.12840v1 )

ライセンス: Link先を確認

Thomas R. Collins and Riccardo Di Clemente and Mario Guti\'errez-Roig and Federico Botta

(参考訳) 都市活力は都市部における人間のダイナミックな活動である。都市の特徴や人間との交流の機会によって異なる場合もあるが、都市住民の社会環境や社会環境によっても異なる可能性がある。異なる人口集団がどのように都市を経験するかの不均一性は、住民の嗜好、アクセシビリティと機会、大規模な移動行動の違いにより、性別分離を引き起こす可能性がある。しかし、伝統的な研究は、都市の活力と都市の特徴との関係、異性間の違い、都市における人種差別にどのように影響するかについて、高頻度で理解できていない。以上の結果から,(1)都会の活力には男女差があり,(2)「関心の点」と交通ネットワークの相違がみられ,(3)各都市に肯定的・否定的な「空間的流出」が存在することが示唆された。そこで我々は,携帯電話のほぼユビキタスな利用を生かしたコールディテールデータを用いた定量的手法を用いて,イタリア7都市における空間行動の高周波観測を行う。都会の特徴から直接的効果と「スパイルオーバー」効果の空間モデルによる男女差の比較を行った。私たちの結果は、都市における不平等と将来の都市をより公平にする方法についての理解を深めます。

Urban vibrancy is the dynamic activity of humans in urban locations. It can vary with urban features and the opportunities for human interactions, but it might also differ according to the underlying social conditions of city inhabitants across and within social surroundings. Such heterogeneity in how different demographic groups may experience cities has the potential to cause gender segregation because of differences in the preferences of inhabitants, their accessibility and opportunities, and large-scale mobility behaviours. However, traditional studies have failed to capture fully a high-frequency understanding of how urban vibrancy is linked to urban features, how this might differ for different genders, and how this might affect segregation in cities. Our results show that (1) there are differences between males and females in terms of urban vibrancy, (2) the differences relate to `Points of Interest` as well as transportation networks, and (3) that there are both positive and negative `spatial spillovers` existing across each city. To do this, we use a quantitative approach using Call Detail Record data--taking advantage of the near-ubiquitous use of mobile phones--to gain high-frequency observations of spatial behaviours across the seven most prominent cities of Italy. We use a spatial model comparison approach of the direct and `spillover` effects from urban features on male-female differences. Our results increase our understanding of inequality in cities and how we can make future cities fairer.

翻訳日:2023-04-26 20:22:58 公開日:2023-04-25

# parity アーキテクチャにおけるフレキシブル制約コンパイル

Flexible constraint compilation in the parity architecture ( http://arxiv.org/abs/2304.12879v1 )

ライセンス: Link先を確認

Roeland ter Hoeven, Anette Messinger, Wolfgang Lechner

(参考訳) 本稿では,任意の接続グラフを持つディジタル量子コンピューティングデバイスへのパリティコンパイルを一般化するツールと手法を提案し,高階制約付きバイナリ最適化問題の制約ハミルトニアンの回路実装について述べる。特に,非局所制約でさえ,高価なSWAPゲートを使わずに効率的に実装できることを示す。本稿では,並列性アーキテクチャにおける量子近似最適化アルゴリズムの全回路深さとcnot数を最適化する手法を示し,様々な例を用いたフレキシブルコンパイルの利点を強調する。開発したゲートシーケンスとスワップゲートを用いた従来のアプローチとの関係を導出する。この結果は、他の多くの非局所作用素の実装を改善するために適用することができる。

We present tools and methods to generalize parity compilation to digital quantum computing devices with arbitrary connectivity graphs and construct circuit implementations for the constraint Hamiltonian of higher-order constrained binary optimization problems. In particular, we show how even non-local constraints can be efficiently implemented without expensive SWAP gates. We show how the presented tools can be used to optimize the total circuit depth and CNOT count of the quantum approximate optimization algorithm in the parity architecture and highlight the advantages of the flexible compilation using various examples. We derive the relation between the developed gate sequences and the traditional approach that uses SWAP gates. The result can be applied to improve the implementation of many other non-local operators.

翻訳日:2023-04-26 20:17:44 公開日:2023-04-25

# 強化学習エージェントのための近位カリキュラム

Proximal Curriculum for Reinforcement Learning Agents ( http://arxiv.org/abs/2304.12877v1 )

ライセンス: Link先を確認

Georgios Tzannetos, B\'arbara Gomes Ribeiro, Parameswaran Kamalaruban, Adish Singla

(参考訳) マルチタスク環境における強化学習(RL)エージェントのカリキュラム設計の問題点を考察する。既存の自動カリキュラム設計技術では、ドメイン固有のハイパーパラメータチューニングが必要か、理論的な基盤が限られている。これらの制約に対処するため,我々は,ZPD(Zone of Proximal Development)という教育的概念に触発されたカリキュラム戦略であるProCuRLを設計する。 ProCuRLは、学習者が難しすぎても難しすぎるタスクを選択するとき、学習の進捗が最大になるという直感を捉えます。 ProCuRLは2つの簡単な学習条件を解析することで数学的に導出する。また,最小限のハイパーパラメータチューニングを施した深部RLフレームワークと直接統合可能なProCuRLの実用版も提示する。各種領域に対する実験結果から, 深部RLエージェントのトレーニングプロセスの促進に向け, 最先端のベースラインに対するカリキュラム戦略の有効性が示された。

We consider the problem of curriculum design for reinforcement learning (RL) agents in contextual multi-task settings. Existing techniques on automatic curriculum design typically require domain-specific hyperparameter tuning or have limited theoretical underpinnings. To tackle these limitations, we design our curriculum strategy, ProCuRL, inspired by the pedagogical concept of Zone of Proximal Development (ZPD). ProCuRL captures the intuition that learning progress is maximized when picking tasks that are neither too hard nor too easy for the learner. We mathematically derive ProCuRL by analyzing two simple learning settings. We also present a practical variant of ProCuRL that can be directly integrated with deep RL frameworks with minimal hyperparameter tuning. Experimental results on a variety of domains demonstrate the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.

翻訳日:2023-04-26 20:17:32 公開日:2023-04-25

# レーザ注入による埋め込みニューラルネットワークに対するパラメータベース攻撃の評価

Evaluation of Parameter-based Attacks against Embedded Neural Networks with Laser Injection ( http://arxiv.org/abs/2304.12876v1 )

ライセンス: Link先を確認

Mathieu Dumont, Kevin Hector, Pierre-Alain Moellic, Jean-Max Dutertre, Simon Ponti\'e

(参考訳) 機械学習(ML)ベースのシステムのセキュリティに関する今後の認証アクションは、多くのハードウェアプラットフォームにおけるモデルの大規模展開によって増幅される大きな評価課題を提起する。最近まで、ほとんどの研究は、MLモデルを純粋にアルゴリズムの抽象化と見なすAPIベースの攻撃に焦点を当てていた。しかし、新しい実装ベースの脅威が明らかになり、モデルの堅牢性を適切に評価する実用的な手法とシミュレーションベースの手法の両方を提案する緊急性を強調している。主な関心事はパラメータベースの攻撃(Bit-Flip Attack, BFAなど)であり、メモリに格納された内部パラメータの正確かつ最適な変更に直面した場合、典型的なディープニューラルネットワークモデルの堅牢性の欠如を強調する。セキュリティテストの目的で設定されたこの研究は、32ビットのcortex-mマイクロコントローラにレーザーフォールトインジェクションを用いてbfaの派生型を初めて報告した。セキュリティ評価のための標準的なフォールトインジェクション手段であり、空間的および時間的に正確な障害を注入することができる。非現実的なブルートフォース戦略を避けるため、シミュレーションはレーザー断層モデルを考慮したパラメータから最も敏感なビットセットを選択するのにどのように役立つかを示す。

Upcoming certification actions related to the security of machine learning (ML) based systems raise major evaluation challenges that are amplified by the large-scale deployment of models in many hardware platforms. Until recently, most of research works focused on API-based attacks that consider a ML model as a pure algorithmic abstraction. However, new implementation-based threats have been revealed, emphasizing the urgency to propose both practical and simulation-based methods to properly evaluate the robustness of models. A major concern is parameter-based attacks (such as the Bit-Flip Attack, BFA) that highlight the lack of robustness of typical deep neural network models when confronted by accurate and optimal alterations of their internal parameters stored in memory. Setting in a security testing purpose, this work practically reports, for the first time, a successful variant of the BFA on a 32-bit Cortex-M microcontroller using laser fault injection. It is a standard fault injection means for security evaluation, that enables to inject spatially and temporally accurate faults. To avoid unrealistic brute-force strategies, we show how simulations help selecting the most sensitive set of bits from the parameters taking into account the laser fault model.

翻訳日:2023-04-26 20:17:17 公開日:2023-04-25

# 交互局所列挙(TnALE):低評価によるテンソルネットワーク構造探索の解法

Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations ( http://arxiv.org/abs/2304.12875v1 )

ライセンス: Link先を確認

Chao Li, Junhua Zeng, Chunmei Li, Cesar Caiafa, Qibin Zhao

(参考訳) テンソルネットワーク(TN)は機械学習の強力なフレームワークであるが、TN構造探索(TN-SS)として知られる優れたTNモデルを選択することは困難で計算集約的なタスクである。 TNLS~\cite{li2022permutation} の最近のアプローチは、このタスクに対して有望な結果を示したが、その計算効率はまだ不満足であり、目的関数の評価が多すぎる。本稿では,TNLSと比較して,各構造関連変数を局所列挙によって交互に更新するアルゴリズムであるTnALEを提案する。 TNLS と TnALE の降下ステップを理論的に検討し、両アルゴリズムが各近傍で目的の十分な減算が \emph{reached} であれば、定数まで線形収束を達成できることを証明した。また、TNLS と TnALE の評価効率も比較し、TNLS では \emph{reaching} に対して $\Omega(2^N)$ 評価が要求されるのに対し、理想的には $O(N^2R)$ 評価は TnALE では十分であり、$N$ はテンソル次数を表し、$R$ は近隣の 'emph{``low-rankness'' を反映する。実験の結果、TnALEは最先端のアルゴリズムよりもはるかに少ない評価で、実用的に優れたTNランクと置換を見出すことができた。

Tensor network (TN) is a powerful framework in machine learning, but selecting a good TN model, known as TN structure search (TN-SS), is a challenging and computationally intensive task. The recent approach TNLS~\cite{li2022permutation} showed promising results for this task, however, its computational efficiency is still unaffordable, requiring too many evaluations of the objective function. We propose TnALE, a new algorithm that updates each structure-related variable alternately by local enumeration, \emph{greatly} reducing the number of evaluations compared to TNLS. We theoretically investigate the descent steps for TNLS and TnALE, proving that both algorithms can achieve linear convergence up to a constant if a sufficient reduction of the objective is \emph{reached} in each neighborhood. We also compare the evaluation efficiency of TNLS and TnALE, revealing that $\Omega(2^N)$ evaluations are typically required in TNLS for \emph{reaching} the objective reduction in the neighborhood, while ideally $O(N^2R)$ evaluations are sufficient in TnALE, where $N$ denotes the tensor order and $R$ reflects the \emph{``low-rankness''} of the neighborhood. Experimental results verify that TnALE can find practically good TN-ranks and permutations with vastly fewer evaluations than the state-of-the-art algorithms.

翻訳日:2023-04-26 20:16:40 公開日:2023-04-25

# 相対論的量子系におけるベルの不等式とハイゼンベルク測定

Bell's Inequality and Heisenberg Measurements on Relativistic Quantum Systems ( http://arxiv.org/abs/2304.12873v1 )

ライセンス: Link先を確認

Ulrich Faigle

(参考訳) ベルの不等式は、量子論の物理的現実に関するアインシュタイン問題において重要な役割を果たす。ベルの不等式は一般にヒルベルト空間の量子モデルの幾何学的枠組みの中で見なされるが、現在の注意はハイゼンベルク測定の理論を一般直交幾何学空間の表現、特に相対性理論のミンコフスキー空間を持つ量子系へ拡張するものである。ファインマンの数値例では、ミンコフスキー空間における合同確率論的解釈がヒルベルト空間では観測できないものの、2つの測定値を示す。この分析は、量子測定の確率論的解釈は測定器やシステム状態だけでなく、測定を行う幾何学的空間にも依存することを示している。特に、明快な数値の例は、ミンコフスキー空間におけるベルの不等式に反し、ヒルベルト空間でそれを満たすような可観測性の完全な集合を持つハイゼンベルク測定から与えられる。

Bell's inequality plays an important role with respect to the Einsteinian question about the physical reality of quantum theory. While Bell's inequality is usually viewed within the geometric framework of a Hilbert space quantum model, the present note extends the theory of Heisenberg measurements to quantum systems with representations in general orthogonal geometric spaces and, in particular, the Minkowski spaces of relativity theory. A Feynmanian numerical example exhibits two measurements that admit a joint probabilistic interpretation in Minkowski space while they are not jointly observable in Hilbert space. The analysis shows that probabilistic interpretations of quantum measurements may depend not only on the measuring instruments and the system states but also on the geometric space in which the measurements are conducted. In particular, an explicit numerical example is given of a Heisenberg measurement with a complete set of common observables that violates Bell's inequality in Minkowski space but, mutatatis mutandis, satisfies it in Hilbert space.

翻訳日:2023-04-26 20:15:50 公開日:2023-04-25

# 量子アニーリングにおける指数閉ギャップとしてのアンチクロスの発生

Anti-crossings occurrence as exponentially closing gaps in Quantum Annealing ( http://arxiv.org/abs/2304.12872v1 )

ライセンス: Link先を確認

Arthur Braida, Simon Martiel and Ioan Todinca

(参考訳) 本稿では,量子アニーリングにおける回避レベル交差現象について考察する。量子コンピューティングのための将来的なフレームワークであり,特定のタスクに量子的優位性をもたらす可能性がある。量子アニーリング(quantum annealing)は、最終状態の測定を通じて最適化問題に対する最適解を得ることを目的として、Schr\\odinger方程式に従って量子システムを進化させる。しかしながら、量子アニーリングの連続性は解析解析を特に瞬時固有エネルギーに関して困難にする。断熱定理は、最小スペクトルギャップの2乗に反比例する高い確率で最適解を得るのに必要なアニーリング時間の理論的結果を与える。回避されたレベルの交差は指数関数的に閉じるギャップを生じさせ、最適化問題に対して指数関数的に長い実行時間をもたらす。本稿では, 焼鈍過程における回避レベル交差の発生条件を導出するために, 摂動膨張を用いた。次に、この条件を二部グラフ上のMaxCut問題に適用する。正規二部グラフに対して指数的に小さなギャップは生じないことを示し、QAがMaxCutを効率的に解けることを示唆する。一方,頂点度の不規則性は,回避された踏切発生条件の満足度につながる可能性が示唆された。この理論的発展を支える数値的な証拠を提供し,指数閉ギャップの存在と量子アニーリングの失敗との関係について論じる。

This paper explores the phenomenon of avoided level crossings in quantum annealing, a promising framework for quantum computing that may provide a quantum advantage for certain tasks. Quantum annealing involves letting a quantum system evolve according to the Schr\"odinger equation, with the goal of obtaining the optimal solution to an optimization problem through measurements of the final state. However, the continuous nature of quantum annealing makes analytical analysis challenging, particularly with regard to the instantaneous eigenenergies. The adiabatic theorem provides a theoretical result for the annealing time required to obtain the optimal solution with high probability, which is inversely proportional to the square of the minimum spectral gap. Avoided level crossings can create exponentially closing gaps, which can lead to exponentially long running times for optimization problems. In this paper, we use a perturbative expansion to derive a condition for the occurrence of an avoided level crossing during the annealing process. We then apply this condition to the MaxCut problem on bipartite graphs. We show that no exponentially small gaps arise for regular bipartite graphs, implying that QA can efficiently solve MaxCut in that case. On the other hand, we show that irregularities in the vertex degrees can lead to the satisfaction of the avoided level crossing occurrence condition. We provide numerical evidence to support this theoretical development, and discuss the relation between the presence of exponentially closing gaps and the failure of quantum annealing.

翻訳日:2023-04-26 20:15:22 公開日:2023-04-25

# 計算化学のための線形スケーリング量子回路

Linear-Scaling Quantum Circuits for Computational Chemistry ( http://arxiv.org/abs/2304.12870v1 )

ライセンス: Link先を確認

Ilias Magoulas and Francesco A. Evangelista

(参考訳) 我々は最近、任意の多体ランク(I. Magoulas and F.A. Evangelista, J. Chem. Theory Comput. 19, 822 (2023))のフェルミオンおよび量子ビット励起のためのコンパクトでCNOT効率の良い量子回路を構築した。ここでは,CNOT数を大幅に減少させる回路の近似について述べる。予備的な数値データは、選択された射影量子固有解法を用いて、帰納対称性の破れが本質的に無視される一方で、親実装と比較して、実質的にエネルギーの精度の損失はないことを示す。

We have recently constructed compact, CNOT-efficient, quantum circuits for fermionic and qubit excitations of arbitrary many-body rank [I. Magoulas and F.A. Evangelista, J. Chem. Theory Comput. 19, 822 (2023)]. Here, we present approximations to these circuits that substantially reduce the CNOT counts even further. Our preliminary numerical data, using the selected projective quantum eigensolver approach, demonstrate that there is practically no loss of accuracy in the energies compared to the parent implementation while the ensuing symmetry breaking is essentially negligible.

翻訳日:2023-04-26 20:14:57 公開日:2023-04-25

# 量子有限オートマタの浅実装のためのGAP

GAPs for Shallow Implementation of Quantum Finite Automata ( http://arxiv.org/abs/2304.12868v1 )

ライセンス: Link先を確認

Mansur Ziiatdinov, Aliya Khadieva, Abuzer Yakary{\i}lmaz

(参考訳) 量子フィンガープリントは古典的な入力語を量子状態にマッピングする技法である。結果として生じる量子状態は元の単語よりもはるかに短く、その処理はリソースを少なくし、量子アルゴリズム、通信、暗号において有用である。量子フィンガープリントの例の1つは、$MOD_{p}=\{a^{i\cdot p} \mid i \geq 0\}$ languageの量子オートマトンであり、$p$は素数である。しかし、このオートマトンを現在の量子ハードウェアで実装することは効率的ではない。量子フィンギプリントは$x \in \{0,1\}^{n}$ of length $n$ to a state $|\psi(x)\rangle$ of $o(\log \log n)$ qubitsであり、$o(\log n)$ ユニタリ演算を必要とする。現在の量子コンピュータの全メモリを用いた量子指紋の計算は、多くの量子演算が必要なため、現在不可能である。量子フィンガープリントを実用的なものにするためには、回路の幅よりも奥行きを最適化する必要がある。一般化算術進行法(gaps)などの加法コンビネータのツールに基づく量子フィンガープリントの明示的な手法を提案し,これらの手法が確率的手法に匹敵する回路深さを提供することを示す。また,提案手法を,明示的な量子フィンガープリンティング手法の先行研究と比較した。

Quantum fingerprinting is a technique that maps classical input word to a quantum state. The resulting quantum state is much shorter than original word, and its processing requires less resources, making it useful in quantum algorithms, communication and cryptography. One of the examples of quantum fingerprinting is quantum automaton for $MOD_{p}=\{a^{i\cdot p} \mid i \geq 0\}$ language, where $p$ is a prime number. However, implementing this automata in current quantum hardware is not efficient. Quantum fingeprinting maps a word $x \in \{0,1\}^{n}$ of length $n$ to a state $|\psi(x)\rangle$ of $O(\log \log n)$ qubits, and requires $O(\log n)$ unitary operations. Computing quantum fingerprint using all memory of the current quantum computers is currently infeasible due to the large number of quantum operations necessary. In order to make quantum fingerprinting practical, we must optimize the circuit for depth instead of width as previous works did. We propose explicit methods of quantum fingerprinting based on tools from additive combinatorics, such as generalized arithmetic progressions (GAPs), and prove that these methods provide circuit depth comparable to probabilistic method. We also compare our method to prior work on explicit quantum fingerprinting methods.

翻訳日:2023-04-26 20:14:44 公開日:2023-04-25

# 高能率ニューロモルフィック深層学習の両眼確率性によるソフトウェア精度の向上

Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracy ( http://arxiv.org/abs/2304.12866v1 )

ライセンス: Link先を確認

Yang Li, Wei Wang, Ming Wang, Chunmeng Dou, Zhengyu Ma, Huihui Zhou, Peng Zhang, Nicola Lepri, Xumeng Zhang, Qing Luo, Xiaoxin Xu, Guanhua Yang, Feng Zhang, Ling Li, Daniele Ielmini, and Ming Liu

(参考訳) ディープラーニングには、フォワーディング信号の高精度処理、バックプロパゲーションエラー、ウェイトのアップデートが必要だ。これは、勾配降下学習規則が部分微分の連鎖積に依存しているため、学習アルゴリズムによって本質的に必要となる。しかし, 人工シナプスとしてノイズの多いアナログメムリスタを用いるハードウェアシステムにおいて, 生物学的に妥当でないような深層学習を実現することは困難である。 memristorベースの実装は一般に、ニューロン回路の過大なコストと理想化されたシナプスデバイスに対する厳しい要求をもたらす。そこで本研究では,高精度の要求は不要であり,この要求が解除された場合により効率的な深層学習を実現することを実証する。本稿では,すべての基本ニューラルネットワーク操作を修飾する二元確率学習アルゴリズムを提案する。 (i)転送信号と活性化関数の導関数の確率的二乗化 (ii)バックプロパゲーションエラーの符号付きバイナリ化、 (iii)段階的な重み付け更新。ソフトウェアシミュレーションとハードウェア実験のハイブリッドアプローチにより、二進確率深層学習システムは、高精度学習アルゴリズムを用いて、ソフトウェアベースのベンチマークよりも優れた性能を提供できることがわかった。また、二項確率アルゴリズムはハードウェアにおけるニューラルネットワーク操作を強く単純化し、乗算および累積演算のエネルギー効率を3桁以上改善する。

Deep learning needs high-precision handling of forwarding signals, backpropagating errors, and updating weights. This is inherently required by the learning algorithm since the gradient descent learning rule relies on the chain product of partial derivatives. However, it is challenging to implement deep learning in hardware systems that use noisy analog memristors as artificial synapses, as well as not being biologically plausible. Memristor-based implementations generally result in an excessive cost of neuronal circuits and stringent demands for idealized synaptic devices. Here, we demonstrate that the requirement for high precision is not necessary and that more efficient deep learning can be achieved when this requirement is lifted. We propose a binary stochastic learning algorithm that modifies all elementary neural network operations, by introducing (i) stochastic binarization of both the forwarding signals and the activation function derivatives, (ii) signed binarization of the backpropagating errors, and (iii) step-wised weight updates. Through an extensive hybrid approach of software simulation and hardware experiments, we find that binary stochastic deep learning systems can provide better performance than the software-based benchmarks using the high-precision learning algorithm. Also, the binary stochastic algorithm strongly simplifies the neural network operations in hardware, resulting in an improvement of the energy efficiency for the multiply-and-accumulate operations by more than three orders of magnitudes.

翻訳日:2023-04-26 20:14:15 公開日:2023-04-25

# 簡易性と有効性を追求する: 分布特性テストのための量子アルゴリズム

Striving for simplicity and effectiveness: quantum algorithm for distribution property testing ( http://arxiv.org/abs/2304.12916v1 )

ライセンス: Link先を確認

Jingquan Luo and Lvzhou Li

(参考訳) 分布特性の試験法の基本問題に対する潜在的な量子スピードアップについて検討する。特に、2つの異なる問題に焦点を当てている: 1つは、2つの未知の古典分布が十分に近いか遠くにあるかをテストし、もう1つは$\{0, 1\}^n$ 上の与えられた分布が$k$-wise 一様か、あるいは任意の$k$-wise 一様分布から遠く離れているかをテストすることである。最初の問題として、現在最高の量子アルゴリズムを$l^1$-distanceと$l^2$-distanceのメトリクスで提案する。量子特異値変換 (qsvt) の手法に依存する \cite{gilyen2019distributional} の最新の結果と比較すると, アルゴリズムはより簡潔なだけでなく, より効率的である。後者の問題に対しては、最先端の古典的アルゴリズムを2次高速化する最初の量子アルゴリズムを提案する。量子アルゴリズムの分析は、従来のものよりもはるかに直感的で簡潔であることは注目に値する。

We explore potential quantum speedups for the fundamental problem of testing properties of distributions. In particular, we focus on two different problems: the first one is to test whether two unknown classical distributions are close or far enough, and the second one is to test whether a given distribution over $\{0, 1\}^n$ is $k$-wise uniform or far from any $k$-wise uniform distribution. For the first problem, we propose the currently best quantum algorithm under the metrics of $l^1$-distance and $l^2$-distance. Compared with the latest result given in \cite{gilyen2019distributional} which relied on the technique of quantum singular value transformation (QSVT), our algorithm is not only more concise, but also more efficient. For the latter problem, we propose the first quantum algorithm achieving a quadratic speedup over the state-of-the-art classical algorithm. It is worthy noting that the analysis of our quantum algorithm is much more intuitive and concise than that of the classical one.

翻訳日:2023-04-26 20:06:18 公開日:2023-04-25

# 動的断熱工学による例外点近傍のキラルおよび非キラル急速モード変換

Chiral and non-chiral swift mode conversion near an exception point with dynamic adiabaticity engineering ( http://arxiv.org/abs/2304.12912v1 )

ライセンス: Link先を確認

Dong Wang, Wen-Xi Huang, Pei-Chao Cao, Yu-Gui Peng, Xue-Feng Zhu, Ying Li

(参考訳) 非エルミート的ハミルトニアンの固有値は、しばしば自己交差リーマン曲面を形成するため、ハミルトニアンが例外点 (EP) の周りの特定のループ経路に沿って進化する際、ユニークなモード変換現象を引き起こす。モード変換の速度は断熱的な要求によって制限され、キラリティーは自由に制御できない。我々は、同じ経路上でカイラルモードと非カイラルモードの変換を可能にする非エルミートハミルトニアンの進化における動的工学的断熱法を提案する。本手法は, 即時断熱性の定量化と制御を基本とし, 経路全体の不均一な進化を可能にする。断熱性の分布的性質に基づく進化を最適化することにより、従来の準断熱的進化と同じ品質を3分の1の時間で達成する。我々のアプローチはEPを取り巻くスピードとキラリティの問題に対処するための包括的で普遍的な解決策を提供する。また、非断熱的なプロセスの動的な操作と制御が容易になり、操作を加速し、様々なモード変換パターンを選択できる。

The eigenvalue of a non-Hermitian Hamiltonian often forms a self-intersecting Riemann surface, leading to a unique mode conversion phenomenon when the Hamiltonian evolves along certain loop paths around an exceptional point (EP). However, two fundamental problems exist with the conventional scheme of EP encircling: the speed of mode conversion is restricted by the adiabatic requirement, and the chirality cannot be freely controlled. We introduce a method for dynamically engineering adiabaticity in the evolution of non-Hermitian Hamiltonians that allows for both chiral and non-chiral mode conversion on the same path. Our method is based on quantifying and controlling the instantaneous adiabaticity, allowing for non-uniform evolution throughout the entire path. By optimizing the evolution based on the distributed nature of adiabaticity, we achieve the same quality as conventional quasi-adiabatic evolution in only one-third of the time. Our approach provides a comprehensive and universal solution to address the speed and chirality challenges associated with EP encircling. It also facilitates the dynamic manipulation and regulation of non-adiabatic processes, thereby accelerating the operation and allowing for a selection among various mode conversion patterns.

翻訳日:2023-04-26 20:05:59 公開日:2023-04-25

# インプシット生成モデルのためのスコア差流

The Score-Difference Flow for Implicit Generative Modeling ( http://arxiv.org/abs/2304.12906v1 )

ライセンス: Link先を確認

Romann M. Weber

(参考訳) 暗黙的生成モデリング(igm)は、ターゲットデータ分布の特性にマッチする合成データのサンプルを作成することを目的としている。最近の研究(例えばスコアマッチングネットワーク、拡散モデル)は、動的摂動や周囲空間の流れを通じて、合成音源データを目標分布へ押し上げるという観点から、igm問題にアプローチしている。 We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.

翻訳日:2023-04-26 20:05:38 公開日:2023-04-25

# 可変原子ミラーを用いた非エルミート導波路キャビティQED

Non-Hermitian Waveguide Cavity QED with Tunable Atomic Mirrors ( http://arxiv.org/abs/2304.12897v1 )

ライセンス: Link先を確認

Wei Nie, Tao Shi, Yu-xi Liu, Franco Nori

(参考訳) 光鏡は光反射により空洞特性を決定する。不完全な反射は光子損失を伴う開空洞を引き起こす。可変反射スペクトルを持つ原子-二量体ミラーからなる開空洞について検討した。原子空洞は反$\mathcal{PT}$対称性を示す。鏡内の原子カップリングによって制御される反$\mathcal{PT}$相転移は、2つの退化キャビティスーパーモデムの出現を示す。興味深いことに、強いコヒーレントな空洞-原子結合を実現するためにミラー反射のしきい値が同定される。この反射閾値は、良好なキャビティを生み出すために原子鏡の基準を明らかにする。さらに、プローブ原子を持つキャビティ量子電磁力学は、キャビティとプローブ原子によって形成される反射依存性のポーラリトンを含むミラーチューニング特性を示す。我々の研究は、反$\mathcal{PT}$原子空洞の非エルミート理論を示し、量子光学や量子計算に応用できるかもしれない。

Optical mirrors determine cavity properties by means of light reflection. Imperfect reflection gives rise to open cavities with photon loss. We study an open cavity made of atom-dimer mirrors with a tunable reflection spectrum. We find that the atomic cavity shows anti-$\mathcal{PT}$ symmetry. The anti-$\mathcal{PT}$ phase transition controlled by atomic couplings in mirrors indicates the emergence of two degenerate cavity supermodes. Interestingly, a threshold of mirror reflection is identified for realizing strong coherent cavity-atom coupling. This reflection threshold reveals the criterion of atomic mirrors to produce a good cavity. Moreover, cavity quantum electrodynamics with a probe atom shows mirror-tuned properties, including reflection-dependent polaritons formed by the cavity and probe atom. Our work presents a non-Hermitian theory of an anti-$\mathcal{PT}$ atomic cavity, which may have applications in quantum optics and quantum computation.

翻訳日:2023-04-26 20:05:19 公開日:2023-04-25

# グラフ生成アルゴリズムの発見

Discovering Graph Generation Algorithms ( http://arxiv.org/abs/2304.12895v1 )

ライセンス: Link先を確認

Mihai Babiac, Karolis Martinkus and Roger Wattenhofer

(参考訳) グラフ生成モデルを構築するための新しいアプローチを提案する。従来の確率モデルや深層生成モデルを使う代わりに、データを生成するアルゴリズムを見つけることを提案する。ランダムに初期化グラフニューラルネットワークによって実装された進化探索と強力な適合関数を用いてこれを実現する。これは、例えば、最後のグラフ生成プロセスがPython関数として表現されるため、トレーニング外分布の一般化と直接解釈可能性を高めるために、現在の深層生成モデルに対していくつかの利点をもたらす。我々は、このアプローチが深い生成モデルと競合し、ある状況下では真のグラフ生成プロセスを見つけることさえでき、それが完全に一般化できることを示す。

We provide a novel approach to construct generative models for graphs. Instead of using the traditional probabilistic models or deep generative models, we propose to instead find an algorithm that generates the data. We achieve this using evolutionary search and a powerful fitness function, implemented by a randomly initialized graph neural network. This brings certain advantages over current deep generative models, for instance, a higher potential for out-of-training-distribution generalization and direct interpretability, as the final graph generative process is expressed as a Python function. We show that this approach can be competitive with deep generative models and under some circumstances can even find the true graph generative process, and as such perfectly generalize.

翻訳日:2023-04-26 20:05:07 公開日:2023-04-25

# 正確な不確実性定量化を伴う生成降水流の潜時拡散モデル

Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification ( http://arxiv.org/abs/2304.12891v1 )

ライセンス: Link先を確認

Jussi Leinonen, Ulrich Hamann, Daniele Nerini, Urs Germann, Gabriele Franch

(参考訳) 拡散モデルは画像生成に広く採用されており、gans(generative adversarial network)よりも高品質で多様なサンプルを生成する。本研究では,最新の観測データに基づく降水予測のための潜時拡散モデル (ldm) を提案する。 LDMはより安定しており、GANよりも訓練に少ない計算を必要とする。 GANをベースとしたDGMR(Deep Generative Models of Rainfall)と統計モデルPySTEPSを比較検討した。 ldmはより正確な降水予測を生成するが、降水が事前定義された閾値を超えるかどうかを予測する場合の比較はより混合される。 LDMの最も明確な利点は、DGMRやPySTEPSよりも多様な予測を生成することである。ランク分布試験は, LDMからの試料の分布が予測の不確かさを正確に反映していることを示す。したがって、LCMは気象や気候など不確実性定量化が重要であるあらゆる応用に期待できる。

Diffusion models have been widely adopted in image generation, producing higher-quality and more diverse samples than generative adversarial networks (GANs). We introduce a latent diffusion model (LDM) for precipitation nowcasting - short-term forecasting based on the latest observational data. The LDM is more stable and requires less computation to train than GANs, albeit with more computationally expensive generation. We benchmark it against the GAN-based Deep Generative Models of Rainfall (DGMR) and a statistical model, PySTEPS. The LDM produces more accurate precipitation predictions, while the comparisons are more mixed when predicting whether the precipitation exceeds predefined thresholds. The clearest advantage of the LDM is that it generates more diverse predictions than DGMR or PySTEPS. Rank distribution tests indicate that the distribution of samples from the LDM accurately reflects the uncertainty of the predictions. Thus, LDMs are promising for any applications where uncertainty quantification is important, such as weather and climate.

翻訳日:2023-04-26 20:04:56 公開日:2023-04-25

# internet-of-thingsにおける信頼性の高い実行環境におけるセキュアアグリゲーションによるブロックチェーンベースのフェデレーション学習

Blockchain-based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things ( http://arxiv.org/abs/2304.12889v1 )

ライセンス: Link先を確認

Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammad Saidur Rahman, Mohammed Atiquzzaman, Xun Yi, and Mahathir Almashor

(参考訳) 本稿では,Intel Software Guard Extension (SGX) ベースのTrusted Execution Environment (TEE) を用いたブロックチェーンベースのフェデレートラーニング(FL)フレームワークを提案する。 FLでは、ローカルモデルは攻撃者によって改ざんされる。したがって、改ざんされた局所モデルから生成される大域的なモデルは誤りである。そのため、提案フレームワークはセキュアなモデルアグリゲーションにブロックチェーンネットワークを活用する。各ブロックチェーンノードはSGX対応プロセッサをホストし、FLベースの集約タスクを安全に実行してグローバルモデルを生成する。ブロックチェーンノードは、集約されたモデルの信頼性を検証し、モデルの完全性を保証するためにブロックチェーンコンセンサスメカニズムを実行し、タンパ保護ストレージのために分散台帳に追加することができる。各クラスタは、ブロックチェーンから集約モデルを取得し、それを使用する前にその整合性を検証することができる。提案フレームワークの性能を評価するために,様々なcnnモデルとデータセットを用いていくつかの実験を行った。

This paper proposes a blockchain-based Federated Learning (FL) framework with Intel Software Guard Extension (SGX)-based Trusted Execution Environment (TEE) to securely aggregate local models in Industrial Internet-of-Things (IIoTs). In FL, local models can be tampered with by attackers. Hence, a global model generated from the tampered local models can be erroneous. Therefore, the proposed framework leverages a blockchain network for secure model aggregation. Each blockchain node hosts an SGX-enabled processor that securely performs the FL-based aggregation tasks to generate a global model. Blockchain nodes can verify the authenticity of the aggregated model, run a blockchain consensus mechanism to ensure the integrity of the model, and add it to the distributed ledger for tamper-proof storage. Each cluster can obtain the aggregated model from the blockchain and verify its integrity before using it. We conducted several experiments with different CNN models and datasets to evaluate the performance of the proposed framework.

翻訳日:2023-04-26 20:04:39 公開日:2023-04-25

# 二重対向的偏りによる分布外証拠認識フェイクニュース検出

Out-of-distribution Evidence-aware Fake News Detection via Dual Adversarial Debiasing ( http://arxiv.org/abs/2304.12888v1 )

ライセンス: Link先を確認

Qiang Liu, Junfei Wu, Shu Wu, Liang Wang

(参考訳) Evidence-aware fake news detectionは、ニュースコンテンツに基づいて検索されるニュースとエビデンスの間の推論を行い、一様性や矛盾を見つけることを目的としている。しかし,エビデンス認識検出モデルでは,ニュース・エビデンスコンテンツと真・偽のニュースラベルの相関関係がみられ,アウトオブオフ・ディストリビューション(OOD)の状況に一般化することは困難である。そこで本研究では,新しい対向学習(dal)手法を提案する。 DALには、真偽のニュースラベルをターゲットとするニュースアスペクションとエビデンスアスペクティブアスペクティブアスペクティブアスペクティブデバイアスニングの識別器が組み込まれている。そして、DALは、ニュースやエビデンスコンテンツバイアスの影響を軽減するために、ニュース・アスペクトとエビデンス・エイビデンス・デバイアスをリバースに最適化する。同時に、DALはメインのフェイクニュース予測器を最適化し、ニュース・エビデンス・インタラクション・モジュールを学習できるようにする。このプロセスにより、エビデンスを意識した偽ニュース検出モデルを教え、ニュースエビデンス推論をより効果的に実施し、コンテンツバイアスの影響を最小限に抑えることができる。ちなみに、提案しているdalアプローチは、既存のバックボーンとうまく連携するプラグアンドプレイモジュールです。 2つのOOD設定下で総合的な実験を行い、4つの証拠を意識した偽ニュース検出バックボーンにDALを挿入する。その結果、DALは元の背骨といくつかの競争的脱バイアス法を著しく、安定的に上回っていることがわかった。

Evidence-aware fake news detection aims to conduct reasoning between news and evidence, which is retrieved based on news content, to find uniformity or inconsistency. However, we find evidence-aware detection models suffer from biases, i.e., spurious correlations between news/evidence contents and true/fake news labels, and are hard to be generalized to Out-Of-Distribution (OOD) situations. To deal with this, we propose a novel Dual Adversarial Learning (DAL) approach. We incorporate news-aspect and evidence-aspect debiasing discriminators, whose targets are both true/fake news labels, in DAL. Then, DAL reversely optimizes news-aspect and evidence-aspect debiasing discriminators to mitigate the impact of news and evidence content biases. At the same time, DAL also optimizes the main fake news predictor, so that the news-evidence interaction module can be learned. This process allows us to teach evidence-aware fake news detection models to better conduct news-evidence reasoning, and minimize the impact of content biases. To be noted, our proposed DAL approach is a plug-and-play module that works well with existing backbones. We conduct comprehensive experiments under two OOD settings, and plug DAL in four evidence-aware fake news detection backbones. Results demonstrate that, DAL significantly and stably outperforms the original backbones and some competitive debiasing methods.

翻訳日:2023-04-26 20:04:21 公開日:2023-04-25

# 関数近似を用いた効率的なオンラインRLにおける一般的なカバレッジ条件の利点

Provable benefits of general coverage conditions in efficient online RL with function approximation ( http://arxiv.org/abs/2304.12886v1 )

ライセンス: Link先を確認

Fanghui Liu, Luca Viano, Volkan Cevher

(参考訳) オンライン強化学習(RL)では、マルコフ決定プロセス(MDP)の標準的な構造仮定を採用する代わりに、特定のカバレッジ条件(元々オフラインRLから)を用いることで、サンプル効率の保証を確保するのに十分である(Xie et al. 2023)。本研究では,この新たな方向性に焦点をあてて,より可能かつ一般的なカバレッジ条件を掘り下げ,効率的なオンラインrlにおけるその可能性と有用性について検討する。我々は、集中度の変化、密度比の再現性、部分/レスト被覆条件でのトレードオフなど、より多くの概念を同定し、サンプル効率の良いオンラインRLにも有益であり、改善された後悔境界を達成できる。さらに,オンラインrlでは,探索的オフラインデータを用いることで,統計的かつ計算効率のよい保証を実現することができる。さらに、mdp構造(例えば線形mdp)が与えられたとしても、良好なカバレッジ条件は、$\widetilde{o}(\sqrt{t})$ を超えるより早い後悔を得るのに有益であり、また対数順序の後悔も得られる。これらの結果は、効率的なオンラインRLにおける一般的なカバレッジ条件の使用を正当化する。

In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL. We identify more concepts, including the $L^p$ variant of concentrability, the density ratio realizability, and trade-off on the partial/rest coverage condition, that can be also beneficial to sample-efficient online RL, achieving improved regret bound. Furthermore, if exploratory offline data are used, under our coverage conditions, both statistically and computationally efficient guarantees can be achieved for online RL. Besides, even though the MDP structure is given, e.g., linear MDP, we elucidate that, good coverage conditions are still beneficial to obtain faster regret bound beyond $\widetilde{O}(\sqrt{T})$ and even a logarithmic order regret. These results provide a good justification for the usage of general coverage conditions in efficient online RL.

翻訳日:2023-04-26 20:03:53 公開日:2023-04-25

# ポテンシャル流としての生成モデルにおける潜在トラバース

Latent Traversals in Generative Models as Potential Flows ( http://arxiv.org/abs/2304.12944v1 )

ライセンス: Link先を確認

Yue Song, Andy Keller, Nicu Sebe, Max Welling

(参考訳) 深層生成モデルにおける最近の顕著な進歩にもかかわらず、それらの潜在空間の構造はいまだに理解されていないため、意味論的に意味のある潜在トラバーサルの実行はオープンな研究課題である。ほとんどの先行研究はこの課題を、潜在構造を線形にモデル化し、対応する線形方向を見出すことで解決することを目的としている。そこで本研究では,学習された動的ポテンシャルランドスケープを持つ潜在構造物をモデル化し,ランドスケープの勾配を下るサンプルの流れとして潜在トラバースを行う。物理、最適輸送、神経科学にインスパイアされたこれらの潜在的景観は、物理的に現実的な偏微分方程式として学習され、空間と時間の両方で柔軟に変化する。絡み合いを実現するために、複数の電位を同時に学習し、分類器によって区別され、意味的に自己整合する。実験により,本手法は最先端のベースラインよりも定性的かつ定量的に歪んだ軌跡を達成できることが実証された。さらに,本手法をトレーニング中に正規化項として統合することにより,構造化表現の学習に対する帰納的バイアスとして作用し,最終的に類似した構造化データに対するモデル可能性を向上させることを実証する。

Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.

翻訳日:2023-04-26 19:57:00 公開日:2023-04-25

# ユーザ中心のフェデレーション学習:パーソナライズのためのワイヤレスリソースの取引

User-Centric Federated Learning: Trading off Wireless Resources for Personalization ( http://arxiv.org/abs/2304.12930v1 )

ライセンス: Link先を確認

Mohamad Mestoukirdi, Matteo Zecchin, David Gesbert, Qianrui Li

(参考訳) フェデレートラーニング(FL)システムにおけるクライアント間の統計的不均一性はアルゴリズム収束時間を増加させ、一般化性能を低下させ、貧弱なモデルに見返りに大きな通信オーバーヘッドをもたらす。 FLが課すプライバシー制約に違反することなく、上記の問題に対処するためには、パーソナライズされたFLメソッドは、プライバシー保護転送を保証するために、データに直接アクセスすることなく、統計的に類似したクライアントを結合する必要がある。本研究では,パラメータサーバ(PS)におけるユーザ中心のアグリゲーションルールを設計し,容易に利用可能な勾配情報に基づいて各FLクライアントに対してパーソナライズされたモデルを生成する。提案する集約ルールは,重み付き集計経験的リスク最小化器の上限値に着想を得たものである。第2に,ユーザクラスタリングに基づく通信効率のよい変種を導出し,通信制約のあるシステムへの適用性を大幅に向上させる。提案アルゴリズムは,平均精度,ノード性能,通信オーバヘッドの訓練において,パーソナライズされたFLベースラインを上回っている。

Statistical heterogeneity across clients in a Federated Learning (FL) system increases the algorithm convergence time and reduces the generalization performance, resulting in a large communication overhead in return for a poor model. To tackle the above problems without violating the privacy constraints that FL imposes, personalized FL methods have to couple statistically similar clients without directly accessing their data in order to guarantee a privacy-preserving transfer. In this work, we design user-centric aggregation rules at the parameter server (PS) that are based on readily available gradient information and are capable of producing personalized models for each FL client. The proposed aggregation rules are inspired by an upper bound of the weighted aggregate empirical risk minimizer. Secondly, we derive a communication-efficient variant based on user clustering which greatly enhances its applicability to communication-constrained systems. Our algorithm outperforms popular personalized FL baselines in terms of average accuracy, worst node performance, and training communication overhead.

翻訳日:2023-04-26 19:55:45 公開日:2023-04-25

# ベイズ最適化のための量子ガウス過程回帰

Quantum Gaussian Process Regression for Bayesian Optimization ( http://arxiv.org/abs/2304.12923v1 )

ライセンス: Link先を確認

Frederic Rapp and Marco Roth

(参考訳) ガウス過程回帰は確立されたベイズ機械学習手法である。本稿では,パラメータ化量子回路に基づく量子カーネルを用いたガウス過程の回帰手法を提案する。ハードウェア効率の良い特徴写像とグラム行列の注意的な正則化を用いて、得られた量子ガウス過程の分散情報を保存できることを実証する。また,量子ガウス過程がベイズ最適化の代用モデルとして利用できることを示す。この量子ベイズ最適化アルゴリズムの性能を示すために,実世界のデータセット上で回帰を行う機械学習モデルのハイパーパラメータ最適化に適用する。我々は,量子ベイズ最適化を古典版と比較し,量子版がその性能に合致することを示す。

Gaussian process regression is a well-established Bayesian machine learning method. We propose a new approach to Gaussian process regression using quantum kernels based on parameterized quantum circuits. By employing a hardware-efficient feature map and careful regularization of the Gram matrix, we demonstrate that the variance information of the resulting quantum Gaussian process can be preserved. We also show that quantum Gaussian processes can be used as a surrogate model for Bayesian optimization, a task that critically relies on the variance of the surrogate model. To demonstrate the performance of this quantum Bayesian optimization algorithm, we apply it to the hyperparameter optimization of a machine learning model which performs regression on a real-world dataset. We benchmark the quantum Bayesian optimization against its classical counterpart and show that quantum version can match its performance.

翻訳日:2023-04-26 19:55:27 公開日:2023-04-25

# gmnlp at semeval-2023 task 12: 系統別アダプタを用いた感情分析

GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters ( http://arxiv.org/abs/2304.12979v1 )

ライセンス: Link先を確認

Md Mahfuz Ibn Alam, Ruoyu Xie, Fahim Faisal, Antonios Anastasopoulos

(参考訳) 本報告では,SemEval-2023共有タスクAfriSenti-SemEvalに対するGMUの感情分析システムについて述べる。我々は,モノリンガル,マルチリンガル,ゼロショットの3つのサブタスクに参加した。 AfroXLMR-largeはアフリカ語で訓練された訓練済みの多言語言語モデルである。また,オリジナルトレーニングデータとともに拡張トレーニングデータも導入する。微調整と並行して,複数のモデルを作成し,最終提案に最適なモデルをアサンブルするために,系統ベースのアダプタチューニングを行う。本システムでは,5:Amharicで最高のF1スコアを達成し,F1スコアを6.2ポイント上回っている。システム全体では、全15トラックに参加する10のシステムの中で5位です。

This report describes GMU's sentiment analysis system for the SemEval-2023 shared task AfriSenti-SemEval. We participated in all three sub-tasks: Monolingual, Multilingual, and Zero-Shot. Our approach uses models initialized with AfroXLMR-large, a pre-trained multilingual language model trained on African languages and fine-tuned correspondingly. We also introduce augmented training data along with original training data. Alongside finetuning, we perform phylogeny-based adapter tuning to create several models and ensemble the best models for the final submission. Our system achieves the best F1-score on track 5: Amharic, with 6.2 points higher F1-score than the second-best performing system on this track. Overall, our system ranks 5th among the 10 systems participating in all 15 tracks.

翻訳日:2023-04-26 19:48:45 公開日:2023-04-25

# 逆強化学習の理論的理解に向けて

Towards Theoretical Understanding of Inverse Reinforcement Learning ( http://arxiv.org/abs/2304.12966v1 )

ライセンス: Link先を確認

Alberto Maria Metelli, Filippo Lazzati, Marcello Restelli

(参考訳) 逆強化学習(IRL)は、専門家が示す振る舞いを正当化する報酬関数を回復するアルゴリズムの強力なファミリーである。 IRLのよく知られた制限は、観察された振る舞いを説明する複数の報酬が存在するため、報酬関数の選択の曖昧さである。この制限は、IRLを実現可能な報酬セット、すなわち専門家の行動に適合する報酬の領域を推定する問題として定式化することによって、近年回避されている。本稿では、生成モデルを用いた有限ホライゾン問題において、irlの理論ギャップを閉じる一歩を踏み出す。まず、実現可能な報酬セット、対応するPAC要件を推定し、特定の報酬のクラスの性質を議論する問題を正式に導入することから始める。次に、サンプル複雑性に関する最初のミニマックス下界を、次数${\Omega}\Bigl( \frac{H^3SA}{\epsilon^2} \bigl( \log \bigl(\frac{1}{\delta}\bigl) + S \bigl)\Bigl)$, $S$と$A$のそれぞれ状態と動作の数、水平線$H$、$\epsilon$所望の精度、$\delta$の信頼度を推定する問題に対して与える。均一サンプリング戦略 (us-irl) のサンプル複雑性を分析し, 対数因子に対する上限値の一致を証明した。最後に、IRLにおけるいくつかのオープンな質問について概説し、今後の研究方向性を提案する。

Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent. A well-known limitation of IRL is the ambiguity in the choice of the reward function, due to the existence of multiple rewards that explain the observed behavior. This limitation has been recently circumvented by formulating IRL as the problem of estimating the feasible reward set, i.e., the region of the rewards compatible with the expert's behavior. In this paper, we make a step towards closing the theory gap of IRL in the case of finite-horizon problems with a generative model. We start by formally introducing the problem of estimating the feasible reward set, the corresponding PAC requirement, and discussing the properties of particular classes of rewards. Then, we provide the first minimax lower bound on the sample complexity for the problem of estimating the feasible reward set of order ${\Omega}\Bigl( \frac{H^3SA}{\epsilon^2} \bigl( \log \bigl(\frac{1}{\delta}\bigl) + S \bigl)\Bigl)$, being $S$ and $A$ the number of states and actions respectively, $H$ the horizon, $\epsilon$ the desired accuracy, and $\delta$ the confidence. We analyze the sample complexity of a uniform sampling strategy (US-IRL), proving a matching upper bound up to logarithmic factors. Finally, we outline several open questions in IRL and propose future research directions.

翻訳日:2023-04-26 19:48:31 公開日:2023-04-25

# ユニタリ回路ゲームにおける絡み合い遷移

Entanglement Transitions in Unitary Circuit Games ( http://arxiv.org/abs/2304.12965v1 )

ライセンス: Link先を確認

Ra\'ul Morral-Yepes, Adam Smith, S. L. Sondhi, Frank Pollmann

(参考訳) ユニタリ回路における繰り返しの投影的測定は、測定速度が調整されるにつれて、絡み合い相転移を引き起こす可能性がある。そこで本研究では,射影測度を動的に選択したユニタリゲートに置き換え,絡み合いを最小限に抑える異なる設定について考察する。これは、2人のプレーヤーがランダムに割り当てられた結合に異なるレートでユニタリゲートを配置する1次元のユニタリ回路ゲームであると見なすことができる。異端者」は状態に関する限られた知識に基づいて、割り当てられた結合上の絡み合いのエントロピーを減らし、有限(領域法則)の絡み合いに制限することを目的としてユニタリゲートを選択する。結果として生じる絡み合いのダイナミクスを明らかにするために、3つの異なるシナリオを考えます。 (i)古典的な離散高さモデル (ii)クリフォード回路、及び (iii)一般的な$U(4)$ユニタリ回路。古典的回路モデルとクリフォード回路モデルの両方が、確率的フレドキン連鎖との接続を通して理解できるような類似した性質を持つゲートを解離器が配置する速度の関数として位相遷移を示す。対照的に、''entangler'' は常にハールランダムなユニタリゲートを使用するときに勝利し、エンタングリングの非ゼロな速度に対する広範囲な体積法的な絡み合いを観察する。

Repeated projective measurements in unitary circuits can lead to an entanglement phase transition as the measurement rate is tuned. In this work, we consider a different setting in which the projective measurements are replaced by dynamically chosen unitary gates that minimize the entanglement. This can be seen as a one-dimensional unitary circuit game in which two players get to place unitary gates on randomly assigned bonds at different rates: The "entangler" applies a random local unitary gate with the aim of generating extensive (volume law) entanglement. The "disentangler", based on limited knowledge about the state, chooses a unitary gate to reduce the entanglement entropy on the assigned bond with the goal of limiting to only finite (area law) entanglement. In order to elucidate the resulting entanglement dynamics, we consider three different scenarios: (i) a classical discrete height model, (ii) a Clifford circuit, and (iii) a general $U(4)$ unitary circuit. We find that both the classical and Clifford circuit models exhibit phase transitions as a function of the rate that the disentangler places a gate, which have similar properties that can be understood through a connection to the stochastic Fredkin chain. In contrast, the ``entangler'' always wins when using Haar random unitary gates and we observe extensive, volume law entanglement for all non-zero rates of entangling.

翻訳日:2023-04-26 19:47:55 公開日:2023-04-25

# chameleon: 連合学習における耐久性のあるバックドアの植え付けのためのピアイメージへの適応

Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning ( http://arxiv.org/abs/2304.12961v1 )

ライセンス: Link先を確認

Yanbo Dai, Songze Li

(参考訳) フェデレーション学習(fl)システムでは、分散クライアントはローカルモデルを中央サーバにアップロードしてグローバルモデルに集約する。悪意のあるクライアントは、毒入りのローカルモデルをアップロードすることで、グローバルモデルにバックドアを植え込み、特定のパターンを持つ画像を一部のターゲットラベルに誤分類する。現在の攻撃によって植えられたバックドアは耐久性がなく、攻撃者がモデル中毒をやめるとすぐに消滅する。本稿では,flバックドアの耐久性と良性画像と有毒画像との関係について検討する。具体的には、原画像と被毒画像の標的ラベルとの良性画像は、バックドア耐久性に重要な影響を及ぼす。そこで我々は,より耐久性の高いバックドアに向けて,その効果をさらに増幅するためにコントラスト学習を利用する新たな攻撃であるChameleonを提案する。広範な実験により、chameleonは、幅広い画像データセット、バックドアタイプ、およびモデルアーキテクチャに対して、ベースラインよりもバックドアの寿命を12\times \sim 4\times$で大幅に伸ばすことが示されている。

In a federated learning (FL) system, distributed clients upload their local models to a central server to aggregate into a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing images with specific patterns to be misclassified into some target labels. Backdoors planted by current attacks are not durable, and vanish quickly once the attackers stop model poisoning. In this paper, we investigate the connection between the durability of FL backdoors and the relationships between benign images and poisoned images (i.e., the images whose labels are flipped to the target label during local training). Specifically, benign images with the original and the target labels of the poisoned images are found to have key effects on backdoor durability. Consequently, we propose a novel attack, Chameleon, which utilizes contrastive learning to further amplify such effects towards a more durable backdoor. Extensive experiments demonstrate that Chameleon significantly extends the backdoor lifespan over baselines by $1.2\times \sim 4\times$, for a wide range of image datasets, backdoor types, and model architectures.

翻訳日:2023-04-26 19:47:28 公開日:2023-04-25

# 機械翻訳における文レベルパラダイムの回避

Escaping the sentence-level paradigm in machine translation ( http://arxiv.org/abs/2304.12959v1 )

ライセンス: Link先を確認

Matt Post and Marcin Junczys-Dowmunt

(参考訳) 文書の文脈は、翻訳の曖昧さを解消するのに不可欠であり、実際、文書の設定は、ほぼ全ての翻訳にとって最も自然な設定である。したがって、機械翻訳(研究と生産の両方)が数十年前の文レベルの翻訳パラダイムに留まっているのは残念である。また、ドキュメントベースの大規模言語モデルによる競合的なプレッシャーに照らされつつある問題でもある。文書-文脈機械翻訳の多くの作業は存在するが、様々な理由により、保持できない。本稿では,3つの障害を一度に解決することで,このルートから抜け出す方法を提案する。ドキュメントレベルの情報をどこで取得すればよいのか? それが良いかどうかどうやって知るのか? 特殊アーキテクチャの作業とは対照的に,標準的な Transformer アーキテクチャでは十分なキャパシティがあれば十分であることを示す。次に、データ提供が容易であるだけでなく、機械翻訳出力を含むかもしれない並列文書データよりも高品質である逆変換データのみから文書サンプルを取ることで、トレーニングデータ問題に対処する。最後に,文書システム間でより識別し易い既存のコントラスト指標の生成変種を提案する。大規模な4つの言語ペア(DE$\rightarrow$EN, EN$\rightarrow$DE, EN$\rightarrow$FR, EN$\rightarrow$RU)の結果は、ドキュメントレベルのパフォーマンスを改善するために、これら3つを一緒に成功させる。

It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation. It is therefore unfortunate that machine translation -- both research and production -- largely remains stuck in a decades-old sentence-level translation paradigm. It is also an increasingly glaring problem in light of competitive pressure from large language models, which are natively document-based. Much work in document-context machine translation exists, but for various reasons has been unable to catch hold. This paper suggests a path out of this rut by addressing three impediments at once: what architectures should we use? where do we get document-level information for training them? and how do we know whether they are any good? In contrast to work on specialized architectures, we show that the standard Transformer architecture is sufficient, provided it has enough capacity. Next, we address the training data issue by taking document samples from back-translated data only, where the data is not only more readily available, but is also of higher quality compared to parallel document data, which may contain machine translation output. Finally, we propose generative variants of existing contrastive metrics that are better able to discriminate among document systems. Results in four large-data language pairs (DE$\rightarrow$EN, EN$\rightarrow$DE, EN$\rightarrow$FR, and EN$\rightarrow$RU) establish the success of these three pieces together in improving document-level performance.

翻訳日:2023-04-26 19:47:07 公開日:2023-04-25

# 高レベルロボット説明のための逆解法について

A Closer Look at Reward Decomposition for High-Level Robotic Explanations ( http://arxiv.org/abs/2304.12958v1 )

ライセンス: Link先を確認

Wenhao Lu, Sven Magg, Xufeng Zhao, Martin Gromniak, Stefan Wermter

(参考訳) ロボットのような知的エージェントの振る舞いを人間に説明することは、その理解不能な摂理状態、変動的中間的目標、結果として予測不可能性のために困難である。さらに、強化学習エージェントのワンステップ説明は、各遷移におけるエージェントの将来の振る舞いを説明できないため曖昧であり、ロボットの動作を説明する複雑さが増す。タスク固有のプリミティブにマップする抽象的なアクションを活用することで、動作レベルの説明を避けることができる。提案するフレームワークは、報酬分解(RD)と抽象的な行動空間を組み合わせて説明可能な学習フレームワークとし、タスクのオブジェクト特性に基づいた曖昧で高レベルな説明を可能にする。本研究では,人間の理解が容易なRD説明の出力成果から,視覚的・テキスト的説明を提示する2つのロボットシナリオの定量的・定性的な分析を通じて,フレームワークの有効性を実証する。さらに,これらのアーティファクトを大規模言語モデルと統合して推論やインタラクティブなクエリを行う汎用性を示す。

Explaining the behavior of intelligent agents such as robots to humans is challenging due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for reinforcement learning agents can be ambiguous as they fail to account for the agent's future behavior at each transition, adding to the complexity of explaining robot actions. By leveraging abstracted actions that map to task-specific primitives, we avoid explanations on the movement level. Our proposed framework combines reward decomposition (RD) with abstracted action spaces into an explainable learning framework, allowing for non-ambiguous and high-level explanations based on object properties in the task. We demonstrate the effectiveness of our framework through quantitative and qualitative analysis of two robot scenarios, showcasing visual and textual explanations, from output artifacts of RD explanation, that are easy for humans to comprehend. Additionally, we demonstrate the versatility of integrating these artifacts with large language models for reasoning and interactive querying.

翻訳日:2023-04-26 19:46:45 公開日:2023-04-25

# 量子スピン鎖における弦破れの動的局在転移

Dynamical localization transition of string breaking in quantum spin chains ( http://arxiv.org/abs/2304.12957v1 )

ライセンス: Link先を確認

Roberto Verdel and Guo-Yi Zhu and Markus Heyl

(参考訳) 2つの電荷を繋ぐ弦の分裂は、ゲージ理論における最も驚くべき現象の1つである。この過程のダイナミクスは最近の集中的な研究の主題であり、多くの数値的な結果が次の二分法を示唆している。ここでは, この二分法の基礎となるメカニズムとして, 動的局在遷移を提唱する。この目的のために、閉じ込められたスピン鎖の光中間セクターにおける効果的な弦破壊記述を導出し、この問題をフォック空間における動的局所化遷移と見なすことができることを示す。高速および抑制された文字列破壊ダイナミクスは、それぞれ非局在化および局所化動作と識別される。次に、弦が中間子浴に浸漬された「不純物」として表される量子不純物モデルへの動的弦破れ問題のさらなる軽減を与える。この現象学モデルは局所化-非局在化遷移を特徴とし、定性的に異なる弦の破れ状態を理解するための一般的で単純な物理的基礎を与える。これらの発見は1次元以上の密閉格子モデルのより広いクラスに直接関係しており、現在のrydberg量子シミュレータで実現することができる。

The fission of a string connecting two charges is one of the most astounding phenomena in confining gauge theories. The dynamics of this process has been the subject of recent intensive studies, in which plenty of numerical results suggest the following dichotomy: the confining string can decay relatively fast or persist up to extremely long times. Here, we put forward a dynamical localization transition as the mechanism underlying this dichotomy. To this end, we derive an effective string breaking description in the light-meson sector of a confined spin chain and show that the problem can be regarded as a dynamical localization transition in Fock space. Fast and suppressed string breaking dynamics are identified with delocalized and localized behavior, respectively. We then provide a further reduction of the dynamical string breaking problem onto a quantum impurity model, where the string is represented as an "impurity" immersed in a meson bath. It is shown that this phenomenological model features a localization-delocalization transition, giving a general and simple physical basis to understand the qualitatively distinct string breaking regimes. These findings are directly relevant for a wider class of confining lattice models in one and higher dimensions and could be realized on present-day Rydberg quantum simulators.

翻訳日:2023-04-26 19:46:27 公開日:2023-04-25

# 単一能動素子多重光子源

Single-active-element demultiplexed multi-photon source ( http://arxiv.org/abs/2304.12956v1 )

ライセンス: Link先を確認

Lena M. Hansen, Lorenzo Carosini, Lennart Jehle, Francesco Giorgino, Romane Houvenaghel, Michal Vyvlecka, Juan C. Loredo, Philip Walther

(参考訳) 時空間デマルチプレキシングは、同じ空間モードの非同時事象を異なる出力軌跡にルートする。この技術は、固体量子エミッタを利用する際に、より多数の多光子状態にアクセスするために広く採用されている。しかしながら、これまでの実装では、リソースの制約に素早く直面している、アクティブな要素の数を常に増加させる必要があった。本稿では,任意の数の出力をルーティングするために,単一のアクティブ要素のみを利用するデマルチプレクシング手法を提案し,実証する。我々は、高効率量子ドットベースの単一光子源と組み合わせ、最大8個の非多重化高識別可能な単一光子を測定する。本稿では,本手法の実用的限界について考察し,十数個のアウトプットをデマルチプレックスに使用できる条件について述べる。以上の結果から,資源効率の高い大規模マルチ光子源の創出が期待できる。

Temporal-to-spatial demultiplexing routes non-simultaneous events of the same spatial mode to distinct output trajectories. This technique has now been widely adopted because it gives access to higher-number multi-photon states when exploiting solid-state quantum emitters. However, implementations so far have required an always-increasing number of active elements, rapidly facing resource constraints. Here, we propose and demonstrate a demultiplexing approach that utilizes only a single active element for routing to, in principle, an arbitrary number of outputs. We employ our device in combination with a high-efficiency quantum dot based single-photon source, and measure up to eight demultiplexed highly indistinguishable single photons. We discuss the practical limitations of our approach, and describe in which conditions it can be used to demultiplex, e.g., tens of outputs. Our results thus provides a path for the preparation of resource-efficient larger-scale multi-photon sources.

翻訳日:2023-04-26 19:46:07 公開日:2023-04-25

# ニューラルネットワークにおける非決定論的スタック

Nondeterministic Stacks in Neural Networks ( http://arxiv.org/abs/2304.12955v1 )

ライセンス: Link先を確認

Brian DuSell

(参考訳) ニューラルネットワークは、言語を処理するコンピュータシステムの画期的な改善に寄与しているが、広く使われているニューラルネットワークアーキテクチャは、構文を処理する能力の限界をまだ示している。この問題に対処するため、以前の研究では、ニューラルネットワークにスタックデータ構造を追加し、構文とスタック間の理論的接続からインスピレーションを得ている。しかし、これらの手法は一度に1つのパースを追跡するように設計された決定論的スタックを用いるが、構文的曖昧さは解析に非決定論的スタックを必要とするが、言語では極めて一般的である。この論文では,非決定論的スタックをニューラルネットワークに組み込む手法を提案することで,この不一致を解消する。本研究では,動的プログラミングアルゴリズムを用いて,指数関数数を表す非決定論的プッシュダウンオートマトンを効率的にシミュレートする微分可能なデータ構造を開発する。このモジュールをリカレントニューラルネットワーク(RNN)とトランスフォーマーの2つの主要なアーキテクチャに組み込む。これにより、任意の文脈自由言語に対する形式的認識能力が向上し、決定論的文脈自由言語においてもトレーニングを支援することが示される。経験的に、非決定論的スタックを持つニューラルネットワークは、理論的に最大解析の難しい言語を含む、以前のスタック推論モデルよりもずっと効果的に文脈自由言語を学習する。また,非決定性スタックを付加したrnnでは,非コンテキストフリーパターンであるクロスシリアル依存性の学習など,驚くほど強力な動作が可能であることも示している。自然言語モデリングの改善を実証し,構文一般化ベンチマークの分析を行う。この作業は、より人間的な方法で構文の使用を学ぶシステムを構築するための重要なステップである。

Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.

翻訳日:2023-04-26 19:45:54 公開日:2023-04-25

# 射影確率近似における漸近的挙動と相転移:ジャンプ拡散アプローチ

Asymptotic Behaviors and Phase Transitions in Projected Stochastic Approximation: A Jump Diffusion Approach ( http://arxiv.org/abs/2304.12953v1 )

ライセンス: Link先を確認

Jiadong Liang, Yuze Han, Xiang Li, Zhihua Zhang

(参考訳) 本稿では,線形制約付き最適化問題を考察し,ループレス射影確率近似(LPSA)アルゴリズムを提案する。実行可能性を確保するために、n$-thイテレーションで確率$p_n$でプロジェクションを実行する。確率 $p_n$ とステップサイズ $\eta_n$ の特定の族を考えると、漸近的かつ連続的な観点からアルゴリズムを分析する。新しいジャンプ拡散近似を用いて、それらの再スケールされた最後のイテレートを特定の確率微分方程式(sdes)の解に弱収束させる軌道を示す。 SDEを解析することにより、LPSAの漸近挙動を$(p_n, \eta_n)$の異なる選択に対して同定する。このアルゴリズムは興味深い漸近バイアス分散トレードオフを示し、相対等級$p_n$ w.r.t.$\eta_n$に従って位相遷移現象を生じる。この発見は射影コストを最小化するために${(p_n, \eta_n)}_{n \geq 1}$を選択するための洞察を与える。さらに,ジャンプ拡散近似の実用的応用として,脱バイアスLPSA(DLPSA)を提案する。 DLPSAはバニラLPSAと比較してプロジェクションの複雑さを効果的に減少させる。

In this paper we consider linearly constrained optimization problems and propose a loopless projection stochastic approximation (LPSA) algorithm. It performs the projection with probability $p_n$ at the $n$-th iteration to ensure feasibility. Considering a specific family of the probability $p_n$ and step size $\eta_n$, we analyze our algorithm from an asymptotic and continuous perspective. Using a novel jump diffusion approximation, we show that the trajectories connecting those properly rescaled last iterates weakly converge to the solution of specific stochastic differential equations (SDEs). By analyzing SDEs, we identify the asymptotic behaviors of LPSA for different choices of $(p_n, \eta_n)$. We find that the algorithm presents an intriguing asymptotic bias-variance trade-off and yields phase transition phenomenons, according to the relative magnitude of $p_n$ w.r.t. $\eta_n$. This finding provides insights on selecting appropriate ${(p_n, \eta_n)}_{n \geq 1}$ to minimize the projection cost. Additionally, we propose the Debiased LPSA (DLPSA) as a practical application of our jump diffusion approximation result. DLPSA is shown to effectively reduce projection complexity compared to vanilla LPSA.

翻訳日:2023-04-26 19:45:26 公開日:2023-04-25

# 宇宙から何か分離する?

Segment anything, from space? ( http://arxiv.org/abs/2304.13000v1 )

ライセンス: Link先を確認

Simiao Ren, Francesco Luzi, Saad Lahrichi, Kaleb Kassaw, Leslie M. Collins, Kyle Bradbury, Jordan M. Malof

(参考訳) 近年,視覚タスク用に開発された最初の基礎モデルが開発され,SAM (Segment Anything Model) と呼ばれる。 SAMは1つ(またはそれ以上)のポイント、バウンディングボックス、マスクなど、安価な入力プロンプトに基づいて入力画像にオブジェクトを分割することができる。著者らは、多数の視覚ベンチマークタスクにおいてSAMのゼロショット画像分割精度を検証し、SAMは通常、目標タスクで訓練された視覚モデルと似ているか、時には超過している。セグメンテーションのためのSAMの印象的な一般化は、自然画像の研究に重要な意味を持つ。本研究では,SAMの優れた性能が画像のオーバーヘッド問題にまで及んでいるかどうかを考察し,その開発に対するコミュニティの反応のガイドに役立てる。 SAMの性能を多様で広く研究されているベンチマークタスクのセットで検証する。 SAMはオーバヘッド画像によく当てはまるが、オーバヘッド画像とターゲットオブジェクトのユニークな特徴のため、いくつかのケースではフェールする。リモートセンシング画像に対するこれらのユニークな系統的障害事例について報告する。これは作業用紙であり、追加の分析と結果が完了すると更新される。

Recently, the first foundation model developed specifically for vision tasks was developed, termed the "Segment Anything Model" (SAM). SAM can segment objects in input imagery based upon cheap input prompts, such as one (or more) points, a bounding box, or a mask. The authors examined the zero-shot image segmentation accuracy of SAM on a large number of vision benchmark tasks and found that SAM usually achieved recognition accuracy similar to, or sometimes exceeding, vision models that had been trained on the target tasks. The impressive generalization of SAM for segmentation has major implications for vision researchers working on natural imagery. In this work, we examine whether SAM's impressive performance extends to overhead imagery problems, and help guide the community's response to its development. We examine SAM's performance on a set of diverse and widely-studied benchmark tasks. We find that SAM does often generalize well to overhead imagery, although it fails in some cases due to the unique characteristics of overhead imagery and the target objects. We report on these unique systematic failure cases for remote sensing imagery that may comprise useful future research for the community. Note that this is a working paper, and it will be updated as additional analysis and results are completed.

翻訳日:2023-04-26 19:39:30 公開日:2023-04-25

# AudioGPT: 音声、音楽、音声、トーキングヘッドの理解と生成

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head ( http://arxiv.org/abs/2304.12995v1 )

ライセンス: Link先を確認

Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe

(参考訳) 大規模言語モデル(LLM)は、さまざまな領域やタスクにまたがって顕著な能力を示し、学習と認知の理解に挑戦しています。最近の成功にもかかわらず、現在のLLMは複雑なオーディオ情報を処理したり、(SiriやAlexaのような)会話を行うことができない。本研究では,LLM(すなわちChatGPT)を補完するマルチモーダルAIシステムであるAudioGPTを提案する。 1)複雑な音声情報を処理し、多数の理解・生成課題を解決する基礎モデル 2)音声対話を支援するための入力/出力インタフェース(ASR, TTS)。人間の意図的理解と基礎モデルとの協調によるマルチモーダルLLMの評価の必要性が高まる中、我々はAudioGPTの原則とプロセスの概要を一貫性、能力、堅牢性の観点から検証する。実験の結果,複数回対話における音声,音楽,音声,会話の頭部理解と生成によるai課題の解決におけるaudiogptの能力が実証された。本システムは,<url{https://github.com/AIGC-Audio/AudioGPT}で公開されている。

Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Despite the recent success, current LLMs are not capable of processing complex audio information or conducting spoken conversations (like Siri or Alexa). In this work, we propose a multi-modal AI system named AudioGPT, which complements LLMs (i.e., ChatGPT) with 1) foundation models to process complex audio information and solve numerous understanding and generation tasks; and 2) the input/output interface (ASR, TTS) to support spoken dialogue. With an increasing demand to evaluate multi-modal LLMs of human intention understanding and cooperation with foundation models, we outline the principles and processes and test AudioGPT in terms of consistency, capability, and robustness. Experimental results demonstrate the capabilities of AudioGPT in solving AI tasks with speech, music, sound, and talking head understanding and generation in multi-round dialogues, which empower humans to create rich and diverse audio content with unprecedented ease. Our system is publicly available at \url{https://github.com/AIGC-Audio/AudioGPT}.

翻訳日:2023-04-26 19:38:51 公開日:2023-04-25

# R'enyi divergencesの有効性

Sufficiency of R\'enyi divergences ( http://arxiv.org/abs/2304.12989v1 )

ライセンス: Link先を確認

Niklas Galke, Lauritz van Luijk, Henrik Wilming

(参考訳) 古典的あるいは量子的状態の集合が、古典的または量子的チャネルのペアが他方にセットされた場合、別のものと同値である。ディコトミー(状態のペア)の場合、これは(古典的または量子的) R\'enyi divergences (RD) とデータ処理の不等式と密接に結びついている。ここでは、古典的二分法について、RDs の等式だけでは、2つの方向のいずれかのチャネルの存在に十分であることを示すとともに、いくつかの応用について議論する。最小量子RDの等式は量子の場合で十分であり、特殊の場合では証明できる。また、ペッツ量子も最大量子RDも十分でないことを示す。我々の手法の副作用として、古典、ペッツ量子、最大量子RDによって満たされる無限の不等式のリストを得る。これらの不等式は最小量子rdsには当てはまらない。

A set of classical or quantum states is equivalent to another one if there exists a pair of classical or quantum channels mapping either set to the other one. For dichotomies (pairs of states) this is closely connected to (classical or quantum) R\'enyi divergences (RD) and the data-processing inequality: If a RD remains unchanged when a channel is applied to the dichotomy, then there is a recovery channel mapping the image back to the initial dichotomy. Here, we prove for classical dichotomies that equality of the RDs alone is already sufficient for the existence of a channel in any of the two directions and discuss some applications. We conjecture that equality of the minimal quantum RDs is sufficient in the quantum case and prove it for special cases. We also show that neither the Petz quantum nor the maximal quantum RDs are sufficient. As a side-result of our techniques we obtain an infinite list of inequalities fulfilled by the classical, the Petz quantum, and the maximal quantum RDs. These inequalities are not true for the minimal quantum RDs.

翻訳日:2023-04-26 19:38:30 公開日:2023-04-25

# 胸部x線診断のためのパラレルアテンションブロックを用いたマルチスケール特徴核融合

Multi-Scale Feature Fusion using Parallel-Attention Block for COVID-19 Chest X-ray Diagnosis ( http://arxiv.org/abs/2304.12988v1 )

ライセンス: Link先を確認

Xiao Qi, David J. Foran, John L. Nosher, and Ilker Hacihaliloglu

(参考訳) 世界的なcovid-19危機では、胸部x線(cxr)画像からのcovid-19の正確な診断が重要である。放射線学的評価において, 医療的意思決定と後続の疾患管理を補完するために, コンピュータ支援診断ツールが活用されている。患者を迅速にトリアージし, 放射線科医を支援するためには, 高精度で頑健な計算方法が必要である。本研究では,並列注意ブロックを用いてオリジナルcxr画像と局所位相特徴強調cxr画像をマルチスケールで融合する,新しい多機能融合ネットワークを提案する。我々は、さまざまな組織から取得したさまざまなCOVID-19データセットのモデルを検証し、一般化能力を評価する。本実験は,本手法が最先端性能を実現し,一般化能力の向上を図っている。

Under the global COVID-19 crisis, accurate diagnosis of COVID-19 from Chest X-ray (CXR) images is critical. To reduce intra- and inter-observer variability, during the radiological assessment, computer-aided diagnostic tools have been utilized to supplement medical decision-making and subsequent disease management. Computational methods with high accuracy and robustness are required for rapid triaging of patients and aiding radiologists in the interpretation of the collected data. In this study, we propose a novel multi-feature fusion network using parallel attention blocks to fuse the original CXR images and local-phase feature-enhanced CXR images at multi-scales. We examine our model on various COVID-19 datasets acquired from different organizations to assess the generalization ability. Our experiments demonstrate that our method achieves state-of-art performance and has improved generalization capability, which is crucial for widespread deployment.

翻訳日:2023-04-26 19:38:06 公開日:2023-04-25

# 糖尿病足部潰瘍分類における画像類似性の影響の定量化

Quantifying the Effect of Image Similarity on Diabetic Foot Ulcer Classification ( http://arxiv.org/abs/2304.12987v1 )

ライセンス: Link先を確認

Imran Chowdhury Dipto, Bill Cassidy, Connah Kendrick, Neil D. Reeves, Joseph M. Pappachan, Vishnu Chandrabalan, Moi Hoon Yap

(参考訳) 本研究は,深層学習分類ネットワークを訓練する際の糖尿病性足底潰瘍データセットにおける視覚的に類似した画像の効果について検討する。ディープラーニングアルゴリズムのトレーニングに使用されるデータセットにバイナリIDの重複画像が存在することは、ネットワークパフォーマンスを劣化させる不必要なバイアスを生じさせる、よく知られた問題である。しかし, 視覚的に類似した非同一性像の影響は未検討の話題であり, 糖尿病性足部潰瘍研究ではまだ検討されていない。我々は,糖尿病性足潰瘍2021(dfuc2021)トレーニングデータセットにおける類似画像群を,オープンソースのファジィアルゴリズムを用いて同定する。それぞれの類似度しきい値に基づいて、ディープラーニング多クラス分類器のトレーニングに使用する新しいトレーニングセットを作成します。次に,dfuc2021テストセットにおけるベストパフォーマンスモデルの性能評価を行った。 InceptionResNetV2ネットワークを用いて,80\%の類似度閾値画像が除去されたトレーニングセットでトレーニングしたモデルが最も優れた性能を示した。このモデルは、それぞれ0.023、0.029、0.013のf1-score、精度、リコールを改善した。これらの結果から, 糖尿病性フット潰瘍チャレンジ2021データセットにおけるパフォーマンス劣化バイアスの存在に極めて類似した画像が寄与し, トレーニングセットから80%類似した画像の除去が分類性能の向上に有効であることが示唆された。

This research conducts an investigation on the effect of visually similar images within a publicly available diabetic foot ulcer dataset when training deep learning classification networks. The presence of binary-identical duplicate images in datasets used to train deep learning algorithms is a well known issue that can introduce unwanted bias which can degrade network performance. However, the effect of visually similar non-identical images is an under-researched topic, and has so far not been investigated in any diabetic foot ulcer studies. We use an open-source fuzzy algorithm to identify groups of increasingly similar images in the Diabetic Foot Ulcers Challenge 2021 (DFUC2021) training dataset. Based on each similarity threshold, we create new training sets that we use to train a range of deep learning multi-class classifiers. We then evaluate the performance of the best performing model on the DFUC2021 test set. Our findings show that the model trained on the training set with the 80\% similarity threshold images removed achieved the best performance using the InceptionResNetV2 network. This model showed improvements in F1-score, precision, and recall of 0.023, 0.029, and 0.013, respectively. These results indicate that highly similar images can contribute towards the presence of performance degrading bias within the Diabetic Foot Ulcers Challenge 2021 dataset, and that the removal of images that are 80\% similar from the training set can help to boost classification performance.

翻訳日:2023-04-26 19:37:51 公開日:2023-04-25

# 大規模マルチタスク中国語理解の測定

Measuring Massive Multitask Chinese Understanding ( http://arxiv.org/abs/2304.12986v1 )

ライセンス: Link先を確認

Hui Zeng

(参考訳) 大規模な中国語モデルの開発は盛んであるが、それに対応する能力評価が不足している。そこで本研究では,大規模中国語モデルのマルチタスク精度を計測するテストを提案する。このテストは、医学、法学、心理学、教育を含む4つの主要な領域を含み、15のサブタスクと8のサブタスクがある。その結果、ゼロショット設定における最高のパフォーマンスモデルは、最悪のパフォーマンスモデルよりも平均22ポイント向上した。 4つの主要領域全体で、全てのモデルの平均ゼロショット精度は0.5を超えなかった。サブドメインでは、gpt-3.5-turboモデルのみが臨床医学において0.703のゼロショット精度を達成した。すべてのモデルは法律領域で性能が悪く、最高ゼロショット精度は0.259にしか達しなかった。複数の分野にわたる知識の幅と深さを包括的に評価することにより、このテストはモデルの欠点をより正確に識別することができる。

The development of large-scale Chinese language models is flourishing, yet there is a lack of corresponding capability assessments. Therefore, we propose a test to measure the multitask accuracy of large Chinese language models. This test encompasses four major domains, including medicine, law, psychology, and education, with 15 subtasks in medicine and 8 subtasks in education. We found that the best-performing models in the zero-shot setting outperformed the worst-performing models by nearly 22 percentage points on average. Across the four major domains, the average zero-shot accuracy of all models did not exceed 0.5. In the subdomains, only the GPT-3.5-turbo model achieved a zero-shot accuracy of 0.703 in clinical medicine, which was the highest accuracy among all models across all subtasks. All models performed poorly in the legal domain, with the highest zero-shot accuracy reaching only 0.259. By comprehensively evaluating the breadth and depth of knowledge across multiple disciplines, this test can more accurately identify the shortcomings of the models.

翻訳日:2023-04-26 19:37:29 公開日:2023-04-25

# Rubikの光ニューラルネットワーク:物理対応ローテーションアーキテクチャによるマルチタスク学習

Rubik's Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture ( http://arxiv.org/abs/2304.12985v1 )

ライセンス: Link先を確認

Yingjie Li, Weilu Gao, Cunxi Yu

(参考訳) 近年、電力効率、並列性、計算速度の面で機械学習(ML)に大きな利点をもたらす光学ニューラルネットワーク(ONN)の進歩への取り組みが増えている。計算速度とエネルギー効率にかなりの利点があるため、onnを医療センシング、セキュリティスクリーニング、薬物検出、自動運転に活用することには大きな関心がある。しかしながら、再構成可能性を実装することの難しさから、マルチタスク学習(mtl)アルゴリズムをonnにデプロイするには、実際のアプリケーションシナリオにおけるエネルギーとコスト効率を大幅に低下させる物理的拡散システムの再構築と複製が必要となる。この論文は、光学系の物理的性質を利用して複数のフィードフォワード関数をエンコードし、 \textit{rubik's cube} を回転させるのと同様にハードウェアを物理的に回転させることによって、新しい onns アーキテクチャ、すなわち \textit{rubikonns} を提案する。 RubikONN 上での MTL 性能を最適化するために,ドメイン固有の物理認識トレーニングアルゴリズム \textit{RotAgg} と \textit{RotSeq} を提案する。実験の結果, 最先端の手法と比較して, エネルギーとコストの効率が改善し, 限界精度が低下することを示した。

Recently, there are increasing efforts on advancing optical neural networks (ONNs), which bring significant advantages for machine learning (ML) in terms of power efficiency, parallelism, and computational speed. With the considerable benefits in computation speed and energy efficiency, there are significant interests in leveraging ONNs into medical sensing, security screening, drug detection, and autonomous driving. However, due to the challenge of implementing reconfigurability, deploying multi-task learning (MTL) algorithms on ONNs requires re-building and duplicating the physical diffractive systems, which significantly degrades the energy and cost efficiency in practical application scenarios. This work presents a novel ONNs architecture, namely, \textit{RubikONNs}, which utilizes the physical properties of optical systems to encode multiple feed-forward functions by physically rotating the hardware similarly to rotating a \textit{Rubik's Cube}. To optimize MTL performance on RubikONNs, two domain-specific physics-aware training algorithms \textit{RotAgg} and \textit{RotSeq} are proposed. Our experimental results demonstrate more than 4$\times$ improvements in energy and cost efficiency with marginal accuracy degradation compared to the state-of-the-art approaches.

翻訳日:2023-04-26 19:37:12 公開日:2023-04-25

# ホッターは簡単:スピン量子ビット周波数の予期せぬ温度依存性

Hotter is easier: unexpected temperature dependence of spin qubit frequencies ( http://arxiv.org/abs/2304.12984v1 )

ライセンス: Link先を確認

Brennan Undseth, Oriol Pietx-Casas, Eline Raymenants, Mohammad Mehmandoost, Mateusz T. Madzik, Stephan G.J. Philips, Sander L. de Snoo, David J. Michalak, Sergey V. Amitonov, Larysa Tryputen, Brian Paquelet Wuetz, Viviana Fezzi, Davide Degli Esposti, Amir Sammak, Giordano Scappucci, Lieven M. K. Vandersypen

(参考訳) スピンベースの量子プロセッサのサイズと複雑さが大きくなるにつれて、高いフィダリティの維持とクロストークの最小化が量子アルゴリズムと誤り訂正プロトコルの実装の成功に不可欠となる。特に最近の実験では、マイクロ波キュービット駆動に伴う過度な過渡的キュービット周波数シフトが強調されている。オフ共振マイクロ波バーストをプリパルスしてデバイスを定常状態にし、測定に先立って待ち時間、キュービット固有のキャリブレーションなど、小さなデバイスに対する回避策は、デバイススケーラビリティに悪影響を及ぼす。ここでは、この効果を理解し、克服する上で大きな進歩を遂げます。マイクロ波とベースバンドの制御信号による観測周波数シフトと一致した混合室温度とスピンラーモア周波数の驚くべき非単調関係について報告する。この装置を200mKで故意に動作させることは、キュービットコヒーレンスや単一キュービット忠実度ベンチマークを損なうことなく、有害な加熱効果を著しく抑制することを発見した。さらに、系統的非マルコフクロストークは大幅に削減される。本結果は,将来のスピンベース量子プロセッサのキャリブレーション手順を簡素化しつつ,マルチスピン制御の品質を向上させるための簡単な手段を提供する。

As spin-based quantum processors grow in size and complexity, maintaining high fidelities and minimizing crosstalk will be essential for the successful implementation of quantum algorithms and error-correction protocols. In particular, recent experiments have highlighted pernicious transient qubit frequency shifts associated with microwave qubit driving. Workarounds for small devices, including prepulsing with an off-resonant microwave burst to bring a device to a steady-state, wait times prior to measurement, and qubit-specific calibrations all bode ill for device scalability. Here, we make substantial progress in understanding and overcoming this effect. We report a surprising non-monotonic relation between mixing chamber temperature and spin Larmor frequency which is consistent with observed frequency shifts induced by microwave and baseband control signals. We find that purposefully operating the device at 200 mK greatly suppresses the adverse heating effect while not compromising qubit coherence or single-qubit fidelity benchmarks. Furthermore, systematic non-Markovian crosstalk is greatly reduced. Our results provide a straightforward means of improving the quality of multi-spin control while simplifying calibration procedures for future spin-based quantum processors.

翻訳日:2023-04-26 19:36:48 公開日:2023-04-25

# DSTC11におけるタスク指向対話トラックの会話からのインテント誘導

Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11 ( http://arxiv.org/abs/2304.12982v1 )

ライセンス: Link先を確認

James Gung, Raphael Shu, Emily Moeng, Wesley Rose, Salvatore Romeo, Yassine Benajiba, Arshit Gupta, Saab Mansour and Yi Zhang

(参考訳) 近年,仮想アシスタントの需要増加と採用に伴い,インテントの自動誘導やスロットや対話状態の誘導を通じて,ボットスキーマ設計を高速化する方法が研究されている。しかしながら、専用のベンチマークと標準化された評価の欠如により、システム間の進捗の追跡と比較が困難になっている。このチャレンジトラックは、第11回ダイアログシステム技術チャレンジの一環として開催され、人間エージェントと顧客間のカスタマーサービスインタラクションの現実的な設定において、顧客意図の自動誘導方法を評価するためのベンチマークを導入している。本稿では,インテントの自動誘導とそれに対応する評価手法に取り組むための2つのサブタスクを提案する。次に,タスク評価に適したデータセットを3つ提示し,簡単なベースラインを提案する。最後に、課題トラックの提出内容と結果を要約し、34チームから応募を受け取りました。

With increasing demand for and adoption of virtual assistants, recent work has investigated ways to accelerate bot schema design through the automatic induction of intents or the induction of slots and dialogue states. However, a lack of dedicated benchmarks and standardized evaluation has made progress difficult to track and comparisons between systems difficult to make. This challenge track, held as part of the Eleventh Dialog Systems Technology Challenge, introduces a benchmark that aims to evaluate methods for the automatic induction of customer intents in a realistic setting of customer service interactions between human agents and customers. We propose two subtasks for progressively tackling the automatic induction of intents and corresponding evaluation methodologies. We then present three datasets suitable for evaluating the tasks and propose simple baselines. Finally, we summarize the submissions and results of the challenge track, for which we received submissions from 34 teams.

翻訳日:2023-04-26 19:36:27 公開日:2023-04-25

# flickr-pad:新しい顔高分解能プレゼンテーションアタック検出データベース

Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database ( http://arxiv.org/abs/2304.13015v1 )

ライセンス: Link先を確認

Diego Pasmino, Carlos Aravena, Juan Tapia and Christoph Busch

(参考訳) 現在,プレゼンテーションアタック検出は非常に活発な研究分野である。映像から抽出した画像を用いて,最先端のデータベースを構成する。主な問題は、多くのデータベースが低品質で小さな画像サイズであり、実際の遠隔バイオメトリックシステムでは運用シナリオを表現していないことである。現在、これらの画像は高品質で解像度の高いスマートフォンから撮影されている。画像品質の多様性を高めるため、この研究は「Flickr-PAD」と呼ばれるオープンアクセスFlickrイメージに基づいた新しいPADデータベースを提供する。新しい手作りデータベースは、高品質な印刷と画面のシナリオを示しています。これにより、研究者はより広いデータベース上の既存のアルゴリズムと新しいアプローチを比較することができる。このデータベースは他の研究者も利用できる。 MobileNet-V3(小さくて大きい)とEfficientNet-B0に基づく3つのPADモデルのトレーニングと評価に、Left-one-outプロトコルが使用された。最大の成果はMobileNet-V3で、BPCER10は7.08%、BPCER20は11.15%だった。

Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.

翻訳日:2023-04-26 19:29:06 公開日:2023-04-25

# 内視鏡画像とビデオにおける最小侵襲手術器具の分節化のための方法とデータセット:術法の現状について

Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art ( http://arxiv.org/abs/2304.13014v1 )

ライセンス: Link先を確認

Tobias Rueckert (1), Daniel Rueckert (2 and 3), Christoph Palm (1 and 4) ((1) Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany, (2) Artificial Intelligence in Healthcare and Medicine, Klinikum rechts der Isar, Technical University of Munich, Germany, (3) Department of Computing, Imperial College London, UK, (4) Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Germany)

(参考訳) コンピュータ・ロボット支援の低侵襲手術の分野では,内視鏡画像における手術器具の認識により,近年,大きな進歩を遂げている。特に楽器の位置や種類の決定は、非常に興味深い。現在の研究は、空間的情報と時間的情報の両方が関係しており、外科的ツールの時間的移動の予測が最終セグメンテーションの質を向上させる可能性がある。公開データセットの提供は、最近、主にディープラーニングに基づく新しい手法の開発を奨励している。本稿では,手法の開発と評価に使用されるデータセットを特定し,文献上での使用頻度を定量化する。内視鏡画像における低侵襲手術器具のセグメンテーションと追跡に関する研究の現状について概説する。本論文は,1フレーム分割手法と時間情報を含む手法を考慮し,楽器の任意の種類のマーカーを付加せずに純粋に視覚的に機能する手法に焦点を当てた。レビューされた文献に関する議論が提供され、既存の欠点と将来の発展の可能性を強調している。検討された出版物は、Google Scholar、Web of Science、PubMedを通じて特定された。検索用語は「構造的セグメンテーション」、「構造的トラッキング」、「外科的ツールセグメンテーション」、「外科的ツールトラッキング」であり、2015年から2022年にかけて408の論文が発行され、109が体系的選択基準で含まれていた。

In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images. Especially the determination of the position and type of the instruments is of great interest here. Current work involves both spatial and temporal information with the idea, that the prediction of movement of surgical tools over time may improve the quality of final segmentations. The provision of publicly available datasets has recently encouraged the development of new methods, mainly based on deep learning. In this review, we identify datasets used for method development and evaluation, as well as quantify their frequency of use in the literature. We further present an overview of the current state of research regarding the segmentation and tracking of minimally invasive surgical instruments in endoscopic images. The paper focuses on methods that work purely visually without attached markers of any kind on the instruments, taking into account both single-frame segmentation approaches as well as those involving temporal information. A discussion of the reviewed literature is provided, highlighting existing shortcomings and emphasizing available potential for future developments. The publications considered were identified through the platforms Google Scholar, Web of Science, and PubMed. The search terms used were "instrument segmentation", "instrument tracking", "surgical tool segmentation", and "surgical tool tracking" and result in 408 articles published between 2015 and 2022 from which 109 were included using systematic selection criteria.

翻訳日:2023-04-26 19:28:50 公開日:2023-04-25

# 大規模視覚言語モデルのための安定・低精度トレーニング

Stable and low-precision training for large-scale vision-language models ( http://arxiv.org/abs/2304.13013v1 )

ライセンス: Link先を確認

Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt

(参考訳) 新しい方法を紹介します 1)加速・加速 2)大規模言語視モデルの安定化訓練。 1) Int8量子化トレーニングの線形層であるSwitchBackを導入し,bfloat16トレーニングのパフォーマンスを1BパラメータCLIP ViT-Hugeの0.1パーセントの範囲内で一致させながら,13～25%の高速化を実現した。 float8のgpuサポートは稀ですが、シミュレーションを通じてfloat8トレーニングも分析しています。 SwitchBackはfloat8に有効であることが証明されているが、ネットワークがトレーニングされ初期化され、大きな特徴が無視され、ゼロで初期化された層スケールで達成される場合、標準技術も成功していることを示す。 2)安定トレーニングに向けて損失スパイクを分析し,AdamW第2モーメント推定器で2乗勾配が過小評価された後に連続して1～8回発生することを示した。結果として,CLIP ViT-Hugeモデルのトレーニング時に損失のスパイクを回避し,勾配クリッピングよりも優れているため,StableAdamWと呼ぶAdamW-Adafactorハイブリッドを推奨する。

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) Towards accelerating training, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the performance of bfloat16 training within 0.1 percentage points for the 1B parameter CLIP ViT-Huge -- the largest int8 training to date. Our main focus is int8 as GPU support for float8 is rare, though we also analyze float8 training through simulation. While SwitchBack proves effective for float8, we show that standard techniques are also successful if the network is trained and initialized so that large feature magnitudes are discouraged, which we accomplish via layer-scale initialized with zeros. 2) Towards stable training, we analyze loss spikes and find they consistently occur 1-8 iterations after the squared gradients become under-estimated by their AdamW second moment estimator. As a result, we recommend an AdamW-Adafactor hybrid, which we refer to as StableAdamW because it avoids loss spikes when training a CLIP ViT-Huge model and outperforms gradient clipping.

翻訳日:2023-04-26 19:28:25 公開日:2023-04-25

# 非構造化データと構造化データ: 大きな言語モデルを持つ両方の世界のベストを得られるか?

Unstructured and structured data: Can we have the best of both worlds with large language models? ( http://arxiv.org/abs/2304.13010v1 )

ライセンス: Link先を確認

Wang-Chiew Tan

(参考訳) 本稿では,大規模言語モデルを用いて非構造化データと構造化データの両方を問合せする可能性について考察する。また,両タイプのデータを対象とした質問応答システムの構築に関する研究課題についても概説する。

This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for both types of data.

翻訳日:2023-04-26 19:28:07 公開日:2023-04-25

# リモートセンシングにおけるビジュアルチャットGPTの可能性

The Potential of Visual ChatGPT For Remote Sensing ( http://arxiv.org/abs/2304.13009v1 )

ライセンス: Link先を確認

Lucas Prado Osco, Eduardo Lopes de Lemos, Wesley Nunes Gon\c{c}alves, Ana Paula Marques Ramos and Jos\'e Marcato Junior

(参考訳) 自然言語処理(NLP)の最近の進歩、特にディープラーニングベースのコンピュータビジョン技術に関連するLarge Language Models(LLMs)は、様々なタスクを自動化する可能性を示している。 1つの注目すべきモデルはVisual ChatGPTであり、これはChatGPTのLLM機能とビジュアル計算を組み合わせて、効果的な画像解析を可能にする。テキスト入力に基づく画像の処理能力は、様々な分野に革命をもたらす可能性がある。しかし、リモートセンシング領域での応用は未検討のままである。 GPTアーキテクチャ上に構築された最先端のLCMである Visual ChatGPT は,リモートセンシング領域に関連する画像処理の課題に対処するための最初の論文である。現在の機能の中で、Visual ChatGPTは画像のテキスト記述を生成し、キャニーエッジと直線検出を実行し、画像セグメンテーションを実行することができる。これらは画像コンテンツに関する貴重な洞察を与え、情報の解釈と抽出を容易にする。衛星画像の公開データセットにおけるこれらの技術の適用性を探ることで、リモートセンシング画像を扱う際の現在のモデルの限界を実証し、その課題と今後の展望を明らかにする。 LLMとビジュアルモデルの組み合わせは、まだ開発の初期段階であるが、リモートセンシング画像処理を変換し、現場でアクセスしやすく実用的な応用機会を生み出す大きな可能性を秘めている。

Recent advancements in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. One notable model is Visual ChatGPT, which combines ChatGPT's LLM capabilities with visual computation to enable effective image analysis. The model's ability to process images based on textual inputs can revolutionize diverse fields. However, its application in the remote sensing domain remains unexplored. This is the first paper to examine the potential of Visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, Visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model's limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visual models holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field.

翻訳日:2023-04-26 19:28:01 公開日:2023-04-25

# 思考連鎖のメタリゾン化による質問への回答

Answering Questions by Meta-Reasoning over Multiple Chains of Thought ( http://arxiv.org/abs/2304.13007v1 )

ライセンス: Link先を確認

Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant

(参考訳) マルチホップ質問応答(QA)のための現代のシステムは、最終回答に到達する前に、質問を一連の推論ステップ、すなわちチェーン・オブ・シント(CoT)に分割する。多くの場合、複数の連鎖が最終回答の投票機構を通じてサンプリングされ集約されるが、中間ステップ自体は破棄される。このようなアプローチはパフォーマンスを向上させるが、チェーン間の中間ステップ間の関係を考慮せず、予測された回答の統一的な説明を提供しない。 MCR(Multi-Chain Reasoning)は,大規模言語モデルに対して,回答を集約するのではなく,複数の思考チェーン上でメタ推論を行うアプローチである。 MCRは、異なる推論連鎖を調べ、それらを混合し、説明を生成し、答えを予測する際に最も関係のある事実を選択する。 MCRは7つのマルチホップQAデータセットで強いベースラインを上回ります。さらに,本分析の結果から,MCRの説明は高品質であり,人間が回答を検証できることがわかった。

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.

翻訳日:2023-04-26 19:27:39 公開日:2023-04-25

# PoseVocab:人間のアバターモデリングのための共同構造ポス埋め込み学習

PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling ( http://arxiv.org/abs/2304.13006v1 )

ライセンス: Link先を確認

Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu

(参考訳) ポーズ駆動型人間のアバターを作成することは、低周波駆動のポーズから高周波のダイナミックな人間の外観へのマッピングをモデル化することであり、人間のアバターモデリングにおいて、高忠実な人間の詳細をエンコードできる効果的なポーズ符号化法が不可欠である。キャラクターのマルチビューRGBビデオが与えられた後、PoseVocabはトレーニングポーズに基づいてキーポーズと潜在埋め込みを構築する。ポーズ一般化と時間的一貫性を達成するために,大域的なポーズベクトルではなく,各ジョイントの$so(3)$でキー回転をサンプリングし,各サンプルされたキー回転に対してポーズ埋め込みを割り当てる。これらのジョイント構造のポーズ埋め込みは、異なるキーポーズの下でのダイナミックな外観をエンコードするだけでなく、ジョイント構造に埋め込まれたグローバルなポーズを分解し、各ジョイントの動きに関連する外観の変動をよりよく学習する。メモリ効率を保ちながらポーズ埋め込みの表現能力を向上するために,よりきめ細かな人間の外観をモデル化するために,コンパクトで効果的な3D表現である特徴線を導入する。さらに、クエリポーズと空間的位置が与えられた場合、ポーズ埋め込みを補間し、動的ヒト合成のための条件付きポーズ特徴を取得する階層的なクエリ戦略を導入する。全体的に、ponsvocabは人間の外観の動的な詳細を効果的にエンコードし、新しいポーズの下でリアルで一般化されたアニメーションを可能にする。実験により,本手法は質的および定量的に合成品質の点で,他の最先端ベースラインよりも優れていることが示された。コードはhttps://github.com/lizhe00/posevocabで入手できる。

Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling.To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in $so(3)$ of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.

翻訳日:2023-04-26 19:27:11 公開日:2023-04-25

# インド言語におけるバイリンガル・セマンティック・パーシングの評価

Evaluating Inter-Bilingual Semantic Parsing for Indian Languages ( http://arxiv.org/abs/2304.13005v1 )

ライセンス: Link先を確認

Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan

(参考訳) インド語の自然言語生成(IndicNLP)の進歩にもかかわらず、意味解析のような複雑な構造化タスクに関するデータセットが不足している。この差し迫ったギャップの1つは論理形式の複雑さであり、英語から多言語への翻訳が難しい。このプロセスでは、論理形式、意図、スロットを翻訳された非構造的発話とアライメントする。そこで本研究では,11の異なるインド言語を対象としたセマンティック解析データセットIE-SEMPARSEを提案する。本稿では,提案課題の実用性を強調し,既存の多言語Seq2seqモデルを複数の列車試験戦略で評価する。実験の結果,mTOP, Multilingual TOP, multiATIS++ など) と提案した IE-SEMPARSE スイートの性能に高い相関関係が認められた。

Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack of datasets around complex structured tasks such as semantic parsing. One reason for this imminent gap is the complexity of the logical form, which makes English to multilingual translation difficult. The process involves alignment of logical forms, intents and slots with translated unstructured utterance. To address this, we propose an Inter-bilingual Seq2seq Semantic parsing dataset IE-SEMPARSE for 11 distinct Indian languages. We highlight the proposed task's practicality, and evaluate existing multilingual seq2seq models across several train-test strategies. Our experiment reveals a high correlation across performance of original multilingual semantic parsing datasets (such as mTOP, multilingual TOP and multiATIS++) and our proposed IE-SEMPARSE suite.

翻訳日:2023-04-26 19:26:25 公開日:2023-04-25

# 複雑なリアルタイム戦略ゲームにおけるマルチエージェントRLの集中制御

Centralized control for multi-agent RL in a complex Real-Time-Strategy game ( http://arxiv.org/abs/2304.13004v1 )

ライセンス: Link先を確認

Roger Creus Castanyer

(参考訳) マルチエージェント強化学習(MARL)は、共有環境で共存する複数の学習エージェントの行動を研究する。 marlは、より複雑な学習ダイナミクスを含むため、シングルエージェントrlよりも難しい: 各エージェントの観察と報酬は、他のすべてのエージェントの機能である。 MARLの文脈では、リアルタイム戦略(RTS)ゲームは複数のプレイヤーが同時に相互作用し、異なる性質の多くのユニットを同時に制御する非常に困難な環境を表す。実際、RTSゲームは現在のRLメソッドでは難しいので、RLでそれらに取り組むことは興味深い。このプロジェクトは、lux ai v2 kaggleコンペティションにおいてrlを適用するエンドツーエンドのエクスペリエンスを提供する。このコンペティションでは、コンペティターは、可変サイズのユニット群を制御し、マルチ変数の最適化、リソースの収集、アロケーション問題に取り組むために、他のコンペティタに対して1v1シナリオでエージェントを設計する。 RLエージェントのトレーニングには集中型アプローチを使用し、そのプロセスに沿って複数の設計判断を報告します。プロジェクトのソースコードは、https://github.com/roger-creus/centralized-control-lux。

Multi-agent Reinforcement learning (MARL) studies the behaviour of multiple learning agents that coexist in a shared environment. MARL is more challenging than single-agent RL because it involves more complex learning dynamics: the observations and rewards of each agent are functions of all other agents. In the context of MARL, Real-Time Strategy (RTS) games represent very challenging environments where multiple players interact simultaneously and control many units of different natures all at once. In fact, RTS games are so challenging for the current RL methods, that just being able to tackle them with RL is interesting. This project provides the end-to-end experience of applying RL in the Lux AI v2 Kaggle competition, where competitors design agents to control variable-sized fleets of units and tackle a multi-variable optimization, resource gathering, and allocation problem in a 1v1 scenario against other competitors. We use a centralized approach for training the RL agents, and report multiple design decisions along the process. We provide the source code of the project: https://github.com/roger-creus/centralized-control-lux.

翻訳日:2023-04-26 19:26:10 公開日:2023-04-25

# 学習された構造化表現の一般化について

On the Generalization of Learned Structured Representations ( http://arxiv.org/abs/2304.13001v1 )

ライセンス: Link先を確認

Andrea Dittadi

(参考訳) 過去10年間に大きく進歩したにもかかわらず、ディープラーニングの手法は一般に人間レベルの体系的な一般化に欠ける。データの基盤となる構造を明示的に捉えることで、コネクショニストシステムがより予測可能で体系的な方法で一般化できることが主張されている。実際、人間の証拠は、記号のような構成的実体で世界を解釈することは知的行動や高レベルの推論に不可欠であることを示唆している。ディープラーニングシステムのもうひとつの一般的な制限は、大量のトレーニングデータを必要とすることだ。表現学習では、任意の下流タスクを効率的に学習するのに有用な汎用データ表現を学習するために、大きなデータセットが利用される。この論文は構造化表現学習に関するものである。我々は,その隠れた構造を捉えた非構造化データの表現をほとんど,あるいは全く監視せずに学習する手法について検討する。論文の第1部では,データの変動の説明的要因を異にする表現に注目した。分散表現学習を新しいロボットデータセットにスケールアップし、下流ロボットタスクにおける分布外一般化のための事前学習表現の役割を体系的に大規模に研究する。この論文の第2部はオブジェクト中心の表現に焦点を当てており、視覚シーンのオブジェクトのようなシンボルのようなエンティティの観点で入力の構成構造を捉えている。オブジェクト中心学習法は、無構造入力から有意義な実体を形成することを学習し、コネクショニスト基板上でシンボリック情報処理を可能にする。本研究では,複数の共通データセット上のメソッドの選択を訓練し,下流タスクの有用性と分布から一般化する能力について検討する。

Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.

翻訳日:2023-04-26 19:25:50 公開日:2023-04-25

# DQS3D: 厳密に整合した量子化対応半教師付き3次元検出

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection ( http://arxiv.org/abs/2304.13031v1 )

ライセンス: Link先を確認

Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, Guyue Zhou

(参考訳) 本研究では, 3次元室内シーンのクラッタ化に要するアノテーションコストを考慮し, 半教師付き3次元物体検出の問題点について検討する。自己啓発の強固で原則化された枠組みは,近年,半教師付き学習に顕著な進歩をもたらしている。このパラダイムは画像レベルやピクセルレベルの予測には自然であるが、提案マッチングの問題により検出問題に適応する。従来の手法は2段階のパイプラインに基づいており、第1段階で生成したヒューリスティックに選択された提案に一致し、空間的に疎い訓練信号をもたらす。対照的に,一段階的に動作し,空間的に密集したトレーニング信号を可能にする,最初の半教師付き3次元検出アルゴリズムを提案する。この新設計の根本的な問題は、点対ボクセルの離散化に起因する量子化誤差であり、これは必然的に、ボクセル領域における2つの変換されたビュー間の不一致を引き起こす。この目的のために、我々はこのミスアライメントを補うクローズドフォームルールを導出し実装する。 ScanNet mAP@0.5 を 20% のアノテーションで 35.2% から 48.5% まで推し進めるなど、我々の結果は重要である。コードとデータは公開される予定だ。

In this paper, we study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes. We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently. While this paradigm is natural for image-level or pixel-level prediction, adapting it to the detection problem is challenged by the issue of proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals generated in the first stage and resulting in spatially sparse training signals. In contrast, we propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals. A fundamental issue of this new design is the quantization error caused by point-to-voxel discretization, which inevitably leads to misalignment between two transformed views in the voxel domain. To this end, we derive and implement closed-form rules that compensate this misalignment onthe-fly. Our results are significant, e.g., promoting ScanNet mAP@0.5 from 35.2% to 48.5% using 20% annotation. Codes and data will be publicly available.

翻訳日:2023-04-26 19:20:05 公開日:2023-04-25

# completionformer:畳み込みと視覚変換による奥行き補完

CompletionFormer: Depth Completion with Convolutions and Vision Transformers ( http://arxiv.org/abs/2304.13030v1 )

ライセンス: Link先を確認

Zhang Youmin, Guo Xianda, Poggi Matteo, Zhu Zheng, Huang Guan, Mattoccia Stefano

(参考訳) スパース深度と対応するRGB画像が与えられた場合、深度補正は画像全体を通してスパース計測を空間的に伝播させ、深度予測を得ることを目的としている。深層学習に基づく深層学習手法の進歩にもかかわらず、畳み込み層やグラフモデルの局所性により、ネットワークが画素間の長距離関係をモデル化することは困難である。最近の完全トランスフォーマーベースのアーキテクチャは、グローバルレセプション分野での成果を奨励していると報告しているが、十分に開発されているcnnモデルの性能と効率の差は、局所的な特徴の詳細のために依然として残っている。本稿では、ピラミッド構造における深度補完モデルを構築するための基本単位として、畳み込み注意層と視覚変換器を1ブロックに深く結合したJCAT(Joint Convolutional Attention and Transformer Block)を提案する。このハイブリッドアーキテクチャは、自然に畳み込みの局所接続と1つのモデルにおけるトランスフォーマーのグローバルコンテキストの両方にメリットがある。その結果,屋外kitti深度補完ベンチマークと屋内nyuv2データセットのcnns法を上回り,純粋なトランスフォーマー法に比べて高い効率(約1/3フロップ)を達成した。コードは \url{https://github.com/youmi-zym/CompletionFormer} で入手できる。

Given sparse depths and the corresponding RGB images, depth completion aims at spatially propagating the sparse measurements throughout the whole image to get a dense depth prediction. Despite the tremendous progress of deep-learning-based depth completion methods, the locality of the convolutional layer or graph model makes it hard for the network to model the long-range relationship between pixels. While recent fully Transformer-based architecture has reported encouraging results with the global receptive field, the performance and efficiency gaps to the well-developed CNN models still exist because of its deteriorative local feature details. This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure. This hybrid architecture naturally benefits both the local connectivity of convolutions and the global context of the Transformer in one single model. As a result, our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods. Code is available at \url{https://github.com/youmi-zym/CompletionFormer}.

翻訳日:2023-04-26 19:19:42 公開日:2023-04-25

# Bake off redux:最近の時系列分類アルゴリズムのレビューと実験的評価

Bake off redux: a review and experimental evaluation of recent time series classification algorithms ( http://arxiv.org/abs/2304.13029v1 )

ライセンス: Link先を確認

Matthew Middlehurst, Patrick Sch\"afer and Anthony Bagnall

(参考訳) 2017年、カリフォルニア大学リバーサイド校(UCR)のアーカイブから得られた85のデータセットに対して、18の時系列分類(TSC)アルゴリズムを比較した。この研究は一般に「ベイクオフ」と呼ばれ、9つのアルゴリズムのみが使用されていた動的時間ウォーピング(DTW)と回転フォレストベンチマークよりもはるかに優れた性能を示した。研究は、時系列データから抽出した特徴の種類によって各アルゴリズムを分類し、5つの主要なアルゴリズムタイプを分類した。このアルゴリズムの分類と、コード提供と再現性のためのアクセス可能な結果の分類は、TSC分野の人気向上に寄与した。このブームから6年以上が経過し、UCRアーカイブは112のデータセットに拡張され、多くの新しいアルゴリズムが提案されている。提案したカテゴリが、当初からどのように進歩してきたかを確認し、拡張されたUCRアーカイブを用いて、以前のベスト・オブ・カテゴリに対して新しいアルゴリズムの性能を評価する。分類法を拡張して、最近の発展を反映した3つの新しいカテゴリを含める。提案した距離,間隔,シェープレット,辞書,ハイブリッドベースアルゴリズムとともに,より新しい畳み込みアルゴリズムと特徴ベースアルゴリズム,ディープラーニングアプローチを比較した。本稿では,最近アーカイブに寄贈された30の分類データセットと,tscフォーマットに再構成された分類データセットを紹介し,これらを用いて,各カテゴリのベストパフォーマンスアルゴリズムをさらに評価する。近年提案されているHydra+MultiROCKET と HIVE-COTEv2 のアルゴリズムは,現在のTSC 問題と新しい TSC 問題の両方において,他の手法よりも優れていることがわかった。

In 2017, a research paper compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a `bake off', identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, Hydra+MultiROCKET and HIVE-COTEv2, perform significantly better than other approaches on both the current and new TSC problems.

翻訳日:2023-04-26 19:19:13 公開日:2023-04-25

# 対称性、制約、長距離相互作用をまたいだ創発的流体力学とリンドブラッド低エネルギースペクトルの統一化

Unifying Emergent Hydrodynamics and Lindbladian Low Energy Spectra across Symmetries, Constraints, and Long-Range Interactions ( http://arxiv.org/abs/2304.13028v1 )

ライセンス: Link先を確認

Olumakinde Ogunnaike, Johannes Feldmeier, Jong Yeon Lee

(参考訳) 種々の対称性,制約,相互作用範囲を有するブラウンランダム回路において電荷輸送を制御する創発的流体力学を同定する。これは、二重ヒルベルト空間において有効ハミルトニアンとして作用するリンドブラッド作用素の平均動力学と低エネルギースペクトルの間の写像によって達成される。単一モード近似を用いて、この有効ハミルトニアンの分散励起状態を明示的に構成することにより、保存された多極モーメントと可変相互作用範囲を持つ多体系における拡散的、劣微分的、超拡散的緩和の包括的理解を提供する。我々はさらに,双極子保存が存在するにもかかわらず拡散緩和を示すエキゾチックなクリロフ空間分解流体力学を同定し,数値的に検証する。このアプローチは、ランダムなユニタリ時間発展の下で保存された演算子のダイナミクスを定性的に理解するための汎用的で汎用的なフレームワークを提供する。

We identify emergent hydrodynamics governing charge transport in Brownian random circuits with various symmetries, constraints, and ranges of interactions. This is accomplished via a mapping between the averaged dynamics and the low energy spectrum of a Lindblad operator, which acts as an effective Hamiltonian in a doubled Hilbert space. By explicitly constructing dispersive excited states of this effective Hamiltonian using a single mode approximation, we provide a comprehensive understanding of diffusive, subdiffusive, and superdiffusive relaxation in many-body systems with conserved multipole moments and variable interaction ranges. Our approach further allows us to identify exotic Krylov-space-resolved hydrodynamics exhibiting diffusive relaxation despite the presence of dipole conservation, which we verify numerically. Our approach provides a general and versatile framework to qualitatively understand the dynamics of conserved operators under random unitary time evolution.

翻訳日:2023-04-26 19:18:42 公開日:2023-04-25

# 公開データセットのみを持つ強く再現可能な物体検出器

A Strong and Reproducible Object Detector with Only Public Datasets ( http://arxiv.org/abs/2304.13027v1 )

ライセンス: Link先を確認

Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

(参考訳) この研究は、COCO val2017で64.6 AP、COCO test-devで64.8 APを達成し、テスト時間を増やすことなく700万のパラメータしか持たない強力な再現可能なオブジェクト検出モデルであるFocal-Stable-DINOを提示する。強力なfocalnet-hugeバックボーンと効果的なstable-dino検出器の組み合わせを探索する。大規模プライベートデータやマージデータで多種多様なパラメータや複雑なトレーニング技術を使用する既存のsomaモデルとは異なり、このモデルは公開データセットオブジェクト365でのみトレーニングされるため、このアプローチの再現性が保証される。

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64.6 AP on COCO val2017 and 64.8 AP on COCO test-dev using only 700M parameters without any test time augmentation. It explores the combination of the powerful FocalNet-Huge backbone with the effective Stable-DINO detector. Different from existing SOTA models that utilize an extensive number of parameters and complex training techniques on large-scale private data or merged data, our model is exclusively trained on the publicly available dataset Objects365, which ensures the reproducibility of our approach.

翻訳日:2023-04-26 19:18:23 公開日:2023-04-25

# 見ることは常に信じるとは限らない:AI生成画像の人間の知覚に関する定量的研究

Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images ( http://arxiv.org/abs/2304.13023v1 )

ライセンス: Link先を確認

Zeyu Lu, Di Huang, Lei Bai, Xihui Liu, Jingjing Qu, Wanli Ouyang

(参考訳) 写真は、人間が日常生活で何を経験したかを記録するための手段であり、しばしば信頼できる情報源と見なされる。しかし、人工知能(AI)技術の進歩が偽の写真を生み出し、写真に対する混乱と信頼の低下を引き起こすのではないかという懸念が高まっている。本研究の目的は、現在のaiベースの視覚コンテンツ生成モデルが一貫して人間の目を欺き、誤った情報を伝えることができるかどうかという疑問に答えることである。 50人の被験者を対象に高品質な定量的調査を行い、人間は実際の写真とaiが生成した偽の写真とを38.7%の程度で区別できないことを初めて明らかにした。また, 性別, 年齢, 経験など個人が生成するAIGC(AIGC)の背景は, 実際の写真とAI生成画像を区別する能力に大きく影響しないことがわかった。しかし、私たちは、人々が本物と偽の写真を区別するための手がかりとなる、AI生成画像にある種の欠陥があることを観察しています。我々の研究は、AI生成画像の潜在的なリスクに対する認識を高め、偽情報の拡散を防止するためにさらなる研究を促進することを願っている。ポジティブな観点から見れば、AI生成画像は様々な産業に革命をもたらす可能性があり、もしそれが適切に使用され、規制されたら、人類にとってより良い未来を生み出すことができる。

Photos serve as a way for humans to record what they experience in their daily lives, and they are often regarded as trustworthy sources of information. However, there is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos, which can create confusion and diminish trust in photographs. This study aims to answer the question of whether the current state-of-the-art AI-based visual content generation models can consistently deceive human eyes and convey false information. By conducting a high-quality quantitative study with fifty participants, we reveal, for the first time, that humans cannot distinguish between real photos and AI-created fake photos to a significant degree 38.7%. Our study also finds that an individual's background, such as their gender, age, and experience with AI-generated content (AIGC), does not significantly affect their ability to distinguish AI-generated images from real photographs. However, we do observe that there tend to be certain defects in AI-generated images that serve as cues for people to distinguish between real and fake photos. We hope that our study can raise awareness of the potential risks of AI-generated images and encourage further research to prevent the spread of false information. From a positive perspective, AI-generated images have the potential to revolutionize various industries and create a better future for humanity if they are used and regulated properly.

翻訳日:2023-04-26 19:18:11 公開日:2023-04-25

# 単一モーフィング攻撃検出における顔特徴の可視化

Face Feature Visualisation of Single Morphing Attack Detection ( http://arxiv.org/abs/2304.13021v1 )

ライセンス: Link先を確認

Juan Tapia and Christoph Busch

(参考訳) 本稿では,単一モーフィング攻撃検出のためのボナfideおよびモーフィング画像の検出を可能にする,異なる顔特徴抽出アルゴリズムの説明可能な可視化手法を提案する。特徴抽出は、生画像、形状、テクスチャ、周波数、圧縮に基づいている。この可視化は、国境政策、特に容疑者画像の詳細を調査しなければならない国境警備要員のためのグラフィカルユーザーインタフェースの開発に役立つかもしれない。ランダムフォレスト分類器は,3つのランドマークに基づく顔形態決定法と,frllデータベースでモーフィング画像が使用可能なスタイルガン型モーフィング法で訓練された。モーフィング攻撃検出では、離散コサイン変換法が合成画像の最良の結果とランドマークに基づく画像の特徴のBSIFを得た。

This paper proposes an explainable visualisation of different face feature extraction algorithms that enable the detection of bona fide and morphing images for single morphing attack detection. The feature extraction is based on raw image, shape, texture, frequency and compression. This visualisation may help to develop a Graphical User Interface for border policies and specifically for border guard personnel that have to investigate details of suspect images. A Random forest classifier was trained in a leave-one-out protocol on three landmarks-based face morphing methods and a StyleGAN-based morphing method for which morphed images are available in the FRLL database. For morphing attack detection, the Discrete Cosine-Transformation-based method obtained the best results for synthetic images and BSIF for landmark-based image features.

翻訳日:2023-04-26 19:17:49 公開日:2023-04-25

# 認定アンサンブル:s-リプシッツ性を持つ一般認定理論

Certifying Ensembles: A General Certification Theory with S-Lipschitzness ( http://arxiv.org/abs/2304.13019v1 )

ライセンス: Link先を確認

Aleksandar Petrov, Francisco Eiras, Amartya Sanyal, Philip H.S. Torr, Adel Bibi

(参考訳) ディープラーニングモデルの堅牢性の改善と保証は、激しい研究のトピックとなっている。複数の分類器を組み合わせてより良いモデルを提供するensemblingは、一般化、不確実性推定、キャリブレーション、概念ドリフトの効果の緩和に有効であることが示されている。しかし、認証された堅牢性に対するアンサンブルの影響は、あまり理解されていない。本研究では、S-Lipschitz分類器を導入してリプシッツ連続性を一般化し、アンサンブルの理論的堅牢性を分析する。この結果は,ロバスト分類器のアンサンブルがどの構成分類器よりも頑健である場合と,ロバストでない場合の条件が正確である。

Improving and guaranteeing the robustness of deep learning models has been a topic of intense research. Ensembling, which combines several classifiers to provide a better model, has shown to be beneficial for generalisation, uncertainty estimation, calibration, and mitigating the effects of concept drift. However, the impact of ensembling on certified robustness is less well understood. In this work, we generalise Lipschitz continuity by introducing S-Lipschitz classifiers, which we use to analyse the theoretical robustness of ensembles. Our results are precise conditions when ensembles of robust classifiers are more robust than any constituent classifier, as well as conditions when they are less robust.

翻訳日:2023-04-26 19:17:34 公開日:2023-04-25

# duett: 電子健康記録用のデュアルイベントタイムトランスフォーマー

DuETT: Dual Event Time Transformer for Electronic Health Records ( http://arxiv.org/abs/2304.13017v1 )

ライセンス: Link先を確認

Alex Labach, Aslesha Pokhrel, Xiao Shi Huang, Saba Zuberi, Seung Eun Yi, Maksims Volkovs, Tomi Poutanen, Rahul G. Krishnan

(参考訳) 病院で記録された電子健康記録(ehrs)は、通常、高いスパーシティと不規則な観察によって特徴づけられる幅広い数値時系列データを含んでいる。このようなデータの効果的なモデリングは、時系列の性質、異なる種類の観測のセマンティックな関係、およびデータの空間構造における情報を活用する必要がある。自己教師付きトランスフォーマーは、nlpやコンピュータビジョンの様々な構造化タスクにおいて優れた性能を示している。しかし、多変量時系列データには、時間と記録されたイベントタイプという2次元にわたる構造化された関係が含まれており、時系列データへのトランスフォーマーの直接的な適用は、この異なる構造を利用しない。セルフアテンション層の二次スケーリングは、適切な入力工学を使わずに入力シーケンスの長さを著しく制限することができる。我々は,時間型とイベント型の両方の次元に対応するように設計されたトランスフォーマーの拡張であるduettアーキテクチャを紹介し,ehlデータからロバスト表現を生成する。 DuETTは、スパース時系列が一定の長さの正規シーケンスに変換される集約された入力を使用する。これにより、従来のERHトランスフォーマーモデルと比較して計算の複雑さが低下し、より重要なことに、より大きく深いニューラルネットワークの使用が可能になる。モデル事前学習のためのリッチで情報的な信号を提供する自己教師型予測タスクを訓練すると、MIMIC-IVおよびPhystoNet-2012 EHRデータセットから得られた複数の下流タスクにおける最先端のディープラーニングモデルよりも優れる。

Electronic health records (EHRs) recorded in hospital settings typically contain a wide range of numeric time series data that is characterized by high sparsity and irregular observations. Effective modelling for such data must exploit its time series nature, the semantic relationship between different types of observations, and information in the sparsity structure of the data. Self-supervised Transformers have shown outstanding performance in a variety of structured tasks in NLP and computer vision. But multivariate time series data contains structured relationships over two dimensions: time and recorded event type, and straightforward applications of Transformers to time series data do not leverage this distinct structure. The quadratic scaling of self-attention layers can also significantly limit the input sequence length without appropriate input engineering. We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions, yielding robust representations from EHR data. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length; this lowers the computational complexity relative to previous EHR Transformer models and, more importantly, enables the use of larger and deeper neural networks. When trained with self-supervised prediction tasks, that provide rich and informative signals for model pre-training, our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.

翻訳日:2023-04-26 19:17:21 公開日:2023-04-25

# サブサンプルリッジアンサンブル:同値と一般化されたクロスバリデーション

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation ( http://arxiv.org/abs/2304.13016v1 )

ライセンス: Link先を確認

Jin-Hong Du, Pratik Patil, Arun Kumar Kuchibhotla

(参考訳) 本研究では, 比例漸近状態におけるサブサンプリングに基づく隆起アンサンブルについて検討し, 比例比が一定となるような試料径に比例して特徴量が大きくなることを示した。リッジアンサンブルの2乗予測リスクを明示的なペナルティ$\lambda$と制限サブサンプルアスペクト比$\phi_s$(特徴サイズとサブサンプルサイズとの比率)の関数として解析することにより、達成可能なリスクで$(\lambda, \phi_s)$プレーンの輪郭を特徴づける。その結果、最適なリッジレスアンサンブル(すべての可能なサブサンプルに適合する)のリスクが、最適なリッジ予測器のそれと一致することを証明した。さらに,リッジアンサンブルの予測リスクを推定するためのサブサンプルサイズに対して,一般クロスバリデーション(GCV)の強い均一性を示す。これにより、サンプル分割なしでGCVベースのフルリッジレスアンサンブルのチューニングが可能となり、リスクが最適リッジリスクと一致する予測器が得られる。

We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $\lambda$ and the limiting subsample aspect ratio $\phi_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(\lambda, \phi_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk.

翻訳日:2023-04-26 19:16:57 公開日:2023-04-25

PDF登録状況（公開日: 20230425）